[![Jupyter Notebook](https://img.shields.io/badge/Source%20on%20GitHub-orange)](https://github.com/laminlabs/lamin-usecases/blob/main/docs/multimodal.ipynb)

# Multi-modal

Here, we'll showcase how to curate and register ECCITE-seq data from [Papalexi21](https://www.nature.com/articles/s41592-019-0392-0) in the form of [MuData](https://github.com/scverse/mudata) objects.

ECCITE-seq is designed to enable interrogation of single-cell transcriptomes together with surface protein markers in the context of CRISPR screens.

[MuData objects](https://mudata.readthedocs.io) build on top of AnnData objects to store multimodal data.


In [None]:
# !pip install 'lamindb[jupyter,bionty]'
!lamin init --storage ./test-multimodal --modules bionty

In [None]:
import lamindb as ln
import bionty as bt

ln.track()

## Creating MuData Artifacts

lamindb provides a {meth}`~lamindb.Artifact.from_mudata` method to create {class}`~lamindb.Artifact` from MuData objects.

In [None]:
mdata = ln.core.datasets.mudata_papalexi21_subset()
mdata

In [None]:
mdata_af = ln.Artifact.from_mudata(mdata, key="papalexi.h5mu")
mdata_af

In [None]:
# MuData Artifacts have the corresponding otype
mdata_af.otype

In [None]:
# MuData Artifacts can easily be loaded back into memory
papalexi_in_memory = mdata_af.load()
papalexi_in_memory

## Schema

In [None]:
# define labels
perturbation = ln.ULabel(name="Perturbation", is_type=True).save()
ln.ULabel(name="Perturbed", type=perturbation).save()
ln.ULabel(name="NT", type=perturbation).save()

replicate = ln.ULabel(name="Replicate", is_type=True).save()
ln.ULabel(name="rep1", type=replicate).save()
ln.ULabel(name="rep2", type=replicate).save()
ln.ULabel(name="rep3", type=replicate).save()

# define obs schema
obs_schema = ln.Schema(
    name="mudata_papalexi21_subset_obs_schema",
    features=[
        ln.Feature(name="perturbation", dtype="cat[ULabel[Perturbation]]").save(),
        ln.Feature(name="replicate", dtype="cat[ULabel[Replicate]]").save(),
    ],
).save()

obs_schema_rna = ln.Schema(
    name="mudata_papalexi21_subset_rna_obs_schema",
    features=[
        ln.Feature(name="nCount_RNA", dtype=int).save(),
        ln.Feature(name="nFeature_RNA", dtype=int).save(),
        ln.Feature(name="percent.mito", dtype=float).save(),
    ],
    coerce_dtype=True,
).save()

obs_schema_hto = ln.Schema(
    name="mudata_papalexi21_subset_hto_obs_schema",
    features=[
        ln.Feature(name="nCount_HTO", dtype=float).save(),
        ln.Feature(name="nFeature_HTO", dtype=int).save(),
        ln.Feature(name="technique", dtype=bt.ExperimentalFactor).save(),
    ],
    coerce_dtype=True,
).save()

var_schema_rna = ln.Schema(
    name="mudata_papalexi21_subset_rna_var_schema",
    itype=bt.Gene.symbol,
    dtype=float,
).save()

# define composite schema
mudata_schema = ln.Schema(
    name="mudata_papalexi21_subset_mudata_schema",
    otype="MuData",
    slots={
        "obs": obs_schema,
        "rna:obs": obs_schema_rna,
        "hto:obs": obs_schema_hto,
        "rna:var": var_schema_rna,
    },
).save()

In [None]:
mudata_schema

## Validate MuData annotations

In [None]:
curator = ln.curators.MuDataCurator(mdata, mudata_schema)

In [None]:
try:
    curator.validate()
except ln.errors.ValidationError:
    pass

In [None]:
curator.slots["rna:var"].cat.standardize("columns")

In [None]:
curator.slots["rna:var"].cat.add_new_from("columns")

In [None]:
curator.validate()

## Register curated Artifact

In [None]:
artifact = curator.save_artifact(key="mudata_papalexi21_subset.h5mu")

In [None]:
artifact.describe()

In [None]:
ln.finish()

In [None]:
# clean up test instance
!rm -r test-multimodal
!lamin delete --force test-multimodal