[![Jupyter Notebook](https://img.shields.io/badge/Jupyter%20Notebook-orange)](https://github.com/laminlabs/lamin-usecases/blob/main/docs/multimodal.ipynb)
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/laminlabs/lamin-usecases/main?labpath=lamin-usecases%2Fdocs%2Fmultimodal.ipynb)

# Multi-modal

scRNA data has moved beyond just RNA and can also include the measurements of other modalities such as chromatin accessibility, surface proteins or adaptive immune receptors.
ECCITE-seq is designed to enable interrogation of single-cell transcriptomes together with surface protein markers in the context of CRISPR screens.

Here, we'll showcase how to curate and register ECCITE-seq data from [Papalexi21](https://www.nature.com/articles/s41592-019-0392-0) in the form of [MuData](https://github.com/scverse/mudata) objects.

## Setup

In [None]:
!lamin init --storage ./test-multimodal --schema bionty

In [None]:
import lamindb as ln
import lnschema_bionty as lb

lb.settings.species = "human"

In [None]:
ln.track()

## Papalexi21

Let's use a MuData object:

### Transform ![](https://img.shields.io/badge/Transform-10b981)

In [None]:
mdata = ln.dev.datasets.mudata_papalexi21_subset()
mdata

MuData objects build on top of AnnData objects to store and serialize multimodal data.
More information can be found on the [MuData documentation](https://mudata.readthedocs.io/en/latest/).

First we register the file:

In [None]:
file = ln.File(
    "papalexi21_subset.h5mu", description="Sub-sampled MuData from Papalexi21"
)
file.save()

Now let's validate and register the 3 feature sets this data contains:
1. RNA (gene expression)
2. ADT (antibody derived tags reflecting surface proteins)
3. obs (metadata)

For the two modalities rna and adt, we use bionty tables as the reference:

### Validate ![](https://img.shields.io/badge/Validate-10b981)

In [None]:
mdata["rna"].var_names[:5]

In [None]:
lb.Gene.validate(mdata["rna"].var_names, lb.Gene.symbol);

In [None]:
genes = lb.Gene.from_values(mdata["rna"].var_names, lb.Gene.symbol)
ln.save(genes)

In [None]:
mdata["rna"].var_names = lb.Gene.standardize(mdata["rna"].var_names, lb.Gene.symbol)

In [None]:
validated = lb.Gene.validate(mdata["rna"].var_names, lb.Gene.symbol)

In [None]:
new_genes = [lb.Gene(symbol=symbol) for symbol in mdata["rna"].var_names[~validated]]
ln.save(new_genes)

In [None]:
lb.Gene.validate(mdata["rna"].var_names, lb.Gene.symbol);

In [None]:
feature_set_rna = ln.FeatureSet.from_values(
    mdata["rna"].var_names, field=lb.Gene.symbol
)

In [None]:
mdata["adt"].var_names

In [None]:
lb.CellMarker.validate(mdata["adt"].var_names);

In [None]:
markers = lb.CellMarker.from_values(mdata["adt"].var_names)
ln.save(markers)

In [None]:
lb.CellMarker.validate(mdata["adt"].var_names);

### Register ![](https://img.shields.io/badge/Register-10b981) 

In [None]:
feature_set_adt = ln.FeatureSet.from_values(
    mdata["adt"].var_names, field=lb.CellMarker.name
)

Link them to file:

In [None]:
file.features.add_feature_set(feature_set_rna, slot="rna")
file.features.add_feature_set(feature_set_adt, slot="adt")

The 3rd feature set is the obs:

In [None]:
obs = mdata["rna"].obs

We're only interested in a single metadata column:

In [None]:
ln.Feature(name="gene_target", type="category").save()

In [None]:
features = ln.Feature.from_df(obs)
ln.save(features)

In [None]:
feature_set_obs = ln.FeatureSet.from_df(obs)

In [None]:
file.features.add_feature_set(feature_set_obs, slot="obs")

In [None]:
gene_targets = lb.Gene.from_values(obs["gene_target"], lb.Gene.symbol)
ln.save(gene_targets)
features = ln.Feature.lookup()
file.labels.add(gene_targets, feature=features.gene_target)

In [None]:
nt = ln.ULabel(name="NT", description="Non-targeting control of perturbations")
nt.save()

In [None]:
file.labels.add(nt, feature=features.gene_target)

In [None]:
for col in ["orig.ident", "perturbation", "replicate", "Phase", "guide_ID"]:
    labels = [ln.ULabel(name=name) for name in obs[col].unique()]
    ln.save(labels)

Because none of these labels seem like something we'd want to track in the registry or validate, we don't link them to the file.

In [None]:
file.features

In [None]:
file.describe()

In [None]:
file.view_flow()

In [None]:
# clean up test instance
!lamin delete --force test-multimodal
!rm -r test-multimodal