[![Jupyter Notebook](https://img.shields.io/badge/Source%20on%20GitHub-orange)](https://github.com/laminlabs/lamin-usecases/blob/main/docs/multimodal.ipynb)
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/laminlabs/lamin-usecases/main?labpath=lamin-usecases%2Fdocs%2Fmultimodal.ipynb)

# Multi-modal

```{warning}

This is, for now, just a stub.

```

Here, we'll showcase how to curate and register ECCITE-seq data from [Papalexi21](https://www.nature.com/articles/s41592-019-0392-0) in the form of [MuData](https://github.com/scverse/mudata) objects. ECCITE-seq is designed to enable interrogation of single-cell transcriptomes together with surface protein markers in the context of CRISPR screens.


## Setup

In [None]:
!lamin init --storage ./test-multimodal --schema bionty

In [None]:
import lamindb as ln
import bionty as bt

bt.settings.organism = "human"

In [None]:
ln.settings.transform.stem_uid = "yMWSFirS6qv2"
ln.settings.transform.version = "0"
ln.track()

## Papalexi21

Let's use a MuData object:

### Transform ![](https://img.shields.io/badge/Transform-10b981)

In [None]:
mdata = ln.core.datasets.mudata_papalexi21_subset()
mdata

MuData objects build on top of AnnData objects to store and serialize multimodal data.
More information can be found on the [MuData documentation](https://mudata.readthedocs.io/en/latest/).

First we register the artifact:

In [None]:
artifact = ln.Artifact(
    "papalexi21_subset.h5mu", description="Sub-sampled MuData from Papalexi21"
)
artifact.save()

Now let's validate and register the 3 feature sets this data contains:
1. RNA (gene expression)
2. ADT (antibody derived tags reflecting surface proteins)
3. obs (metadata)

For the two modalities rna and adt, we use bionty tables as the reference:

### Validate ![](https://img.shields.io/badge/Validate-10b981)

In [None]:
mdata["rna"].var_names[:5]

In [None]:
bt.Gene.validate(mdata["rna"].var_names, bt.Gene.symbol);

In [None]:
genes = bt.Gene.from_values(mdata["rna"].var_names, bt.Gene.symbol)
ln.save(genes)

In [None]:
mdata["rna"].var_names = bt.Gene.standardize(mdata["rna"].var_names, bt.Gene.symbol)

In [None]:
validated = bt.Gene.validate(mdata["rna"].var_names, bt.Gene.symbol)

In [None]:
new_genes = [bt.Gene(symbol=symbol) for symbol in mdata["rna"].var_names[~validated]]
ln.save(new_genes)

In [None]:
bt.Gene.validate(mdata["rna"].var_names, bt.Gene.symbol);

In [None]:
feature_set_rna = ln.FeatureSet.from_values(
    mdata["rna"].var_names, field=bt.Gene.symbol
)

In [None]:
mdata["adt"].var_names

In [None]:
bt.CellMarker.validate(mdata["adt"].var_names);

In [None]:
markers = bt.CellMarker.from_values(mdata["adt"].var_names)
ln.save(markers)

In [None]:
bt.CellMarker.validate(mdata["adt"].var_names);

### Register ![](https://img.shields.io/badge/Register-10b981) 

In [None]:
feature_set_adt = ln.FeatureSet.from_values(
    mdata["adt"].var_names, field=bt.CellMarker.name
)

Link them to artifact:

In [None]:
artifact.features._add_feature_set(feature_set_rna, slot="rna")
artifact.features._add_feature_set(feature_set_adt, slot="adt")

The 3rd feature set is the obs:

In [None]:
obs = mdata["rna"].obs

We're only interested in a single metadata column:

In [None]:
ln.Feature(name="gene_target", type="category").save()

In [None]:
features = ln.Feature.from_df(obs)
ln.save(features)

In [None]:
feature_set_obs = ln.FeatureSet.from_df(obs)

In [None]:
artifact.features._add_feature_set(feature_set_obs, slot="obs")

In [None]:
gene_targets = bt.Gene.from_values(obs["gene_target"], bt.Gene.symbol)
ln.save(gene_targets)
features = ln.Feature.lookup()
artifact.labels.add(gene_targets, feature=features.gene_target)

In [None]:
nt = ln.ULabel(name="NT", description="Non-targeting control of perturbations")
nt.save()

In [None]:
artifact.labels.add(nt, feature=features.gene_target)

In [None]:
for col in ["orig.ident", "perturbation", "replicate", "Phase", "guide_ID"]:
    labels = [ln.ULabel(name=name) for name in obs[col].unique()]
    ln.save(labels)

Because none of these labels seem like something we'd want to track in the registry or validate, we don't link them to the artifact.

In [None]:
artifact.features

In [None]:
artifact.describe()

In [None]:
artifact.view_lineage()

In [None]:
# clean up test instance
!lamin delete --force test-multimodal
!rm -r test-multimodal