[![Jupyter Notebook](https://img.shields.io/badge/Source%20on%20GitHub-orange)](https://github.com/laminlabs/lamin-usecases/blob/main/docs/multimodal.ipynb)
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/laminlabs/lamin-usecases/main?labpath=lamin-usecases%2Fdocs%2Fmultimodal.ipynb)

# Multi-modal

```{warning}

This is, for now, just a stub.

```

Here, we'll showcase how to curate and register ECCITE-seq data from [Papalexi21](https://www.nature.com/articles/s41592-019-0392-0) in the form of [MuData](https://github.com/scverse/mudata) objects. ECCITE-seq is designed to enable interrogation of single-cell transcriptomes together with surface protein markers in the context of CRISPR screens.


## Setup

In [None]:
!lamin init --storage ./test-multimodal --schema bionty

In [None]:
import lamindb as ln
import bionty as bt

bt.settings.organism = "human"

In [None]:
ln.settings.transform.stem_uid = "yMWSFirS6qv2"
ln.settings.transform.version = "0"
ln.track()

## Papalexi21

Let's use a MuData object.

MuData objects build on top of AnnData objects to store and serialize multimodal data.
More information can be found on the [MuData documentation](https://mudata.readthedocs.io/en/latest/).

In [None]:
mdata = ln.core.datasets.mudata_papalexi21_subset()

In [None]:
mdata

In [None]:
mdata.obs

## Standardize and validate metadata 

In [None]:
annotate = ln.Annotate.from_mudata(
    mdata,
    var_index={
        "rna": bt.Gene.symbol, # gene expression
        "adt": bt.CellMarker.name, # antibody derived tags reflecting surface proteins
        "hto": ln.Feature.name, # cell hashing
        "gdo": ln.Feature.name, # guide RNAs
    },
    categoricals={
        "perturbation": ln.ULabel.name, "replicate": ln.ULabel.name, # shared categoricals
        "hto:technique": bt.ExperimentalFactor.name # note this is a modality specific categorical
    }
)

In [None]:
# add new gene symbols from the ['rna'].var.index
annotate.add_new_from_var_index("rna")
# add new categories from the hto and gdo var.index
annotate.add_new_from_var_index("hto")
annotate.add_new_from_var_index("gdo")

In [None]:
# optional: register additional columns we'd like to annotate
annotate.add_new_from_columns(modality="rna")
annotate.add_new_from_columns(modality="adt")
annotate.add_new_from_columns(modality="hto")
annotate.add_new_from_columns(modality="gdo")

In [None]:
annotate.validate()

In [None]:
# add validated and new categories
annotate.add_new_from("perturbation")
annotate.add_new_from("replicate")
annotate.add_validated_from("technique", modality="hto")

In [None]:
annotate.validate()

## Register and annotate artifact

In [None]:
artifact = annotate.save_artifact(description="Sub-sampled MuData from Papalexi21")

In [None]:
artifact.describe()

In [None]:
artifact.view_lineage()

In [None]:
# clean up test instance
!lamin delete --force test-multimodal
!rm -r test-multimodal