![scrna1/6](https://img.shields.io/badge/scrna1/6-lightgrey)
[![Jupyter Notebook](https://img.shields.io/badge/Source%20on%20GitHub-orange)](https://github.com/laminlabs/lamin-usecases/blob/main/docs/scrna.ipynb)
[![lamindata](https://img.shields.io/badge/Source%20%26%20report%20on%20LaminHub-mediumseagreen)](https://lamin.ai/laminlabs/lamindata/transform/Nv48yAceNSh87CpJ/QtqH9DLlh3aEUnYu4oe7)

# scRNA-seq

You'll learn how to manage a growing number of scRNA-seq datasets as a single queryable & batch-iterable collection.

Along the way, you'll see how to create reports, leverage data lineage, and query individual datasets.

If you're only interested in _using_ a large curated scRNA-seq collection, see the [CELLxGENE Census guide](inv:docs#cellxgene).

Here, you will:

1. create an {class}`~lamindb.Artifact` from an `AnnData` object and seed a growing {class}`~lamindb.Collection` with it (![scrna1/6](https://img.shields.io/badge/scrna1/6-lightgrey), current page)
2. append a new dataset and create a new version of this collection ([![scrna2/6](https://img.shields.io/badge/scrna2/6-lightgrey)](/scrna2))
3. query & inspect artifacts by metadata individually ([![scrna3/6](https://img.shields.io/badge/scrna3/6-lightgrey)](/scrna3))
4. load the joint collection and save analytical results ([![scrna4/6](https://img.shields.io/badge/scrna4/6-lightgrey)](/scrna4))
5. iterate over the collection and train a model ([![scrna5/6](https://img.shields.io/badge/scrna5/6-lightgrey)](/scrna5))
6. discuss converting a collection to a single TileDB SOMA store of the same data ([![scrna6/6](https://img.shields.io/badge/scrna6/6-lightgrey)](/scrna6))

```{toctree}
:maxdepth: 1
:hidden:

scrna2
scrna3
scrna4
scrna5
scrna6
```

In [None]:
# !pip install 'lamindb[jupyter,aws,bionty]' 
!lamin init --storage ./test-scrna --schema bionty

In [None]:
import lamindb as ln
import bionty as bt

ln.context.uid = "Nv48yAceNSh80000"
ln.context.track()

## Populate metadata registries based on an artifact

Let us look at the standardized data of [Conde _et al._, Science (2022)](https://doi.org/10.1126/science.abl5197), [available from CELLxGENE](https://cellxgene.cziscience.com/collections/62ef75e4-cbea-454e-a0ce-998ec40223d3). {func}`~lamindb.core.datasets.anndata_human_immune_cells` loads a subsampled version:

In [None]:
adata = ln.core.datasets.anndata_human_immune_cells()
adata

Let's curate this artifact:

In [None]:
curate = ln.Curate.from_anndata(
    adata, 
    var_index=bt.Gene.ensembl_gene_id, 
    categoricals={
        adata.obs.donor.name: ln.ULabel.name, 
        adata.obs.tissue.name: bt.Tissue.name, 
        adata.obs.cell_type.name: bt.CellType.name, 
        adata.obs.assay.name: bt.ExperimentalFactor.name
    }, 
    organism="human",
)

In [None]:
curate.validate()

In [None]:
curate.add_validated_from("all")

In [None]:
curate.add_new_from_var_index()

In [None]:
curate.add_new_from("donor")
curate.add_new_from("cell_type")

In [None]:
curate.validate()

When we create a {class}`~lamindb.Artifact` object from an `AnnData`, we automatically curate it with validated features and labels:

In [None]:
artifact = curate.save_artifact(description="Human immune cells from Conde22")

It is annotated with rich metadata:

In [None]:
artifact.describe()

You can also see the types of the metadata:

In [None]:
artifact.describe(print_types=True)

## Seed a collection

Let's create a first version of a collection that will encompass many `h5ad` files when more data is ingested.

```{note}

To see the result of the incremental growth, take a look at the [CELLxGENE Census guide](inv:docs#cellxgene) for an instance with ~1k h5ads and ~50 million cells.

```

In [None]:
collection = ln.Collection(
    artifact, name="My versioned scRNA-seq collection", version="1"
)
collection.save()

For this version 1 of the collection, collection and artifact match each other. But they're independently tracked and queryable through their registries:

In [None]:
collection.describe()

Access the underlying artifacts like so:

In [None]:
collection.artifacts

See data lineage:

In [None]:
collection.view_lineage()