![scrna4/6](https://img.shields.io/badge/scrna4/6-lightgrey)
[![Jupyter Notebook](https://img.shields.io/badge/Source%20on%20GitHub-orange)](https://github.com/laminlabs/lamin-usecases/blob/main/docs/scrna3.ipynb)
[![lamindata](https://img.shields.io/badge/Source%20%26%20report%20on%20LaminHub-mediumseagreen)](https://lamin.ai/laminlabs/transform/mfWKm8OtAzp8z8)

# Analyze a collection in memory

Here, we'll analyze the growing collection by loading it into memory.
This is only possible if it's not too large.
If your data is large, you'll likely want to iterate over the collection to train a model, the topic of the next page ([![scrna5/6](https://img.shields.io/badge/scrna5/6-lightgrey)](/scrna5)).

In [None]:
import lamindb as ln
import bionty as bt
import anndata as ad

In [None]:
ln.settings.transform.stem_uid = "mfWKm8OtAzp8"
ln.settings.transform.version = "1"
ln.track()

In [None]:
ln.Collection.df()

In [None]:
collection = ln.Collection.filter(
    name="My versioned scRNA-seq collection", version="2"
).one()

In [None]:
collection.artifacts.df()

If the collection isn't too large, we can now load it into memory.

Under-the-hood, the `AnnData` objects are concatenated during loading.

The amount of time this takes depends on a variety of factors.

If it occurs often, one might consider storing a concatenated version of the collection, rather than the individual pieces.

In [None]:
adata = collection.load()

The default is an outer join during concatenation as in pandas:

In [None]:
adata

The `AnnData` has the reference to the individual artifacts in the `.obs` annotations:

In [None]:
adata.obs.artifact_uid.cat.categories

We can easily obtain ensemble IDs for gene symbols using the look up object:

In [None]:
genes = bt.Gene.lookup(field="symbol")

In [None]:
genes.itm2b.ensembl_gene_id

Let us create a plot:

In [None]:
import scanpy as sc

sc.pp.pca(adata, n_comps=2)

In [None]:
sc.pl.pca(
    adata,
    color=genes.itm2b.ensembl_gene_id,
    title=(
        f"{genes.itm2b.symbol} / {genes.itm2b.ensembl_gene_id} /"
        f" {genes.itm2b.description}"
    ),
    save="_itm2b",
)

We could save a plot as a pdf and then see it in the flow diagram:

In [None]:
artifact = ln.Artifact("./figures/pca_itm2b.pdf", description="My result on ITM2B")
artifact.save()
artifact.view_lineage()

But given the image is part of the notebook, we can also rely on the report that we create when saving the notebook via the command line via:

```
lamin save <notebook_path>
```

To see the current notebook, visit: [lamin.ai/laminlabs/transform/mfWKm8OtAzp8z8](https://lamin.ai/laminlabs/transform/mfWKm8OtAzp8z8)

![](https://lamin-site-assets.s3.amazonaws.com/.lamindb/RGXj5wcAf7EAc6J8aBoM.png)