[![hub](https://img.shields.io/badge/Source%20%26%20report%20-mediumseagreen)](https://lamin.ai/laminlabs/arc-virtual-cell-atlas/transform/l6GZa1J999W5)

# Arc Virtual Cell Atlas: scRNA-seq

The [Arc Virtual Cell Atlas](https://github.com/ArcInstitute/arc-virtual-cell-atlas) hosts one of the biggest collections of scRNA-seq datasets.

Lamin mirrors the dataset for simplified access here: https://lamin.ai/laminlabs/arc-virtual-cell-atlas

If you use the data academically, please cite the original publications, [Youngblut _et al._ (2025)](https://arcinstitute.org/manuscripts/scBaseCamp) and [Zhang _et al._ (2025)](https://biorxiv.org/10.1101/2025.02.20.639398).

Connect to the public LaminDB instance that mirrors cellxgene:

In [None]:
# pip install 'lamindb[jupyter,bionty,wetlab,gcp]'
!lamin connect laminlabs/arc-virtual-cell-atlas

In [None]:
import lamindb as ln
import bionty as bt
import wetlab as wl

## Metadata

21 organisms.

In [None]:
bt.Organism.df()

50 cell lines.

In [None]:
bt.CellLine.df()

100 compounds.

In [None]:
wl.Compound.df()

## The Tahoe collection

Every individual dataset in the atlas is an `.h5ad` file that is registered as an artifact in LaminDB.

Let us first query for the `Tahoe` collection.

In [None]:
collection = ln.Collection.get(key="tahoe100")
collection.artifacts.df()

Each of the datasets were validated with the same schema:

In [None]:
schema = ln.Schema.get(name="tahoe100_anndata_schema")

In [None]:
ln.Artifact.filter(schema=schema).df()

Here is how the features in the dataset look like:

In [None]:
artifact = ln.Artifact.filter(schema=schema).first()  # the first in the collection
artifact.describe()

The genes are indexed with a "stable ID", a unique mix of Ensembl gene ID and gene symbol.

Every `AnnData` object measures a broad range of perturbations, biosamples and cell lines. The plates are approximate replicates for each other.

You can download an `.h5ad` into your local cache like so:

```python
artifact.cache()
```

Note that unlike what the suffix suggests, the `.h5ad` is presently _not_ compressed.