[![Jupyter Notebook](https://img.shields.io/badge/Jupyter%20Notebook-orange)](https://github.com/laminlabs/cellxgene-census-lamin/blob/main/docs/03-cellxgene-census.ipynb)
[![census](https://img.shields.io/badge/laminlabs/cellxgene--census-mediumseagreen)](https://lamin.ai/laminlabs/cellxgene-census)

# cellxgene-census

`cellxgene-census` is a Python client to query the concatenated cellxgene datasets.

This notebook shows how to query registered h5ad files from metadata.

For more background, see:

- [CELLxGENE Census](https://chanzuckerberg.github.io/cellxgene-census/)
- [TileDB-SOMA](https://github.com/single-cell-data/TileDB-SOMA)

## Setup



First, load the public instance:

In [None]:
!lamin load laminlabs/cellxgene-census

In [None]:
import lamindb as ln
import lnschema_bionty as lb

In [None]:
lb.settings.organism = "human"

## Search metadata

In [None]:
lb.CellType.search("effector Tcell").head()

## Ontological hierarchies

In [None]:
teff = lb.CellType.filter(uid="yvHkIrVI").one()

In [None]:
teff.view_parents(distance=2, with_children=True)

In [None]:
teff.children.df()

## Query `H5AD` files by metadata

In [None]:
features = ln.Feature.lookup()
assays = lb.ExperimentalFactor.lookup()
cell_types = lb.CellType.lookup()
tissues = lb.Tissue.lookup()
ulabels = ln.ULabel.lookup()
suspension_types = ulabels.is_suspension_type.children.all().lookup()

In [None]:
query = (
    ln.File.filter(
        organism=lb.settings.organism,
        cell_types__name__in=[
            cell_types.dendritic_cell.name,
            cell_types.neutrophil.name,
        ],
        tissues=tissues.kidney,
        ulabels=suspension_types.cell,
        experimental_factors=assays.ln_10x_3_v2,
    )
    .order_by("size")
    .distinct()
)

Display all search result as a `DataFrame`:

```python

query.df()
```

## Access a queried `H5AD` file

In [None]:
file = query.first()
file

Optionally:

[Search a file on the UI](https://lamin.ai/laminlabs/cellxgene-census/records/core/File) and fetch it through uid:

```python

file = ln.File.filter(uid='...').one()
```

Query for a collection you found from https://cellxgene.cziscience.com/collections:

```python
ln.File.filter(ulabels__name__contains="Mapping single-cell transcriptomes in the intra-tumoral and associated territories of kidney cancer").one()
```

Note that most recent collections may not have been added yet.

Describe all linked metadata:

In [None]:
file.describe()

Access all registered features (standardized obs columns):

In [None]:
file.features

Get labels from a feature:

In [None]:
features = ln.Feature.lookup()

In [None]:
# tissues
file.labels.get(features.tissue).df()

In [None]:
# check the corresponding collection/publication
file.labels.get(features.collection).one()

Use `file.backed()`, `file.load()` to access the underlying `h5ad` file:

See {class}`~lamindb.File` for details.

In [None]:
file.backed()

In [None]:
adata = file.load()

In [None]:
adata

If you are interested in how the human part of the instance was created: see {doc}`census-registries`.

If you are interested in querying from `cellxgene-census` using LaminDB registries: see {doc}`query-census`.

If you want to see the full docs, see [here](https://cellxgene-census-lamin-c192.netlify.app/notebooks).

```{toctree}
:maxdepth: 1
:hidden:

census-registries
query-census
```