# Query from cellxgene-census server

`cellxgene-census` is a Python client to query the concatenated cellxgene datasets.

This notebook shows how to query `cellxgene-census` using LaminDB registries.

For more background, see:

- [CELLxGENE Census](https://chanzuckerberg.github.io/cellxgene-census/)
- [TileDB-SOMA](https://github.com/single-cell-data/TileDB-SOMA)

## Setup



First, load the public instance:

```bash
lamin load laminlabs/cellxgene-census
```

In [None]:
import lamindb as ln
import lnschema_bionty as lb
import cellxgene_census

In [None]:
lb.settings.organism = "human"
human = lb.settings.organism.scientific_name

## Query datasets by metadata

In [None]:
modalities = ln.Modality.lookup()
features = ln.Feature.lookup(return_field="name")
assays = lb.ExperimentalFactor.lookup(return_field="name")
cell_types = lb.CellType.lookup(return_field="name")
tissues = lb.Tissue.lookup(return_field="name")
ulabels = ln.ULabel.lookup()
suspension_types = ulabels.is_suspension_type.children.all().lookup(return_field="name")

In [None]:
value_filter = (
    f'{features.tissue} == "{tissues.brain}" and {features.cell_type} in'
    f' ["{cell_types.microglial_cell}", "{cell_types.neuron}"] and'
    f' {features.suspension_type} == "{suspension_types.cell}" and {features.assay} =='
    f' "{assays.ln_10x_3_v3}"'
)

In [None]:
value_filter

In [None]:
%%time

with cellxgene_census.open_soma() as census:
    # Reads SOMADataFrame as a slice
    cell_metadata = census["census_data"][human].obs.read(value_filter=value_filter)

    # Concatenates results to pyarrow.Table
    cell_metadata = cell_metadata.concat()

    # Converts to pandas.DataFrame
    cell_metadata = cell_metadata.to_pandas()

In [None]:
cell_metadata.shape

In [None]:
cell_metadata.head()

## Fetch AnnData from census based on filters

In [None]:
%%time

with cellxgene_census.open_soma() as census:
    adata = cellxgene_census.get_anndata(
        census=census,
        organism=human,
        obs_value_filter=value_filter,
        column_names={
            "obs": [
                features.assay,
                features.assay_ontology_term_id,
                features.cell_type,
                features.cell_type_ontology_term_id,
                features.tissue,
                features.tissue_ontology_term_id,
                features.disease,
                features.disease_ontology_term_id,
                features.suspension_type,
            ]
        },
    )

In [None]:
adata.var = adata.var.set_index("feature_id")

In [None]:
adata

In [None]:
adata.var.head()

In [None]:
adata.obs.head()

## Register the queried data in LaminDB

In [None]:
ln.track()

Register `AnnData`:

In [None]:
file = ln.File.from_anndata(
    adata,
    description=(
        "microglial and neuron cell data from 10x 3' v3 in brain queried from Census"
    ),
    field=lb.Gene.ensembl_gene_id,
    modality=modalities.rna,
)

In [None]:
file.save()

Link validated metadata:

In [None]:
feature_records = features.dict()

for col in adata.obs.columns:
    if not col.endswith("ontology_term_id"):
        file.labels.add(adata.obs[col], feature_records.get(col))

In [None]:
file.describe()

In [None]:
# clean up test instance
!lamin delete --force test-census
!rm -r ./test-census