# Query using `tiledbsoma`

The [first guide](cellxgene) showed how to query for `AnnData` objects.

This guide queries "Census", i.e., a `tiledbsoma` array store that concatenates many `AnnData` objects.

Load your LaminDB instance for quering data:

In [None]:
!lamin load laminlabs/cellxgene

In [None]:
import lamindb as ln
import bionty as bt
import tiledbsoma

census_version = "2024-07-01"

## Query data

Create look ups so that we can auto-complete valid values:

In [None]:
features = ln.Feature.lookup(return_field="name")
assays = bt.ExperimentalFactor.lookup(return_field="name")
cell_types = bt.CellType.lookup(return_field="name")
tissues = bt.Tissue.lookup(return_field="name")
ulabels = ln.ULabel.lookup()
suspension_types = ulabels.is_suspension_type.children.all().lookup(return_field="name")

Create a query expression for a `tiledbsoma` array store.

In [None]:
value_filter = (
    f'{features.tissue} == "{tissues.brain}" and {features.cell_type} in'
    f' ["{cell_types.microglial_cell}", "{cell_types.neuron}"] and'
    f' {features.suspension_type} == "{suspension_types.cell}" and {features.assay} =='
    f' "{assays.ln_10x_3_v3}"'
)
value_filter

Query for the `tiledbsoma` array store that contains all concatenated expression data.

In [None]:
census = ln.Artifact.filter(description=f"Census {census_version}").one()

Query slices within the array store. (This will run a lot faster from within the AWS `us-west-2` data center.)

In [None]:
human = "homo_sapiens"  # subset to human data

# open the array store for queries
with census.open() as store:
    # read SOMADataFrame as a slice
    cell_metadata = store["census_data"][human].obs.read(value_filter=value_filter)
    # concatenate results to pyarrow.Table
    cell_metadata = cell_metadata.concat()
    # convert to pandas.DataFrame
    cell_metadata = cell_metadata.to_pandas()

cell_metadata.shape

In [None]:
cell_metadata.head()

## Create an `AnnData`

In [None]:
with census.open() as store:
    
    experiment = store["census_data"][human]
    
    adata = experiment.axis_query(
        "RNA",
        obs_query=tiledbsoma.AxisQuery(value_filter=value_filter)
    ).to_anndata(
        X_name="raw",
        column_names={
            "obs": [
                features.assay,
                features.cell_type,
                features.tissue,
                features.disease,
                features.suspension_type,
            ]
        }
    )

In [None]:
adata.var = adata.var.set_index("feature_id")
adata

In [None]:
adata.var.head()

In [None]:
adata.obs.head()