![scrna3/6](https://img.shields.io/badge/scrna3/6-lightgrey)
[![Jupyter Notebook](https://img.shields.io/badge/Source%20on%20GitHub-orange)](https://github.com/laminlabs/lamin-usecases/blob/main/docs/scrna2.ipynb)
[![lamindata](https://img.shields.io/badge/Source%20%26%20report%20on%20LaminHub-mediumseagreen)](https://lamin.ai/laminlabs/lamindata/record/core/Transform?id=agayZTonayqAz8)

# Query individual files

Here, we'll query individual files and inspect their metadata.

This guide can be skipped if you are only interested in how to leverage the overall dataset.

In [None]:
import lamindb as ln
import lnschema_bionty as lb
import anndata as ad

In [None]:
ln.track()

## Query files by provenance metadata

In [None]:
users = ln.User.lookup()

In [None]:
ln.Transform.filter(created_by=users.testuser1).search("scrna")

In [None]:
transform = ln.Transform.filter(uid="Nv48yAceNSh8z8").one()

In [None]:
ln.File.filter(transform=transform).df()

## Query files by biological metadata 

In [None]:
assays = lb.ExperimentalFactor.lookup()
organism = lb.Organism.lookup()
cell_types = lb.CellType.lookup()

In [None]:
query = ln.File.filter(
    experimental_factors=assays.single_cell_rna_sequencing,
    organism=organism.human,
    cell_types=cell_types.gamma_delta_t_cell,
)

In [None]:
query.df()

## Inspect file metadata

In [None]:
query_set = ln.File.filter().all()

file1, file2 = query_set[0], query_set[1]

In [None]:
file1.describe()

In [None]:
file1.view_flow()

In [None]:
file2.describe()

In [None]:
file2.view_flow()

## Compare features

Here we compute shared genes without loading files:

In [None]:
file1_genes = file1.features["var"]
file2_genes = file2.features["var"]

shared_genes = file1_genes & file2_genes
len(shared_genes)

In [None]:
shared_genes.list("symbol")[:10]

## Compare cell types

In [None]:
file1_celltypes = file1.cell_types.all()
file2_celltypes = file2.cell_types.all()

shared_celltypes = file1_celltypes & file2_celltypes
shared_celltypes_names = shared_celltypes.list("name")
shared_celltypes_names

## Load the individual files

We could either load the files into memory or access them in `backed` mode through `.backed()` to lazily load their content from the cloud or the disk.display_markdown

Let's load them into memory:

In [None]:
adata1 = file1.load()
adata2 = file2.load()

We can now subset the two datasets by shared cell types:

In [None]:
adata1_subset = adata1[adata1.obs["cell_type"].isin(shared_celltypes_names)]

adata2_subset = adata2[adata2.obs["cell_type"].isin(shared_celltypes_names)]