# Linking scRNA-seq data against `Gene`

In [None]:
!lamin delete test-scrna
!lamin init --storage ./test-scrna --schema bionty

In [None]:
import lamindb as ln
import lnschema_bionty as lb

ln.track()

Consider an scRNA-seq count matrix in form of an `AnnData` object in memory:

In [None]:
adata = ln.dev.datasets.anndata_mouse_sc_lymph_node()

In [None]:
adata

Check out the features of this dataset:

In [None]:
adata.var.head()

## Parse features

The features in this data object are genes and indexed by Ensembl gene ids. We'd like to link these features so that we can query the data by genes!

Features are often knowledge-based entities. [Bionty](https://lamin.ai/docs/bionty) provides several knowledge-based tables for basic biological entities.

```{note}

- For an overview of knowledge tables, see: {mod}`~bionty`.
```

Now let's parse the features from the data into the Gene Ensembl id:

In [None]:
featureset = ln.Featureset.from_iterable(
    adata.var.index, lb.Gene.ensembl_gene_id, species="mouse"
)  # Don't forget to specify species here, default is "human"

Commit the featureset to the databse:

In [None]:
ln.save(featureset);

Here, all features were successfully (unambiguously) linked against their canonical reference in `bionty.Gene`.

This creates a feature set of type `gene` linked (indexed by its hash):

In [None]:
featureset

This feature set links records for 10k genes. Here are the first 3, all of which can be queried:

In [None]:
featureset.genes.values_list("symbol", flat=True)[:5]

Hence, not just for Ensemble IDs, but also by gene symbol, NCBI ids, gene type, etc.

## Track data with features (genes)

Now we can track data together with featureset by link them:

In [None]:
file = ln.File(adata, name="Mouse Lymph Node scRNA-seq")

In [None]:
ln.save(file);

In [None]:
file.featuresets.add(featureset)

The features can now be accessed via relationship to dobejct:

In [None]:
file.featuresets.values_list()

## Querying data by features

```{seealso}

Basic queries: {doc}`/guide/select`

```

Let us query gene records by symbol:

In [None]:
ln.select(lb.Gene, symbol="Actg1").df()

Query all feature sets that contain the gene:

In [None]:
ln.select(ln.Featureset).filter(genes__symbol="Actg1").df()

Query files whose featuresets contain the gene:

In [None]:
ln.select(ln.File).filter(featuresets__genes__symbol="Actg1").df()