# Link data to biological features

When ingesting a dataset, a feature_model can be provided so that the data is queriable with standardized features.

In [None]:
import lamindb as db
from bionty import Gene, lookup
import scanpy as sc

db.header()

## Dataset

Here we have two datasets:
- data1 is a flow cytometry dataset, present in the .fcs format as a file
- data2 is a scrRNAseq present as AnnData format in memory, its variables contain a column of ensembl ids

In [None]:
from urllib.request import urlretrieve

data1, _ = urlretrieve(
    "https://lamindb-test.s3.amazonaws.com/example.fcs", "example.fcs"
)
data1

Note that gene id column name must match the database field, you can look them up in `bt.lookup.gene_ids.`

In [None]:
data2 = sc.datasets.pbmc3k()
data2.var.rename(columns={"gene_ids": lookup.gene_ids.ensembl_gene_id}, inplace=True)
data2.var.head()

## Curate features

For data1, we specify the feature model using bionty Gene with id as hgnc_symbol

In [None]:
feature_model1 = Gene(id=lookup.gene_ids.hgnc_symbol)

In [None]:
db.do.ingest.add(data1, feature_model=feature_model1)

For data2, we'd like to ingest features based on the ensembl ids

In [None]:
feature_model2 = Gene(id=lookup.gene_ids.ensembl_gene_id)

In [None]:
db.do.ingest.add(data2, name="scanpy_pbmc3k", feature_model=feature_model2)

In [None]:
db.do.ingest.status

`.logs` stores info of the mapped features

In [None]:
next(iter(db.do.ingest.logs.items()))

Finalize the ingestion.

In [None]:
db.do.ingest.commit()