# Ingest data: `db.ingest`

Use the {class}`~lamindb.db.ingest` class to ingest data of any format.

In [None]:
import lamindb as ln
import sklearn.datasets
import scanpy as sc

ln.nb.header()

## Ingest files

Example: A single image file from [Paradisi *et al.* (2005)](https://bmcmolcellbiol.biomedcentral.com/articles/10.1186/1471-2121-6-27):

<img width="150" alt="Laminopathic nuclei" src="https://upload.wikimedia.org/wikipedia/commons/2/28/Laminopathic_nuclei.jpg">

In [None]:
filepath = ln.datasets.file_jpg_paradisi05()
filepath

To track this dataset, stage it for ingestion:

In [None]:
ln.db.ingest.add(filepath);

Check what we staged:

In [None]:
ln.db.ingest.status()

## Ingest in-memory data

Example: A `DataFrame` storing the iris dataset:

In [None]:
df = sklearn.datasets.load_iris(as_frame=True).frame

df.head()

When ingesting in-memory objects, a `name` argument needs to be passed:

In [None]:
ln.db.ingest.add(df, name="iris");

## Ingest with feature models

So far, we haven't enabled ourselves to query for the features[^features] of ingested data, and used LaminDB like a data lake.

[^features]: We'll mostly use the term feature for synonyms variable (statistics), column and field (databases), dimension (machine learning).

By passing a _feature model_ to `db.ingest`, LaminDB creates links[^relations] to underlying entities and behaves much like a data warehouse.

[^relations]: We mostly use the term link for synonyms relations and references.

Example: An scRNA-seq count matrix in form of an `AnnData` object in memory

In [None]:
data = sc.read(ln.datasets.file_mouse_sc_lymph_node())

data.var.head()

In [None]:
ingest3 = ln.db.ingest.add(data, name="Mouse Lymph Node scRNA-seq")

The features in this dataset represent the entity `gene` and are indexed by Ensembl gene ids.

Bionty provides a number of feature models for all basic biological entities that are typically measured.

For linking against protein complexes, see a guide on [ingesting flow cytometry data with cell markers](https://lamin.ai/docs/db/faq/flow).

```{note}

[Bionty](https://lamin.ai/docs/bionty) is a data model generator for biology based on knowledge from scientific databases.

- For an overview of feature models, see: [`bionty.lookup.feature_model`](https://lamin.ai/docs/bionty/bionty.lookup#bionty.lookup.feature_model))
- For an overview of gene ids, see: [`bionty.lookup.gene_id`](https://lamin.ai/docs/bionty/bionty.lookup#bionty.lookup.gene_id).
```

In [None]:
import bionty as bt

In [None]:
feature_model = bt.Gene(
    id=bt.lookup.gene_id.ensembl_gene_id,
    species=bt.lookup.species.mouse,
)

The `feature_model` links features against a reference, here, the gene reference [`bionty.Gene`](https://lamin.ai/docs/bionty/bionty.gene#bionty.Gene).

Ingesting data with a `feature_model` enables querying for features with a number of ids, names, and feature properties.

For example, here we ingest genes with their Ensembl ids, but we can also query for them based on [gene symbol, NCBI ids, gene type, etc](https://lamin.ai/docs/db/guide/query-load#Query-data-objects-by-linked-entities).

In [None]:
ingest3.link.features(feature_model)

Here, all 10000 features were successfully (unambiguously) linked against their canonical reference in `bionty.Gene`.

We can retrieve the integrity information:

In [None]:
ingest3.feature_model

## Complete ingestion

Before completing the ingestion, let's check what we staged:

In [None]:
ln.db.ingest.status()

Let's now commit these data to the DB:

In [None]:
ln.db.ingest.commit()

We see that several links are made in the background: the data object is associated with its source (this Jupyter notebook, `jupynb`) and the user who operates the notebook.