# Ingest data: `db.Ingest`

Use the {class}`~lamindb.db.Ingest` class to ingest data of any format.

In [None]:
import lamindb as ln

ln.nb.header()

## Ingest files

Example: A single image file from [Paradisi *et al.* (2005)](https://bmcmolcellbiol.biomedcentral.com/articles/10.1186/1471-2121-6-27):

<img width="150" alt="Laminopathic nuclei" src="https://upload.wikimedia.org/wikipedia/commons/2/28/Laminopathic_nuclei.jpg">

In [None]:
filepath = ln.datasets.file_jpg_paradisi05()
filepath

Any ingestion starts by instantiating {class}`~lamindb.db.Ingest` with information about a data source.

Here, the data source is a Jupyter notebook and is automatically inferred. In {doc}`pipeline`, the data source is a computational pipeline.

In [None]:
ingest = ln.db.Ingest()

To track a file, stage it for ingestion via `ingest.add()`.

In [None]:
staged = ingest.add(filepath)

Each `.add(data)` operation creates an {class}`~lamindb.dev.db.Staged` object.

Among other attributes, it allows to access the data object entry that is be inserted into the database:

In [None]:
staged.dobject

We'll see how this is going to be relevant for linking metadata against the data object.

## Ingest in-memory data

In [None]:
import sklearn.datasets

Example: A `DataFrame` storing the iris dataset:

In [None]:
df = sklearn.datasets.load_iris(as_frame=True).frame

df.head()

When ingesting in-memory objects, a `name` argument needs to be passed:

In [None]:
ingest.add(df, name="iris");

## Linking features of ingested data

So far, we haven't enabled ourselves to select for the _features_ of ingested data, and used LaminDB like a data lake.

We'll typically use the term feature to denote all its related terms: variable (statistics), column/field (databases), and dimension (machine learning).

LaminDB can create links to underlying entities and behave much like a data warehouse if one passes a _feature model_ to `db.ingest`.

Example: An scRNA-seq count matrix in form of an `AnnData` object in memory

In [None]:
import scanpy as sc

In [None]:
adata = sc.read(ln.datasets.file_mouse_sc_lymph_node())

adata.var.head()

In [None]:
staged = ingest.add(adata, name="Mouse Lymph Node scRNA-seq")

The features in this dataset represent the entity `gene` and are indexed by Ensembl gene ids.

Bionty provides a number of feature models for all basic biological entities that are typically measured.

For linking against protein complexes, see a guide on [ingesting flow cytometry data with cell markers](https://lamin.ai/docs/db/faq/flow).

```{note}

[Bionty](https://lamin.ai/docs/bionty) is a biological data model generator based scientific databases.

- For an overview of feature models, see: [`bionty.lookup.feature_model`](https://lamin.ai/docs/bionty/bionty.lookup#bionty.lookup.feature_model))
- For an overview of gene ids, see: [`bionty.lookup.gene_id`](https://lamin.ai/docs/bionty/bionty.lookup#bionty.lookup.gene_id).
```

In [None]:
import bionty as bt

feature_model = bt.Gene(
    id=bt.lookup.gene_id.ensembl_gene_id,
    species=bt.lookup.species.mouse,
)

The `feature_model` links features against a reference, here, the gene reference [`bionty.Gene`](https://lamin.ai/docs/bionty/bionty.gene#bionty.Gene).

Ingesting data with a `feature_model` enables selecting for features with a number of ids, names, and feature properties.

For example, here we ingest genes with their Ensembl ids, but we can also select for them based on [gene symbol, NCBI ids, gene type, etc](https://lamin.ai/docs/db/guide/select-load#Select-data-objects-by-linked-entities).

In [None]:
staged.link_features(feature_model)

Here, all 10000 features were successfully (unambiguously) linked against their canonical reference in `bionty.Gene`.

## Complete ingestion

Before completing the ingestion, let's check what we staged:

In [None]:
ingest.status()

Let's now commit these data to the DB:

In [None]:
ingest.commit()

We see that several links are made in the background: the data object is associated with its source (this Jupyter notebook, `jupynb`) and the user who operates the notebook.