# Quickstart

Run the following on the command line to install and set up a user identity that's linked to data and analysis (similar to a GitHub username):
```
pip install lamindb
lndb signup <email>
```
After confirming the signup email, run
```
lndb login <handle/username>
```
and
```
lndb init --storage mydb
```

If you want to plug in an S3 or a Google Cloud bucket, just supply `--storage s3://my-bucket` or `--storage gs://my-bucket`.

In [None]:
import lamindb as ln

ln.nb.header()

## Ingest files

Ingest a simple image file from [Paradisi *et al.* (2005)](https://bmcmolcellbiol.biomedcentral.com/articles/10.1186/1471-2121-6-27):

<img width="150" alt="Laminopathic nuclei" src="https://upload.wikimedia.org/wikipedia/commons/2/28/Laminopathic_nuclei.jpg">

In [None]:
filepath = ln.datasets.file_jpg_paradisi05()
filepath

To track this dataset, stage it for ingestion via `.add`:

In [None]:
ln.db.ingest.add(filepath)

## Ingest in-memory data

You can also ingest a data object loaded into memory, for instance, a `DataFrame` here:

In [None]:
import sklearn.datasets

df = sklearn.datasets.load_iris(as_frame=True).frame

df.head()

When ingesting in-memory objects, a `name` parameter needs to be passed:

In [None]:
ln.db.ingest.add(df, name="iris")

Upon ingestion, the data object will be saved as a corresponding file format. In this case, a dataframe is saved as a `.feather` file in LaminDB. See [here](https://lamin.ai/docs/lnschema-core/lnschema_core.dobject) for more details!

In [None]:
ln.db.ingest.status

## Ingest with feature models

So far, we haven't enabled ourselves to query for the features of ingested data, and used LaminDB like a data lake.

By providing _feature models_ at ingestion, can use LaminDB as a queryable data warehouse that stores links and monitor data integrity.

A feature model creates a link between a feature and a reference table that defines the entity underlying the feature.

Example:

In [None]:
import scanpy as sc

data = sc.read(ln.datasets.file_mouse_sc_lymph_node())

data.var.head()

The features in this dataset represent the entity `gene` and are indexed by Ensembl gene ids.

Bionty provides a number of feature models for all basic biological entities that are typically measured. Below we show an example of genes as entities. Also see [ingesting flow cytometry data with cell markers](https://lamin.ai/docs/db/faq/flow).

```{note}

[Bionty](https://lamin.ai/docs/bionty) is a data model generator for biology based on knowledge from scientific databases.

- For an overview of feature models, see: [`bionty.lookup.feature_model`](https://lamin.ai/docs/bionty/bionty.lookup#bionty.lookup.feature_model))
- For an overview of gene ids, see: [`bionty.lookup.gene_id`](https://lamin.ai/docs/bionty/bionty.lookup#bionty.lookup.gene_id).
```

In [None]:
import bionty as bt

In [None]:
feature_model = bt.Gene(
    id=bt.lookup.gene_id.ensembl_gene_id,
    species=bt.lookup.species.mouse,
)

The `feature_model` links features against a reference, here, the gene reference [`bionty.Gene`](https://lamin.ai/docs/bionty/bionty.gene#bionty.Gene).

Ingesting data with a `feature_model` enables querying for features with a number of ids, names, and feature properties.

For example, here we ingest genes with their Ensembl ids, but we can also query for them based on [gene symbol, NCBI ids, gene type, etc](https://lamin.ai/docs/db/guide/query-load#Query-data-objects-by-linked-entities).

In [None]:
ln.db.ingest.add(
    data,
    name="Mouse Lymph Node scRNA-seq",
    feature_model=feature_model,
)

Here, all 10000 features were successfully (unambiguously) linked against their canonical reference in `bionty.Gene`.

We can retrieve the integrity information:

## Complete ingestion

Before completing the ingestion, let's check what we staged:

In [None]:
ln.db.ingest.status

Let's now commit these data to the DB:

In [None]:
ln.db.ingest.commit()

We see that several links are made in the background: the data object is associated with its source (this Jupyter notebook, `jupynb`) and the user who operates the notebook.

## Load and query data

You can now load and query data, for instance, via:
```
dobject = ln.db.query.dobject(name="iris").first()
df = ln.db.load(dobject)
```

For many more features, see [lamin.ai/docs/db/guide](https://lamin.ai/docs/db/guide).