# Introduction

```{include} ../README.md
:start-line: 6
:end-line: -4
```

:::{dropdown} LaminDB features

```{include} features-lamindb.md
```
:::

LaminHub is a data collaboration hub built on LaminDB similar to how GitHub is built on git.

:::{dropdown} LaminHub features

```{include} features-laminhub.md
```
:::

LaminHub is free for public data. Enterprise features, support, integration tests & wetlab plug-ins hosted in your or our infrastructure are available on a paid plan: please [reach out](https://lamin.ai/contact)!

## Quickstart

```{warning}

Public beta: Close to having converged a stable API, but some breaking changes might still occur.

```

### Setup

[Sign up](https://lamin.ai/signup-form) for a free account (see more [info](https://lamin.ai/docs/setup)).

On the command line, run `pip install 'lamindb[jupyter,bionty]'` and `lamin login <email> --password <password>`.

Init a LaminDB instance like you'd init a git repository:

In [None]:
!lamin init --schema bionty --storage ./lamin-intro  # or s3://my-bucket, gs://my-bucket as default storage

Because we passed `--schema bionty`, this instance mounted plug-in {mod}`lnschema_bionty`.

### Register a dataset

In [None]:
import lamindb as ln
import pandas as pd

# track data flow through current notebook
ln.track()

# access a new batch of data
df = pd.DataFrame(
    {"CD8": [1, 2, 3], "CD45": [3, 4, 5], "perturbation": ["DMSO", "IFNG", "DMSO"]}
)

# create a dataset
dataset = ln.Dataset(df, name="Immune phenotyping 1")
# register dataset
dataset.save()

### Access a dataset

In [None]:
# search a dataset
ln.Dataset.search("immune")

# query a dataset
dataset = ln.Dataset.filter(name__contains="phenotyping 1").one()

# view data flow
dataset.view_flow()

# describe metadata
dataset.describe()

# load the dataset
df = dataset.load()

### Validate & annotate a dataset

Validate the column names in a `DataFrame` _schema-less_:

In [None]:
# define validation criteria
names_types = [("CD8", "number"), ("CD45", "number"), ("perturbation", "category")]

# save validation criteria as features
features = [ln.Feature(name=name, type=type) for (name, type) in names_types]
ln.save(features)

# create dataset & validate features
dataset = ln.Dataset.from_df(df, name="Immune phenotyping 1")
# register dataset & link validated features
dataset.save()

# access linked features
dataset.features

Use the {mod}`lnschema_bionty` plug-in to type biological entities and validate column names _schema-full_:

In [None]:
# requires the 'bionty' schema
import lnschema_bionty as lb

# set a global species for multi-species registries
lb.settings.species = "human"

# create cell marker records from the public ontology
cell_markers = [lb.CellMarker.from_bionty(name=name) for name in ["CD8", "CD45"]]
ln.save(cell_markers)

# create dataset & validate features
dataset = ln.Dataset.from_df(
    df.iloc[:, :2], name="Immune phenotyping 2", field=lb.CellMarker.name
)
# register dataset & link validated features
dataset.save()

dataset.features

### Query for annotations

Query for a panel of cell markers & the linked datasets:

In [None]:
# an object to auto-complete cell markers
cell_markers = lb.CellMarker.lookup()

# all cell marker panel containing CD45
panels_with_cd45 = ln.FeatureSet.filter(cell_markers=cell_markers.cd45).all()

# all datasets measuring CD45
ln.Dataset.filter(feature_sets__in=panels_with_cd45).df()

### Annotate with biological labels

Use the Experimental Factor Ontology to link a validated label for the readout:

In [None]:
# search the public ontology from the bionty store
lb.ExperimentalFactor.bionty().search("facs").head(2)

# create a record for facs
facs = lb.ExperimentalFactor.from_bionty(ontology_id="EFO:0009108")
facs.save()

# label with an inhouse assay
immune_assay1 = lb.ExperimentalFactor(name="Immune phenotyping assay 1")
immune_assay1.save()

dataset.experimental_factors.add(facs, immune_assay1)

# create a tissue from a public ontology
bone_marrow = lb.Tissue.from_bionty(name="bone marrow")
bone_marrow.save()

dataset.tissues.add(bone_marrow)

dataset.describe()

## More examples

### Understand data flow

View the sequence of data transformations ({class}`~lamindb.Transform`) in a project (from [here](docs:project-flow), based on [Schmidt _et al._, 2022](https://pubmed.ncbi.nlm.nih.gov/35113687/)):

```python
transform.view_parents()
```

<img src="https://lamin-site-assets.s3.amazonaws.com/.lamindb/b0geN1HDHXlORqMOOPay.svg" width="400">

Or, the generating flow of a file or dataset:

```python
file.view_flow()
```

<img src="https://lamin-site-assets.s3.amazonaws.com/.lamindb/KQmzmmLOeBN0C8YkitMr.svg" width="800">


Both figures are based on mere calls to `ln.track()` in notebooks, pipelines & app.


### Manage biological registries

Create a cell type registry from public knowledge and add a new cell state (from [here](bio-registries)):

In [None]:
import lnschema_bionty as lb

# create an ontology-coupled cell type record and save it
lb.CellType.from_bionty(name="neuron").save()

# create a record to track a new cell state
new_cell_state = lb.CellType(name="my neuron cell state", description="explains X")
new_cell_state.save()

# express that it's a neuron state
cell_types = lb.CellType.lookup()
new_cell_state.parents.add(cell_types.neuron)

In [None]:
# view ontological hierarchy
new_cell_state.view_parents(distance=2)

### Leverage a mesh of instances

LaminDB is a distributed system like git.

For instance, collaborators can load your instance using:

```shell
$ lamin load myhandle/myinstance
```

### Manage custom schemas

LaminDB can be customized & extended with schema & app plug-ins building on the [Django](https://github.com/django/django) ecosystem. Examples are

- [lnschema_bionty](lnschema_bionty): Registries for basic biological entities, coupled to public ontologies.
- [lnschema_lamin1](https://github.com/laminlabs/lnschema-lamin1): Exemplary custom schema to manage samples, treatments, etc. 

If you'd like to create your own schema or app:

1. Create a git repository with registries similar to [lnschema_lamin1](https://github.com/laminlabs/lnschema-lamin1)
2. Create & deploy migrations via `lamin migrate create` and `lamin migrate deploy`

It's fastest if we do this for you based on our templates within an enterprise plan.

## Design

LaminDB builds semantics of R&D and biology into well-established infrastructure.

It provides a SQL-schema specification for common entities: {class}`~lamindb.File`, {class}`~lamindb.Dataset`, {class}`~lamindb.Transform`, {class}`~lamindb.Feature`, {class}`~lamindb.ULabel` etc. - see the [API reference](reference) or the [source code](https://github.com/laminlabs/lnschema-core/blob/main/lnschema_core/models.py).

```{dropdown} What is the schema language?

Data models are defined in Python using the Django ORM. Django translates it to SQL.

[Django](https://github.com/django/django) is one of the most-used & highly-starred projects on GitHub (~1M dependents, ~73k stars) and has been robustly maintained for 15 years.

In the first year, LaminDB used SQLModel/SQLAlchemy -- we might bring back compatibility.

```

On top of the schema, LaminDB is a Python API that abstracts over storage & database access, data transformations, and (biological) ontologies.

The code for this is open-source & accessible through the dependencies & repositories listed below.
 
### Dependencies

- Data is stored in a platform-independent way: 
    - location → local, on AWS S3 or GCP Storage, accessed through `fsspec`
    - format → blob-like files or queryable formats like Parquet, zarr, HDF5, TileDB & DuckDB
- Metadata is stored in SQL: current backends are SQLite (small teams) and Postgres (any team size).
- Django ORM for schema management & metadata queries (until v0.41: SQLModel & SQLAlchemy).
- Biological knowledge sources & ontologies: see [Bionty](https://lamin.ai/docs/bionty).

For more details, see the [pyproject.toml](https://github.com/laminlabs/lamindb/blob/main/pyproject.toml) file in lamindb & the linked repositories below.

### Repositories

LaminDB and its plug-ins consist in open-source Python libraries & publicly hosted metadata assets:

- [lamindb](https://github.com/laminlabs/lamindb): Core API, which builds on the [core schema](https://github.com/laminlabs/lnschema-core).
- [lnschema-bionty](https://github.com/laminlabs/lnschema-bionty): Registries for basic biological entities, coupled to public ontologies.
- [lnschema-lamin1](https://github.com/laminlabs/lnschema-lamin1): Exemplary custom schema to manage samples, treatments, etc.
- [lamindb-setup](https://github.com/laminlabs/lamindb-setup): Setup & configure LaminDB, client for Lamin Hub.
- [bionty](https://github.com/laminlabs/bionty): Accessor for public biological ontologies.
- [nbproject](https://github.com/laminlabs/nbproject): Metadata parser for Jupyter notebooks.
- [lamin-utils](https://github.com/laminlabs/lamin-utils): Generic utilities, e.g., a logger.
- [readfcs](https://github.com/laminlabs/readfcs): FCS file reader.
<!-- [bionty-assets](https://github.com/laminlabs/bionty-assets): Hosted assets of parsed public biological ontologies. -->

LaminHub is not open-sourced, and neither are plug-ins that model lab operations.


### Assumptions & principles

<img src="https://lamin-site-assets.s3.amazonaws.com/.lamindb/BunYmHkyFLITlM5MYQci.svg" width="350px" style="background: transparent" align="right">

1. Data is generated by instruments that process physical samples: it comes in batches stored as immutable files.
2. Files are transformed into more useful data representations, e.g.:
   - Summary statistics like count matrices for fastq files
   - Array stores of non-array-like input data (e.g., images)
   - Higher-level embeddings for lower-level array, text or graph representations
   - Concatenated array stores for large-scale atlas-like datasets
3. Semantics of high-level embeddings ("inflammatory", "lipophile") are anchored in experimental metadata and knowledge (ontologies)
4. Experimental metadata is another ontology type
5. Experiments measure features ({class}`~lamindb.Feature`, {class}`~lnschema_bionty.CellMarker`, ...)
6. Samples are annotated by labels ({class}`~lamindb.ULabel`, {class}`~lnschema_bionty.CellLine`, ...)
7. Learning and data warehousing both iterate data transformations ({class}`~lamindb.Transform`)
8. Basic biological entities should have the same meaning to anyone and across any data platform
9. Schema migrations have to be easy

### Influences

LaminDB was influenced by many other projects, see {doc}`docs:influences`.

## Notebooks

- Find all guide notebooks [here](https://github.com/laminlabs/lamindb/tree/main/docs/guide).
- You can run these notebooks in hosted versions of JupyterLab, e.g., [Saturn Cloud](https://github.com/laminlabs/run-lamin-on-saturn), Google Vertex AI, Google Colab, and others.
- Jupyter Lab & Notebook offer a fully interactive experience, VS Code & others require using the CLI to track notebooks: `lamin track my-notebook.ipynb`

In [None]:
!lamin delete --force lamin-intro