```{include} ../README.md
:start-line: 0
:end-line: 3
```

# Guide

```{include} ../README.md
:start-line: 6
:end-line: -4
```

LaminApp lets you collaborate on data & analyses in a UI. If LaminDB ~ git, LaminApp ~ GitHub.

```{dropdown} What does LaminApp look like?

<img src="https://lamin-site-assets.s3.amazonaws.com/.lamindb/mgiwiUGN6HXZww7NOv9H.png" width="700px">

If logged in, you can explore a public demo LaminDB instance [lamin.ai/laminlabs/lamindata](https://lamin.ai/laminlabs/lamindata).

```

```{dropdown} What is open-source & what isn't?

LaminDB and its dependencies are open-source and Apache 2.0 licensed.

LaminApp is closed-source and free for public data.

Enterprise features for LaminApp, support, integration tests & schemas hosted in your or our infrastructure are available on a paid plan: please [reach out](https://lamin.ai/contact)!

```

## Quickstart

Run `pip install 'lamindb[jupyter,bionty]'` and `lamin signup <email>` on the terminal (more [info](https://lamin.ai/docs/setup)).

Init a LaminDB instance with {mod}`lnschema_bionty` plug-in like you'd init a git repository:

In [None]:
!lamin init --schema bionty --storage ./mydata  # or s3://my-bucket, gs://my-bucket as default storage

Register a dataset while tracking data flow:

In [None]:
import lamindb as ln
import pandas as pd

# track data flow of current notebook
ln.track()

# access a new batch of data
df = pd.DataFrame(
    {"CD8": [1, 2, 3], "CD45": [3, 4, 5], "perturbation": ["DMSO", "IFNG", "DMSO"]}
)

# create a dataset
dataset = ln.Dataset(df, name="Immune phenotyping 1")
dataset.save()  # register dataset

Access the dataset:

In [None]:
# search a dataset
ln.Dataset.search("immune")

# query a dataset
dataset = ln.Dataset.filter(name__contains="phenotyping 1").one()

# view generating data flow
dataset.view_flow()

# describe metadata
dataset.describe()

# load the dataset
df = dataset.load()

Validate a `DataFrame` schema-less:

In [None]:
# define validation criteria
names_types = [("CD8", "number"), ("CD45", "number"), ("perturbation", "category")]

# create features to save validation criteria
features = [ln.Feature(name=name, type=type) for (name, type) in names_types]
ln.save(features)  # save features

# create dataset & validate features
dataset = ln.Dataset.from_df(df, name="Immune phenotyping 1")
dataset.save()  # register dataset & link validated features

# confirm that validated features are linked
dataset.features

Use the {mod}`lnschema_bionty` plug-in to type biological entities and validate schema-full:

In [None]:
import lnschema_bionty as lb

# set a default species for ontology lookups
lb.settings.species = "human"

# create cell marker records from the public ontology
cell_markers = [lb.CellMarker.from_bionty(name=name) for name in ["CD8", "CD45"]]
ln.save(cell_markers)

# create dataset & validate features
dataset = ln.Dataset.from_df(
    df.iloc[:, :2], name="Immune phenotyping 2", field=lb.CellMarker.name
)
dataset.save()  # register dataset & link validated features

dataset.features

Query for a panel of cell markers & the linked datasets:

In [None]:
# an object to auto-complete cell markers
cell_markers = lb.CellMarker.lookup()

# all cell marker panel containing CD45
panels_with_cd45 = ln.FeatureSet.filter(cell_markers=cell_markers.cd45).all()

# all datasets measuring CD45
ln.Dataset.filter(feature_sets__in=panels_with_cd45).df()

Use the Experimental Factor Ontology to link a validated label for the readout:

In [None]:
# search the public ontology from the bionty store
lb.ExperimentalFactor.bionty().search("facs").head()

# create a record for facs
facs = lb.ExperimentalFactor.from_bionty(ontology_id="EFO:0009108")
facs.save()

# label with an inhouse assay
my_assay = lb.ExperimentalFactor(name="My immune phenotyping assay")
my_assay.save()

dataset.experimental_factors.set([facs, my_assay])

dataset.describe()

## Overview

### Track data flow

Understand the generating data flow of a given file or dataset (from [here](docs:project-flow)):

```python
file.view_flow()
```

<img src="https://raw.githubusercontent.com/laminlabs/lamindb/main/docs/img/readme/view_lineage.svg" width="800">

### Manage biological registries

Create a cell type registry from public knowledge and add a new cell state (from [here](bio-registries)):

In [None]:
import lnschema_bionty as lb

# create an ontology-coupled cell type record and save it
lb.CellType.from_bionty(name="neuron").save()

# create a record to track a new cell state
new_cell_state = lb.CellType(name="my neuron cell state", description="explains X")
new_cell_state.save()

# express that it's a neuron state
cell_types = lb.CellType.lookup()
new_cell_state.parents.add(cell_types.neuron)

In [None]:
# view ontological hierarchy
new_cell_state.view_parents(distance=2)

### Access, validate & register

See graphic & quickstart above!

Browse, e.g., {doc}`docs:scrna`.

### Leverage a mesh of instances

LaminDB is a distributed system like git.

For instance, collaborators can load your instance using:

```shell
$ lamin load myhandle/myinstance
```

### Manage custom schemas

1. Create a GitHub repository with registries similar to [github.com/laminlabs/lnschema-lamin1](https://github.com/laminlabs/lnschema-lamin1)
2. Create & deploy migrations via `lamin migrate create` and `lamin migrate deploy`

It's fastest if we do this for you based on our templates within an enterprise plan.

## How does it work?

### Dependencies

LaminDB builds semantics of R&D and biology onto well-established tools:

- SQLite & Postgres for SQL databases using the Django ORM (previously: SQLModel)
- S3, GCP & local storage for object storage using fsspec
- Configurable storage formats: pyarrow, anndata, zarr, etc.
- Biological knowledge sources & ontologies: see [Bionty](https://lamin.ai/docs/bionty)

LaminDB is open source and stores data in universal formats (SQL, parquet, HDF5, TileDB, zarr, etc.) on AWS, GCP or locally.

### Architecture

LaminDB consists of the `lamindb` Python package (repository [here](https://github.com/laminlabs/lamindb)) with its components:

- [bionty](https://github.com/laminlabs/bionty): Basic biological entities (usable standalone).
- [lamindb-setup](https://github.com/laminlabs/lamindb-setup): Setup & configure LaminDB, client for Lamin Hub.
- [lnschema-core](https://github.com/laminlabs/lnschema-core): Core schema, ORMs to model data objects & data flow.
- [lnschema-bionty](https://github.com/laminlabs/lnschema-bionty): Bionty schema, ORMs that are coupled to Bionty's entities.
- [lnschema-lamin1](https://github.com/laminlabs/lnschema-lamin1): Exemplary configured schema to track samples, treatments, etc.
- [nbproject](https://github.com/laminlabs/nbproject): Parse metadata from Jupyter notebooks.
- [lamin-utils](https://github.com/laminlabs/lamin-utils): Utilities for LaminDB and Bionty.
- [readfcs](https://github.com/laminlabs/readfcs): FCS file reader.

LaminHub & LaminApp are not open-sourced, and neither are templates that model lab operations.

## Notebooks

- Find all guide notebooks [here](https://github.com/laminlabs/lamindb/tree/main/docs/guide).
- You can run these notebooks in hosted versions of JupyterLab, e.g., [Saturn Cloud](https://github.com/laminlabs/run-lamin-on-saturn), Google Vertex AI, Google Colab, and others.
- Jupyter Lab & Notebook offer a fully interactive experience, VS Code & others require using the CLI to track notebooks: `lamin track my-notebook.ipynb`

```{toctree}
:hidden:
:caption: Tutorial

tutorial
tutorial1
```

```{toctree}
:hidden:
:caption: "How to"

query-search
validate
bio-registries
schemas
setup
```

```{toctree}
:hidden:
:caption: Other topics

faq
storage
```

In [None]:
!lamin delete --force mydata