# Knowledge-coupled biological entities for lookups and typing

```{important}

LaminDB can be extended to speak biology via the [bionty schema](https://lamin.ai/docs/lnschema-bionty) module.

In addition, a [wetlab schema](https://lamin.ai/docs/lnschema-wetlab) module can be mounted to integrate the R&D experimental process.
```

Before ingesting any data, let us review a last concept: __knowledge-coupled entities__

If ingested data isn't curated against public and in-house knowledge-derived standards, data integration becomes the pain it often is.

LaminDB (and the underlying [Bionty](https://lamin.ai/docs/bionty)) offers a few ways to navigate knowledge from lookups, curation, managing knowledge-coupled SQL tables to writing code using the derived types.

Let us curate some data to illustrate the process.

In [None]:
import lamindb as ln
import lamindb.schema as lns
import lamindb.knowledge as lnk

ln.nb.header()

```{tip}

You can view the reference of current biological entities in LaminDB:
```

In [None]:
ln.select(lns.bionty.dev.BiontyVersions).join(lns.bionty.dev.CurrentBiontyVersions).df()

## Lookup ontology ids

```{note}

Module {class}`lamindb.knowledge` offers lookups via tab completion.

```

For instance, you can retrieve the cell type ontology id of "gamma delta T cell" like so:

In [None]:
ct_lookup = lnk.CellType().lookup

In [None]:
ct_lookup.gamma_delta_T_cell

```{seealso}

See the [Bionty](https://lamin.ai/docs/bionty) documentation for gene name aliasing and other types of lookups.

```

## Create knowledge-derived records

You can also directly create a record for the `CellType` table:

In [None]:
lns.bionty.CellType(ontology_id=ct_lookup.gamma_delta_T_cell)

## Curate metadata by linking it against knowledge 

Let us link all the biological samples in a cross-tissue scRNA-seq dataset:

In [None]:
adata = ln.dev.datasets.anndata_human_immune_cells()

meta = adata.obs.drop_duplicates(subset=adata.obs.columns)
meta.shape

In [None]:
meta.head()

Let's first add all the cell types: `.curate` allows you to check the passed ids are present in the knowledge table.

It returns a new `DataFrame` indexed with the curated ids and a boolean `__curated__` column.

In [None]:
celltype_curate = lnk.CellType().curate(meta, column="cell_type_ontology_term_id")

Here we saw all terms can be linked. 🎉

## Update the content of the knowledge-managed tables in your SQL database

Assume there are cell types in this new dataset that are tracked in the ontology (`lnk.CellType`), but are not yet tracked in the DB table (`lns.bionty.CellType`).

Let us fix that!

We can go ahead and create records of the `CellType` table:

In [None]:
celltype_records = [
    lns.bionty.CellType(ontology_id=i) for i in celltype_curate.index.unique()
]

In [None]:
celltype_records[:3]

We can do the same for tissues:

In [None]:
tissue_curate = lnk.Tissue().curate(meta, column="tissue_ontology_term_id")

In [None]:
tissue_records = [
    lns.bionty.Tissue(ontology_id=i) for i in tissue_curate.index.unique()
]

In [None]:
tissue_records[:3]

Finally, let's add them to the database:

In [None]:
ln.add(celltype_records + tissue_records)  # add all records in one transaction

Check they are in the database:

In [None]:
ln.select(lns.bionty.Tissue, name="blood").one()

Now, some of the foundational knowledge is in place! 😌