# Manage biological registries 

Here, you'll learn how to collaborate on biological registries across dry & wetlab.

What's special is that LaminDB allows you to leverage ontologies for it, based on plug-in {mod}`lnschema_bionty`.

```{note}

Registries can anchor dry & wetlab work to help find, access & model data. No more ambiguity about model dimensions and labels!

```

Let us start with an instance that has `lnschema_bionty` mounted:

In [None]:
!lamin init --storage ./test-registries --schema bionty

In [None]:
import lamindb as ln
import lnschema_bionty as lb

In [None]:
# pre-populate the cell type registry with a few records for this guide
lb.CellType(name="my cell type").save()
lb.CellType.from_bionty(name="T cell").save()

## Search or lookup terms from public source

Let's consider a public ontology for cell types:

```{tip}

The corresponding Bionty object can be accessed via `.bionty()`. For instance `lb.CellType.bionty()` is equivalent to `bionty.CellType()`.

Bionty object provides [full Bionty functionality](https://lamin.ai/docs/bionty/).
```

In [None]:
celltype_bt = lb.CellType.bionty()  # if you know bionty, is same as bionty.CellType()

In [None]:
celltype_bt

We can use it to search cell types:

In [None]:
celltype_bt.search("gamma delta T cell").head(3)

And we can also use it to look up cell types with auto-complete:

In [None]:
celltype_bt_lookup = celltype_bt.lookup()
celltype_bt_lookup.gamma_delta_t_cell

## Create a record for an in-house registry

You can create a registry record directly by passing the result of a Bionty lookup:

In [None]:
lb.CellType(celltype_bt_lookup.gamma_delta_t_cell)

Or specify to create from Bionty public source:

In [None]:
celltype_record = lb.CellType.from_bionty(
    ontology_id=celltype_bt_lookup.gamma_delta_t_cell.ontology_id
)

In [None]:
celltype_record

Create records from synonyms will raise validation warning:

In [None]:
lb.CellType.from_bionty(name="T lymphocyte")

Standardize synonyms with {meth}`~lamindb.dev.SynonymsAware.map_synonyms` before creating the record:

In [None]:
standardized_name = lb.CellType.map_synonyms("T lymphocyte", field="name")[0]
standardized_name

In [None]:
lb.CellType.from_bionty(name=standardized_name)

When we save this record to the registry, logging informs us that we're also saving parent ontology terms.


```{dropdown} Will I always see a 100 parents being saved?

No, this only happens a single time.

- If we accidentally save the same record again, lamindb will recognize that the record and all parents are already in the registry.
- If we save another record that has overlapping parents, only new parents will be saved.

```

In [None]:
celltype_record.save()

View the ontological hierarchy:

In [None]:
celltype_record.view_parents()

Or access the parents directly:

In [None]:
celltype_record.parents.df()

You can construct hierarchies of terms by specifying parents:

In [None]:
my_celltype = lb.CellType.filter(name="my cell type").one()
my_celltype.parents.add(celltype_record)

In [None]:
celltype_record.view_parents(distance=2, with_children=True)

This cell type and all its parents can now be queried & searched in the registry using `lb.CellType.filter` and `lb.CellType.search`.

Further down the guide, we'll see how this will help us to annotate and validate files & datasets!

## Bulk create records by parsing data

Consider a DataFrame-based example:

In [None]:
adata = ln.dev.datasets.anndata_with_obs()

In [None]:
adata.obs.head()

In [None]:
adata.obs.cell_type.value_counts()

You need to specify a field correspond to the values you are passing, for instance "CellType.name" or "CellType.ontology_id" in this case.

`Registry.from_values()` creates entries in the following steps:

1. If existing DB records that match the input field values, return records without creating new
2. (`lnschema_bionty` only) For non-existing DB records, create records from Bionty that matches corresponding Bionty field
3. If none of the above is possible, create new records with a single field containing input values (with a warning message)

In [None]:
# Input has 4 unique values of cell type names
adata.obs.cell_type.unique().tolist()

In [None]:
cell_types = lb.CellType.from_values(adata.obs.cell_type, lb.CellType.name)

cell_types

What if the input contains synonyms?

In [None]:
celltype_names = [
    "gamma-delta T cell",  # existing record with the same name
    "T lymphocyte",  # existing record with synonym
    "hepatocyte",  # Bionty record with the same name
    "HSC",  # Bionty record with synonym
    "my new cell type",  # Not exist in DB, not exist in Bionty
]

Map synonyms:

In [None]:
celltype_names_standardized = lb.CellType.map_synonyms(celltype_names)

In [None]:
celltype_names_standardized

As the warning message suggests, add the 'hematopoietic stem cell' the in-house registry:

In [None]:
ln.save(lb.CellType.from_values(["hematopoietic stem cell"], lb.CellType.name))

Now we can create records with standardized cell type names:

In [None]:
lb.CellType.from_values(celltype_names_standardized, lb.CellType.name)

Similarly, we can create entries based on cell type ontology ids that eliminates the synonyms ambiguity:

In [None]:
# Input has 3 unique values and 1 empty string (empty values don't result a record)
adata.obs.cell_type_id.unique().tolist()

In [None]:
lb.CellType.from_values(adata.obs.cell_type_id, lb.CellType.ontology_id)

If we're happy with `cell_types`, we save them to the DB in one transaction:

In [None]:
ln.save(cell_types)

Our in-house registry grew a bit:

In [None]:
lb.CellType.filter().df()

## Search or lookup terms in the DB

In [None]:
lb.CellType.search("T cell").head(2)

In [None]:
celltype_db_lookup = lb.CellType.lookup()

In [None]:
hsc_record = celltype_db_lookup.hematopoietic_stem_cell

In [None]:
hsc_record

## Map or add synonyms to terms in the DB

Convert synonyms to standardized names:

In [None]:
lb.CellType.map_synonyms(["HSC", "blood forming stem cell"])

Add a new synonym to a record:

In [None]:
hsc_record.add_synonym("HSCs")

Now this new synonym can also be mapped:

In [None]:
lb.CellType.map_synonyms(["HSCs"])

A special synonym is "abbr" (abbreviation), which has its own field and can be assigned via:

In [None]:
hsc_record.set_abbr("HSC")

Similarly, users can create a lookup object from abbr field:

In [None]:
celltype_db_lookup = lb.CellType.lookup("abbr")
hsc_record = celltype_db_lookup.hsc
hsc_record

The same workflow works for all of `lnschema_bionty`'s ORMs.

## Multi-species registries

Multi-species ORMs are species aware, for instance, Gene:

In [None]:
lb.Gene.from_bionty(
    symbol="TCF7", species="human"
)  # error is raised without passing species

You can also omit the `species` argument, if you configure it globally:

In [None]:
lb.settings.species = "mouse"

In [None]:
lb.Gene.from_bionty(symbol="Ap5b1")

## Track underlying ontology sources

Under-the-hood, ontology sources are tracked:

In [None]:
lb.BiontySource.filter(currently_used=True).df()

Each record is linked to a versioned bionty source (if it was created from bionty):

In [None]:
cell_type_record = lb.CellType.filter(name="hepatocyte").one()
cell_type_record.bionty_source

In [None]:
!lamin delete --force test-registries
!rm -r test-registries