# Manage biological registries 

If you only work with pre-defined ontologies (public or in-house), [Bionty](https://lamin.ai/docs/bionty/) is sufficient!

If you'd like to maintain in-house registries for basic entities along with ontologies, manage them using {mod}`lnschema_bionty`.

```{toctree}
:hidden:
:maxdepth: 1

../lnschema-bionty
```

Let us start with an instance that has {mod}`lnschema_bionty` mounted:

In [None]:
!lamin init --storage ./test-registries --schema bionty

In [None]:
import lamindb as ln
import lnschema_bionty as lb

ln.settings.verbosity = 3  # show hints

## Search or lookup terms from public source

Let us first grab a public ontology for cell types:

In [None]:
celltype_bt = lb.CellType.bionty()  # same as bionty.CellType()

celltype_bt

### Search a cell type

In [None]:
celltype_bt.search("gamma delta T cell").head(3)

### Look up a cell type with auto-completion

Create a lookup object:

In [None]:
celltype_bt_lookup = celltype_bt.lookup()

There are 2680 terms in it:

In [None]:
len(celltype_bt_lookup)

In [None]:
gd_tcell = celltype_bt_lookup.gamma_delta_t_cell
gd_tcell

## Create a record for an in-house registry

You can create a SQL record directly by passing the result of a Bionty search or lookup.

In [None]:
celltype_record = lb.CellType(gd_tcell)

celltype_record

Note that this ontology has `parents`, and therefore the parent records will also be added to the DB upon save:

In [None]:
gd_tcell.parents

Save the record to the DB to seed an in-house ontology:

In [None]:
celltype_record.save()

This cell type can now be queried from the DB:

In [None]:
gd_tcell_record = lb.CellType.select(name=celltype_record.name).one()
gd_tcell_record

Access its direct parents:

In [None]:
gd_tcell_record.parents.all()

View all parents as a graph: (you may specify `distance=` to view a subset of parents)

In [None]:
gd_tcell_record.view_parents()

Query for all the parents with a specified distance:

In [None]:
distance = 5

results = []
for d in range(1, distance + 1):
    msg = f"Depth = {d}"
    if d == 1:
        condition = "children__name"
    else:
        condition = "children__" + condition
    records = lb.CellType.select(**{condition: gd_tcell_record.name}).list()
    msg += f", {len(records)} records found:"
    print(msg)
    print(records)
    results.append(records)

## Bulk create records by parsing data

Consider a DataFrame-based example:

In [None]:
adata = ln.dev.datasets.anndata_with_obs()

In [None]:
adata.obs.head()

In [None]:
adata.obs.cell_type.value_counts()

You need to specify a field correspond to the values you are passing, for instance "CellType.name" or "CellType.ontology_id" in this case.

`ORM.from_values()` creates entries in the following steps:

1. If existing DB records that match the input field values, return records without creating new
2. If input values matches synonyms associated with existing DB records, return records without creating new
3. (`lnschema_bionty` only) For non-existing DB records, create records from Bionty that matches corresponding Bionty field
4. (`lnschema_bionty` only) Create records from Bionty that matches synonyms
5. If none of the above is possible, create new records with a single field containing input values

In [None]:
lb.CellType.from_bionty(name="T cell").save()

Let's try to create entries based on cell type names:

In [None]:
# Input has 4 unique values of cell type names
adata.obs.cell_type.unique().tolist()

In [None]:
cell_types = lb.CellType.from_values(adata.obs.cell_type, lb.CellType.name)

cell_types

What happens if the input contains synonyms:

In [None]:
lb.CellType.from_values(
    [
        "gamma-delta T cell",  # existing record with the same name
        "T lymphocyte",  # existing record with synonym
        "hepatocyte",  # Bionty record with the same name
        "HSC",  # Bionty record with synonym
        "my new cell type",  # Not exist in DB, not exist in Bionty
    ],
    lb.CellType.name,
)

Similarly, we can create entries based on cell type ontology ids:

In [None]:
# Input has 3 unique values and 1 empty string (empty values don't result a record)
adata.obs.cell_type_id.unique().tolist()

In [None]:
lb.CellType.from_values(adata.obs.cell_type_id, lb.CellType.ontology_id)

If we're happy with `cell_types`, we save them to the DB in one transaction:

In [None]:
ln.save(cell_types)

Our in-house registry grew a bit:

In [None]:
lb.CellType.select().df()

## Search or lookup terms in the DB

In [None]:
lb.CellType.search("T cell").head(2)

In [None]:
celltype_db_lookup = lb.CellType.lookup()

In [None]:
hsc_record = celltype_db_lookup.hematopoietic_stem_cell

In [None]:
hsc_record

## Map or add synonyms to terms in the DB

Convert synonyms to standardized names:

In [None]:
lb.CellType.map_synonyms(["HSC", "blood forming stem cell"])

Add a new synonym to a record:

In [None]:
hsc_record.add_synonym("HSCs")

Now this new synonym can also be mapped:

In [None]:
lb.CellType.map_synonyms(["HSCs"])

The same workflow works for all of `lnschema_bionty`'s ORMs.

## Track underlying ontology sources

Under-the-hood, ontology sources are tracked:

In [None]:
lb.BiontySource.select(currently_used=True).df()

Each record is linked to a versioned bionty source (if it was created from bionty):

In [None]:
cell_type_record = lb.CellType.select(name="hepatocyte").one()
cell_type_record.bionty_source

In [None]:
!lamin delete test-registries
!rm -r test-registries