# Cell type

lamindb provides access to the following public cell type ontologies:

1. [Cell Ontology](https://obophenotype.github.io/cell-ontology)

## Setup

In [None]:
!lamin init --storage ./test-cell-type --schema bionty

In [None]:
import lnschema_bionty as lb
import pandas as pd

## Bionty objects

Let us create a public knowledge accessor with {meth}`~docs:lnschema_bionty.dev.BioRegistry.public`, which chooses a default public knowledge source from {class}`~docs:lnschema_bionty.PublicSource`. It's a [Bionty](https://lamin.ai/docs/bionty/bionty.bionty) object, which you can think about as a less-capable registry:

In [None]:
public = lb.CellType.public()
public

As for registries, you can export the ontology as a `DataFrame`:

In [None]:
df = public.df()
df.head()

Unlike registries, you can also export it as a Pronto object via `public.ontology`.

## Look up terms

As for registries, terms can be looked up with auto-complete:

In [None]:
lookup = public.lookup()

The `.` accessor provides normalized terms (lower case, only contains alphanumeric characters and underscores):

In [None]:
lookup.cd8_positive_alpha_beta_t_cell

To look up the exact original strings, convert the lookup object to dict and use the `[]` accessor:

In [None]:
lookup_dict = lookup.dict()
lookup_dict["CD8-positive, alpha-beta T cell"]

## Search terms

Search behaves in the same way as it does for registries:

In [None]:
public.search("CD8 positive T cell").head(5)

Search another field (default is `.name`):

In [None]:
public.search("CD8 positive alpha beta T cell", field=public.definition).head(5)

## Standardize cell type identifiers

Let us generate a `DataFrame` that stores a number of cell type identifiers, some of which corrupted:

In [None]:
df_orig = pd.DataFrame(
    index=[
        "Boettcher cell",
        "bone marrow cell",
        "interstitial cell of ovary",
        "pancreatic ductal cell",
        "This cell type does not exist",
    ]
)
df_orig

We can check whether any of our values are validated against the ontology reference:

In [None]:
validated = public.validate(df_orig.index, public.name)
df_orig.index[~validated]

## Ontology source versions

For any given entity, we can choose from a number of versions:

In [None]:
lb.PublicSource.filter(entity="CellType").df()

When instantiating a Bionty object, we can choose a source or version:

In [None]:
public_source = lb.PublicSource.filter(
    source="cl", version="2023-04-20", organism="all"
).one()
public = lb.CellType.public(public_source=public_source)
public

The currently used ontologies can be displayed using:

In [None]:
lb.PublicSource.filter(currently_used=True).df()

In [None]:
!lamin delete --force test-cell-type
!rm -r test-cell-type