# Cell marker ontologies

lamindb provides access to the following public cell marker ontologies:

1. [CellMarker](http://xteam.xbio.top/CellMarker)

## Setup

In [None]:
!lamin load test-ontologies

In [None]:
import lnschema_bionty as lb
import pandas as pd

# adds an entry "human" into an empty instance
lb.settings.organism = "human"

## Bionty objects

Let us create a public knowledge accessor with {meth}`~lnschema_bionty.dev.BioRegistry.bionty`, which chooses a default public knowledge source from {meth}`~lnschema_bionty.BiontySource`. It's a [Bionty](https://lamin.ai/docs/bionty/bionty.bionty) object, which you can think about as a less-capable registry:

In [None]:
cell_marker_bt = lb.CellMarker.bionty()
cell_marker_bt

As for registries, you can export the ontology as a `DataFrame`:

In [None]:
df = cell_marker_bt.df()
df.head()

Unlike registries, you can also export it as a Pronto object via `cell_line_bt.ontology`.

## Look up terms

As for registries, terms can be looked up with auto-complete:

In [None]:
lookup = cell_marker_bt.lookup()

The `.` accessor provides normalized terms (lower case, only contains alphanumeric characters and underscores):

In [None]:
lookup.immp1l

To look up the exact original strings, convert the lookup object to dict and use the `[]` accessor:

In [None]:
lookup_dict = lookup.dict()
lookup_dict["IMMP1L"]

## Search terms

Search behaves in the same way as it does for registries:

In [None]:
cell_marker_bt = lb.CellMarker.bionty()
cell_marker_bt.search("CD4").head(5)

Search another field (default is `.name`):

In [None]:
cell_marker_bt.search(
    "CD4", field=cell_marker_bt.gene_symbol
).head(1)

## Standardize cell marker identifiers

Let us generate a `DataFrame` that stores a number of cell markers identifiers, some of which corrupted:

In [None]:
markers = pd.DataFrame(
    index=[
        "KI67",
        "CCR7",
        "CD14",
        "CD8",
        "CD45RA",
        "CD4",
        "CD3",
        "CD127a",
        "PD1",
        "Invalid-1",
        "Invalid-2",
        "CD66b",
        "Siglec8",
        "Time",
    ]
)

Now let’s check which cell markers can be found in the reference:

In [None]:
cell_marker_bt.inspect(markers.index, cell_marker_bt.name);

Logging suggests to map synonyms:

In [None]:
synonyms_mapper = cell_marker_bt.standardize(markers.index, return_mapper=True)
synonyms_mapper

Let's replace the synonyms with standardized names in the `DataFrame`:

In [None]:
markers.rename(index=synonyms_mapper, inplace=True)

The `Time`, `Invalid-1` and `Invalid-2` are non-marker channels which won’t be curated by cell marker:

In [None]:
cell_marker_bt.inspect(markers.index, cell_marker_bt.name);

We don't find `CD127a`, let's check in the lookup with auto-completion:

In [None]:
lookup = cell_marker_bt.lookup()
lookup.cd127

It should be cd127, we had a typo there with `cd127a`:

In [None]:
curated_df = markers.rename(index={"CD127a": lookup.cd127.name})

Optionally, search:

In [None]:
cell_marker_bt.search("CD127a").head()

Now we see that all cell marker candidates validate:

In [None]:
cell_marker_bt.validate(curated_df.index, cell_marker_bt.name);

## Ontology source versions

For any given entity, we can choose from a number of versions:

In [None]:
lb.BiontySource.filter(entity="CellMarker").df()

When instantiating a Bionty object, we can choose a source or version:

In [None]:
bionty_source = lb.BiontySource.filter(source="cellmarker", version="2.0", organism="human").one()
cell_marker_bt = lb.CellType(bionty_source=bionty_source)
cell_marker_bt

The currently used ontologies can be displayed using:

In [None]:
lb.BiontySource.filter(currently_used=True).df()