# Pathway ontologies

lamindb provides access to the following public protein ontologies through [lnschema-bionty](https://github.com/laminlabs/lnschema-bionty):

1. [Gene Ontology](https://bioportal.bioontology.org/ontologies/GO)
2. [Pathway Ontology](https://bioportal.bioontology.org/ontologies/PW)

Here we show how to access and search pathway ontologies to standardize new data.

## Setup

In [None]:
!lamin load test-ontologies

In [None]:
import lnschema_bionty as lb
import pandas as pd

# adds an entry "human" into an empty instance
lb.settings.organism = "human"

## Bionty objects

Let us create a public knowledge accessor with {meth}`~lnschema_bionty.dev.BioRegistry.bionty`, which chooses a default public knowledge source from {meth}`~lnschema_bionty.BiontySource`. It's a [Bionty](https://lamin.ai/docs/bionty/bionty.bionty) object, which you can think about as a less-capable registry:

In [None]:
pathway_bt = lb.Pathway.bionty()
pathway_bt

As for registries, you can export the ontology as a `DataFrame`:

In [None]:
df = pathway_bt.df()
df.head()

Unlike registries, you can also export it as a Pronto object via `pathway_bt.ontology`.

## Look up terms

As for registries, terms can be looked up with auto-complete:

In [None]:
lookup = pathway_bt.lookup()

The `.` accessor provides normalized terms (lower case, only contains alphanumeric characters and underscores):

In [None]:
lookup.acetyl_coa_assimilation_pathway

To look up the exact original strings, convert the lookup object to dict and use the `[]` accessor:

In [None]:
lookup_dict = lookup.dict()
lookup_dict["acetyl-CoA assimilation pathway"]

By default, the `name` field is used to generate lookup keys. You can specify another field to look up:

In [None]:
lookup = pathway_bt.lookup(pathway_bt.ontology_id)

In [None]:
lookup.go_0019681

## Search terms

Search behaves in the same way as it does for registries:

In [None]:
pathway_bt = lb.Pathway.bionty()
pathway_bt.search("acetyl coa assimilation").head(3)

By default, search also covers synonyms:

In [None]:
pathway_bt.search("acetyl-CoA catabolism").head(1)

Search another field (default is `.name`):

In [None]:
pathway_bt.search(
    "Chemical reactions and pathways resulting in the breakdown of Cinnamic Acid, 3-Phenyl-2-Propenoic Acid.", field=pathway_bt.definition
).head()

## Standardize pathway identifiers

Let us generate a `DataFrame` that stores a number of pathway identifiers, some of which are corrupted:

In [None]:
df_orig = pd.DataFrame(
        index=[
            "GO:1905210",
            "GO:1905211",
            "GO:1905212",
            "GO:1905208",
            "This pathway does not exist",
        ]
    )
df_orig

We can check whether any of our values are validated against the ontology reference:

In [None]:
validated = pathway_bt.validate(df_orig.index, pathway_bt.ontology_id)
df_orig.index[~validated]

## Ontology source versions

For any given entity, we can choose from a number of versions:

In [None]:
lb.BiontySource.filter(entity="Pathway").df()

When instantiating a Bionty object, we can choose a source or version:

In [None]:
bionty_source = lb.BiontySource.filter(source="go", version="2023-05-10", organism="all").one()
pathway_bt = lb.Pathway.bionty(bionty_source=bionty_source)
pathway_bt

The currently used ontologies can be displayed using:

In [None]:
lb.BiontySource.filter(currently_used=True).df()