# Intro to Biomedical Ontologies: Owlready2

Biomedical ontologies is generally a tough-to-approach field, starting with "what is an ontology?".

I often reply to that with "it's a hairball of knowledge." 

Imagine if someone/group decided to "lets represent something close to a neuronal-connection of knowledge".

For me, I am not an ontologist (creating new ontologies). I consider myself one of the few people who can figure out how to leverage ontologies to achieve very specific biomedical and clinical tasks. 

In [1]:
import owlready2

hpo = owlready2.get_ontology("http://purl.obolibrary.org/obo/hp.owl").load()
# mondo = owlready2.get_ontology("http://purl.obolibrary.org/obo/mondo.owl").load()
# efo = owlready2.get_ontology("http://www.ebi.ac.uk/efo/efo.owl").load()


In [100]:
search_term = "moyamoya"

# Crude searcher
def obo_searcher(ontology, search_term):
    mondo_results = ontology.search(label = f"*{search_term}*", _case_sensitive=False)
    data = [{
        "concept": x,
        "label": x.label,
        "iri": x.iri,
        "synonyms": x.hasExactSynonym,
        "name": x.name,
        "subclasses": list(x.subclasses()),
        "xrefs": x.hasDbXref
    } for x in mondo_results if str(x.label)]

    return (data)


In [101]:
results = obo_searcher(hpo, search_term)

# Results: List of dictionary of ontology concepts and metadata

In [102]:
results

[{'concept': obo.HP_0011834,
  'label': ['Moyamoya phenomenon'],
  'iri': 'http://purl.obolibrary.org/obo/HP_0011834',
  'synonyms': [],
  'name': 'HP_0011834',
  'subclasses': [],
  'xrefs': ['UMLS:C4023169']}]

## Check one concept

In [103]:
results

[{'concept': obo.HP_0011834,
  'label': ['Moyamoya phenomenon'],
  'iri': 'http://purl.obolibrary.org/obo/HP_0011834',
  'synonyms': [],
  'name': 'HP_0011834',
  'subclasses': [],
  'xrefs': ['UMLS:C4023169']}]

In [104]:
results[0]['concept'].hasExactSynonym

[]

## Get `is_a` concepts

In [105]:
is_a = results[0]['concept'].is_a

# it returns a list, since "stomach cancer" can be multiple things
[x.label for x in is_a]

[['Abnormal cerebral artery morphology']]

## Get all ancestors

In [106]:
ancestors = results[0]['concept'].ancestors()

# it returns a list of all ancestors
[x.label for x in ancestors]

[['Abnormality of cardiovascular system morphology'],
 ['Morphological central nervous system abnormality'],
 ['Abnormality of brain morphology'],
 ['Abnormal blood vessel morphology'],
 ['Abnormal cerebral artery morphology'],
 ['Abnormal systemic arterial morphology'],
 ['Abnormality of the cardiovascular system'],
 ['Abnormal cerebral vascular morphology'],
 ['Abnormality of the nervous system'],
 [],
 ['Phenotypic abnormality'],
 ['Moyamoya phenomenon'],
 ['Abnormal nervous system morphology'],
 ['All'],
 [locstr('Abnormal vascular morphology', 'en')],
 ['Abnormality of the vasculature']]

## Get all descendants

In [107]:
descendants = results[0]['concept'].descendants()

# Descendants tends to return itself
[(x.label, x.name) for x in descendants]

[(['Moyamoya phenomenon'], 'HP_0011834')]

## Get all Subclasses

In [108]:
subclasses = results[0]['concept'].subclasses()

# Descendants tends to return itself
[(x.label, x.name) for x in subclasses]

[]

## Things to Note:

- `label`: actually returns a list of the synonyms related
- `iri`: unique ID for this concept
- `name`: concept ID, Even though this is an HPO term, sometimes ontologies can reference external ontologies as part of the "semantic web" reference.
- `xrefs`: Generally, `owlready2` has poor documentation, but it's a single person(?) effort (and I never personally contributed) for not the most approachable field, so give him some slack. But the oddly named `.hasDbXref` returns a list of external cross-walks, which is one of the more useful things to know.

# TODO Visual Interface

In [112]:
results[0]

[{'concept': obo.HP_0011834,
  'label': ['Moyamoya phenomenon'],
  'iri': 'http://purl.obolibrary.org/obo/HP_0011834',
  'synonyms': [],
  'name': 'HP_0011834',
  'subclasses': [],
  'xrefs': ['UMLS:C4023169']}]

# It's hard to work with python objects

For my mission - I want to try to create various plots using Altair. 

Plotting libraries don't know what to do with python objects.

Generally needs to more primitive data types - convert everything to string in our case.

In [116]:
results

[{'concept': obo.HP_0011834,
  'label': ['Moyamoya phenomenon'],
  'iri': 'http://purl.obolibrary.org/obo/HP_0011834',
  'synonyms': [],
  'name': 'HP_0011834',
  'subclasses': [],
  'xrefs': ['UMLS:C4023169']}]

In [137]:

def sanitize(v):
  if isinstance(v, list):
    return [str(x) for x in v]
  elif isinstance(v, dict):
    return sanitize(v)
  else:
    return str(v)

def sanitize_results(data):
    for d in data:
        for k, v in d.items():
            d[k] = sanitize(v)
    return data



In [139]:
import polars as pl
sanitized_results = sanitize_results(results)
pl.from_dicts(sanitized_results)

concept,label,iri,synonyms,name,subclasses,xrefs
str,str,str,str,str,str,str
"""obo.HP_0011834…","""Moyamoya pheno…","""http://purl.ob…","""""","""HP_0011834""","""""","""UMLS:C4023169"""


In [140]:
from pprint import pprint

pprint(sanitized_results)

[{'concept': 'obo.HP_0011834',
  'iri': 'http://purl.obolibrary.org/obo/HP_0011834',
  'label': 'Moyamoya phenomenon',
  'name': 'HP_0011834',
  'subclasses': '',
  'synonyms': '',
  'xrefs': 'UMLS:C4023169'}]


# Test Drive it out with bigger use case

In [144]:
import polars as pl
sanitized_results = sanitize_results(obo_searcher(hpo, "diabetes"))
df = pl.from_dicts(sanitized_results)

In [145]:
df

concept,label,iri,synonyms,name,subclasses,xrefs
str,list[str],str,list[str],str,list[str],list[str]
"""obo.HP_0000819…","[""Diabetes mellitus""]","""http://purl.ob…",[],"""HP_0000819""","[""obo.HP_0000831"", ""obo.HP_0001953"", … ""obo.HP_0100651""]","[""MSH:D003920"", ""SNOMEDCT_US:73211009"", ""UMLS:C0011849""]"
"""obo.HP_0000831…","[""Insulin-resistant diabetes mellitus""]","""http://purl.ob…","[""Insulin resistant diabetes"", ""Insulin resistant diabetes mellitus"", ""Insulin-resistant diabetes""]","""HP_0000831""","[""obo.HP_0000857"", ""obo.HP_0000877""]","[""UMLS:C0854110""]"
"""obo.HP_0000857…","[""Neonatal insulin-dependent diabetes mellitus""]","""http://purl.ob…",[],"""HP_0000857""","[""obo.HP_0008255""]","[""UMLS:C3278636""]"
"""obo.HP_0000863…","[""Central diabetes insipidus""]","""http://purl.ob…","[""Neurohypophyseal diabetes insipidus""]","""HP_0000863""",[],"[""MSH:D020790"", ""SNOMEDCT_US:45369008"", ""UMLS:C0687720""]"
"""obo.HP_0000873…","[""Diabetes insipidus""]","""http://purl.ob…",[],"""HP_0000873""","[""obo.HP_0000863"", ""obo.HP_0009806""]","[""MSH:D003919"", ""SNOMEDCT_US:15771004"", ""UMLS:C0011848""]"
"""obo.HP_0000877…","[""Insulin-resistant diabetes mellitus at puberty""]","""http://purl.ob…","[""Insulin-resistant diabetes mellitus at puberty""]","""HP_0000877""",[],"[""UMLS:C1837792""]"
"""obo.HP_0004904…","[""Maturity-onset diabetes of the young""]","""http://purl.ob…","[""Maturity onset diabetes of the young""]","""HP_0004904""",[],"[""MSH:C562772"", ""SNOMEDCT_US:609561005"", ""UMLS:C0342276""]"
"""obo.HP_0005978…","[""Type II diabetes mellitus""]","""http://purl.ob…","[""Diabetes mellitus Type II"", ""Diabetes mellitus, noninsulin-dependent"", … ""Type II diabetes""]","""HP_0005978""","[""obo.HP_0008205""]","[""MSH:D003924"", ""SNOMEDCT_US:44054006"", ""UMLS:C0011860""]"
"""obo.HP_0008205…","[""Insulin-dependent but ketosis-resistant diabetes""]","""http://purl.ob…",[],"""HP_0008205""",[],"[""UMLS:C1842404""]"
"""obo.HP_0008255…","[""Transient neonatal diabetes mellitus""]","""http://purl.ob…",[],"""HP_0008255""",[],"[""SNOMEDCT_US:237603002"", ""UMLS:C0342273""]"


In [151]:
import altair as alt

search_input = alt.param(
    value='',
    bind=alt.binding(
        input='search',
        placeholder="Diseases/symtoms",
        name='Search ',
    )
)
alt.Chart(df.to_pandas()).mark_rect(size=60).encode(
    x='count(synonyms):Q',
    y='label:N',
    # tooltip='Name:N',
    opacity=alt.condition(
        alt.expr.test(alt.expr.regexp(search_input, 'i'), alt.datum.label),
        alt.value(1),
        alt.value(0.05)
    )
).transform_flatten(
    ["label"]
).transform_flatten(
    ["synonyms"]
).add_params(
    search_input
)