**Hpo-toolkit tutorial**

This notebook shows how to install and use `hpo-toolkit` for work with Human Phenotype Ontology (HPO).

# Installation

The toolkit is available at [PyPi](https://pypi.org/project/hpo-toolkit), so installation with `pip` is really easy:

# Load HPO

`hpo-toolkit` supports reading ontologies in [Obographs](https://github.com/geneontology/obographs) JSON format.

We can download and open the latest HPO from *https://raw.githubusercontent.com/obophenotype/human-phenotype-ontology/master/hp.json*

In [1]:
import hpotk
from hpotk.ontology import Ontology
from hpotk.ontology.load.obographs import load_ontology

# to peek under the hood
import logging
hpotk.util.setup_logging(logging.DEBUG)

o: Ontology = load_ontology('https://raw.githubusercontent.com/obophenotype/human-phenotype-ontology/master/hp.json')

2023-02-28 10:15:40,870 hpotk.util           DEBUG : Opening https://raw.githubusercontent.com/obophenotype/human-phenotype-ontology/master/hp.json
2023-02-28 10:15:40,871 hpotk.util           DEBUG : Using default encoding 'utf-8'
2023-02-28 10:15:40,872 hpotk.util           DEBUG : Looks like a URL: https://raw.githubusercontent.com/obophenotype/human-phenotype-ontology/master/hp.json
2023-02-28 10:15:40,872 hpotk.util           DEBUG : Downloading with timeout=30s
2023-02-28 10:15:40,945 hpotk.util           DEBUG : Looks like un-compressed data
2023-02-28 10:15:41,677 hpotk.ontology.load.obographs._load DEBUG : Extracting ontology terms
2023-02-28 10:15:41,678 hpotk.ontology.io.obographs DEBUG : Missing node type in {'id': 'http://purl.obolibrary.org/obo/GO_0000016', 'lbl': 'lactase activity'}
2023-02-28 10:15:41,678 hpotk.ontology.io.obographs DEBUG : Missing node type in {'id': 'http://purl.obolibrary.org/obo/GO_0003857', 'lbl': '3-hydroxyacyl-CoA dehydrogenase activity'}
2023-02

The code downloads the latest HPO JSON file and creates the file into `Ontology`.

# `Ontology`

`hpo-toolkit` provides  - a container for ontology data and several data structures for modeling ontology data.
- `TermId` - an identifier of an ontology concept.
- `Term` - a class for representing ontology concept data
- `OntologyGraph` - graph for storing hierarchy between ontology concepts
- `Ontology` - a top-level container consisting of an `OntologyGraph`, non-obsolete `Term`s, and all (primary and obsolete) `TermId`s of an ontology, along with additional metadata.

This section provides an overview of `hpo-toolkit`s functionality.

## Get all `Term`s and `TermId`s

All `TermId`s, both primary and obsolete can be iterated over via `o.term_ids` property:

In [2]:
print(next(iter(o.term_ids)))

TermId(prefix="HP", id="0000001")


Similarly, you can iterate over ontology `Term`s via `hpo.terms`:

In [3]:
print(next(iter(o.terms)))

Term(identifier=TermId(prefix="HP", id="0000001"), name="All", definition=None, comment=Root of all terms in the Human Phenotype Ontology., is_obsolete=False, alt_term_ids="[]")


and get the number of the primary (non-obsolete) `Term`s:

In [4]:
len(o)

16874

## Query `Term`

### Test presence of a `TermId` in the ontology

Presence of a `TermId` can be tested in the same fashion as you would test the presence of an element in a Python container:

In [5]:
current_arachnodactyly_id = "HP:0001166"  # as of Dec 28th, 2022

assert current_arachnodactyly_id in o

The test works both for primary and obsolete `TermId`s:

In [6]:
obsolete_arachnodactyly_id = "HP:0001505"

assert obsolete_arachnodactyly_id in o

Queries work with a simple CURIE `str` (e.g. `HP:0001166`) or a `TermId`:

In [7]:
from hpotk.model import TermId
assert current_arachnodactyly_id in o and TermId.from_curie(current_arachnodactyly_id) in o

### Get a specific `Term`

Use the `get_term` method to get ahold of a specific `Term`:

In [8]:
arachnodactyly = o.get_term(current_arachnodactyly_id)
arachnodactyly

Term(identifier=TermId(prefix="HP", id="0001166"), name="Arachnodactyly", definition=Abnormally long and slender fingers ("spider fingers")., comment=None, is_obsolete=False, alt_term_ids="[TermId(prefix="HP", id="0001505")]")

Each term has:
- `identifier` - a `hpotk.model.TermId` corresponding to term's CURIE 
- `name` - term's name (e.g. *"Hypertension"*)
- `alt_term_ids` - alternative term IDs - term ids of obsoleted terms that have been replaced by this term
- `is_obsolete` - obsoletion status

In [9]:
print(f'ID: {arachnodactyly.identifier.value}')
print(f'Name: {arachnodactyly.name}')
print(f'Alt ids: {arachnodactyly.alt_term_ids}')
print(f'Is obsolete: {arachnodactyly.is_obsolete}')

ID: HP:0001166
Name: Arachnodactyly
Alt ids: [TermId(prefix="HP", id="0001505")]
Is obsolete: False


`get_term` always returns the primary `Term`, even for an obsolete `TermId`:

In [10]:
assert o.get_term(current_arachnodactyly_id) == o.get_term(obsolete_arachnodactyly_id)

# Ontology algorithms

`hpotk.algorithm` provides several ontology algorithms.

## Ontology traversal

Use `get_parents` and `get_ancestors` to get a `set` with *parents* or *ancestors* of a `TermId`:

In [11]:
print('#' * 20 + ' Parents ' + '#' * 20)
for parent in hpotk.algorithm.get_parents(o, current_arachnodactyly_id):
    term = o.get_term(parent)
    print(f"{term.identifier.value} - {term.name}")

print('\n'+'#' * 20 + ' Ancestors ' + '#' * 18)
for parent in hpotk.algorithm.get_ancestors(o, current_arachnodactyly_id):
    term = o.get_term(parent)
    print(f"{term.identifier.value} - {term.name}")

#################### Parents ####################
HP:0001238 - Slender finger
HP:0100807 - Long fingers

#################### Ancestors ##################
HP:0011842 - Abnormal skeletal morphology
HP:0002817 - Abnormality of the upper limb
HP:0033127 - Abnormality of the musculoskeletal system
HP:0000118 - Phenotypic abnormality
HP:0002813 - Abnormality of limb bone morphology
HP:0040064 - Abnormality of limbs
HP:0001167 - Abnormality of finger
HP:0011297 - Abnormal digit morphology
HP:0001238 - Slender finger
HP:0040068 - Abnormality of limb bone
HP:0000001 - All
HP:0001155 - Abnormality of the hand
HP:0000924 - Abnormality of the skeletal system
HP:0100807 - Long fingers
HP:0011844 - Abnormal appendicular skeleton morphology


In a similar fashion, we get the *children* or *descendants* by calling `get_children` and `get_descendants`:

In [12]:
print('#' * 20 + ' Children ' + '#' * 20)
for child in hpotk.algorithm.get_children(o, 'HP:0100807'):  # Long fingers
    term = o.get_term(child)
    print(f"{term.identifier.value} - {term.name}")
    
print('\n'+'#' * 20 + ' Descendants ' + '#' * 17)
for descendant in hpotk.algorithm.get_descendants(o, 'HP:0006109'):  # Absent phalangeal crease 
    term = o.get_term(descendant)
    print(f"{term.identifier.value} - {term.name}")

#################### Children ####################
HP:0001182 - Tapered finger
HP:0001166 - Arachnodactyly

#################### Descendants #################
HP:0001032 - Absent distal interphalangeal creases
HP:0006077 - Absent proximal finger flexion creases
HP:0005780 - Absent fourth finger distal interphalangeal crease
HP:0006216 - Single interphalangeal crease of fifth finger


# Validation

The toolkit provides functions for performing multiple sanity checks.

## Obsolete term IDs

We should always use the primary term IDs instead of the obsolete terms.

`HP:0100807` and `HP:0006010` are the primary and obsolete term IDs for *Long fingers*. 
The `ObsoleteTermIdsValidator` points out usage of the obsolete term:

In [13]:
from hpotk.model import MinimalTerm

from hpotk.validate import ValidationLevel
from hpotk.validate import ObsoleteTermIdsValidator

obso_validator = ObsoleteTermIdsValidator(o)

# The term uses an obsolete term ID `HP:0006010` instead of the current `HP:0100807`.
inputs = [
    MinimalTerm.create_minimal_term(TermId.from_curie('HP:0006010'), name='Long fingers', alt_term_ids=[], is_obsolete=False)
]
results = obso_validator.validate(inputs)

# At least one error or warning was found
assert results.is_ok() == False

# A sequence of errors/warnings is availabe via `results` property
assert len(results.results) == 1

validation_result = results.results[0]

# Obsolete term ID is a warning. The toolkit can map the term ID to the primary term ID and use it in the downstream analyses
assert validation_result.level == ValidationLevel.WARNING

# A unique ID of the validation check
assert validation_result.category == 'obsolete_term_id_is_used'

# A message for human consumption
assert validation_result.message == 'Using the obsolete HP:0006010 instead of HP:0100807 for Long fingers'

## Violation of the annotation propagation rule



In [14]:
from hpotk.validate import AnnotationPropagationValidator

# Long fingers is a parent of Arachnodactyly
inputs = [
    MinimalTerm.create_minimal_term(TermId.from_curie('HP:0001166'), name='Arachnodactyly', alt_term_ids=[], is_obsolete=False),
    MinimalTerm.create_minimal_term(TermId.from_curie('HP:0100807'), name='Long fingers', alt_term_ids=[], is_obsolete=False)
]

ap_validator = AnnotationPropagationValidator(o)
results = ap_validator.validate(inputs)

val_result = results.results[0]

# Violation of the annotation propagation rule is an ERROR.
# Most analyses will yield biased/incorrect results when using both the term and its ancestor.
assert val_result.level == ValidationLevel.ERROR

# A unique ID of the validation check
assert val_result.category == 'annotation_propagation'

# A message for human consumption
assert val_result.message == 'Terms should not contain both Arachnodactyly [HP:0001166] and its ancestor Long fingers [HP:0100807]'

## Terms are phenotypic features

Most algorithms require a list of phenotypic features (i.e. descendants of [Phenotypic abnormality](https://hpo.jax.org/app/browse/term/HP:0000118)) 
but HPO contains terms that are does not represent phenotypic abnormality, such as [Clinical modifier](https://hpo.jax.org/app/browse/term/HP:0012823), [Mode of inheritance](https://hpo.jax.org/app/browse/term/HP:0000005), etc.

`PhenotypicAbnormalityValidator` reports all terms that are not descendants of [Phenotypic abnormality](https://hpo.jax.org/app/browse/term/HP:0000118).

In [15]:
from hpotk.validate import PhenotypicAbnormalityValidator

# Long fingers is a parent of Arachnodactyly
inputs = [
    MinimalTerm.create_minimal_term(TermId.from_curie('HP:0000007'), name='Autosomal recessive inheritance', alt_term_ids=[], is_obsolete=False),
    MinimalTerm.create_minimal_term(TermId.from_curie('HP:0100807'), name='Long fingers', alt_term_ids=[], is_obsolete=False)
]

pa_validator = PhenotypicAbnormalityValidator(o)
results = pa_validator.validate(inputs)

val_result = results.results[0]

# Using non-phenotypic abnormality is an ERROR.
assert val_result.level == ValidationLevel.ERROR

# A unique ID of the validation check
assert val_result.category == 'phenotypic_abnormality_descendant'

# A message for human consumption
assert val_result.message == 'Autosomal recessive inheritance [HP:0000007] is not a descendant of Phenotypic abnormality [HP:0000118]'

## Perform multiple checks at the same time

The checks can be performed individually, as described in the previous sections, or together within a single validation run using `ValidationRunner`.

In [16]:
from hpotk.validate import ValidationRunner

inputs = [
    MinimalTerm.create_minimal_term(TermId.from_curie('HP:0000007'), name='Autosomal recessive inheritance', alt_term_ids=[], is_obsolete=False),
    MinimalTerm.create_minimal_term(TermId.from_curie('HP:0006010'), name='Long fingers', alt_term_ids=[], is_obsolete=False),
    MinimalTerm.create_minimal_term(TermId.from_curie('HP:0001166'), name='Arachnodactyly', alt_term_ids=[], is_obsolete=False),
]

runner = ValidationRunner(validators=[obso_validator, ap_validator, pa_validator])
results = runner.validate_all(inputs)
for issue in results.results:
    print(issue.message)

Using the obsolete HP:0006010 instead of HP:0100807 for Long fingers
Terms should not contain both Arachnodactyly [HP:0001166] and its ancestor Long fingers [HP:0100807]
Autosomal recessive inheritance [HP:0000007] is not a descendant of Phenotypic abnormality [HP:0000118]


# HPO constants

The toolkit provides `TermId`s of the well-established and stable HPO concepts.

## Base terms

The terms that define the first level of HPO hierarchy (e.g. [Phenotypic abnormality](https://hpo.jax.org/app/browse/term/HP:0000118)):

In [17]:
from hpotk.constants.hpo.base import PHENOTYPIC_ABNORMALITY
PHENOTYPIC_ABNORMALITY

TermId(prefix="HP", id="0000118")

## Frequency

Concepts to represent frequency of phenotypic abnormalities within a patient cohort, the children of [Frequency](https://hpo.jax.org/app/browse/term/HP:0040279) (e.g. [Occasional](https://hpo.jax.org/app/browse/term/HP:0040283)):

In [18]:
from hpotk.constants.hpo.frequency import OCCASIONAL
OCCASIONAL

TermId(prefix="HP", id="0040283")

## Inheritance

Selected descendents of [Mode of inheritance](https://hpo.jax.org/app/browse/term/HP:0000005) (e.g. [Autosomal dominant inheritance](https://hpo.jax.org/app/browse/term/HP:0000006)):

In [19]:
from hpotk.constants.hpo.inheritance import AUTOSOMAL_DOMINANT_INHERITANCE
AUTOSOMAL_DOMINANT_INHERITANCE

TermId(prefix="HP", id="0000006")

## Onset

All descendents of [Onset](https://hpo.jax.org/app/browse/term/HP:0003674) (e.g. [Congenital onset](https://hpo.jax.org/app/browse/term/HP:0003577)):

In [20]:
from hpotk.constants.hpo.onset import CONGENITAL_ONSET
CONGENITAL_ONSET

TermId(prefix="HP", id="0003577")

## Organ system

Children of [Phenotypic abnormality](https://hpo.jax.org/app/browse/term/HP:0000118) that correspond to abnormalities of organ systems (e.g. [Abnormality of the ear](https://hpo.jax.org/app/browse/term/HP:0000598)):

In [21]:
from hpotk.constants.hpo.organ_system import ABNORMALITY_OF_THE_EAR
ABNORMALITY_OF_THE_EAR

TermId(prefix="HP", id="0000598")

## Severity

All descendents of [Severity](https://hpo.jax.org/app/browse/term/HP:0012824) (e.g. [Mild](https://hpo.jax.org/app/browse/term/HP:0012825)):

In [22]:
from hpotk.constants.hpo.severity import MILD
MILD

TermId(prefix="HP", id="0012825")

That's it for now!