**Hpo-toolkit tutorial**

This notebook shows how to install and use `hpo-toolkit` for work with Human Phenotype Ontology (HPO).

# Installation

The toolkit is available at [PyPi](https://pypi.org/project/hpo-toolkit), so installation with `pip` is really easy:

# Load HPO

`hpo-toolkit` supports reading ontologies in [Obographs](https://github.com/geneontology/obographs) JSON format.

We can download and open the latest HPO from *https://raw.githubusercontent.com/obophenotype/human-phenotype-ontology/master/hp.json*

In [1]:
import hpotk
from hpotk.ontology import Ontology
from hpotk.ontology.load.obographs import load_ontology

# to peek under the hood
import logging
hpotk.util.setup_logging(logging.DEBUG)

o: Ontology = load_ontology('https://raw.githubusercontent.com/obophenotype/human-phenotype-ontology/master/hp.json')

2022-12-28 16:36:08,577 hpotk.util           DEBUG : Opening https://raw.githubusercontent.com/obophenotype/human-phenotype-ontology/master/hp.json
2022-12-28 16:36:08,578 hpotk.util           DEBUG : Using default encoding 'utf-8'
2022-12-28 16:36:08,579 hpotk.util           DEBUG : Looks like a URL: https://raw.githubusercontent.com/obophenotype/human-phenotype-ontology/master/hp.json
2022-12-28 16:36:08,579 hpotk.util           DEBUG : Downloading with timeout=30s
2022-12-28 16:36:08,814 hpotk.util           DEBUG : Looks like un-compressed data
2022-12-28 16:36:12,930 hpotk.ontology.load.obographs._load DEBUG : Extracting ontology terms
2022-12-28 16:36:12,930 hpotk.ontology.io.obographs DEBUG : Missing node type in {'id': 'http://purl.obolibrary.org/obo/GO_0000016', 'lbl': 'lactase activity'}
2022-12-28 16:36:12,931 hpotk.ontology.io.obographs DEBUG : Missing node type in {'id': 'http://purl.obolibrary.org/obo/GO_0003857', 'lbl': '3-hydroxyacyl-CoA dehydrogenase activity'}
2022-12

The code downloads the latest HPO JSON file and creates the file into `Ontology`.

# `Ontology`

`hpo-toolkit` provides  - a container for ontology data and several data structures for modeling ontology data.
- `TermId` - an identifier of an ontology concept.
- `Term` - a class for representing ontology concept data
- `OntologyGraph` - graph for storing hierarchy between ontology concepts
- `Ontology` - a top-level container consisting of an `OntologyGraph`, non-obsolete `Term`s, and all (primary and obsolete) `TermId`s of an ontology, along with additional metadata.

This section provides an overview of `hpo-toolkit`s functionality.

## Get all `Term`s and `TermId`s

All `TermId`s, both primary and obsolete can be iterated over via `o.term_ids` property:

In [2]:
print(next(iter(o.term_ids)))

TermId(prefix="HP", id="0000001")


Similarly, you can iterate over ontology `Term`s via `hpo.terms`:

In [3]:
print(next(iter(o.terms)))

Term(identifier=TermId(prefix="HP", id="0000001"), name="All", definition=None, comment=Root of all terms in the Human Phenotype Ontology., is_obsolete=False, alt_term_ids="[]")


and get the number of the primary (non-obsolete) `Term`s:

In [4]:
len(o)

16810

## Query `Term`

### Test presence of a `TermId` in the ontology

Presence of a `TermId` can be tested in the same fashion as you would test the presence of an element in a Python container:

In [5]:
current_arachnodactyly_id = "HP:0001166"  # as of Dec 28th, 2022

current_arachnodactyly_id in o

True

The test works both for primary and obsolete `TermId`s:

In [6]:
obsolete_arachnodactyly_id = "HP:0001505"

obsolete_arachnodactyly_id in o

True

Queries work with a simple CURIE `str` (e.g. `HP:0001166`) or a `TermId`:

In [7]:
from hpotk.model import TermId
assert current_arachnodactyly_id in o and TermId.from_curie(current_arachnodactyly_id) in o

### Get a specific `Term`

Use the `get_term` method to get ahold of a specific `Term`:

In [8]:
arachnodactyly = o.get_term(current_arachnodactyly_id)
arachnodactyly

Term(identifier=TermId(prefix="HP", id="0001166"), name="Arachnodactyly", definition=Abnormally long and slender fingers ("spider fingers")., comment=None, is_obsolete=False, alt_term_ids="[TermId(prefix="HP", id="0001505")]")

Each term has:
- `identifier` - a `hpotk.model.TermId` corresponding to term's CURIE 
- `name` - term's name (e.g. *"Hypertension"*)
- `alt_term_ids` - alternative term IDs - term ids of obsoleted terms that have been replaced by this term
- `is_obsolete` - obsoletion status

In [9]:
print(f'ID: {arachnodactyly.identifier.value}')
print(f'Name: {arachnodactyly.name}')
print(f'Alt ids: {arachnodactyly.alt_term_ids}')
print(f'Is obsolete: {arachnodactyly.is_obsolete}')

ID: HP:0001166
Name: Arachnodactyly
Alt ids: [TermId(prefix="HP", id="0001505")]
Is obsolete: False


`get_term` always returns the primary `Term`, even for an obsolete `TermId`:

In [10]:
assert o.get_term(current_arachnodactyly_id) == o.get_term(obsolete_arachnodactyly_id)

# Ontology algorithms

`hpo-toolkit` provides several ontology algorithms.


## Ontology traversal

We can traverse the ontology hierarchy to get a `set` with parents or all ancestors of a `TermId`:

In [11]:
for parent in hpotk.algorithm.get_parents(o, current_arachnodactyly_id):
    p = o.get_term(parent)
    print(f"{p.identifier.value} - {p.name}")

HP:0001238 - Slender finger
HP:0100807 - Long fingers


In [12]:
for parent in hpotk.algorithm.get_ancestors(o, current_arachnodactyly_id):
    p = o.get_term(parent)
    print(f"{p.identifier.value} - {p.name}")

HP:0040068 - Abnormality of limb bone
HP:0002813 - Abnormality of limb bone morphology
HP:0001155 - Abnormality of the hand
HP:0040064 - Abnormality of limbs
HP:0000924 - Abnormality of the skeletal system
HP:0001238 - Slender finger
HP:0001167 - Abnormality of finger
HP:0002817 - Abnormality of the upper limb
HP:0000001 - All
HP:0011844 - Abnormal appendicular skeleton morphology
HP:0011297 - Abnormal digit morphology
HP:0000118 - Phenotypic abnormality
HP:0033127 - Abnormality of the musculoskeletal system
HP:0100807 - Long fingers
HP:0011842 - Abnormal skeletal morphology


# HPO constants

The toolkit provides `TermId`s of the well-established and stable HPO concepts.

## Base terms

The terms that define the first level of HPO hierarchy (e.g. [Phenotypic abnormality](https://hpo.jax.org/app/browse/term/HP:0000118)):

In [13]:
from hpotk.constants.hpo.base import PHENOTYPIC_ABNORMALITY
PHENOTYPIC_ABNORMALITY

TermId(prefix="HP", id="0000118")

## Frequency

Concepts to represent frequency of phenotypic abnormalities within a patient cohort, the children of [Frequency](https://hpo.jax.org/app/browse/term/HP:0040279) (e.g. [Occasional](https://hpo.jax.org/app/browse/term/HP:0040283)):

In [14]:
from hpotk.constants.hpo.frequency import OCCASIONAL
OCCASIONAL

TermId(prefix="HP", id="0040283")

## Inheritance

Selected descendents of [Mode of inheritance](https://hpo.jax.org/app/browse/term/HP:0000005) (e.g. [Autosomal dominant inheritance](https://hpo.jax.org/app/browse/term/HP:0000006)):

In [15]:
from hpotk.constants.hpo.inheritance import AUTOSOMAL_DOMINANT_INHERITANCE
AUTOSOMAL_DOMINANT_INHERITANCE

TermId(prefix="HP", id="0000006")

## Onset

All descendents of [Onset](https://hpo.jax.org/app/browse/term/HP:0003674) (e.g. [Congenital onset](https://hpo.jax.org/app/browse/term/HP:0003577)):

In [16]:
from hpotk.constants.hpo.onset import CONGENITAL_ONSET
CONGENITAL_ONSET

TermId(prefix="HP", id="0003577")

## Organ system

Children of [Phenotypic abnormality](https://hpo.jax.org/app/browse/term/HP:0000118) that correspond to abnormalities of organ systems (e.g. [Abnormality of the ear](https://hpo.jax.org/app/browse/term/HP:0000598)):

In [17]:
from hpotk.constants.hpo.organ_system import ABNORMALITY_OF_THE_EAR
ABNORMALITY_OF_THE_EAR

TermId(prefix="HP", id="0000598")

## Severity

All descendents of [Severity](https://hpo.jax.org/app/browse/term/HP:0012824) (e.g. [Mild](https://hpo.jax.org/app/browse/term/HP:0012825)):

In [18]:
from hpotk.constants.hpo.severity import MILD
MILD

TermId(prefix="HP", id="0012825")

That's it for now!