# Harnessing AI to annotate the severity of all phenotypic abnormalities

within the Human Phenotype Ontology

Kitty B Murphy (Department of Brain Sciences, Imperial College London, UK, UK Dementia Research Institute at Imperial College London, UK)  
Brian M Schilder (Department of Brain Sciences, Imperial College London, UK, UK Dementia Research Institute at Imperial College London, UK)  
Nathan G Skene (Department of Brain Sciences, Imperial College London, UK, UK Dementia Research Institute at Imperial College London, UK)

## Abstract

The Human phenotype Ontology (HPO) has played a crucial role in defining, diagnosing, prognosing, and treating human diseases by providing a standardised database for phenotypic abnormalities. However, there is currently no information pertaining to the severity of each phenotype, making systematic analyses and prioritisation of results difficult. With 18,082 abnormalities now corresponding to over 10,000 rare diseases, manual curation of such phenotypic annotations by experts would be labor-intensive and time-consuming. Leveraging advances in artificial intelligence, we employed the OpenAI GPT-4 model with Python to systematically annotate the severity of ~ 17,000 phenotypic abnormalities in the HPO. By checking that phenotypes with guaranteed outcomes were appropriately annotated, we demonstrate the potential for natural language processing technologies to automate the curation process accurately. For example, phenotypes such as “decreased male fertility” were used to compute a true positive rate, as they would be expected to be annotated as often, if not always, causing reduced fertility. Across the annotated outcomes, we observed \> 73 % annotation accuracy. Using a novel approach, we developed a severity scoring system that incorporates both the nature of the phenotype outcome and the frequency of its occurrence. These severity metrics will enable efforts to systematically prioritise which human phenotypes are most detrimental to human well being, and best targets for therapeutic intervention.

## Introduction

Comprehensive annotation of phenotypic abnormalities is invaluable for defining, diagnosing, prognosing, and treating human disease. Since 2008, the Human phenotype Ontology (HPO) has been instrumental to this, by providing a standardised database for the description and analysis of human phenotypes1. Through developing open community resources, the depth and breadth of the HPO has continued to expand and there are now ~ 18,000 phenotypic abnormalities, corresponding to \> 10,000 rare diseases, described. In recent years, the HPO has expanded its disease annotations so that each HPO term can have metadata including typical age of onset and frequency. In addition, there are the Clinical modifier (put this in italics) and Clinical course (also italics) subontologies, which contains terms to describe factors including severity and triggers, and mortality and progression, respectively. Describing the severity-related attributes of a disease is crucial for attaining significant objectives in rare diseases. This includes enhancing diagnostic capabilities, as well as prioritising and guiding gene therapy trials.

To date, the HPO has largely been manually curated by experts including clinicians, clinical geneticists, and researchers. Although this approach ensures the quality and accuracy of the ontology, it is time-consuming and labour-intensive. As a result, less than 1% of terms within the HPO have metadata pertaining to their features such as time course and severity. As artificial intelligence (AI) capabilities advance, there is an opportunity to integrate natural language processing technologies into assisting in the curation process. Here, we have used the OpenAI GPT-4 model (https://openai.com/) with Python to systematically annotate the severity of \> 17,000 phenotypic abnormalities within the HPO. Our severity annotation framework was developed based on previously defined criteria developed through consultation with clinicians 2. The authors consulted 192 healthcare professionals for their opinions on the relative severity of various clinical characteristics: they used this to create a system for categorising the severity of diseases. Briefly, each healthcare professional was sent a survey asking them to first rate how important a disease characteristic was for determining disease severity, and then to rate the severity of a set of given disease. Using the responses, the authors were able to categorise clinical characteristics into 4 ‘severity tiers’. While characteristics such as shortened lifespan in infancy and intellectual disability were identified as highly severe and placed into tier 1, sensory impairment and reduced lifespan were categorised as less severe and placed into tier 4. Being able to quickly ascribe severity measures based on those criteria to HPO phenotypes, will assist with interpreting phenome-wide studies.

Almost 800 phenotypes were annotated twice to evaluate annotation consistency, and a true positive rate of annotations was calculated to assess annotation accuracy. Additionally, based on the clinical characteristics and their occurrence, we have quantified the severity of each phenotype, providing an example of how these clinical characteristic annotations can be used to guide prioritisation of gene therapy trials. Ultimately, we hope that our resource will be of utility to those working in rare diseases, as well as the wider rare disease community.

In [None]:
eruptions <- c(1492, 1585, 1646, 1677, 1712, 1949, 1971, 2021)
n_eruptions <- length(eruptions)

In [None]:
par(mar = c(3, 1, 1, 1) + 0.1)
plot(eruptions, rep(0, n_eruptions), 
  pch = "|", axes = FALSE)
axis(1)
box()

In [None]:
avg_years_between_eruptions <- mean(diff(eruptions[-n_eruptions]))
avg_years_between_eruptions

[1] 79.83333

Based on data up to and including 1971, eruptions on La Palma happen every 79.8 years on average.

Studies of the magma systems feeding the volcano, such as @marrero2019, have proposed that there are two main magma reservoirs feeding the Cumbre Vieja volcano; one in the mantle (30-40km depth) which charges and in turn feeds a shallower crustal reservoir (10-20km depth).

Eight eruptions have been recorded since the late 1400s (<a href="#fig-timeline" class="quarto-xref">Figure 1</a>).

Data and methods are discussed in <a href="#sec-data-methods" class="quarto-xref">Section 3</a>.

Let $x$ denote the number of eruptions in a year. Then, $x$ can be modeled by a Poisson distribution

<span id="eq-poisson">$$
p(x) = \frac{e^{-\lambda} \lambda^{x}}{x !}
 \qquad(1)$$</span>

where $\lambda$ is the rate of eruptions per year. Using <a href="#eq-poisson" class="quarto-xref">Equation 1</a>, the probability of an eruption in the next $t$ years can be calculated.

| Name                | Year |
|---------------------|------|
| Current             | 2021 |
| Teneguía            | 1971 |
| Nambroque           | 1949 |
| El Charco           | 1712 |
| Volcán San Antonio  | 1677 |
| Volcán San Martin   | 1646 |
| Tajuya near El Paso | 1585 |
| Montaña Quemada     | 1492 |

Table 1: Recent historic eruptions on La Palma

<a href="#tbl-history" class="quarto-xref">Table 1</a> summarises the eruptions recorded since the colonization of the islands by Europeans in the late 1400s.

## Data & Methods

## Conclusion

## References