<H1>ANKH</H1>

In [1]:
import phenopackets as php
from google.protobuf.json_format import MessageToDict, MessageToJson
from google.protobuf.json_format import Parse, ParseDict
import pandas as pd
pd.set_option('display.max_colwidth', None) # show entire column contents, important!
from collections import defaultdict
import os
import sys

sys.path.insert(0, os.path.abspath('../../pyphetools'))
from pyphetools import *

<h3>Import HPO Dara</h3>

In [5]:
parser = HpoParser()
hpo_cr = parser.get_hpo_concept_recognizer()
hpo_version = parser.get_version()

<H1>Importing a single case report</H1>
<p>Here, we use functions of the pyphetools package to import data from a typical case report: <a href="https://pubmed.ncbi.nlm.nih.gov/33748234/" target="__blank">Wu JL, et al.</a> A three-year clinical investigation of a Chinese child with craniometaphyseal dysplasia caused by a mutated ANKH gene. World J Clin Cases. 2021 Mar 16;9(8):1853-1862.</p>
<p>The case report consists of several sections to which we can apply text mining and add some corrections for cases in which text mining fails to capture an HPO term or calls a false-positive term.</p>
<p>The basic strategy is to use the <tt>add_vignette</tt> function for each section and judge the results by manual inspecting, adding any missed terms using the custom dictionary (see examples below).</p>

In [3]:
pmid = "PMID:33748234"
age = "P1Y5M"
encoder = CaseEncoder(concept_recognizer=hpo_cr, pmid=pmid, age_at_last_exam=age)

<h3>Chief complaints</h3>
<p>A 17-mo-old boy presented with progressive nasal obstruction, snoring and hearing loss symptoms when referred to the hospital.</p>

In [4]:
vignette = "A 17-mo-old boy presented with progressive nasal obstruction, snoring and hearing loss symptoms when referred to the hospital."
results = encoder.add_vignette(vignette=vignette)

In [5]:
results

Unnamed: 0,id,label,observed,measured
0,HP:0000365,Hearing impairment,True,True
1,HP:0001742,Nasal congestion,True,True
2,HP:0025267,Snoring,True,True


<h2>History of present illness</h2>

In [6]:
v2 = """
The patient’s medical history was first reviewed before the diagnosis. His head circumference was 45.5 cm, 
46.5 cm and 49.5 cm at age 3 mo, 6 mo and 12 mo, respectively. When he was 6 mo old, fiber nasopharyngoscopy 
revealed a double choanal stenosis. The patient was found to have a serious nasal obstruction at the age of 
12 mo due to a wide nasal bridge. Occasionally, he resorted to mouth breathing, especially at night. 
The patient was examined at the local hospital, showed low bone mineral density and commenced 
oral calcium supplements. His head circumference increased to 51 cm (standard value 45.2 cm). 
Consequently, the patient developed a prominent forehead, prognathism and occipital protuberance. 
At the age of 16 mo, the patient presented with mild hearing loss. He had been receiving calcium and 
vitamin D supplementation for 4 mo prior to examination at other hospital; however, the patient’s symptoms 
developed progressively. 
"""
d = {'low bone mineral density':'Reduced bone mineral density',
    'head circumference increased to 51 cm':'Macrocephaly'}
results = encoder.add_vignette(vignette=v2, custom_d=d)

In [7]:
results

Unnamed: 0,id,label,observed,measured
0,HP:0004349,Reduced bone mineral density,True,True
1,HP:0000256,Macrocephaly,True,True
2,HP:0000303,Mandibular prognathia,True,True
3,HP:0000365,Hearing impairment,True,True
4,HP:0000431,Wide nasal bridge,True,True
5,HP:0000452,Choanal stenosis,True,True
6,HP:0001742,Nasal congestion,True,True
7,HP:0011220,Prominent forehead,True,True


<h2>Physical examination</h2>

In [8]:
v3 = """
A wide nasal bridge, paranasal bossing, widely spaced eyes with an increased bizygomatic width, and 
prominent mandible (Figure 1) were noted. However, hypertelorism was not obviously discernible. 
Additionally, the patient’s frontal and maxillary sinuses were severely obstructed. He had 20 teeth 
with wide spacing between the teeth. His teeth appeared small. He exhibited no facial nerve palsy or 
limb muscle tension. His pain perception and muscular strength appeared normal. Nasal laryngeal mirror 
showed serious choanal stenosis on both sides. The bottom of the patient’s nose exhibited bossing and 
his palatine bone appeared thickened. The patient’s parents and his elder brother had completely normal 
features.
"""
d3 = {
    'wide spacing between the teeth': 'Widely spaced teeth'
}
results = encoder.add_vignette(vignette=v3, custom_d=d3)
results

Unnamed: 0,id,label,observed,measured
0,HP:0000687,Widely spaced teeth,True,True
1,HP:0000303,Mandibular prognathia,True,True
2,HP:0000316,Hypertelorism,True,True
3,HP:0000316,Hypertelorism,True,True
4,HP:0000431,Wide nasal bridge,True,True
5,HP:0000452,Choanal stenosis,True,True
6,HP:0010628,Facial palsy,True,True
7,HP:0012531,Pain,True,True


<h2>Radiograph</h2>

In [9]:
v4 ="""
Radiograph and facial appearance of the child aged 17 mo. A and B: Cranial computed tomography (CT) scan 
shows significantly increased bone density and thickened bone plate of the skull. The sinus cavity was small 
without inflation, and the nasal cavity was obviously narrowed. The nasal bone was thickened with abnormal 
morphology; C: CT scan shows that the middle ear cavities were narrowed; the lumen of the labyrinth 
(vestibular, semicircular canal and cochlear) was sclerotic and the ossicular chain was thickened. The width 
of the left optic canal was 3.93 mm, the width of the right optic canal was 4.17 mm; D: CT scan shows 
sclerosis of the clavicles and ribs; E: X-ray image shows pronounced metaphyseal flaring in the distal 
femora and “Flask deformation” of the proximal metaphysis on both sides (Erlenmeyer flask configuration); 
F: Facial appearance of the patient shows a wide nasal bridge, paranasal bossing, widely spaced eyes with 
an increased bizygomatic width, and a prominent mandible. 
"""
d4 = {}
results = encoder.add_vignette(vignette=v4, custom_d=d4)
results

Unnamed: 0,id,label,observed,measured
0,HP:0000303,Mandibular prognathia,True,True
1,HP:0000316,Hypertelorism,True,True
2,HP:0000431,Wide nasal bridge,True,True
3,HP:0003015,Flared metaphysis,True,True
4,HP:0011001,Increased bone mineral density,True,True


<h2>Laboratory examinations</h2>
<p>To add this data, we consulted Table 1 of the original publication and use the <tt>add_term</tt> function to add abnormal or excluded (normal) findings.</p>

In [10]:
term="Increased circulating osteocalcin level"
age="P1Y8M" # 20 months
results = encoder.add_term(label=term, custom_age=age)
results

Unnamed: 0,id,label,observed,measured
0,HP:0031428,Increased circulating osteocalcin level,False,True


In [11]:
# Combined β-CTX (ng/mL)
term = "Increased circulating beta-C-terminal telopeptide concentration" # HP:0031425
results = encoder.add_term(label=term, custom_age="P1Y8M")
results

Unnamed: 0,id,label,observed,measured
0,HP:0031425,Increased circulating beta-C-terminal telopeptide concentration,False,True


In [12]:
# low 25-hydroxyvitamin D
term = "Decreased circulating calcifediol concentration" # HP:0012053
results = encoder.add_term(label=term, custom_age="P1Y8M")
results

Unnamed: 0,id,label,observed,measured
0,HP:0012053,Decreased circulating calcifediol concentration,False,True


In [13]:
term = "Elevated circulating alkaline phosphatase concentration"
results = encoder.add_term(label=term, custom_age="P1Y8M")
results

Unnamed: 0,id,label,observed,measured
0,HP:0003155,Elevated circulating alkaline phosphatase concentration,False,True


In [14]:
term = "Abnormal circulating calcium concentration"
results = encoder.add_term(label=term, excluded=True, custom_age="P1Y8M")
results

Unnamed: 0,id,label,observed,measured
0,HP:0004363,Abnormal circulating calcium concentration,False,True


In [15]:
# Abnormal blood phosphate concentration HP:0100529
results = encoder.add_term(id ="HP:0100529", excluded=True, custom_age="P1Y8M")
results

Unnamed: 0,id,label,observed,measured
0,HP:0100529,Abnormal blood phosphate concentration,False,True


<h2>Imaging examinations</h2>

In [16]:
v6 = """
Cranial radiography showed that he had increased thickness of the craniofacial bones with obstructed 
sinuses and narrowing ears and other cavities (Figure 1A-C). Specifically, the middle ear cavities 
appeared narrowed, while the lumen of the labyrinth (the vestibule, semicircular canal and cochlea) 
appeared sclerotic. His mastoid cavity disappeared without tympanic cavity effusion. The width of left 
bony optic canal was 3.93 mm and the width on the right side was 4.17 mm. Prominent thickening of the 
calvaria was apparent and sclerosis of the skull base and maxilla was noted. The patient’s frontal and 
maxillary sinuses were severely obstructed with marked thickening of the cranial walls. His forehead 
bone was 14.22 mm thick. Thickening of his middle turbinate and inferior turbinate was also obvious. 
The patient’s nasal septum thickness was 8.44 mm, causing his nose and choanae to become significantly 
narrowed. His frontal sinus and maxillary sinus were invaded by dysplastic tissue. Marked thickening 
of the cranial walls was also noted.
Sclerosis of the clavicles and ribs was observed by computed tomography (CT) and X-ray (Figure 1E). 
Metaphyseal flaring in the distal femora due to a modeling defect was the most pronounced abnormality. 
“Flask deformation” of the proximal metaphysis on both sides (Erlenmeyer flask configuration) (Figure 1F) 
was also noted. When the patient was four years and 2 mo, he complained of pain in the legs; thus, long 
bone examination was performed again. CT and X-ray did not demonstrate significant changes compared to 
his previous image (Figure 2). CT of the middle ears showed the absence of effusions. The patient’s 
ossicular chain was large.

"""
d6 = {'Sclerosis of the clavicles': 'Clavicular sclerosis',
     'Erlenmeyer flask configuration': 'Erlenmeyer flask deformity of the femurs'}
results = encoder.add_vignette(vignette=v6, custom_d=d6)
results

Unnamed: 0,id,label,observed,measured
0,HP:0100923,Clavicular sclerosis,True,True
1,HP:0004975,Erlenmeyer flask deformity of the femurs,True,True
2,HP:0002694,Sclerosis of skull base,True,True
3,HP:0003015,Flared metaphysis,True,True
4,HP:0012531,Pain,True,True


<h2>The variant</h2>
<p> A heterozygous mutation c.1129_1131del (NM_054027) was found on exon 9 of the ANKH gene.</p>

In [17]:
genome = 'hg38'
default_genotype = 'heterozygous'
transcript='NM_054027.4'
varValidator = VariantValidator(genome_build=genome, transcript=transcript)
var = varValidator.encode_hgvs(hgvs="c.1129_1131del")

https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_054027.4%3Ac.1129_1131del/NM_054027.4


In [18]:
var.to_string()


'chr5:14716715GGAA>G'

<h3>Individual</h3>
<p>We can now add information about the disease diagnosis and the MetaData and create the Individual and the Phenopacket.</p>


In [19]:
individual_id = "17-mo-old boy"
sex = "MALE"
age = "P1Y8M" 
disease_id = "OMIM:118600"
disease_label = "Chondrocalcinosis 2"

In [9]:
metadata = MetaData(created_by="ORCID:0000-0002-0736-9199")
metadata.default_versions_with_hpo(version=hpo_version)
json_string = MessageToJson(metadata.to_ga4gh())
print(json_string)

{
  "created": "2023-01-09T09:52:45.737272977Z",
  "createdBy": "ORCID:0000-0002-0736-9199",
  "resources": [
    {
      "id": "geno",
      "name": "Genotype Ontology",
      "url": "http://purl.obolibrary.org/obo/geno.owl",
      "version": "2022-03-05",
      "namespacePrefix": "GENO",
      "iriPrefix": "http://purl.obolibrary.org/obo/GENO_"
    },
    {
      "id": "hgnc",
      "name": "HUGO Gene Nomenclature Committee",
      "url": "https://www.genenames.org",
      "version": "06/01/23",
      "namespacePrefix": "HGNC",
      "iriPrefix": "https://www.genenames.org/data/gene-symbol-report/#!/hgnc_id/"
    },
    {
      "id": "omim",
      "name": "An Online Catalog of Human Genes and Genetic Disorders",
      "url": "https://www.omim.org",
      "version": "January 4, 2023",
      "namespacePrefix": "OMIM",
      "iriPrefix": "https://www.omim.org/entry/"
    },
    {
      "id": "hp",
      "name": "human phenotype ontology",
      "url": "http://purl.obolibrary.org/obo/hp.

In [22]:
ppacket = encoder.get_phenopacket(individual_id=individual_id, sex=sex, age=age, disease_id=disease_id, 
                                  disease_label=disease_label, variants=var, metadata=metadata.to_ga4gh())

In [23]:
json_string = MessageToJson(ppacket)
print(json_string)

{
  "id": "PMID_33748234_17-mo-old_boy",
  "subject": {
    "id": "17-mo-old boy",
    "timeAtLastEncounter": {
      "age": {
        "iso8601duration": "P1Y8M"
      }
    },
    "sex": "MALE"
  },
  "phenotypicFeatures": [
    {
      "type": {
        "id": "HP:0000365",
        "label": "Hearing impairment"
      },
      "onset": {
        "age": {
          "iso8601duration": "P1Y5M"
        }
      }
    },
    {
      "type": {
        "id": "HP:0001742",
        "label": "Nasal congestion"
      },
      "onset": {
        "age": {
          "iso8601duration": "P1Y5M"
        }
      }
    },
    {
      "type": {
        "id": "HP:0025267",
        "label": "Snoring"
      },
      "onset": {
        "age": {
          "iso8601duration": "P1Y5M"
        }
      }
    },
    {
      "type": {
        "id": "HP:0004349",
        "label": "Reduced bone mineral density"
      },
      "onset": {
        "age": {
          "iso8601duration": "P1Y5M"
        }
      }
    },
    {

In [25]:
output_directory = "phenopackets"
encoder.output_phenopacket(outdir=output_directory, phenopacket=ppacket)

Wrote phenopacket to phenopackets/PMID_33748234_17-mo-old_boy
