<H1>FBN1: acromicric and geleophysic dysplasia (Le Goff, 2011)</H1>
<p>Extract phenopackets from the clinical data in <a href="https://pubmed.ncbi.nlm.nih.gov/21683322/" target="__blank">Le Goff et al (2011)</a>.</p>

In [1]:
import phenopackets as php
from google.protobuf.json_format import MessageToDict, MessageToJson
from google.protobuf.json_format import Parse, ParseDict
import pandas as pd
pd.set_option('display.max_colwidth', None) # show entire column contents, important!
from collections import defaultdict
from pyphetools.creation import *
# last tested with pyphetools version 0.2.22

In [2]:
parser = HpoParser()
hpo_cr = parser.get_hpo_concept_recognizer()
hpo_version = parser.get_version()
metadata = MetaData(created_by="ORCID:0000-0002-0736-9199")
metadata.default_versions_with_hpo(version=hpo_version)

In [3]:
df = pd.read_excel('input/LeGoff_FBN1_AD_GD.xlsx')

<H2>Geleophysic dysplasia and Acromicric dysplasia: CLinical description</H2>
<p>According to Le Goff et al., Geleophysic dysplasia (GD, [MIM 231050]) and acromicric dysplasia (AD, [MIM 102370]) belong to the acromelic dysplasia group and are both characterized by severe short stature (&lt;−3 standard deviations [SD]), short hands and feet, joint limitations, and skin thickening.1
Radiological manifestations include delayed bone age, cone-shaped epiphyses, shortened long tubular bones, and ovoid vertebral bodies. GD is distinct from AD because it has an autosomal-recessive mode of inheritance, characteristic facial features—a “happy” face with full cheeks, a shortened nose, hypertelorism, a long and flat philtrum, and a thin upper lip—a progressive cardiac valvular thickening often leading to an early death, toe walking, tracheal stenosis, respiratory insufficiency, and lysosomal-like storage vacuoles in various tissues.</p> 
<p>In this notebook, we will first extract the data for Geleophysic dysplasia.</p>

<H3>Geleophysic dysplasia</H3>
<p>According to the authors, there were "Nineteen GD cases were included in the study, and they all fulfilled the diagnostic criteria, namely short stature &lt;−3 SD, short hands and feet, restricted joint mobility, characteristic facial features, and progressive cardiac involvement (Table 1, Figure 1). </p>
<p>Detailed clinical data are not available, but according to this description, we will assume that each proband had "progressive dilation and thickening of the pulmonary, aortic, and mitral valves, with stenosis of these three valves (See the <a href="https://www.ncbi.nlm.nih.gov/books/NBK11168/">GeneReview</a> entry for geleophysic dysplasia) as well as the above named facial features.</p>

In [4]:
gd_df = df[df['Diagnosis']=="GD"]
gd_df.set_index("Family", inplace=True)
gd_df

Unnamed: 0_level_0,Origin,Diagnosis,Age (Years),Height,Cardiac Involvement,Other,HGVS,protein
Family,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1,Belgium,GD,Death at 9,<−6 SD (80 cm),mitral stenosis and insufficiency,tracheotomy at 3,c.5087A>G,p.Tyr1696Cys
2,France,GD,18,<−6 SD (112 cm),mitral stenosis,"HTAP, respiratory insufficiency,\n\nhepatomegaly, laryngeal stenosis",c.5096A>G,p.Tyr1699Cys
3,Russia,GD,12,<−6 SD (106 cm),no,hepatomegaly,c.5284G>A,p.Gly1762Ser
4,Switzerland,GD,21,<−6 SD (116 cm),"tricuspid stenosis, mild aortic insufficiency",,c.5096A>G,p.Tyr1699Cys
5,Russia,GD,8,−4 SD (103.5 cm),no,,c.5284G>A,p.Gly1762Ser
6,France,GD,5.7,−4 SD (97 cm),no,laryngeal and respiratory insufficiency,c.5284G>A,p.Gly1762Ser
7,U.K.,GD,Death at 3,−5 SD (75 cm),no,"respiratory insufficiency, HTAP,\n\nSleep apnea",c.5087A>G,p.Tyr1696Cys
8,Turkey,GD,4.5,−4 SD (85 cm),mitral and tricuspide stenosis,"respiratory insufficiency, hepatomegaly, spleep apnea",c.5096A>G,p.Tyr1699Cys
9,Algeria,GD,Death at 4,<−6 SD (60 cm),mitral and tricuspide stenosis,"laryngeal and respiratory insufficiency, HTAP",c.5117G>A,p.Cys1706Tyr
10,Lebanon,GD,14,−3.5 SD (133 cm),no,–,c.5157C>G,p.Cys1719Trp


<h3>Description of GD cases</h3>
<p>Nineteen GD cases were included in the study, and they all fulfilled the diagnostic criteria, namely short stature &lt;−3 SD, short hands and feet, restricted joint mobility, characteristic facial features, and progressive cardiac involvement.</p>
<p>From this description, we will assume that all patients have the following HPO terms in addition to the terms implied by the table (which was copied with minor modifications from Table 1 of the original paper).</p>
<ol>
    <li>Short stature HP:0004322</li>
    <li>Short palm HP:0004279</li>
    <li>Short foot HP:0001773</li>
    <li>Limitation of joint mobility HP:0001376</li>
    <li>Full cheeks HP:0000293</li>
    <li>Short nose HP:0003196</li>
    <li>Hypertelorism HP:0000316</li>
    <li>Long philtrum HP:0000343</li>
    <li>Smooth philtrum HP:0000319</li>
    <li>Thin upper lip vermilion HP:0000219</li>
</ol>
<p>See above for the list of characteristic facial features that we include here.</p>    

In [5]:
column_mapper_d = defaultdict(ColumnMapper)

In [6]:
constant_features = [
    ["Short stature", "HP:0004322"],
    ["Short palm", "HP:0004279"],
    ["Short foot", "HP:0001773"],
    ["Limitation of joint mobility", "HP:0001376"],
    ["Full cheeks","HP:0000293"],
    ["Short nose", "HP:0003196"],
    ["Hypertelorism","HP:0000316"],
    ["Long philtrum", "HP:0000343"],
    ["Smooth philtrum","HP:0000319"],
    ["Thin upper lip vermilion", "HP:0000219"],
]
# the constant list can be applied to any column. We will apply it to the column "Height", 
# because all individuals have short stature
constMapper = ConstantColumnMapper(term_list=constant_features)
#statureMapper.preview_column(gd_df["Height"])
column_mapper_d["Height"] = constMapper 

In [7]:
# Cardiac Involvement
# Note: Change entries to enable text mining
gd_df.at[1,"Cardiac Involvement"] = "Mitral stenosis; Mitral regurgitation" # was mitral stenosis and insufficiency
gd_df.at[8,"Cardiac Involvement"] = "Mitral stenosis; Tricuspid stenosis"  # was mitral and tricuspide stenosis
gd_df.at[9,"Cardiac Involvement"] = "Mitral stenosis; Tricuspid stenosis"  # was mitral and tricuspide stenosis
gd_df.at[15,"Cardiac Involvement"] = "Aortic stenosis; Mitral regurgitation; Aortic regurgitation"

#mitral and aortic valve insufficiencies
cardiacMapper = CustomColumnMapper(concept_recognizer=hpo_cr)
cardiacMapper.preview_column(gd_df["Cardiac Involvement"])
column_mapper_d["Cardiac Involvement"] = cardiacMapper

In [8]:
# Other
otherMap = CustomColumnMapper(concept_recognizer=hpo_cr)
otherMap.preview_column(gd_df["Other"])
column_mapper_d["Other"] = otherMap

<h2>Variants</h2>
<p>Note that there is an error in Table 2 of the original publication. NM_000138.5:c.5251T>G: Variant reference (T) does not agree with reference sequence (C). However, according to Clinvar the variant should be
NM_000138.5(FBN1):c.5250T>G (p.Ser1750Arg). We corrected this in the input file.</p>

In [9]:
transcript = "NM_000138.5"

genome = 'hg38'
transcript='NM_000138.5' # FBN1
varMapper = VariantColumnMapper(assembly=genome,
                                column_name='HGVS', 
                                transcript=transcript, 
                                default_genotype='heterozygous')

In [10]:
sexMapper = SexColumnMapper.not_provided()
ageMapper = AgeColumnMapper.by_year("Age (Years)")
pmid = "PMID:21683322"
encoder = CohortEncoder(df=gd_df, 
                        hpo_cr=hpo_cr, 
                        column_mapper_d=column_mapper_d, 
                        individual_column_name="Family", 
                        agemapper=ageMapper, 
                        sexmapper=sexMapper,
                        variant_mapper=varMapper,
                        metadata=metadata,
                        pmid=pmid)
omim_id = "OMIM:614185"
omim_label = "Geleophysic dysplasia 2"
encoder.set_disease(disease_id=omim_id, label=omim_label)

In [11]:
output_directory = "phenopackets"
encoder.output_phenopackets(outdir=output_directory)

https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_000138.5%3Ac.5087A>G/NM_000138.5?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_000138.5%3Ac.5096A>G/NM_000138.5?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_000138.5%3Ac.5284G>A/NM_000138.5?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_000138.5%3Ac.5096A>G/NM_000138.5?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_000138.5%3Ac.5284G>A/NM_000138.5?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_000138.5%3Ac.5284G>A/NM_000138.5?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_000138.5%3Ac.5087A>G/NM_000138.5?content-type=application%2Fjson
https://rest.

<H1>Acromicric dysplasia</H1>

In [12]:
ad_df = df[df['Diagnosis']=="AD"]
ad_df.set_index("Family", inplace=True)
ad_df

Unnamed: 0_level_0,Origin,Diagnosis,Age (Years),Height,Cardiac Involvement,Other,HGVS,protein
Family,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
20,France,AD,10,−6 SD (99 cm),no,-,c.5182G>A,p.Ala1728Thr
21,France,AD,62,−6 SD (125 cm),no,broncho-pulmonary infection,c.5165C>G,p.Ser1722Cys
22-a,France,AD,10,−3 SD (121 cm),no,,c.5250T>G,p.Ser1750Arg
22-b,France,AD,13,−3.5 SD (128 cm),no,,c.5250T>G,p.Ser1750Arg
22-c,France,AD,40,−6 SD (128 cm),no,mother of 22 a and 22b,c.5250T>G,p.Ser1750Arg
23,Belgium,AD,14,<−6 SD (111 cm),no,broncho-pulmonary infection,c.5177G>T,p.Gly1726Val
24,Netherlands,AD,36,<−6 SD (119cm),no,"carpal tunnel syndrome, laminectomy C1-C3 for cervical spine stenosis",c.5096A>G,p.Tyr1699Cys
25,France,AD,13,<−6 SD (104cm),no,,c.5202_5204dup,p.Gln1735dup
26,Italy,AD,43,<−6 SD (129cm),no,,c.5273A>T,p.Asp1758Val
27-a,China,AD,10,−4 SD (117 cm),no,,c.5099A>G,p.Tyr1700Cys


In [13]:
column_mapper_d = defaultdict(ColumnMapper)

<h3>Constant features</h3>
<p>Ten AD cases, including two familial cases, were included in the study. They all fulfilled the diagnostic criteria for AD, namely severe short stature, short hands and feet, progressively stiff joints, and characteristic facial features. AD has an autosomal-dominant mode of inheritance and is characterized by distinct facial features—a round face, well-defined eyebrows, long eyelashes, a bulbous nose with anteverted nostrils, a long and prominent philtrum, and thick lips with a small mouth—a hoarse voice, a pseudomuscular build, and distinct skeleton features, including an internal notch of the femoral head, an internal notch of the second metacarpal, and the external notch of the fifth metacarpal.</p>
<p>From this description, we will assume that all patients have the following HPO terms in addition to the terms implied by the table (which was copied with minor modifications from Table 1 of the original paper).</p>
<ol>
     <li>Short stature HP:0004322</li>
    <li>Short palm HP:0004279</li>
    <li>Short foot HP:0001773</li>
    <li>Joint stiffness HP:0001387</li>
    <li>Round face HP:0000311</li>
    <li>Long eyelashes HP:0000527</li>
    <li>Bulbous nose HP:0000414</li>
    <li>Anteverted nares HP:0000463</li>
    <li>Internal notch of the femoral head HP:0031027</li>
</ol>

In [14]:
constant_features = [
    ["Short stature", "HP:0004322"],
    ["Short palm", "HP:0004279"],
    ["Short foot", "HP:0001773"],
    ["Joint stiffness", "HP:0001387"],
    ["Round face", "HP:0000311"],
    ["Long eyelashes", "HP:0000527"],
    ["Bulbous nose", "HP:0000414"],
    ["Anteverted nares", "HP:0000463"],
    ["Internal notch of the femoral head", "HP:0031027"]
]
# the constant list can be applied to any column. We will apply it to the column "Height", 
# because all individuals have short stature
constMapper = ConstantColumnMapper(term_list=constant_features)
#statureMapper.preview_column(gd_df["Height"])
column_mapper_d["Height"] = constMapper 

In [15]:
# No cardiac involvement, in particular, no "Mitral stenosis; Tricuspid stenosis; Aortic stenosis"
constant_excluded_features = [
    ["Mitral stenosis",  "HP:0001718"],
    ["Tricuspid stenosis", "HP:0010446"],
    ["Aortic valve stenosis","HP:0001650"]
]
# the constant list can be applied to any column. We will apply it to the column "Height", 
# because all individuals have short stature
constExcludedMapper = ConstantColumnMapper(term_list=constant_excluded_features, excluded=True)
#constExcludedMapper.preview_column(gd_df["Origin"])
column_mapper_d["Origin"] = constExcludedMapper 

In [16]:
# Other
otherMap = CustomColumnMapper(concept_recognizer=hpo_cr)
otherMap.preview_column(gd_df["Other"])
column_mapper_d["Other"] = otherMap

In [17]:
transcript = "NM_000138.5"

genome = 'hg38'
transcript='NM_000138.5' # FBN1
varMapper = VariantColumnMapper(assembly=genome,
                                column_name='HGVS', 
                                transcript=transcript, 
                                default_genotype='heterozygous')

In [18]:
sexMapper = SexColumnMapper.not_provided()
ageMapper = AgeColumnMapper.by_year("Age (Years)")
pmid = "PMID:21683322"
encoder = CohortEncoder(df=gd_df, 
                        hpo_cr=hpo_cr, 
                        column_mapper_d=column_mapper_d, 
                        individual_column_name="Family", 
                        agemapper=ageMapper, 
                        sexmapper=sexMapper,
                        variant_mapper=varMapper,
                        metadata=metadata,
                        pmid=pmid)
omim_id = "OMIM:102370"
omim_label = "Acromicric dysplasia"
encoder.set_disease(disease_id=omim_id, label=omim_label)

In [19]:
output_directory = "phenopackets"
encoder.output_phenopackets(outdir=output_directory)

https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_000138.5%3Ac.5087A>G/NM_000138.5?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_000138.5%3Ac.5096A>G/NM_000138.5?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_000138.5%3Ac.5284G>A/NM_000138.5?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_000138.5%3Ac.5096A>G/NM_000138.5?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_000138.5%3Ac.5284G>A/NM_000138.5?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_000138.5%3Ac.5284G>A/NM_000138.5?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_000138.5%3Ac.5087A>G/NM_000138.5?content-type=application%2Fjson
https://rest.