<h1>GLI3: Demurger et al 2015</h1>
<p>Extract the clinical data from <a href="https://pubmed.ncbi.nlm.nih.gov/24736735/"target="__blank">Démurger F, et al. (2015) New insights into genotype-phenotype correlation for GLI3 mutations. Eur J Hum Genet ;23(1):92-102. PMID:24736735</a>.<p>
<p>Table 1 (and Supplemental Table 1) present data for Greig cephalopolysyndactyly syndrome (GCPS; MIM# 175700).</p>
<p>Table 2 (and Supplemental Table 2) present data for Pallister–Hall syndrome (PHS; MIM# 146510).</p>

In [1]:
import pandas as pd
pd.set_option('display.max_colwidth', None) # show entire column contents, important!
from IPython.display import display, HTML
from pyphetools.creation import *
from pyphetools.visualization import *
from pyphetools.validation import *
import pyphetools
print(f"Using pyphetools version {pyphetools.__version__}")

Using pyphetools version 0.9.37


In [2]:
PMID = "PMID:24736735"
title = "New insights into genotype-phenotype correlation for GLI3 mutations"
cite = Citation(pmid=PMID, title=title)
parser = HpoParser(hpo_json_file="../hp.json")
hpo_cr = parser.get_hpo_concept_recognizer()
hpo_version = parser.get_version()
hpo_ontology = parser.get_ontology()
metadata = MetaData(created_by="ORCID:0000-0002-0736-9199", citation=cite)
metadata.default_versions_with_hpo(version=hpo_version)
print(f"HPO version {hpo_version}")

HPO version 2024-01-16


<H2>Greig cephalopolysyndactyly syndrome (GCPS; MIM# 175700)</H2>
<p>c.1543_1544dup was found in two affected sibs, was present at low level in DNA extracted from blood of their father (Family G068), suggesting a somatic mosaicism. We therefore remove the row corresponding to the father from further analysis.</p>
<p>Along the same line, a FISH analysis revealed a GLI3 deletion in only 56% of blood cells of a patient (G059) with bilateral preaxial PD of the feet and developmental delay. At least two patients (G005 and G019) had Greig cephalopolysyndactyly contiguous gene syndrome (GCPS-CGS) caused by haploinsufficiency of GLI3 and adjacent genes confirmed by array-CGH with a deletion of 7 and 9 Mb, respectively.</p>
<p>These individuals were also removed from the analysis because of the multifactorial pathophysiology.</p>
<p>We removed the corresponding rows from the following table.</p>

In [3]:
df1 = pd.read_csv("input/demurger_table_1.csv", delimiter="\t")
df1.head()

Unnamed: 0,N,cDNA alteration,Predicted protein alteration,Inheritance,Postaxial PD,Preaxial PD,Broad thumbs or halluces,Syndactyly,Macrocephaly,Widely spaced eyes,MRI Findings,Developmental delay,Additional findings
0,G029,327del,Phe109Leufs*50,F,-,FB,na,+,-,-,-,-,"Precocious puberty, scaphocephaly"
1,G070,427G>T,Glu143*,F,HB,FL,na,-,+,na,na,-,na
2,G070_Mother,427G>T,Glu143*,F,-,-,na,-,-,-,na,-,na
3,G118,444C>A,Tyr148*,F,-,FB,BT,+,+,na,na,-,na
4,G13684,444C>A,Tyr148*,F,-,FB,na,+,+,+,na,-,na


In [4]:
column_mapper_list = list()

In [5]:
postaxial_d = {'HB': 'Postaxial hand polydactyly',
              'FB': 'Postaxial foot polydactyly',}
excluded_d = {"-":'Postaxial polydactyly'}
postaxialMapper = OptionColumnMapper(column_name="Postaxial PD",
                                     concept_recognizer=hpo_cr, option_d=postaxial_d, excluded_d=excluded_d)
column_mapper_list.append(postaxialMapper)
postaxialMapper.preview_column(df1)

Unnamed: 0,mapping,count
0,Postaxial polydactyly (HP:0100259) (excluded),24
1,Postaxial hand polydactyly (HP:0001162) (observed),24
2,Postaxial foot polydactyly (HP:0001830) (observed),14


In [6]:
preaxial_d = {'HB': 'Preaxial hand polydactyly',
              'FB': 'Preaxial foot polydactyly',
              'FL': 'Preaxial foot polydactyly',}
excluded_d = {"-":'Preaxial polydactyly'}
preaxialMapper = OptionColumnMapper(column_name="Preaxial PD",concept_recognizer=hpo_cr, option_d=preaxial_d, excluded_d=excluded_d)
column_mapper_list.append(preaxialMapper)
preaxialMapper.preview_column(df1)

Unnamed: 0,mapping,count
0,Preaxial foot polydactyly (HP:0001841) (observed),33
1,Preaxial polydactyly (HP:0100258) (excluded),14
2,Preaxial hand polydactyly (HP:0001177) (observed),4


In [7]:
thumb_d = {"BT": "Broad thumb", 
          "BH": "Broad hallux",
          "+": [ "Broad thumb", "Broad hallux"]}
excluded = {"-": "Broad thumb"}
thumbMapper = OptionColumnMapper(column_name="Broad thumbs or halluces",concept_recognizer=hpo_cr, option_d=thumb_d,excluded_d=excluded_d)
column_mapper_list.append(thumbMapper)
thumbMapper.preview_column(df1)

Unnamed: 0,mapping,count
0,Broad thumb (HP:0011304) (observed),13
1,Broad hallux (HP:0010055) (observed),6


In [8]:
syndMapper = SimpleColumnMapper(column_name="Syndactyly",hpo_id="HP:0001159", hpo_label="Syndactyly", observed="+", excluded="-")
column_mapper_list.append(syndMapper)
syndMapper.preview_column(df1)

Unnamed: 0,mapping,count
0,"original value: ""+"" -> HP: Syndactyly (HP:0001159) (observed)",32
1,"original value: ""-"" -> HP: Syndactyly (HP:0001159) (excluded)",17
2,"original value: ""na"" -> HP: Syndactyly (HP:0001159) (not measured)",2


In [9]:
macMapper = SimpleColumnMapper(column_name="Macrocephaly",hpo_id="HP:0000256", hpo_label="Macrocephaly", observed="+", excluded="-")
column_mapper_list.append(macMapper)
macMapper.preview_column(df1)

Unnamed: 0,mapping,count
0,"original value: ""-"" -> HP: Macrocephaly (HP:0000256) (excluded)",20
1,"original value: ""+"" -> HP: Macrocephaly (HP:0000256) (observed)",28
2,"original value: ""na"" -> HP: Macrocephaly (HP:0000256) (not measured)",3


In [10]:
#Widely spaced eyes  Hypertelorism HP:0000316
htMapper = SimpleColumnMapper(column_name="Widely spaced eyes",hpo_id="HP:0000316", hpo_label="Hypertelorism", observed="+", excluded="-")
column_mapper_list.append(htMapper)
htMapper.preview_column(df1)

Unnamed: 0,mapping,count
0,"original value: ""-"" -> HP: Hypertelorism (HP:0000316) (excluded)",24
1,"original value: ""na"" -> HP: Hypertelorism (HP:0000316) (not measured)",7
2,"original value: ""+"" -> HP: Hypertelorism (HP:0000316) (observed)",18
3,"original value: ""VD"" -> HP: Hypertelorism (HP:0000316) (not measured)",2


In [11]:
# MRI Findings
mri_d = {'CCH': 'Hypoplasia of the corpus callosum',
         'CCA': 'Agenesis of corpus callosum',
         'pCCA': 'Partial agenesis of the corpus callosum',
         'VD': 'Ventriculomegaly'}
mriMapper = OptionColumnMapper(column_name="MRI Findings",concept_recognizer=hpo_cr, option_d=mri_d)
column_mapper_list.append(mriMapper)
mriMapper.preview_column(df1)

Unnamed: 0,mapping,count
0,Ventriculomegaly (HP:0002119) (observed),5
1,Hypoplasia of the corpus callosum (HP:0002079) (observed),4
2,Partial agenesis of the corpus callosum (HP:0001338) (observed),3
3,Agenesis of corpus callosum (HP:0001274) (observed),1


In [12]:
dd_d = {'+': 'Global developmental delay',
         'Mild': 'Mild global developmental delay',
         'Bilateral inguinal hernia': 'Inguinal hernia',
         'strabismus': 'Strabismus',
       "Cataract": "Cataract",
       "Seizures":"Seizure",
       "horseshoe kidney": "Horseshoe kidney",
       "Trigonocephaly": "Trigonocephaly",
       "macrosomia": "Macrosomia",
       "vermis dysgenesis": "Dysgenesis of the cerebellar vermis"}
excluded_d = {"–": "Global developmental delay"}
ddMapper = OptionColumnMapper(column_name="Developmental delay",concept_recognizer=hpo_cr, option_d=dd_d, excluded_d=excluded_d)
column_mapper_list.append(ddMapper)
ddMapper.preview_column(df1)

Unnamed: 0,mapping,count
0,Global developmental delay (HP:0001263) (observed),4
1,Mild global developmental delay (HP:0011342) (observed),3
2,Seizure (HP:0001250) (observed),1
3,Strabismus (HP:0000486) (observed),1
4,Horseshoe kidney (HP:0000085) (observed),1


In [13]:
add_d = {'Precocious puberty': 'Precocious puberty',
         'scaphocephaly': 'Scaphocephaly',
         'Delta phalanx': 'Triangular shaped phalanges of the hand',
         'Bifid distal phalanx': 'Partial duplication of the distal phalanges of the hand',
       "Hypoplastic cerebellum": "Cerebellar hypoplasia",
       "microretrognathism":"Microretrognathia",
       "keratoconus": "Keratoconus",
       "umbilical": "Umbilical hernia",
         "Umbilical hernia": "Umbilical hernia",
       "inguinal hernia": "Inguinal hernia",
       "Macrosomia": "Large for gestational age",
        "Brachydactyly":"Brachydactyly",
         "anterior anus": "Anteriorly placed anus",
         "Laryngomalacia":"Laryngomalacia",
         "Supernumerary nipples": "Supernumerary nipple"
        }
addMapper = OptionColumnMapper(column_name="Additional findings",concept_recognizer=hpo_cr, option_d=add_d)
column_mapper_list.append(addMapper)
addMapper.preview_column(df1)

Unnamed: 0,mapping,count
0,Precocious puberty (HP:0000826) (observed),1
1,Scaphocephaly (HP:0030799) (observed),1
2,Triangular shaped phalanges of the hand (HP:0009774) (observed),2
3,Atrial septal defect (HP:0001631) (observed),1
4,Umbilical hernia (HP:0001537) (observed),4
5,Partial duplication of the distal phalanges of the hand (HP:0010004) (observed),1
6,Cerebellar hypoplasia (HP:0001321) (observed),1
7,Microretrognathia (HP:0000308) (observed),1
8,Keratoconus (HP:0000563) (observed),1
9,Inguinal hernia (HP:0000023) (observed),1


In [14]:
struct_variants = { "rsa7p14.1(kit P179)x1",
                    "46,XY.ish del(7)(p14.1)(RP11-816F16-)",
                    "46,XX.ish del(7)(p14.1p14.1)(GLI3-)" }
def get_hgvs(alteration):
    if alteration in struct_variants:
        return alteration
    else:
        return f"c.{alteration}"
    

df1["allele_1"] = df1['cDNA alteration'].apply(lambda x: get_hgvs(x))

<h2>GLI3 Variants</h2>
<p>Variants are provided in table 1 according to NM_000168.6.</p>
<p>Note that the contents of the column "cDNA alteration" do not have the "c." required by HGVS, so we add it to all columns before proceding.</p>

In [15]:
gli3_transcript='NM_000168.6'
genome = 'hg38'
#hgvsMapper = VariantValidator(genome_build=genome, transcript=gli3_transcript)
gli3_symbol = "GLI3"
gli3_id = "HGNC:4319"
vman = VariantManager(df=df1, gene_id=gli3_id, gene_symbol=gli3_symbol, transcript=gli3_transcript, 
                      allele_1_column_name='allele_1',individual_column_name="N")

<h3>Small and Structural variants</h3>
<p>We encode the small variants using HGVS and trhe structural variants using the StructuralVariant class</p>

In [16]:
vman.code_as_chromosomal_deletion(struct_variants)
vman.to_summary()

Unnamed: 0,status,count,alleles
0,mapped,50,"c.1378del, c.1787A>C, c.518dup, c.3559C>T, c.4615_4624del, c.3427_3443del, c.2082_2083delinsAGAGAAGCC, c.1543_1544dup, c.4324C>T, c.997_998dup, c.4431dup, c.1498C>T, c.1874G>A, c.327del, c.4408C>T, c.833_843del, c.1767del, c.1063_1067dup, c.868C>T, c.679+1G>T, c.1786C>T, c.4099dup, c.444C>A, c.427G>T, c.1745del, c.1733G>C, c.4456C>T, c.4463del, c.3950del, c.3098dup, c.2685C>G, c.3006_3007insT, c.3040G>T, c.2149_2150insT, c.2799del, c.2647G>T, c.2941dup, c.3386_3387del, c.2072del, c.2385del, c.2641_2642dup, c.2875_2902del, c.2149C>T, c.2123_2126del, c.3324C>G, c.2121del, c.1995del, 46,XY.ish del(7)(p14.1)(RP11-816F16-), 46,XX.ish del(7)(p14.1p14.1)(GLI3-), rsa7p14.1(kit P179)x1"
1,unmapped,0,


In [17]:
gli3_variant_d = vman.get_variant_d()

In [18]:
variantMapper = VariantColumnMapper(variant_d=gli3_variant_d,
                                    variant_column_name="allele_1",
                                    default_genotype='heterozygous')

In [19]:
encoder = CohortEncoder(df=df1, 
                        hpo_cr=hpo_cr, 
                        column_mapper_list=column_mapper_list, 
                        individual_column_name="N", 
                        metadata=metadata,
                        agemapper=AgeColumnMapper.not_provided(), 
                        sexmapper=SexColumnMapper.not_provided(),
                        variant_mapper=variantMapper)
omim_id = "OMIM:175700"
omim_label = "Greig cephalopolysyndactyly syndrome"
gcps = Disease(disease_id=omim_id, disease_label=omim_label)

encoder.set_disease(disease=gcps)

In [20]:
gcps_individuals = encoder.get_individuals()
cvalidator = CohortValidator(cohort=gcps_individuals, ontology=hpo_ontology, min_hpo=1, allelic_requirement=AllelicRequirement.MONO_ALLELIC)
qc = QcVisualizer(cohort_validator=cvalidator)
display(HTML(qc.to_summary_html()))

Level,Error category,Count
INFORMATION,NOT_MEASURED,14


In [21]:
individuals = cvalidator.get_error_free_individual_list()
phenopackets = [i.to_ga4gh_phenopacket(metadata=metadata.to_ga4gh()) for i in individuals]
table = PhenopacketTable.from_phenopackets(phenopacket_list=phenopackets)
display(HTML(table.to_html()))

Individual,Disease,Genotype,Phenotypic features
G029 (UNKNOWN; ),Greig cephalopolysyndactyly syndrome (OMIM:175700),NM_000168.6:c.327del (heterozygous),Preaxial foot polydactyly (HP:0001841); Syndactyly (HP:0001159); Precocious puberty (HP:0000826); Scaphocephaly (HP:0030799); excluded: Postaxial polydactyly (HP:0100259); excluded: Macrocephaly (HP:0000256); excluded: Hypertelorism (HP:0000316)
G070 (UNKNOWN; ),Greig cephalopolysyndactyly syndrome (OMIM:175700),NM_000168.6:c.427G>T (heterozygous),Postaxial hand polydactyly (HP:0001162); Preaxial foot polydactyly (HP:0001841); Macrocephaly (HP:0000256); excluded: Syndactyly (HP:0001159)
G070_Mother (UNKNOWN; ),Greig cephalopolysyndactyly syndrome (OMIM:175700),NM_000168.6:c.427G>T (heterozygous),excluded: Postaxial polydactyly (HP:0100259); excluded: Preaxial polydactyly (HP:0100258); excluded: Syndactyly (HP:0001159); excluded: Macrocephaly (HP:0000256); excluded: Hypertelorism (HP:0000316)
G118 (UNKNOWN; ),Greig cephalopolysyndactyly syndrome (OMIM:175700),NM_000168.6:c.444C>A (heterozygous),Preaxial foot polydactyly (HP:0001841); Broad thumb (HP:0011304); Syndactyly (HP:0001159); Macrocephaly (HP:0000256); excluded: Postaxial polydactyly (HP:0100259)
G13684 (UNKNOWN; ),Greig cephalopolysyndactyly syndrome (OMIM:175700),NM_000168.6:c.444C>A (heterozygous),Preaxial foot polydactyly (HP:0001841); Syndactyly (HP:0001159); Macrocephaly (HP:0000256); Hypertelorism (HP:0000316); excluded: Postaxial polydactyly (HP:0100259)
G13684_Brother (UNKNOWN; ),Greig cephalopolysyndactyly syndrome (OMIM:175700),NM_000168.6:c.444C>A (heterozygous),Preaxial foot polydactyly (HP:0001841); Syndactyly (HP:0001159); Hypertelorism (HP:0000316); excluded: Postaxial polydactyly (HP:0100259); excluded: Macrocephaly (HP:0000256)
G13684_Mother (UNKNOWN; ),Greig cephalopolysyndactyly syndrome (OMIM:175700),NM_000168.6:c.444C>A (heterozygous),Preaxial foot polydactyly (HP:0001841); Syndactyly (HP:0001159); Macrocephaly (HP:0000256); excluded: Postaxial polydactyly (HP:0100259)
G099 (UNKNOWN; ),Greig cephalopolysyndactyly syndrome (OMIM:175700),NM_000168.6:c.518dup (heterozygous),Preaxial foot polydactyly (HP:0001841); Syndactyly (HP:0001159); Macrocephaly (HP:0000256); excluded: Postaxial polydactyly (HP:0100259); excluded: Hypertelorism (HP:0000316)
G048 (UNKNOWN; ),Greig cephalopolysyndactyly syndrome (OMIM:175700),NM_000168.6:c.679+1G>T (heterozygous),Postaxial hand polydactyly (HP:0001162); Preaxial foot polydactyly (HP:0001841); Preaxial hand polydactyly (HP:0001177); Ventriculomegaly (HP:0002119); Triangular shaped phalanges of the hand (HP:0009774)
G15198 (UNKNOWN; ),Greig cephalopolysyndactyly syndrome (OMIM:175700),NM_000168.6:c.833_843del (heterozygous),Postaxial hand polydactyly (HP:0001162); Postaxial foot polydactyly (HP:0001830); Preaxial foot polydactyly (HP:0001841); Preaxial hand polydactyly (HP:0001177); Broad thumb (HP:0011304); Syndactyly (HP:0001159); Macrocephaly (HP:0000256); Hypertelorism (HP:0000316)


In [22]:
Individual.output_individuals_as_phenopackets(individual_list=gcps_individuals, 
                                              metadata=metadata)

We output 51 GA4GH phenopackets to the directory phenopackets


<h1> Pallister–Hall syndrome (PHS; MIM# 146510)</h1>
<p>The second half of this notebook extracts data about PHS from supplemental table 2.</p>

In [23]:
df2 = pd.read_csv("input/demurger_table_2.csv", delimiter="\t")
df2.head()

Unnamed: 0,N,cDNA,Predicted protein alteration,Inheritance,Growth delay/GH deficiency,Insertional/postaxial PD,Brachytelephalangism/dactyly,Y-shaped metacarpal/metatarsal,Hypothalamic hamartoma,Craniofacial anomalies,Anal atresia,Bifid epiglottis,Cardiac anomalies,Renal anomalies,Genital anomalies,Lung dysplasia,Intellectual deficiency,Nail dysplasia,Other findings
0,P15112,1995del,Gly666Alafs*27,De novo,-,-,+,+,+,+,+,+,-,-,-,-,-,+,"Overlapping toes, preauricular tag"
1,G097,2072del,Gln691Argfs*2,De novo,+,+,na,+,+,-,-,na,-,-,+,-,-,na,"Micropenis, thin CC"
2,G085,2123_2126del,Gly708Valfs*24,Familial,-,+,na,+,na,-,-,na,-,na,-,na,-,+,na
3,Father,Gly708Valfs*24,Gly708Valfs*24,na,+,+,na,+,na,-,-,na,-,na,+,na,-,na,Micropenis
4,Aunt,Gly708Valfs*24,Gly708Valfs*24,na,na,+,+,na,na,-,-,na,-,na,-,na,-,+,na


<p>Note that the HPO parser and the metadata object can be reused.</p>

In [24]:
generator = SimpleColumnMapperGenerator(df=df2, observed="+", excluded="-", hpo_cr=hpo_cr)
column_mapper_list = generator.try_mapping_columns()
display(HTML(generator.to_html()))

Result,Columns
Mapped,Hypothalamic hamartoma; Anal atresia; Bifid epiglottis; Cardiac anomalies; Renal anomalies; Genital anomalies; Nail dysplasia
Unmapped,N; cDNA; Predicted protein alteration; Inheritance; Growth delay/GH deficiency; Insertional/postaxial PD; Brachytelephalangism/dactyly; Y-shaped metacarpal/metatarsal; Craniofacial anomalies; Lung dysplasia; Intellectual deficiency; Other findings


In [25]:
# Growth delay HP:0001510
label_d = {
    'Growth delay/GH deficiency': ["Growth delay", "HP:0001510"],
    'Brachytelephalangism/dactyly': ["Shortening of all distal phalanges of the fingers", "HP:0006118"], # Brachyteledactylyly
    'Craniofacial anomalies' : ["Abnormality of the face", "HP:0000271"],
    'Lung dysplasia': ["Abnormal lung development", "HP:4000059"],
    'Intellectual deficiency': ["Intellectual disability", "HP:0001249"]
}

for k, v in label_d.items():
    column_name = k
    print(column_name)
    hpo_label = v[0]
    hpo_id = v[1]
    mapper = SimpleColumnMapper(column_name=column_name, hpo_id=hpo_id, hpo_label=hpo_label, observed="+", excluded="-")
    column_mapper_list.append(mapper)

Growth delay/GH deficiency
Brachytelephalangism/dactyly
Craniofacial anomalies
Lung dysplasia
Intellectual deficiency


In [26]:
# 'Y-shaped metacarpal/metatarsal'
# Y-shaped metacarpals HP:0006042
# Y-shaped metatarsals HP:0010567
y_d = {'+': ["Y-shaped metacarpals", "Y-shaped metatarsals"]}
y_excluded = {"-": ["Y-shaped metacarpals", "Y-shaped metatarsals"]}
yMapper = OptionColumnMapper(column_name='Y-shaped metacarpal/metatarsal',
                             concept_recognizer=hpo_cr, option_d=y_d, excluded_d=y_excluded)
column_mapper_list.append(yMapper)
yMapper.preview_column(df2)

Unnamed: 0,mapping,count
0,Y-shaped metacarpals (HP:0006042) (observed),15
1,Y-shaped metatarsals (HP:0010567) (observed),15
2,Y-shaped metacarpals (HP:0006042) (excluded),3
3,Y-shaped metatarsals (HP:0010567) (excluded),3


In [27]:
other_findings_d = {'Overlapping toes': 'Overlapping toe',
 'preauricular tag': 'Preauricular skin tag',
 'Micropenis': 'Micropenis',
 'thin CC': 'Thin corpus callosum',
 'Sacrococcygeal teratoma': 'Sacrococcygeal teratoma',
 'conical teeth': 'Conical tooth',
 'cryptorchidism': 'Cryptorchidism',
 'micropenis': 'Micropenis',
 'syndactyly': 'Syndactyly',
 'unilateral renal agenesis': 'Unilateral renal agenesis',
 'fine motor delay': 'Motor delay',
 'choanal atresia': 'Choanal atresia',
 'fine DD': 'Global developmental delay',
 'Scoliosis': 'Scoliosis',
 'dental malposition': 'Tooth malposition',
 'Oligohydramnios': 'Oligohydramnios',
 'Seizures': 'Seizure',
 'panhypopituitarism': 'Panhypopituitarism',
 'renal hypoplasia': 'Renal hypoplasia',
 'Syndactyly': 'Syndactyly',
 'Agnathia': 'Mandibular aplasia',
 'hypoplastic maxillary': 'Hypoplasia of the maxilla',
 #'absence of oral orifice': 'PLACEHOLDER',
 'bilateral choanal atresia': 'Bilateral choanal atresia',
 'oligosyndactyly': 'Syndactyly',
 'arthrogryposis': 'Arthrogryposis multiplex congenita',
 'mesomelia bilateral radio-ulnar bowing': 'Mesomelia',
 'absence of tibia and fibula': 'Absent tibia',
 'bilateral renal agenesis': 'Bilateral renal agenesis',
 'pituitary gland agenesis': 'Anterior pituitary agenesis',
 'adrenal agenesis': 'Renal agenesis',
 'uterovaginal aplasia': 'Aplasia of the uterus',
 #'AVC': 'PLACEHOLDER',
 'CCA': 'Agenesis of corpus callosum',
 'microcephaly': 'Microcephaly',
 'Posterior cleft palate': 'Cleft palate',
 'micrognathia': 'Micrognathia',
 'micromelia': 'Micromelia',
 'club feet': 'Talipes equinovarus',
 'adrenal gland hypoplasia': 'Adrenal hypoplasia',
 'anteposed anus': 'Anteriorly placed anus',
 'Bilateral choanal atresia': 'Bilateral choanal atresia',
 'retrognathia': 'Retrognathia',
 'posterior cleft palate': 'Cleft palate',
 #'ear dysplasia': 'PLACEHOLDER',
 #'cervical chondroma': 'PLACEHOLDER',
 'adrenal and pituitary gland agenesis': 'Adrenal gland agenesis',
 #'abnormal aortic arch': 'PLACEHOLDER',
 'Premaxillary agenesis': 'Aplasia of the premaxilla',
 'microretrognathism': 'Microretrognathia',
 'arhinencephaly': 'Arrhinencephaly',
 'hygroma colli': 'Cystic hygroma',
 'intestinal malrotation': 'Intestinal malrotation',
 #'IAC': 'PLACEHOLDER',
 'adrenal gland agenesis': 'Adrenal gland agenesis',
 'Hypertelorism': 'Hypertelorism',
 'retrognatism': 'Retrognathia',
 'cleft palate': 'Cleft palate',
 'abnormal metacarpals': 'Abnormal metacarpal morphology',
# 'Limited ankle mobility': 'PLACEHOLDER',
 'hypopituitarism': 'Hypopituitarism',
 'hypospadias': 'Hypospadias',
 'speech delay': 'Delayed speech and language development',
 'gelastic seizures': 'Focal emotional seizure with laughing'}
other_findingsMapper = OptionColumnMapper(column_name='Other findings',concept_recognizer=hpo_cr, option_d=other_findings_d)
column_mapper_list.append(other_findingsMapper)
other_findingsMapper.preview_column(df2)

Unnamed: 0,mapping,count
0,Overlapping toe (HP:0001845) (observed),1
1,Preauricular skin tag (HP:0000384) (observed),1
2,Micropenis (HP:0000054) (observed),8
3,Thin corpus callosum (HP:0033725) (observed),1
4,Sacrococcygeal teratoma (HP:0030736) (observed),1
5,Conical tooth (HP:0000698) (observed),1
6,Cryptorchidism (HP:0000028) (observed),1
7,Syndactyly (HP:0001159) (observed),8
8,Unilateral renal agenesis (HP:0000122) (observed),3
9,Motor delay (HP:0001270) (observed),1


<h2>GLI3 variants</h2>

In [28]:
def get_allele(variant):
    """Fix HGVS notation from the original file.
    """
    if variant == "Gly708Valfs*24":
        return "c.2121del" 
    else:
        return f"c.{variant}"

df2["allele_1"] = df2['cDNA'].apply(lambda x: get_allele(x))

In [29]:
vman = VariantManager(df=df2, gene_id=gli3_id, gene_symbol=gli3_symbol, transcript=gli3_transcript, 
                      allele_1_column_name='allele_1',individual_column_name="N")

In [30]:
gli3_variants2 = vman.get_variant_d()
variantMapper = VariantColumnMapper(variant_d=gli3_variant_d,
                                   variant_column_name="allele_1",
                                    default_genotype='heterozygous')

In [31]:
encoder = CohortEncoder(df=df2, 
                        hpo_cr=hpo_cr, 
                        column_mapper_list=column_mapper_list, 
                        individual_column_name="N", 
                        metadata=metadata,
                        agemapper=AgeColumnMapper.not_provided(), 
                        sexmapper=SexColumnMapper.not_provided(),
                        variant_mapper=variantMapper)
phs = Disease(disease_id="OMIM:146510", disease_label="Pallister-Hall syndrome")
encoder.set_disease(disease=phs)

In [32]:
phs_individuals = encoder.get_individuals()
cvalidator = CohortValidator(cohort=phs_individuals, ontology=hpo_ontology, min_hpo=1, allelic_requirement=AllelicRequirement.MONO_ALLELIC)
qc = QcVisualizer(cohort_validator=cvalidator)
display(HTML(qc.to_summary_html()))

Level,Error category,Count
ERROR,CONFLICT,2
WARNING,REDUNDANT,23
INFORMATION,NOT_MEASURED,63


In [33]:
phs_individuals = cvalidator.get_error_free_individual_list()
phenopackets = [i.to_ga4gh_phenopacket(metadata=metadata.to_ga4gh()) for i in phs_individuals]
table = PhenopacketTable(phenopacket_list=phenopackets)
display(HTML(table.to_html()))

Individual,Disease,Genotype,Phenotypic features
P15112 (UNKNOWN; ),Pallister-Hall syndrome (OMIM:146510),NM_000168.6:c.1995del (heterozygous),Hypothalamic hamartoma (HP:0002444); Anal atresia (HP:0002023); Bifid epiglottis (HP:0010564); Nail dysplasia (HP:0002164); Shortening of all distal phalanges of the fingers (HP:0006118); Y-shaped metacarpals (HP:0006042); Y-shaped metatarsals (HP:0010567); Overlapping toe (HP:0001845); Preauricular skin tag (HP:0000384); excluded: Abnormal heart morphology (HP:0001627); excluded: Abnormality of the kidney (HP:0000077); excluded: Abnormality of the genital system (HP:0000078); excluded: Growth delay (HP:0001510); excluded: Abnormal lung development (HP:4000059); excluded: Intellectual disability (HP:0001249)
G097 (UNKNOWN; ),Pallister-Hall syndrome (OMIM:146510),NM_000168.6:c.2072del (heterozygous),Hypothalamic hamartoma (HP:0002444); Growth delay (HP:0001510); Y-shaped metacarpals (HP:0006042); Y-shaped metatarsals (HP:0010567); Micropenis (HP:0000054); Thin corpus callosum (HP:0033725); excluded: Anal atresia (HP:0002023); excluded: Abnormal heart morphology (HP:0001627); excluded: Abnormality of the kidney (HP:0000077); excluded: Abnormality of the face (HP:0000271); excluded: Abnormal lung development (HP:4000059); excluded: Intellectual disability (HP:0001249)
G085 (UNKNOWN; ),Pallister-Hall syndrome (OMIM:146510),NM_000168.6:c.2123_2126del (heterozygous),Nail dysplasia (HP:0002164); Y-shaped metacarpals (HP:0006042); Y-shaped metatarsals (HP:0010567); excluded: Anal atresia (HP:0002023); excluded: Abnormal heart morphology (HP:0001627); excluded: Abnormality of the genital system (HP:0000078); excluded: Growth delay (HP:0001510); excluded: Abnormality of the face (HP:0000271); excluded: Intellectual disability (HP:0001249)
Father (UNKNOWN; ),Pallister-Hall syndrome (OMIM:146510),NM_000168.6:c.2121del (heterozygous),Growth delay (HP:0001510); Y-shaped metacarpals (HP:0006042); Y-shaped metatarsals (HP:0010567); Micropenis (HP:0000054); excluded: Anal atresia (HP:0002023); excluded: Abnormal heart morphology (HP:0001627); excluded: Abnormality of the face (HP:0000271); excluded: Intellectual disability (HP:0001249)
Aunt (UNKNOWN; ),Pallister-Hall syndrome (OMIM:146510),NM_000168.6:c.2121del (heterozygous),Nail dysplasia (HP:0002164); Shortening of all distal phalanges of the fingers (HP:0006118); excluded: Anal atresia (HP:0002023); excluded: Abnormal heart morphology (HP:0001627); excluded: Abnormality of the genital system (HP:0000078); excluded: Abnormality of the face (HP:0000271); excluded: Intellectual disability (HP:0001249)
Grand-mother (UNKNOWN; ),Pallister-Hall syndrome (OMIM:146510),NM_000168.6:c.2121del (heterozygous),Nail dysplasia (HP:0002164); excluded: Anal atresia (HP:0002023); excluded: Abnormal heart morphology (HP:0001627); excluded: Abnormality of the genital system (HP:0000078); excluded: Abnormality of the face (HP:0000271); excluded: Intellectual disability (HP:0001249)
G121 (UNKNOWN; ),Pallister-Hall syndrome (OMIM:146510),NM_000168.6:c.2149_2150insT (heterozygous),Y-shaped metacarpals (HP:0006042); Y-shaped metatarsals (HP:0010567); excluded: Anal atresia (HP:0002023); excluded: Abnormal heart morphology (HP:0001627); excluded: Abnormality of the kidney (HP:0000077); excluded: Abnormality of the genital system (HP:0000078); excluded: Nail dysplasia (HP:0002164); excluded: Growth delay (HP:0001510); excluded: Abnormality of the face (HP:0000271); excluded: Intellectual disability (HP:0001249)
G001 (UNKNOWN; ),Pallister-Hall syndrome (OMIM:146510),NM_000168.6:c.2149C>T (heterozygous),Hypothalamic hamartoma (HP:0002444); Anal atresia (HP:0002023); Growth delay (HP:0001510); Shortening of all distal phalanges of the fingers (HP:0006118); Y-shaped metacarpals (HP:0006042); Y-shaped metatarsals (HP:0010567); Sacrococcygeal teratoma (HP:0030736); Conical tooth (HP:0000698); Cryptorchidism (HP:0000028); Micropenis (HP:0000054); Syndactyly (HP:0001159); Unilateral renal agenesis (HP:0000122); Motor delay (HP:0001270); excluded: Abnormal heart morphology (HP:0001627); excluded: Intellectual disability (HP:0001249)
G083 (UNKNOWN; ),Pallister-Hall syndrome (OMIM:146510),NM_000168.6:c.2385del (heterozygous),Y-shaped metacarpals (HP:0006042); Y-shaped metatarsals (HP:0010567); excluded: Anal atresia (HP:0002023); excluded: Abnormal heart morphology (HP:0001627); excluded: Abnormality of the kidney (HP:0000077); excluded: Abnormality of the genital system (HP:0000078); excluded: Nail dysplasia (HP:0002164); excluded: Growth delay (HP:0001510); excluded: Abnormality of the face (HP:0000271); excluded: Intellectual disability (HP:0001249)
Father (UNKNOWN; ),Pallister-Hall syndrome (OMIM:146510),NM_000168.6:c.2385del (heterozygous),Y-shaped metacarpals (HP:0006042); Y-shaped metatarsals (HP:0010567); excluded: Anal atresia (HP:0002023); excluded: Abnormal heart morphology (HP:0001627); excluded: Abnormality of the kidney (HP:0000077); excluded: Abnormality of the genital system (HP:0000078); excluded: Nail dysplasia (HP:0002164); excluded: Growth delay (HP:0001510); excluded: Abnormality of the face (HP:0000271); excluded: Intellectual disability (HP:0001249)


In [34]:
Individual.output_individuals_as_phenopackets(individual_list=phs_individuals, 
                                              metadata=metadata,
                                              outdir="phenopackets")

We output 21 GA4GH phenopackets to the directory phenopackets


In [35]:
# pxf validate --hpo ../hpo_data/hp.json *.json
# no errors.