<h1>GLI3: Demurger et al 2015</h1>
<p>Extract the clinical data from <a href="https://pubmed.ncbi.nlm.nih.gov/24736735/"target="__blank">Démurger F, et al. (2015) New insights into genotype-phenotype correlation for GLI3 mutations. Eur J Hum Genet ;23(1):92-102. PMID:24736735</a>.<p>
<p>Table 1 (and Supplemental Table 1) present data for Greig cephalopolysyndactyly syndrome (GCPS; MIM# 175700).</p>
<p>Table 2 (and Supplemental Table 2) present data for Pallister–Hall syndrome (PHS; MIM# 146510).</p>

In [16]:
import phenopackets as PPkt
from google.protobuf.json_format import MessageToDict, MessageToJson
from google.protobuf.json_format import Parse, ParseDict
import pandas as pd
import math
from csv import DictReader
pd.set_option('display.max_colwidth', None) # show entire column contents, important!
from collections import defaultdict
import re
import pyphetools
from pyphetools.creation import *
from pyphetools.output import PhenopacketTable
print(f"pyphetools version {pyphetools.__version__}")

pyphetools version 0.5.10


In [2]:
parser = HpoParser()
hpo_cr = parser.get_hpo_concept_recognizer()
hpo_version = parser.get_version()
pmid = "PMID:24736735"
title = "New insights into genotype-phenotype correlation for GLI3 mutations"
metadata = MetaData(created_by="ORCID:0000-0002-0736-9199", pmid=pmid, pubmed_title=title)
metadata.default_versions_with_hpo(version=hpo_version)
pmid="PMID:29198722"

<H2>Greig cephalopolysyndactyly syndrome (GCPS; MIM# 175700)</H2>
<p>c.1543_1544dup) was found in two affected sibs, was present at low level in DNA extracted from blood of their father (Family G068), suggesting a somatic mosaicism. We therefore remove the row corresponding to the father from further analysis.</p>
<p>Along the same line, a FISH analysis revealed a GLI3 deletion in only 56% of blood cells of a patient (G059) with bilateral preaxial PD of the feet and developmental delay. At least two patients (G005 and G019) had Greig cephalopolysyndactyly contiguous gene syndrome (GCPS-CGS) caused by haploinsufficiency of GLI3 and adjacent genes confirmed by array-CGH with a deletion of 7 and 9 Mb, respectively.</p>
<p>These individuals were also removed from the analysis because of the multifactorial pathophysiology.</p>
<p>We removed the corresponding rows from the following table.</p>

In [65]:
df1 = pd.read_csv("input/demurger_table_1.csv", delimiter="\t")
df1.head()

Unnamed: 0,N,cDNA alteration,Predicted protein alteration,Inheritance,Postaxial PD,Preaxial PD,Broad thumbs or halluces,Syndactyly,Macrocephaly,Widely spaced eyes,MRI Findings,Developmental delay,Additional findings
0,G029,327del,Phe109Leufs*50,F,–,FB,,+,–,–,–,–,"Precocious puberty, scaphocephaly"
1,G070,427G>T,Glu143*,F,HB,FL,,–,+,,,–,
2,Mother,427G>T,Glu143*,F,–,–,,–,–,–,,–,
3,G118,444C>A,Tyr148*,F,–,FB,BT,+,+,,,–,
4,G13684,444C>A,Tyr148*,F,–,FB,,+,+,+,,–,


In [66]:
column_mapper_d = defaultdict(ColumnMapper)

In [67]:
postaxial_d = {'HB': 'Postaxial hand polydactyly',
              'FB': 'Postaxial foot polydactyly',}
excluded_d = {"–":'Postaxial polydactyly'}
postaxialMapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=postaxial_d, excluded_d=excluded_d)
postaxialMapper.preview_column(df1["Postaxial PD"])
column_mapper_d["Postaxial PD"] = postaxialMapper

In [68]:
preaxial_d = {'HB': 'Preaxial hand polydactyly',
              'FB': 'Preaxial foot polydactyly',
              'FL': 'Preaxial foot polydactyly',}
excluded_d = {"–":'Preaxial polydactyly'}
preaxialMapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=preaxial_d, excluded_d=excluded_d)
preaxialMapper.preview_column(df1["Preaxial PD"])
column_mapper_d["Preaxial PD"] = preaxialMapper

In [69]:
thumb_d = {"BT": "Broad thumb", 
          "BH": "Broad hallux",
          "+": [ "Broad thumb", "Broad hallux"]}
excluded = {"–": "Broad thumb"}
thumbMapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=thumb_d,excluded_d=excluded_d)
thumbMapper.preview_column(df1["Broad thumbs or halluces"])
column_mapper_d["Broad thumbs or halluces"] = thumbMapper

In [70]:
syndMapper = SimpleColumnMapper(hpo_id="HP:0001159", hpo_label="Syndactyly", observed="+", excluded="–")
syndMapper.preview_column(df1["Syndactyly"])
column_mapper_d["Syndactyly"] = syndMapper

In [71]:
macMapper = SimpleColumnMapper(hpo_id="HP:0000256", hpo_label="Macrocephaly", observed="+", excluded="–")
macMapper.preview_column(df1["Macrocephaly"])
column_mapper_d["Macrocephaly"] = macMapper

In [72]:
#Widely spaced eyes  Hypertelorism HP:0000316
htMapper = SimpleColumnMapper(hpo_id="HP:0000316", hpo_label="Hypertelorism", observed="+", excluded="–")
htMapper.preview_column(df1["Widely spaced eyes"])
column_mapper_d["Widely spaced eyes"] = htMapper

In [73]:
# MRI Findings
# CCA, corpus callosum agenesis; CCH, corpus callosum hypoplasia;
# pCCA, partial CCA;  VD, ventricular dilatation. #: Index cases are indicated in bold and related are below.
mri_d = {'CCH': 'Hypoplasia of the corpus callosum',
         'CCA': 'Agenesis of corpus callosum',
         'pCCA': 'Partial agenesis of the corpus callosum',
         'VD': 'Ventriculomegaly'}
mriMapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=mri_d)
mriMapper.preview_column(df1["MRI Findings"])
column_mapper_d["MRI Findings"] = mriMapper

In [74]:
dd_d = {'+': 'Global developmental delay',
         'Mild': 'Mild global developmental delay',
         'Bilateral inguinal hernia': 'Inguinal hernia',
         'strabismus': 'Strabismus',
       "Cataract": "Cataract",
       "Seizures":"Seizure",
       "horseshoe kidney": "Horseshoe kidney",
       "Trigonocephaly": "Trigonocephaly",
       "macrosomia": "Macrosomia",
       "vermis dysgenesis": "Dysgenesis of the cerebellar vermis"}
excluded_d = {"–": "Global developmental delay"}
ddMapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=dd_d, excluded_d=excluded_d)
ddMapper.preview_column(df1["Developmental delay"])
column_mapper_d["Developmental delay"] = ddMapper

In [75]:
df1["Additional findings"].unique()

array(['Precocious puberty, scaphocephaly', '\xa0', 'Delta phalanx',
       'Atrial septal defect', 'Umbilical hernia',
       'Bifid distal phalanx, BW= 4150', 'Cerebral prematurity sequelae',
       'Delta metacarpal, BW=4880', 'BW=4740',
       'Hypoplastic cerebellum, microretrognathism',
       'Bilateral keratoconus, umbilical and bilateral inguinal hernia',
       'Macrosomia', 'Neurofibromatosis type 1',
       'Brachydactyly, delta phalanx', 'Brachydactyly, speech delay',
       'Speech delay, exomphalos', 'Umbilical hernia, anterior anus',
       'Laryngomalacia', 'BW=4440', 'Supernumerary nipples'], dtype=object)

In [76]:
add_d = {'Precocious puberty': 'Precocious puberty',
         'scaphocephaly': 'Scaphocephaly',
         'Delta phalanx': 'Triangular shaped phalanges of the hand',
         'Bifid distal phalanx': 'Partial duplication of the distal phalanges of the hand',
       "Hypoplastic cerebellum": "Cerebellar hypoplasia",
       "microretrognathism":"Microretrognathia",
       "keratoconus": "Keratoconus",
       "umbilical": "Umbilical hernia",
         "Umbilical hernia": "Umbilical hernia",
       "inguinal hernia": "Inguinal hernia",
       "Macrosomia": "Large for gestational age",
        "Brachydactyly":"Brachydactyly",
         "anterior anus": "Anteriorly placed anus",
         "Laryngomalacia":"Laryngomalacia",
         "Supernumerary nipples": "Supernumerary nipple"
        }
addMapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=add_d)
addMapper.preview_column(df1["Additional findings"])
column_mapper_d["Additional findings"] = addMapper

<h2>GLI3 Variants</h2>
<p>Variants are provided in table 1 according to NM_000168.6.</p>
<p>Note that the contents of the column "cDNA alteration" do not have the "c." required by HGVS, so we add it to all columns before proceding.</p>

In [85]:
transcript='NM_000168.6'
genome = 'hg38'
varMapper = VariantColumnMapper(assembly=genome,
                                column_name='cDNA alteration', 
                                transcript=transcript, 
                                default_genotype='heterozygous')

<h3>Small and Structural variants</h3>
<p>We encode the small variants using HGVS and trhe structural variants using the StructuralVariant class</p>

In [84]:
struct_variants = { 
                      "rsa7p14.1(kit P179)x1",
                      "46,XY.ish del(7)(p14.1)(RP11-816F16-)",
                      "46,XX.ish del(7)(p14.1p14.1)(GLI3-)",
                 }
gli3_symbol = "GLI3"
gli3_id = "HGNC:4319"
gli3_variants = df1['cDNA alteration'].unique()
gli3_variant_d = defaultdict(Variant)
for gli3v in gli3_variants:
    if gli3v in struct_variants:
        sv = StructuralVariant.chromosomal_deletion(cell_contents=gli3v, gene_id=gli3_id, gene_symbol=gli3_symbol)
        print(gli3v)
        print(sv)
        gli3_variant_d[gli3v] = sv
    else:
        hgvs = f"c.{gli3v}"
        v = varMapper.map_cell(gli3v)
        gli3_variant_d[gli3v] = v

https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_000168.6%3A327del/NM_000168.6?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_000168.6%3A427G>T/NM_000168.6?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_000168.6%3A444C>A/NM_000168.6?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_000168.6%3A518dup/NM_000168.6?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_000168.6%3A679+1G>T/NM_000168.6?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_000168.6%3A833_843del/NM_000168.6?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_000168.6%3A868C>T/NM_000168.6?content-type=application%2Fjson
https://rest.variantvalidato

In [33]:
omim_id = "OMIM:175700"
omim_label = "Greig cephalopolysyndactyly syndrome"
encoder = CohortEncoder(df=df1, 
                        hpo_cr=hpo_cr, 
                        column_mapper_d=column_mapper_d, 
                        individual_column_name="N", 
                        agemapper=AgeColumnMapper.not_provided(), 
                        sexmapper=SexColumnMapper.not_provided(),
                        metadata=metadata,
                        pmid=pmid)
encoder.set_disease(disease_id=omim_id, label=omim_label)

In [34]:
gcps_individuals = encoder.get_individuals()

https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_000168.6%3Ac.327del/NM_000168.6?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_000168.6%3Ac.427G>T/NM_000168.6?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_000168.6%3Ac.427G>T/NM_000168.6?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_000168.6%3Ac.444C>A/NM_000168.6?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_000168.6%3Ac.444C>A/NM_000168.6?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_000168.6%3Ac.444C>A/NM_000168.6?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_000168.6%3Ac.444C>A/NM_000168.6?content-type=application%2Fjson
https://rest.variant

c.327del
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_000168.6%3Ac.327del/NM_000168.6?content-type=application%2Fjson
c.427G>T
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_000168.6%3Ac.427G>T/NM_000168.6?content-type=application%2Fjson
c.444C>A
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_000168.6%3Ac.444C>A/NM_000168.6?content-type=application%2Fjson
c.518dup
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_000168.6%3Ac.518dup/NM_000168.6?content-type=application%2Fjson
c.679+1G>T
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_000168.6%3Ac.679+1G>T/NM_000168.6?content-type=application%2Fjson
c.833_843del
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_000168.6%3Ac.833_843del/NM_000168.6?content-type=application%2Fjson
c.868C>T
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_000168.6%