<h1>OFD1: Simpson-Golabi-Behmel syndrome, type 2. Budny et al, 2006</h1>
<p>We will process <a href="https://pubmed.ncbi.nlm.nih.gov/16783569/" target="__blank">Budny, et al. (2006) A novel X-linked recessive mental retardation syndrome comprising macrocephaly and ciliary dysfunction is allelic to oral-facial-digital type I syndrome</a> in this notebook.</p>

In [1]:
import phenopackets as php
from google.protobuf.json_format import MessageToDict, MessageToJson
from google.protobuf.json_format import Parse, ParseDict
import pandas as pd
pd.set_option('display.max_colwidth', None) # show entire column contents, important!
from IPython.display import display, HTML
from pyphetools.creation import *
from pyphetools.visualization import *
from pyphetools.validation import *
import pyphetools
print(f"Using pyphetools version {pyphetools.__version__}")

Using pyphetools version 0.8.28


<h2>Importing HPO data</h2>

In [2]:
parser = HpoParser()
hpo_cr = parser.get_hpo_concept_recognizer()
hpo_version = parser.get_version()
hpo_ontology = parser.get_ontology()
PMID = "PMID:16783569"
title = "A novel X-linked recessive mental retardation syndrome comprising macrocephaly and ciliary dysfunction is allelic to oral-facial-digital type I syndrome"
metadata = MetaData(created_by="ORCID:0000-0002-5648-2155", pmid=PMID, pubmed_title=title)
metadata.default_versions_with_hpo(version=hpo_version)
print(f"HPO version {hpo_version}")

HPO version 2023-10-09


<h2>Importing the supplemental table</h2>

In [3]:
df = pd.read_excel('input/PMID_16783569.xlsx')

In [4]:
df.head()

Unnamed: 0,individual,sex,age,variant,Developmental delay,Abnormal respiratory system physiology,Macrocephaly,Recurrent respiratory infections,"Intellectual disability, severe",High palate,Low-set ears,Broad thumb,Brachydactyly,Obesity,Large for gestational age,Postaxial polydactyly,Inguinal hernia
0,1,male,11,c.2122_2125dupGAAG,+,+,+,+,+,+,+,+,+,+,-,,
1,2,male,0,c.2122_2125dupGAAG,+,+,+,+,+,,,,,,+,+,
2,3,male,0,c.2122_2125dupGAAG,+,+,+,+,,,,,+,,,,+
3,4,male,0,c.2122_2125dupGAAG,+,+,+,+,,,,,,,,,
4,5,male,0,c.2122_2125dupGAAG,+,+,+,+,,,,,,,,,


<h2>Column mappers</h2>

In [5]:
hpo_cr = parser.get_hpo_concept_recognizer()
generator = SimpleColumnMapperGenerator(df=df, observed='+', excluded='-', hpo_cr=hpo_cr)
column_mapper_d = generator.try_mapping_columns()

In [7]:
display(HTML(generator.to_html()))

Result,Columns
Mapped,"Developmental delay; Abnormal respiratory system physiology; Macrocephaly; Recurrent respiratory infections; Intellectual disability, severe; High palate; Low-set ears; Broad thumb; Brachydactyly; Obesity; Large for gestational age; Postaxial polydactyly; Inguinal hernia"
Unmapped,individual; sex; age; variant


<h2>Variant Data</h2>
<p>The OFD1 variant data (HGVS transcript) is listed in the Variant (hg19, NM_015133.4) column.</p>

In [9]:
genome = 'hg19'
default_genotype = 'hemizygous'
transcript='NM_003611.3'

vvalidator = VariantValidator(genome_build="hg38", transcript=transcript)
var_d = {}
for v in df['variant'].unique():
    var = vvalidator.encode_hgvs(v)
    var_d[v] = var

varMapper = VariantColumnMapper(variant_d=var_d,
                                variant_column_name='variant', 
                                default_genotype=default_genotype)

https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_003611.3%3Ac.2122_2125dupGAAG/NM_003611.3?content-type=application%2Fjson


<h1>Demographic data</h1>

In [10]:
ageMapper = AgeColumnMapper.by_year('age')
ageMapper.preview_column(df['age'])

Unnamed: 0,original column contents,age
0,11,P11Y
1,0,P0Y


In [11]:
sexMapper = SexColumnMapper(male_symbol='male', female_symbol='female', column_name='sex')
sexMapper.preview_column(df['sex'])

Unnamed: 0,original column contents,sex
0,male,MALE
1,male,MALE
2,male,MALE
3,male,MALE
4,male,MALE
5,male,MALE
6,male,MALE
7,male,MALE
8,male,MALE


In [12]:
encoder = CohortEncoder(df=df, 
                        hpo_cr=hpo_cr, 
                        column_mapper_d=column_mapper_d, 
                        individual_column_name="individual", 
                        agemapper=ageMapper, 
                        sexmapper=sexMapper,
                        variant_mapper=varMapper, 
                        metadata=metadata,
                        pmid=PMID)
sgb2 = Disease(disease_id='OMIM:300209', disease_label='Simpson-Golabi-Behmel syndrome, type 2')
encoder.set_disease(sgb2)

In [15]:
individuals = encoder.get_individuals()
cvalidator = CohortValidator(cohort=individuals, ontology=hpo_ontology, min_hpo=1, allelic_requirement=AllelicRequirement.MONO_ALLELIC)
qc = QcVisualizer(ontology=hpo_ontology, cohort_validator=cvalidator)
display(HTML(qc.to_summary_html()))

Level,Error category,Count
INFORMATION,NOT_MEASURED,69


In [16]:
individuals = cvalidator.get_error_free_individual_list()
table = PhenopacketTable(individual_list=individuals, metadata=metadata)
display(HTML(table.to_html()))

Individual,Disease,Genotype,Phenotypic features
1 (MALE; P11Y),"Simpson-Golabi-Behmel syndrome, type 2 (OMIM:300209)",NM_003611.3:c.2122_2125dup (hemizygous),Global developmental delay (HP:0001263); Abnormal respiratory system physiology (HP:0002795); Macrocephaly (HP:0000256); Recurrent respiratory infections (HP:0002205); Intellectual disability (HP:0001249); High palate (HP:0000218); Low-set ears (HP:0000369); Broad thumb (HP:0011304); Brachydactyly (HP:0001156); Obesity (HP:0001513); excluded: Large for gestational age (HP:0001520)
2 (MALE; P0Y),"Simpson-Golabi-Behmel syndrome, type 2 (OMIM:300209)",NM_003611.3:c.2122_2125dup (hemizygous),Global developmental delay (HP:0001263); Abnormal respiratory system physiology (HP:0002795); Macrocephaly (HP:0000256); Recurrent respiratory infections (HP:0002205); Intellectual disability (HP:0001249); Large for gestational age (HP:0001520); Postaxial polydactyly (HP:0100259)
3 (MALE; P0Y),"Simpson-Golabi-Behmel syndrome, type 2 (OMIM:300209)",NM_003611.3:c.2122_2125dup (hemizygous),Global developmental delay (HP:0001263); Abnormal respiratory system physiology (HP:0002795); Macrocephaly (HP:0000256); Recurrent respiratory infections (HP:0002205); Brachydactyly (HP:0001156); Inguinal hernia (HP:0000023)
4 (MALE; P0Y),"Simpson-Golabi-Behmel syndrome, type 2 (OMIM:300209)",NM_003611.3:c.2122_2125dup (hemizygous),Global developmental delay (HP:0001263); Abnormal respiratory system physiology (HP:0002795); Macrocephaly (HP:0000256); Recurrent respiratory infections (HP:0002205)
5 (MALE; P0Y),"Simpson-Golabi-Behmel syndrome, type 2 (OMIM:300209)",NM_003611.3:c.2122_2125dup (hemizygous),Global developmental delay (HP:0001263); Abnormal respiratory system physiology (HP:0002795); Macrocephaly (HP:0000256); Recurrent respiratory infections (HP:0002205)
6 (MALE; P0Y),"Simpson-Golabi-Behmel syndrome, type 2 (OMIM:300209)",NM_003611.3:c.2122_2125dup (hemizygous),Global developmental delay (HP:0001263); Abnormal respiratory system physiology (HP:0002795); Macrocephaly (HP:0000256); Recurrent respiratory infections (HP:0002205)
7 (MALE; P0Y),"Simpson-Golabi-Behmel syndrome, type 2 (OMIM:300209)",NM_003611.3:c.2122_2125dup (hemizygous),Global developmental delay (HP:0001263); Abnormal respiratory system physiology (HP:0002795); Macrocephaly (HP:0000256); Recurrent respiratory infections (HP:0002205)
8 (MALE; P0Y),"Simpson-Golabi-Behmel syndrome, type 2 (OMIM:300209)",NM_003611.3:c.2122_2125dup (hemizygous),Global developmental delay (HP:0001263); Abnormal respiratory system physiology (HP:0002795); Macrocephaly (HP:0000256); Recurrent respiratory infections (HP:0002205)
9 (MALE; P0Y),"Simpson-Golabi-Behmel syndrome, type 2 (OMIM:300209)",NM_003611.3:c.2122_2125dup (hemizygous),Global developmental delay (HP:0001263); Abnormal respiratory system physiology (HP:0002795); Macrocephaly (HP:0000256); Recurrent respiratory infections (HP:0002205)


In [17]:
output_directory = "phenopackets"
Individual.output_individuals_as_phenopackets(individual_list=individuals,
                                              metadata=metadata,
                                              outdir=output_directory)

We output 9 GA4GH phenopackets to the directory phenopackets
