<h1>Creation of phenopackets from tabular data (individuals in columns)</h1>
<p>We will process <a href="https://pubmed.ncbi.nlm.nih.gov/19800048/" target="__blank">Coene, et al. (2009) OFD1 Is Mutated in X-Linked Joubert Syndrome and Interacts with LCA5-Encoded Lebercilin</a> in this notebook.</p>

In [1]:
import pandas as pd
pd.set_option('display.max_colwidth', None) # show entire column contents, important!
from IPython.display import display, HTML
from pyphetools.creation import *
from pyphetools.visualization import *
from pyphetools.validation import *
import pyphetools
print(f"Using pyphetools version {pyphetools.__version__}")

Using pyphetools version 0.9.4


<h2>Importing HPO data</h2>

In [2]:
parser = HpoParser()
hpo_cr = parser.get_hpo_concept_recognizer()
hpo_version = parser.get_version()
hpo_ontology = parser.get_ontology()
PMID = "PMID:19800048"
title = "OFD1 is mutated in X-linked Joubert syndrome and interacts with LCA5-encoded lebercilin"
cite = Citation(pmid=PMID, title=title)
metadata = MetaData(created_by="ORCID:0000-0002-5648-2155", citation=cite)
metadata.default_versions_with_hpo(version=hpo_version)
print(f"HPO version {hpo_version}")

HPO version 2023-10-09


<h2>Importing the supplemental table</h2>

In [3]:
df = pd.read_excel('input/PMID_19800048.xlsx')

In [4]:
df.head()

Unnamed: 0,Individual,Sex,Age,Genetic_variant,Polyhydramnios,"Intellectual disability, severe",Postaxial hand polydactyly,Postaxial foot polydactyly,Rod-cone dystrophy,Molar tooth sign on MRI,...,Frequent temper tantrums,Conductive hearing impairment,Cerebellar vermis hypoplasia,Low-set ears,Polyphagia,Obesity,Macrocephaly,Hypotonia,Dysmetria,Brachydactyly
0,III-9,male,34,c.2841_2847delAAAAGAC,,+,,,+,+,...,,,+,+,,,,,,
1,III-10,male,0,c.2841_2847delAAAAGAC,,+,,,+,,...,,,,,,,,,,
2,IV-2,male,0,c.2841_2847delAAAAGAC,,+,+,+,,,...,,,,,,,,,,
3,IV-3,male,0,c.2841_2847delAAAAGAC,,+,,,,,...,,,,,,,,,,
4,IV-4,male,0,c.2841_2847delAAAAGAC,,+,+,+,,,...,,,,,,,,,,


<h2>Column mappers</h2>
<p>Please see the notebook "Create phenopackets from tabular data with individuals in rows" for explanations. In the following cell we create a dictionary for the ColumnMappers. Note that the code is identical except that we use the df.loc function to get the corresponding row data</p>

In [5]:
hpo_cr = parser.get_hpo_concept_recognizer()
generator = SimpleColumnMapperGenerator(df=df, observed='+', excluded='-', hpo_cr=hpo_cr)
column_mapper_d = generator.try_mapping_columns()

In [6]:
display(HTML((generator.to_html())))

Result,Columns
Mapped,"Polyhydramnios; Intellectual disability, severe; Postaxial hand polydactyly; Postaxial foot polydactyly; Rod-cone dystrophy; Molar tooth sign on MRI; Microcephaly; Decreased body weight; Short stature; Tube feeding; Feeding difficulties; Motor delay; Hirsutism; Wide nasal bridge; Thick vermilion border; Absent speech; Recurrent fever; Frequent temper tantrums; Conductive hearing impairment; Cerebellar vermis hypoplasia; Low-set ears; Polyphagia; Obesity; Macrocephaly; Hypotonia; Dysmetria; Brachydactyly"
Unmapped,Individual; Sex; Age; Genetic_variant


<h2>Variant Data</h2>
<p>The variant data (HGVS transcript) is listed in the Variant (hg19, NM_015133.4) column.</p>

In [7]:
default_genotype = 'hemizygous'
transcript='NM_003611.2'
vvalidator = VariantValidator(genome_build="hg38", transcript=transcript)
var_d = {}
for v in df['Genetic_variant'].unique():
    var = vvalidator.encode_hgvs(v)
    var_d[v] = var

varMapper = VariantColumnMapper(variant_d=var_d,
                                variant_column_name='Genetic_variant', 
                                default_genotype=default_genotype)

https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_003611.2%3Ac.2841_2847delAAAAGAC/NM_003611.2?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_003611.2%3Ac.2767delG/NM_003611.2?content-type=application%2Fjson


<h1>Demographic data</h1>

In [8]:
ageMapper = AgeColumnMapper.by_year('Age')
ageMapper.preview_column(df['Age'])

Unnamed: 0,original column contents,age
0,34,P34Y
1,0,P0Y
2,12,P12Y


In [9]:
sexMapper = SexColumnMapper(male_symbol='male', female_symbol='female', column_name='Sex')
sexMapper.preview_column(df['Sex'])

Unnamed: 0,original column contents,sex
0,male,MALE
1,male,MALE
2,male,MALE
3,male,MALE
4,male,MALE
5,male,MALE
6,male,MALE
7,male,MALE
8,male,MALE


In [10]:

encoder = CohortEncoder(df=df, 
                        hpo_cr=hpo_cr, 
                        column_mapper_d=column_mapper_d, 
                        individual_column_name="Individual", 
                        agemapper=ageMapper, 
                        sexmapper=sexMapper,
                        variant_mapper=varMapper, 
                        metadata=metadata)
j10 = Disease(disease_id='OMIM:300804', disease_label='Joubert syndrome 10')
encoder.set_disease(disease=j10)

In [11]:
individuals = encoder.get_individuals()
cvalidator = CohortValidator(cohort=individuals, ontology=hpo_ontology, min_hpo=1, allelic_requirement=AllelicRequirement.MONO_ALLELIC)
qc = QcVisualizer(cohort_validator=cvalidator)
display(HTML(qc.to_summary_html()))

Level,Error category,Count
WARNING,REDUNDANT,2
INFORMATION,NOT_MEASURED,188


In [12]:
individuals = cvalidator.get_error_free_individual_list()
table = PhenopacketTable(individual_list=individuals, metadata=metadata)
display(HTML(table.to_html()))

Individual,Disease,Genotype,Phenotypic features
III-9 (MALE; P34Y),Joubert syndrome 10 (OMIM:300804),NM_003611.2:c.2844_2850del (hemizygous),Intellectual disability (HP:0001249); Rod-cone dystrophy (HP:0000510); Molar tooth sign on MRI (HP:0002419); Thick vermilion border (HP:0012471); Cerebellar vermis hypoplasia (HP:0001320); Low-set ears (HP:0000369)
III-10 (MALE; P0Y),Joubert syndrome 10 (OMIM:300804),NM_003611.2:c.2844_2850del (hemizygous),Intellectual disability (HP:0001249); Rod-cone dystrophy (HP:0000510)
IV-2 (MALE; P0Y),Joubert syndrome 10 (OMIM:300804),NM_003611.2:c.2844_2850del (hemizygous),Intellectual disability (HP:0001249); Postaxial hand polydactyly (HP:0001162); Postaxial foot polydactyly (HP:0001830)
IV-3 (MALE; P0Y),Joubert syndrome 10 (OMIM:300804),NM_003611.2:c.2844_2850del (hemizygous),Intellectual disability (HP:0001249)
IV-4 (MALE; P0Y),Joubert syndrome 10 (OMIM:300804),NM_003611.2:c.2844_2850del (hemizygous),Intellectual disability (HP:0001249); Postaxial hand polydactyly (HP:0001162); Postaxial foot polydactyly (HP:0001830)
IV-5 (MALE; P0Y),Joubert syndrome 10 (OMIM:300804),NM_003611.2:c.2844_2850del (hemizygous),Intellectual disability (HP:0001249); Postaxial hand polydactyly (HP:0001162); Postaxial foot polydactyly (HP:0001830)
IV-6 (MALE; P0Y),Joubert syndrome 10 (OMIM:300804),NM_003611.2:c.2844_2850del (hemizygous),Intellectual disability (HP:0001249); Postaxial hand polydactyly (HP:0001162); Postaxial foot polydactyly (HP:0001830)
IV-10 (MALE; P0Y),Joubert syndrome 10 (OMIM:300804),NM_003611.2:c.2844_2850del (hemizygous),Intellectual disability (HP:0001249); Postaxial hand polydactyly (HP:0001162); Postaxial foot polydactyly (HP:0001830); Rod-cone dystrophy (HP:0000510); Molar tooth sign on MRI (HP:0002419); Microcephaly (HP:0000252); Decreased body weight (HP:0004325); Short stature (HP:0004322); Tube feeding (HP:0033454); Motor delay (HP:0001270); Hirsutism (HP:0001007); Wide nasal bridge (HP:0000431); Thick vermilion border (HP:0012471); Absent speech (HP:0001344); Recurrent fever (HP:0001954); Frequent temper tantrums (HP:0025161); Conductive hearing impairment (HP:0000405); Cerebellar vermis hypoplasia (HP:0001320); Low-set ears (HP:0000369)
UW87 (MALE; P12Y),Joubert syndrome 10 (OMIM:300804),NM_003611.2:c.2767del (hemizygous),Polyhydramnios (HP:0001561); Intellectual disability (HP:0001249); Postaxial hand polydactyly (HP:0001162); Postaxial foot polydactyly (HP:0001830); Molar tooth sign on MRI (HP:0002419); Tube feeding (HP:0033454); Absent speech (HP:0001344); Polyphagia (HP:0002591); Obesity (HP:0001513); Macrocephaly (HP:0000256); Hypotonia (HP:0001252); Dysmetria (HP:0001310); Brachydactyly (HP:0001156)


In [13]:
from IPython.display import HTML, display

phenopackets = [i.to_ga4gh_phenopacket(metadata=metadata.to_ga4gh()) for i in individuals]
table = PhenopacketTable(phenopacket_list=phenopackets)
display(HTML(table.to_html()))

Individual,Disease,Genotype,Phenotypic features
III-9 (MALE; P34Y),Joubert syndrome 10 (OMIM:300804),NM_003611.2:c.2844_2850del (hemizygous),Intellectual disability (HP:0001249); Rod-cone dystrophy (HP:0000510); Molar tooth sign on MRI (HP:0002419); Thick vermilion border (HP:0012471); Cerebellar vermis hypoplasia (HP:0001320); Low-set ears (HP:0000369)
III-10 (MALE; P0Y),Joubert syndrome 10 (OMIM:300804),NM_003611.2:c.2844_2850del (hemizygous),Intellectual disability (HP:0001249); Rod-cone dystrophy (HP:0000510)
IV-2 (MALE; P0Y),Joubert syndrome 10 (OMIM:300804),NM_003611.2:c.2844_2850del (hemizygous),Intellectual disability (HP:0001249); Postaxial hand polydactyly (HP:0001162); Postaxial foot polydactyly (HP:0001830)
IV-3 (MALE; P0Y),Joubert syndrome 10 (OMIM:300804),NM_003611.2:c.2844_2850del (hemizygous),Intellectual disability (HP:0001249)
IV-4 (MALE; P0Y),Joubert syndrome 10 (OMIM:300804),NM_003611.2:c.2844_2850del (hemizygous),Intellectual disability (HP:0001249); Postaxial hand polydactyly (HP:0001162); Postaxial foot polydactyly (HP:0001830)
IV-5 (MALE; P0Y),Joubert syndrome 10 (OMIM:300804),NM_003611.2:c.2844_2850del (hemizygous),Intellectual disability (HP:0001249); Postaxial hand polydactyly (HP:0001162); Postaxial foot polydactyly (HP:0001830)
IV-6 (MALE; P0Y),Joubert syndrome 10 (OMIM:300804),NM_003611.2:c.2844_2850del (hemizygous),Intellectual disability (HP:0001249); Postaxial hand polydactyly (HP:0001162); Postaxial foot polydactyly (HP:0001830)
IV-10 (MALE; P0Y),Joubert syndrome 10 (OMIM:300804),NM_003611.2:c.2844_2850del (hemizygous),Intellectual disability (HP:0001249); Postaxial hand polydactyly (HP:0001162); Postaxial foot polydactyly (HP:0001830); Rod-cone dystrophy (HP:0000510); Molar tooth sign on MRI (HP:0002419); Microcephaly (HP:0000252); Decreased body weight (HP:0004325); Short stature (HP:0004322); Tube feeding (HP:0033454); Motor delay (HP:0001270); Hirsutism (HP:0001007); Wide nasal bridge (HP:0000431); Thick vermilion border (HP:0012471); Absent speech (HP:0001344); Recurrent fever (HP:0001954); Frequent temper tantrums (HP:0025161); Conductive hearing impairment (HP:0000405); Cerebellar vermis hypoplasia (HP:0001320); Low-set ears (HP:0000369)
UW87 (MALE; P12Y),Joubert syndrome 10 (OMIM:300804),NM_003611.2:c.2767del (hemizygous),Polyhydramnios (HP:0001561); Intellectual disability (HP:0001249); Postaxial hand polydactyly (HP:0001162); Postaxial foot polydactyly (HP:0001830); Molar tooth sign on MRI (HP:0002419); Tube feeding (HP:0033454); Absent speech (HP:0001344); Polyphagia (HP:0002591); Obesity (HP:0001513); Macrocephaly (HP:0000256); Hypotonia (HP:0001252); Dysmetria (HP:0001310); Brachydactyly (HP:0001156)


In [14]:
output_directory = "phenopackets"
Individual.output_individuals_as_phenopackets(individual_list=individuals,
                                              metadata=metadata,
                                              outdir=output_directory)

We output 9 GA4GH phenopackets to the directory phenopackets
