<h1>Creation of phenopackets from tabular data (individuals in columns)</h1>
<p>We will process <a href="https://pubmed.ncbi.nlm.nih.gov/11179005/" target="__blank">Ferrante, et al. (2001) Identification of the gene for oral-facial-digital type I syndrome</a> in this notebook.

In [1]:
import pandas as pd
pd.set_option('display.max_colwidth', None) # show entire column contents, important!
from IPython.display import display, HTML
from pyphetools.creation import *
from pyphetools.visualization import *
from pyphetools.validation import *
import pyphetools
print(f"Using pyphetools version {pyphetools.__version__}")

Using pyphetools version 0.9.66


<h2>Importing HPO data</h2>

In [2]:
PMID = "PMID:11179005"
title = "Identification of the gene for oral-facial-digital type I syndrome"
cite = Citation(pmid=PMID, title=title)
metadata = MetaData(created_by="ORCID:0000-0002-5648-2155", citation=cite)
parser = HpoParser(hpo_json_file="../hp.json")
hpo_cr = parser.get_hpo_concept_recognizer()
hpo_version = parser.get_version()
hpo_ontology = parser.get_ontology()

metadata.default_versions_with_hpo(version=hpo_version)
print(f"HPO version {hpo_version}")

HPO version 2024-03-06


<h2>Importing the supplemental table</h2>

In [3]:
df = pd.read_excel('input/PMID_11179005.xlsx')
df['Individual'] = df['(CASE)']

In [4]:
df.head(2)

Unnamed: 0,(CASE),sex,age,Onset,variant,Downslanted palpebral fissures,Dolichocephaly,Facial asymmetry,Localized skin lesion,Dental anomalies,...,Dry hair,Coarse hair,Alopecia,Short 2nd toe,Syndactyly,Polycystic kidney dysplasia,Polydactyly,Clinodactyly,Brachydactyly,Individual
0,1 (F),male,1,Congenital onset,c.1303A>C,+,+,-,+,+,...,-,+,+,-,-,-,-,-,-,1 (F)
1,3 (F),male,3,Congenital onset,c.312delG,-,-,-,-,-,...,-,-,-,+,-,+,+,,-,3 (F)


<h2>Column mappers</h2>
<p>Please see the notebook "Create phenopackets from tabular data with individuals in rows" for explanations. In the following cell we create a dictionary for the ColumnMappers. Note that the code is identical except that we use the df.loc function to get the corresponding row data</p>

In [5]:
generator = SimpleColumnMapperGenerator(df=df, observed="+", excluded="-", hpo_cr=hpo_cr)
column_mapper_list = generator.try_mapping_columns()
display(HTML(generator.to_html()))

Result,Columns
Mapped,Downslanted palpebral fissures; Dolichocephaly; Facial asymmetry; Localized skin lesion; Dental anomalies; Oral cleft; Brain imaging abnormality; Developmental delay; Hepatic cysts; Dry hair; Coarse hair; Alopecia; Short 2nd toe; Syndactyly; Polycystic kidney dysplasia; Polydactyly; Clinodactyly; Brachydactyly
Unmapped,(CASE); sex; age; Onset; variant; Individual


<h2>Variant Data</h2>
<p>The OFD1 variant data (HGVS transcript) is listed in the variant column.</p>

In [6]:
genome = 'hg19'
default_genotype = 'hemizygous'
ofd1_transcript='NM_003611.3'
ofd1_id = "HGNC:2567"
vman = VariantManager(df=df, individual_column_name="Individual", 
                      gene_id=ofd1_id, gene_symbol="OFD1",
                      allele_1_column_name='variant',transcript=ofd1_transcript)

var_d = vman.get_variant_d()
varMapper = VariantColumnMapper(variant_d=var_d,
                                variant_column_name='variant', 
                                default_genotype=default_genotype)

<h1>Demographic data</h1>

In [7]:
encounterMapper = AgeColumnMapper.by_year('age')
#encounterMapper.preview_column(df)
onsetMapper = AgeColumnMapper.hpo_onset("Onset")
#onsetMapper.preview_column(df)
sexMapper = SexColumnMapper(male_symbol='male', female_symbol='female', column_name='sex')
#sexMapper.preview_column(df)

In [8]:
encoder = CohortEncoder(df=df, 
                        hpo_cr=hpo_cr, 
                        column_mapper_list=column_mapper_list, 
                        individual_column_name="Individual", 
                        age_of_onset_mapper=onsetMapper,
                        age_at_last_encounter_mapper=encounterMapper,
                        sexmapper=sexMapper,
                        variant_mapper=varMapper, 
                        metadata=metadata)
ofd1 = Disease(disease_id='OMIM:311200', disease_label='Orofaciodigital syndrome I')
encoder.set_disease(ofd1)

In [9]:
individuals = encoder.get_individuals()
cvalidator = CohortValidator(cohort=individuals, ontology=hpo_ontology, min_hpo=1, allelic_requirement=AllelicRequirement.MONO_ALLELIC)
qc = QcVisualizer(cohort_validator=cvalidator)
display(HTML(qc.to_summary_html()))

Level,Error category,Count
INFORMATION,NOT_MEASURED,12


In [10]:
individuals = cvalidator.get_error_free_individual_list()
table = PhenopacketTable(individual_list=individuals, metadata=metadata)
display(HTML(table.to_html()))

Individual,Disease,Genotype,Phenotypic features
1 (F) (MALE; P1Y),Orofaciodigital syndrome I (OMIM:311200),NM_003611.3:c.1303A>C (hemizygous),Downslanted palpebral fissures (HP:0000494); Dolichocephaly (HP:0000268); Localized skin lesion (HP:0011355); Abnormality of the dentition (HP:0000164); Orofacial cleft (HP:0000202); Brain imaging abnormality (HP:0410263); Global developmental delay (HP:0001263); Coarse hair (HP:0002208); Alopecia (HP:0001596); excluded: Facial asymmetry (HP:0000324); excluded: Hepatic cysts (HP:0001407); excluded: Dry hair (HP:0011359); excluded: Short 2nd toe (HP:0001885); excluded: Syndactyly (HP:0001159); excluded: Polycystic kidney dysplasia (HP:0000113); excluded: Polydactyly (HP:0010442); excluded: Clinodactyly (HP:0030084); excluded: Brachydactyly (HP:0001156)
3 (F) (MALE; P3Y),Orofaciodigital syndrome I (OMIM:311200),NM_003611.3:c.312+1del (hemizygous),Orofacial cleft (HP:0000202); Brain imaging abnormality (HP:0410263); Short 2nd toe (HP:0001885); Polycystic kidney dysplasia (HP:0000113); Polydactyly (HP:0010442); excluded: Downslanted palpebral fissures (HP:0000494); excluded: Dolichocephaly (HP:0000268); excluded: Facial asymmetry (HP:0000324); excluded: Localized skin lesion (HP:0011355); excluded: Abnormality of the dentition (HP:0000164); excluded: Global developmental delay (HP:0001263); excluded: Hepatic cysts (HP:0001407); excluded: Dry hair (HP:0011359); excluded: Coarse hair (HP:0002208); excluded: Alopecia (HP:0001596); excluded: Syndactyly (HP:0001159); excluded: Brachydactyly (HP:0001156)
4 (F) (MALE; P4Y),Orofaciodigital syndrome I (OMIM:311200),NM_003611.3:c.294_312del (hemizygous),Facial asymmetry (HP:0000324); Orofacial cleft (HP:0000202); Brain imaging abnormality (HP:0410263); Global developmental delay (HP:0001263); Hepatic cysts (HP:0001407); Dry hair (HP:0011359); Alopecia (HP:0001596); Syndactyly (HP:0001159); Polycystic kidney dysplasia (HP:0000113); Clinodactyly (HP:0030084); excluded: Downslanted palpebral fissures (HP:0000494); excluded: Dolichocephaly (HP:0000268); excluded: Localized skin lesion (HP:0011355); excluded: Abnormality of the dentition (HP:0000164); excluded: Coarse hair (HP:0002208); excluded: Short 2nd toe (HP:0001885); excluded: Polydactyly (HP:0010442); excluded: Brachydactyly (HP:0001156)
6 (S) (MALE; P6Y),Orofaciodigital syndrome I (OMIM:311200),NM_003611.3:c.121C>T (hemizygous),Localized skin lesion (HP:0011355); Orofacial cleft (HP:0000202); Syndactyly (HP:0001159); excluded: Downslanted palpebral fissures (HP:0000494); excluded: Dolichocephaly (HP:0000268); excluded: Facial asymmetry (HP:0000324); excluded: Abnormality of the dentition (HP:0000164); excluded: Hepatic cysts (HP:0001407); excluded: Dry hair (HP:0011359); excluded: Coarse hair (HP:0002208); excluded: Alopecia (HP:0001596); excluded: Short 2nd toe (HP:0001885); excluded: Polycystic kidney dysplasia (HP:0000113); excluded: Polydactyly (HP:0010442); excluded: Clinodactyly (HP:0030084); excluded: Brachydactyly (HP:0001156)
10 (S) (MALE; P10Y),Orofaciodigital syndrome I (OMIM:311200),NM_003611.3:c.1071_1078del (hemizygous),Orofacial cleft (HP:0000202); Syndactyly (HP:0001159); excluded: Downslanted palpebral fissures (HP:0000494); excluded: Dolichocephaly (HP:0000268); excluded: Abnormality of the dentition (HP:0000164); excluded: Dry hair (HP:0011359); excluded: Short 2nd toe (HP:0001885); excluded: Polycystic kidney dysplasia (HP:0000113); excluded: Polydactyly (HP:0010442); excluded: Clinodactyly (HP:0030084); excluded: Brachydactyly (HP:0001156)
27 (S) (MALE; P27Y),Orofaciodigital syndrome I (OMIM:311200),NM_003611.3:c.312+2del (hemizygous),Localized skin lesion (HP:0011355); Abnormality of the dentition (HP:0000164); Orofacial cleft (HP:0000202); Global developmental delay (HP:0001263); Polycystic kidney dysplasia (HP:0000113); excluded: Downslanted palpebral fissures (HP:0000494); excluded: Dolichocephaly (HP:0000268); excluded: Facial asymmetry (HP:0000324); excluded: Hepatic cysts (HP:0001407); excluded: Dry hair (HP:0011359); excluded: Coarse hair (HP:0002208); excluded: Alopecia (HP:0001596); excluded: Short 2nd toe (HP:0001885); excluded: Syndactyly (HP:0001159); excluded: Polydactyly (HP:0010442); excluded: Clinodactyly (HP:0030084); excluded: Brachydactyly (HP:0001156)
28 (S) (MALE; P28Y),Orofaciodigital syndrome I (OMIM:311200),NM_003611.3:c.1757del (hemizygous),Localized skin lesion (HP:0011355); Abnormality of the dentition (HP:0000164); Orofacial cleft (HP:0000202); Global developmental delay (HP:0001263); Alopecia (HP:0001596); Clinodactyly (HP:0030084); Brachydactyly (HP:0001156); excluded: Downslanted palpebral fissures (HP:0000494); excluded: Dolichocephaly (HP:0000268); excluded: Facial asymmetry (HP:0000324); excluded: Hepatic cysts (HP:0001407); excluded: Dry hair (HP:0011359); excluded: Coarse hair (HP:0002208); excluded: Short 2nd toe (HP:0001885); excluded: Syndactyly (HP:0001159); excluded: Polycystic kidney dysplasia (HP:0000113); excluded: Polydactyly (HP:0010442)


In [11]:
output_directory = "phenopackets"
Individual.output_individuals_as_phenopackets(individual_list=individuals,
                                             metadata=metadata,
                                             outdir=output_directory)

We output 7 GA4GH phenopackets to the directory phenopackets
