<h1>WFS1: Hildebrand, et al. (2008)</h1>
<p>We will process <a href="https://pubmed.ncbi.nlm.nih.gov/18688868/" target="__blank">Hildebrand, et al. (2008) Autoimmune Disease in a DFNA6/14/38 Family Carrying a Novel Missense Mutation in WFS1</a></p>

In [1]:
import pandas as pd
pd.set_option('display.max_colwidth', None) # show entire column contents, important!
from IPython.display import display, HTML
from pyphetools.creation import *
from pyphetools.visualization import *
from pyphetools.validation import *
import pyphetools
print(f"Using pyphetools version {pyphetools.__version__}")

Using pyphetools version 0.9.39


<h2>Importing HPO data</h2>

In [2]:
PMID = "PMID:18688868"
title = "Autoimmune disease in a DFNA6/14/38 family carrying a novel missense mutation in WFS1"
cite = Citation(pmid=PMID, title=title)
parser = HpoParser(hpo_json_file="../hp.json")
hpo_cr = parser.get_hpo_concept_recognizer()
hpo_version = parser.get_version()
hpo_ontology = parser.get_ontology()
metadata = MetaData(created_by="ORCID:0000-0002-5648-2155", citation=cite)
metadata.default_versions_with_hpo(version=hpo_version)
print(f"HPO version {hpo_version}")

HPO version 2024-01-16


<h2>Importing the supplemental table</h2>

In [3]:
df = pd.read_excel('input/PMID_18688868.xlsx')

In [4]:
df.head()

Unnamed: 0,patient_id,II:2,III:1,III:3,IV:2,IV:4,V:2
0,Sex,female,female,female,female,female,male
1,Age,97,55,69,38,43,17
2,Variant,c.2576G>A,c.2576G>A,c.2576G>A,c.2576G>A,c.2576G>A,c.2576G>A
3,Low-frequency sensorineural hearing impairment,+,+,+,+,+,+
4,Progressive sensorineural hearing impairment,+,+,+,+,+,+


<h1>Converting to row-based format</h1>
<p>For this specific case, there is a Count features row that we want dropped, so we filter out any row that does not have Patient in the first column.</p>

In [5]:
dft = df.transpose()
dft.columns = dft.iloc[0]
dft.drop(dft.index[0], inplace=True)
dft.columns = dft.columns.str.strip()
dft = dft.dropna(axis=1, how='all')
dft['patient_id'] = dft.index
dft.head()

patient_id,Sex,Age,Variant,Low-frequency sensorineural hearing impairment,Progressive sensorineural hearing impairment,Graves disease,Crohn's disease,patient_id.1
II:2,female,97,c.2576G>A,+,+,-,-,II:2
III:1,female,55,c.2576G>A,+,+,+,-,III:1
III:3,female,69,c.2576G>A,+,+,-,-,III:3
IV:2,female,38,c.2576G>A,+,+,-,+,IV:2
IV:4,female,43,c.2576G>A,+,+,-,-,IV:4


In [6]:
generator = SimpleColumnMapperGenerator(df=dft, observed='+', excluded='-', hpo_cr=hpo_cr)
column_mapper_list = generator.try_mapping_columns()
display(HTML(generator.to_html()))

Result,Columns
Mapped,Low-frequency sensorineural hearing impairment; Progressive sensorineural hearing impairment; Graves disease; Crohn's disease
Unmapped,Sex; Age; Variant; patient_id


<h2>Variant Data</h2>
<p>The variant data (HGVS transcript) is listed in the Variant (hg19, NM_015133.4) column.</p>

In [7]:
WFS1_transcript='NM_006005.3'
var_list = dft['Variant'].unique()
vvalidator = VariantValidator(genome_build='hg38', transcript=WFS1_transcript)
variant_d = {}
for v in var_list:
    var = vvalidator.encode_hgvs(v)
    variant_d[v] = var
    print(var)

https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_006005.3%3Ac.2576G>A/NM_006005.3?content-type=application%2Fjson
NM_006005.3:c.2576G>A(chr4:6302371G>A)


In [8]:
varMapper = VariantColumnMapper(variant_d=variant_d, variant_column_name='Variant', default_genotype='heterozygous')

<h1>Demographic data</h1>
<p>pyphetools can be used to capture information about age, sex, and individual identifiers. This information is stored in a map of "IndividualMapper" objects. Special treatment may be required for the indifiers, which may be used as the column names or row index.</p>

In [9]:
ageMapper = AgeColumnMapper.by_year('Age')
ageMapper.preview_column(dft)

Unnamed: 0,original column contents,age
0,97,P97Y
1,55,P55Y
2,69,P69Y
3,38,P38Y
4,43,P43Y
5,17,P17Y


In [10]:
sexMapper = SexColumnMapper(male_symbol='male', female_symbol='female', column_name='Sex')
sexMapper.preview_column(dft)

Unnamed: 0,original column contents,sex
0,female,FEMALE
1,female,FEMALE
2,female,FEMALE
3,female,FEMALE
4,female,FEMALE
5,male,MALE


In [11]:
encoder = CohortEncoder(df=dft, 
                        hpo_cr=hpo_cr, 
                        column_mapper_list=column_mapper_list, 
                        individual_column_name="patient_id", 
                        agemapper=ageMapper, 
                        sexmapper=sexMapper,
                        variant_mapper=varMapper, 
                        metadata=metadata)
deafness_as6 = Disease(disease_id='OMIM:600965', disease_label='Deafness, autosomal dominant 6')
encoder.set_disease(deafness_as6)

In [12]:
individuals = encoder.get_individuals()
cvalidator = CohortValidator(cohort=individuals, ontology=hpo_ontology, min_hpo=1, allelic_requirement=AllelicRequirement.MONO_ALLELIC)
qc = QcVisualizer(cohort_validator=cvalidator)
display(HTML(qc.to_summary_html()))

In [13]:
individuals = cvalidator.get_error_free_individual_list()
table = PhenopacketTable(individual_list=individuals, metadata=metadata)
display(HTML(table.to_html()))

Individual,Disease,Genotype,Phenotypic features
II:2 (FEMALE; P97Y),"Deafness, autosomal dominant 6 (OMIM:600965)",NM_006005.3:c.2576G>A (heterozygous),Low-frequency sensorineural hearing impairment (HP:0008573); Progressive sensorineural hearing impairment (HP:0000408); excluded: Graves disease (HP:0100647); excluded: Crohn's disease (HP:0100280)
III:1 (FEMALE; P55Y),"Deafness, autosomal dominant 6 (OMIM:600965)",NM_006005.3:c.2576G>A (heterozygous),Low-frequency sensorineural hearing impairment (HP:0008573); Progressive sensorineural hearing impairment (HP:0000408); Graves disease (HP:0100647); excluded: Crohn's disease (HP:0100280)
III:3 (FEMALE; P69Y),"Deafness, autosomal dominant 6 (OMIM:600965)",NM_006005.3:c.2576G>A (heterozygous),Low-frequency sensorineural hearing impairment (HP:0008573); Progressive sensorineural hearing impairment (HP:0000408); excluded: Graves disease (HP:0100647); excluded: Crohn's disease (HP:0100280)
IV:2 (FEMALE; P38Y),"Deafness, autosomal dominant 6 (OMIM:600965)",NM_006005.3:c.2576G>A (heterozygous),Low-frequency sensorineural hearing impairment (HP:0008573); Progressive sensorineural hearing impairment (HP:0000408); Crohn's disease (HP:0100280); excluded: Graves disease (HP:0100647)
IV:4 (FEMALE; P43Y),"Deafness, autosomal dominant 6 (OMIM:600965)",NM_006005.3:c.2576G>A (heterozygous),Low-frequency sensorineural hearing impairment (HP:0008573); Progressive sensorineural hearing impairment (HP:0000408); excluded: Graves disease (HP:0100647); excluded: Crohn's disease (HP:0100280)
V:2 (MALE; P17Y),"Deafness, autosomal dominant 6 (OMIM:600965)",NM_006005.3:c.2576G>A (heterozygous),Low-frequency sensorineural hearing impairment (HP:0008573); Progressive sensorineural hearing impairment (HP:0000408); excluded: Graves disease (HP:0100647); excluded: Crohn's disease (HP:0100280)


In [14]:
output_directory = "phenopackets"
Individual.output_individuals_as_phenopackets(individual_list=individuals,
                                             metadata=metadata,
                                             outdir=output_directory)

We output 6 GA4GH phenopackets to the directory phenopackets
