<h1>WWOX: Gribaa (2007)</h1>
<p>We will process <a href="https://pubmed.ncbi.nlm.nih.gov/17470496/" target="__blank">Gribaa, et al. (2007) A new form of childhood onset, autosomal recessive spinocerebellar ataxia and epilepsy is localized at 16q21-q23</a></p>

In [1]:
import pandas as pd
pd.set_option('display.max_colwidth', None) # show entire column contents, important!
from IPython.display import display, HTML
from pyphetools.creation import *
from pyphetools.visualization import *
from pyphetools.validation import *
import pyphetools
print(f"Using pyphetools version {pyphetools.__version__}")

Using pyphetools version 0.9.39


<h2>Importing HPO data</h2>

In [2]:
PMID = "PMID:17470496"
title = "A new form of childhood onset, autosomal recessive spinocerebellar ataxia and epilepsy is localized at 16q21-q23"
cite = Citation(pmid=PMID, title=title)
parser = HpoParser(hpo_json_file="../hp.json")
hpo_cr = parser.get_hpo_concept_recognizer()
hpo_version = parser.get_version()
hpo_ontology = parser.get_ontology()
metadata = MetaData(created_by="ORCID:0000-0002-5648-2155", citation=cite)
metadata.default_versions_with_hpo(version=hpo_version)
print(f"HPO version {hpo_version}")

HPO version 2024-01-16


<h2>Importing the supplemental table</h2>

In [3]:
df = pd.read_excel('input/PMID_17470496.xlsx')

In [4]:
df.head()

Unnamed: 0,Patient,II,II2,II3,II4
0,Sex,female,female,male,female
1,Age,19,18,16,10
2,Seizures,+,+,+,+
3,Motor delay,+,+,+,+
4,Developmental delay,+,+,+,+


In [5]:
# Transpose table
df = df.set_index('Patient').T.reset_index()
df['patient_id'] = df.index
df.head()

Patient,index,Sex,Age,Seizures,Motor delay,Developmental delay,Ataxia,Gait ataxia,Dysarthria,Hyporeflexia,Impaired continence,Nystagmus,Variant,patient_id
0,II,female,19,+,+,+,+,+,+,+,-,+,c.139C>A,0
1,II2,female,18,+,+,+,+,+,+,+,-,+,c.139C>A,1
2,II3,male,16,+,+,+,+,+,+,+,-,+,c.139C>A,2
3,II4,female,10,+,+,+,+,+,+,+,+,+,c.139C>A,3


In [6]:
generator = SimpleColumnMapperGenerator(df=df, observed='+', excluded='-', hpo_cr=hpo_cr)
column_mapper_list = generator.try_mapping_columns()
display(HTML(generator.to_html()))

Result,Columns
Mapped,Seizures; Motor delay; Developmental delay; Ataxia; Gait ataxia; Dysarthria; Hyporeflexia; Impaired continence; Nystagmus
Unmapped,index; Sex; Age; Variant; patient_id


<h2>Variant Data</h2>

In [7]:
genome = 'hg38'
default_genotype = 'heterozygous'
WWOX_transcript='NM_016373.2'
vvalidator = VariantValidator(genome_build=genome, transcript=WWOX_transcript)
var = vvalidator.encode_hgvs("c.139C>A")
var_d = {"c.139C>A": var}
varMapper = VariantColumnMapper(variant_d=var_d, variant_column_name='Variant', default_genotype="homozygous")

https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_016373.2%3Ac.139C>A/NM_016373.2?content-type=application%2Fjson


<h1>Demographic data</h1>

In [8]:
ageMapper = AgeColumnMapper.by_year('Age')
#ageMapper.preview_column(df)
sexMapper = SexColumnMapper(male_symbol='male', female_symbol='female', column_name='Sex')
#sexMapper.preview_column(df)

In [9]:
encoder = CohortEncoder(df=df, hpo_cr=hpo_cr, 
                        column_mapper_list=column_mapper_list, 
                        individual_column_name="patient_id", 
                        agemapper=ageMapper, 
                        sexmapper=sexMapper,
                        variant_mapper=varMapper, 
                        metadata=metadata)
sca12 = Disease(disease_id='OMIM:614322', disease_label='Spinocerebellar ataxia, autosomal recessive 12')
encoder.set_disease(sca12)

In [10]:
individuals = encoder.get_individuals()
cvalidator = CohortValidator(cohort=individuals, ontology=hpo_ontology, min_hpo=1, allelic_requirement=AllelicRequirement.BI_ALLELIC)
qc = QcVisualizer(cohort_validator=cvalidator)
display(HTML(qc.to_summary_html()))

Level,Error category,Count
WARNING,REDUNDANT,4


In [11]:
individuals = cvalidator.get_error_free_individual_list()
table = PhenopacketTable(individual_list=individuals, metadata=metadata)
display(HTML(table.to_html()))

Individual,Disease,Genotype,Phenotypic features
0 (FEMALE; P19Y),"Spinocerebellar ataxia, autosomal recessive 12 (OMIM:614322)",NM_016373.2:c.139C>A (homozygous),Seizure (HP:0001250); Motor delay (HP:0001270); Global developmental delay (HP:0001263); Gait ataxia (HP:0002066); Dysarthria (HP:0001260); Hyporeflexia (HP:0001265); Nystagmus (HP:0000639); excluded: Impaired continence (HP:0031064)
1 (FEMALE; P18Y),"Spinocerebellar ataxia, autosomal recessive 12 (OMIM:614322)",NM_016373.2:c.139C>A (homozygous),Seizure (HP:0001250); Motor delay (HP:0001270); Global developmental delay (HP:0001263); Gait ataxia (HP:0002066); Dysarthria (HP:0001260); Hyporeflexia (HP:0001265); Nystagmus (HP:0000639); excluded: Impaired continence (HP:0031064)
2 (MALE; P16Y),"Spinocerebellar ataxia, autosomal recessive 12 (OMIM:614322)",NM_016373.2:c.139C>A (homozygous),Seizure (HP:0001250); Motor delay (HP:0001270); Global developmental delay (HP:0001263); Gait ataxia (HP:0002066); Dysarthria (HP:0001260); Hyporeflexia (HP:0001265); Nystagmus (HP:0000639); excluded: Impaired continence (HP:0031064)
3 (FEMALE; P10Y),"Spinocerebellar ataxia, autosomal recessive 12 (OMIM:614322)",NM_016373.2:c.139C>A (homozygous),Seizure (HP:0001250); Motor delay (HP:0001270); Global developmental delay (HP:0001263); Gait ataxia (HP:0002066); Dysarthria (HP:0001260); Hyporeflexia (HP:0001265); Impaired continence (HP:0031064); Nystagmus (HP:0000639)


In [12]:
output_directory = "phenopackets"
Individual.output_individuals_as_phenopackets(individual_list=individuals,
                                              metadata=metadata,
                                              outdir=output_directory)

We output 4 GA4GH phenopackets to the directory phenopackets
