<h1> Explore creation of phenopackets from supplemental material</h1>
<p>Let's take <a href="https://pubmed.ncbi.nlm.nih.gov/30612693/" target="__blank">Platzer K., et al. (2019) De Novo Variants in MAPK8IP3 Cause Intellectual Disability with Variable Brain Anomalies</a> as an example</p>
<p>pyphetools provides a convenient way of extracting HPO terms from typical tables presented in supplemental material, in which columns either contain yes/no/not-observed indications for a specific phenotypic feature or contain variously formated strings with one or multiple phenotypic features. pyphetools uses text mining to capture as many <a href="https://hpo.jax.org/app/" target="__blank">HPO</a> terms as possible and allows users to specific optional dictionaries that map words or phrases used in the table to the primary labels of HPO terms.</p>
<p>Users can work on one column at a time and then generate a collection of <a href="https://pubmed.ncbi.nlm.nih.gov/35705716/" target="__blank">GA4GH phenopackets</a> to represent each patient included in the original supplemental material. These phenopackets can then be used for a variety of downstream applications.</p>

In [1]:
import phenopackets as php
from google.protobuf.json_format import MessageToDict, MessageToJson
from google.protobuf.json_format import Parse, ParseDict
import pandas as pd
pd.set_option('display.max_colwidth', None) # show entire column contents, important!
from collections import defaultdict
import os
import sys

sys.path.insert(0, os.path.abspath('../../pyphetools'))
from pyphetools import *

<h2>Importing HPO data</h2>
<p>pyphetools uses the Human Phenotype Ontology (HPO) to encode phenotypic features. The recommended way of doing this is to ingest the hp.json file using HpoParser, which in turn creates an HpoConceptRecognizer object. </p>

In [2]:
hpo_json_path = '/home/peter/data/hpo/hp.json'
parser = HpoParser(hpo_json_file=hpo_json_path)
hpo_cr = parser.get_hpo_concept_recognizer()

Length of valid_node_curies 16425


<h2>Importing the supplemental table</h2>
<p>Supplemental Table S1 of the Platzer et al (2019) paper is an Excel file that is included in the data subfolder and contains Detailed Clinical Information for All Individuals with Causative De Novo Variants in MAPK8IP3. We need to read this from the original Excel file because some of the cells contain new-line symbols.</p>
<p>Here, we use the pandas library to import this file (note that the Python package called openpyxl must be installed to read Excel files with pandas, although the library does not need to be imported in this notebook). pyphetools expects a pandas DataFrame as input, and users can choose any input format available for pandas include CSV, TSV, and Excel, or can use any other method to transform their input data into a Pandas DataFrame before using pyphetools.</p>

In [3]:
df = pd.read_excel('data/mmc2.xlsx')

<h2>Stepwise encoding of the supplementary material</h2>
<p>pyphetools supports efficient and accurate HPO-encoding of typical supplementary tables that describe a cohort of 
individuals found to have a certain disease. We recommend using the <tt>df.head()</tt> and the <tt>df.columns</tt> commands
to view the data and the column headers as follows.</p>

In [4]:
df.head()

Unnamed: 0,Indvidual\nin\nmanuscript,g.(hg19) Chr16:,Transcript\nNM_015133.4\nc.,p.,origin,genetic testing,Sex,age at last assesment,prenatal period,Exam at birth,...,neurological examination,result of external MRI,seizures,Sz onset and Sz types,AEDs used,Sz outcome,EEG,Additional symptoms,family history,further results of genetic testing
0,1,1756405,c.65delG,p.Gly22Alafs*3,de novo,TrioWES,M,14 y 8 m,,41 weeks:\nlength: 53.3 cm\nweight: 3.941 kg\nOFC: NA,...,ataxia,"mild cerebellar atrophy, hypointensity of the globi pallidi and substantia nigra, possible mild degree of abnormal iron or mineral deposition",no,,,,,"speech is ataxic but speaks in sentences/short phrases; attention issues, impulse control and emotional lability, OCD symptoms; recently developed scoliosis",unremarkable,
1,2,1756419,c.79G>T,p.Glu27*,de novo,SingleWES,M,4 y,,length: 49 cm\nweigth: 3215 g\nOFC: 35 cm,...,ataxia,normal,no,,,,,pre-natal pelvi-ureteric junction stenosis (spontaneous resolution at 6 m),,
2,3,1756451,c.111C>G,p.Tyr37*,de novo,TrioWES,M,4 y,,length: 20.5 in\nweight: 8 lb 2 oz\nOFC: NA,...,,Stable areas of T2 hyperintensity involving the central tegmental tracts,no,,,,,Nystagmus,unremarkable,770 kb duplicaion of 20p12.3 on chromosome microarray
3,4,1798706,c.1198G>A,p.Gly400Arg,de novo,TrioWES,M,7 y 6 m,"no prenatal care, no known problems","32 weeks:\nlength: NA,\nweight: 4 lbs,\nOFC: NA\n\nhad a 30 day hospital course",...,,no MRI done,no,,,,,"Left hearing loss; Dysmorphic features: hypertelorism inner canthal distance 4.3cm; low set prominent ears, slight overhangin columella, hypodontia; 5th finger clinodactyly and 5th finger brachydactylky; synophrys; Encopresis",Mother with learning disorder; finished 11th grade; Father with ADHD and learning disorder; finished 9th grade; Full sister with learning disorder; Full sister no known problems; Full brother with learning disorder,
4,5,1810410,c.1331T>C,p.Leu444Pro,de novo,TrioWES,M,10 y,,"40 weeks, length: 52 cm\nweight: 3810 g\nOFC: 36 cm",...,,perisylvian polymicrogyria,yes,10 y:\none event of a generalized seizure,,,"pathological EEG with normal age-related background activity (alpha-type), increased appearance of slowing over temporal and occipital regions","no dysmorphism, small teeth, severe s-configured scoliosis of thoracic and lumbar spine",,


<h3>Step 1: Determine the columns of interest</h3>
<p>Typically, some but not all columns of Supplemental tables include clinical phenotypic features that can be encoded using HPO. Inspect the table using the pandas <tt>head()</tt> function or the <tt>columns</tt> attribute and decide which columns to encode</p>

In [5]:
df.columns

Index(['Indvidual\nin\nmanuscript', 'g.(hg19) Chr16:',
       'Transcript\nNM_015133.4\nc.', 'p.', 'origin', 'genetic testing', 'Sex',
       'age at last assesment', 'prenatal period', 'Exam at birth',
       'body measurements\n(at last assesment if not otherwise specified)',
       'DD', 'severity of ID', 'development', 'regression', 'autism',
       'hypotonia', 'movement disorder', 'CVI', 'neurological examination',
       'result of external MRI', 'seizures', 'Sz onset and Sz types',
       'AEDs used', 'Sz outcome', 'EEG', 'Additional symptoms',
       'family history', 'further results of genetic testing'],
      dtype='object')

<h3>Step 2: Encode each column of interest using the ColumnMapper class</h3>
<p>We will show how to work with the ColumnMapper class in detail using the 'neurological examination' column. The basic idea is to make one ColumnMapper object for each column of interest. The column mapper knows how to map the contents using either default exact text matching or custom maps from whatever strings to HPO terms.</p>
<p>The first step is to create a ColumnMapper object and use the preview_column feature to see how many terms can be mapped using exact text mining</p>

In [6]:
neuroMapper = CustomColumnMapper(concept_recognizer=hpo_cr)
neuroMapper.preview_column(df['neurological examination'])

Unnamed: 0,column,terms
0,ataxia,Ataxia (HP:0001251)
1,,
2,spastic paraplegia,Spastic paraplegia (HP:0001258)
3,"spasticity; nerve conduction and EMG studies with abnormal findings ""remarkable for the failure to activate the leg muscles due to an upper motor neuron pattern of aberrant motor unit potential firing rates. These findings are consistent with dysfunction of the corticospinal pathways rather than a lower motor unit."" Significant low extremity weakness.",Spasticity (HP:0001257)
4,spasticity/stiff legs,Spasticity (HP:0001257)
5,spastic diplegic cerebral palsy,Cerebral palsy (HP:0100021)
6,"orobuccal dyspraxia, awkward gross and fine motricity, difficulty in coordination, unstable gait",


<h3>Adding manual mappings: CustomColumnMapper</h3>
<p>We can see that the string in the first column, 'ataxia' was mapped to the HPO term <i>Ataxia</i> (HP:0001251), and that several other concepts were identified. However, we missed several concepts that do not match exactly with HPO term labels or synonyms, but can be mapped using some domain knonwledge. For instance, low extremity weakness appears to be equivalent to <i>Lower limb muscle weakness</i> (HP:0007340).</p>
<p>To add these mappings, users should look up the primary label of the HPO terms in question and create a dictionary that maps the phrases used in the supplemental material to the HPO labels. The following cell shows a map for the 'neurological examination' column and calls the ColumnMapper constructor with this custom map</p>

In [7]:
neuro_exam_custom_map = {'low extremity weakness': 'Lower limb muscle weakness',  
                         'unstable gait': 'Unsteady gait',
                         'dysfunction of the corticospinal pathways':'Upper motor neuron dysfunction',
                         'spastic': 'Spasticity',
                         'orobuccal dyspraxia': 'Oromotor apraxia',
                         'difficulty in coordination':'Poor coordination'
                        }
neuroMapper = CustomColumnMapper(concept_recognizer=hpo_cr, custom_map_d=neuro_exam_custom_map, )
neuroMapper.preview_column(df['neurological examination'])

Unnamed: 0,column,terms
0,ataxia,Ataxia (HP:0001251)
1,,
2,spastic paraplegia,Spastic paraplegia (HP:0001258)
3,"spasticity; nerve conduction and EMG studies with abnormal findings ""remarkable for the failure to activate the leg muscles due to an upper motor neuron pattern of aberrant motor unit potential firing rates. These findings are consistent with dysfunction of the corticospinal pathways rather than a lower motor unit."" Significant low extremity weakness.",Lower limb muscle weakness (HP:0007340); Upper motor neuron dysfunction (HP:0002493); Spasticity (HP:0001257)
4,spasticity/stiff legs,Spasticity (HP:0001257)
5,spastic diplegic cerebral palsy,Spasticity (HP:0001257); Cerebral palsy (HP:0100021)
6,"orobuccal dyspraxia, awkward gross and fine motricity, difficulty in coordination, unstable gait",Unsteady gait (HP:0002317); Oromotor apraxia (HP:0007301); Poor coordination (HP:0002370)


<h3>Adding manual mappings: CustomColumnMapper</h3>
<p>The 'DD' column only contains information about <i>Global developmental delay</i> (HP:0001263). We use
    the SimpleColumnMapper class for columns such as this that contain Yes/No information (and sometimes an indication that the item was not measured or information is not available).</p>

In [8]:
df['DD'].head()

0    yes
1    yes
2    yes
3    yes
4    yes
Name: DD, dtype: object

In [9]:
ddMapper = SimpleColumnMapper(hpo_id='HP:0001263',
    hpo_label='Global developmental delay',
    observed='yes',
    excluded='no')

In [10]:
ddMapper.preview_column(df['DD'])

Unnamed: 0,term,status
0,Global developmental delay (HP:0001263),observed
1,Global developmental delay (HP:0001263),observed
2,Global developmental delay (HP:0001263),observed
3,Global developmental delay (HP:0001263),observed
4,Global developmental delay (HP:0001263),observed
5,Global developmental delay (HP:0001263),observed
6,Global developmental delay (HP:0001263),observed
7,Global developmental delay (HP:0001263),observed
8,Global developmental delay (HP:0001263),observed
9,Global developmental delay (HP:0001263),observed


<h3>Adding manual mappings from a list of options: OptionColumnMapper</h3>
<p>In some cases, a columnn has just a few options that may need to be manually mapped. In our example, the column 'severity of ID' contains <i>Intellectual disability, severe</i> (HP:0010864)  <i>Intellectual disability, moderate</i>  (HP:0002342), and <i>Intellectual disability, mild</i> (HP:0001256)</p>
<p>We use the OptionColumnMapper class for such cases.</p>

In [11]:
df['severity of ID']

0            moderate\n(IQ 48)
1                       severe
2                     moderate
3                         mild
4     moderate\n(IQ 49 and 65)
5                         mild
6                         mild
7                       severe
8                     moderate
9                     moderate
10                    moderate
11                      severe
12           moderate\n(IQ 49)
Name: severity of ID, dtype: object

In [12]:
severity_d = {'moderate\n(IQ 48)':'Intellectual disability, moderate',
             'moderate':'Intellectual disability, moderate',
             'moderate\n(IQ 49)': 'Intellectual disability, moderate',
             'severe': 'Intellectual disability, severe',
             'mild': 'Intellectual disability, mild'}
severityOfIdMapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=severity_d)
severityOfIdMapper.preview_column(df['severity of ID'])

Unnamed: 0,term,status
0,"Intellectual disability, moderate (HP:0002342)",observed
1,"Intellectual disability, severe (HP:0010864)",observed
2,"Intellectual disability, moderate (HP:0002342)",observed
3,"Intellectual disability, mild (HP:0001256)",observed
4,"Intellectual disability, mild (HP:0001256)",observed
5,"Intellectual disability, mild (HP:0001256)",observed
6,"Intellectual disability, severe (HP:0010864)",observed
7,"Intellectual disability, moderate (HP:0002342)",observed
8,"Intellectual disability, moderate (HP:0002342)",observed
9,"Intellectual disability, moderate (HP:0002342)",observed


<h2>Collecting column mappings for the entire table</h2>
<p>pyphetools expects to get a disctionary whose keys correspond to the column names used by the pandas DataFrame, and the values are the corresponding ColumnMapper objects. In the following, we create this dictionary and then we create a ColumnMapper object for each of the columns to be mapped.</p>

In [13]:
column_mapper_d = defaultdict(ColumnMapper)
column_mapper_d['neurological examination'] = neuroMapper
column_mapper_d['DD'] = ddMapper
column_mapper_d['severity of ID'] = severityOfIdMapper

<h3>The remaining columns</h3>
<p>We now create mappers for the remaining columns with less comments.</p>

In [14]:
regressionMapper = SimpleColumnMapper(hpo_id='HP:0002376',
    hpo_label='Developmental regression',
    observed='yes',
    excluded='no')
regressionMapper.preview_column(df['regression'])
column_mapper_d['regression'] = regressionMapper

In [15]:
autismMapper = SimpleColumnMapper(hpo_id='HP:0000717',  
    hpo_label='Autism',
    observed='yes',
    excluded='no')
autismMapper.preview_column(df['autism'])
column_mapper_d['autism'] = autismMapper

In [16]:
hypotoniaMapper = SimpleColumnMapper(hpo_id='HP:0001252',  
    hpo_label='Hypotonia',
    observed='yes',
    excluded='no')
hypotoniaMapper.preview_column(df['hypotonia'])
column_mapper_d['hypotonia'] = hypotoniaMapper

In [17]:
movementMapper = SimpleColumnMapper(hpo_id='HP:0100022',  
    hpo_label='Abnormality of movement',
    observed='yes',
    excluded='no')
movementMapper.preview_column(df['movement disorder'])
column_mapper_d['movement disorder'] = movementMapper

In [18]:
# CVI stands for Cortical visual impairment HP:0100704
cviMapper = SimpleColumnMapper(hpo_id='HP:0100704',  
    hpo_label='Cerebral visual impairment',
    observed='yes',
    excluded='no')
cviMapper.preview_column(df['CVI'])
column_mapper_d['CVI'] = cviMapper

In [19]:
## THIS ONE IS NOT WORKING WELL ENOUGH YET
mri_custom_map = {'hypomyelination': 'CNS hypomyelination',  
                         'thinning of CC': 'Thin corpus callosum',
                         'white matter volume loss':'Reduced cerebral white matter volume',
                         'widened lateral ventricles': 'Lateral ventricle dilatation',
                         'dysgenesis of corpus callosum': 'Dysplastic corpus callosum',
                        }
mriMapper = CustomColumnMapper(concept_recognizer=hpo_cr, custom_map_d=mri_custom_map, )
mriMapper.preview_column(df['result of external MRI'])

Unnamed: 0,column,terms
0,"mild cerebellar atrophy, hypointensity of the globi pallidi and substantia nigra, possible mild degree of abnormal iron or mineral deposition",Cerebellar atrophy (HP:0001272)
1,normal,
2,Stable areas of T2 hyperintensity involving the central tegmental tracts,
3,no MRI done,
4,perisylvian polymicrogyria,Perisylvian polymicrogyria (HP:0012650)
5,perisylvian polymicrogyria extending from the superior/transverse temporal lobes into the frontal opercula,Polymicrogyria (HP:0002126); Perisylvian polymicrogyria (HP:0012650)
6,hypomyelination and thin corpus callosum,CNS hypomyelination (HP:0003429); Thin corpus callosum (HP:0033725)
7,"severe diffuse thinning of CC and relative hypoplasia of mesencephalon and brainstem, diffusely diminished cerebral white matter, and spinal cord malacia. \n\nMost recent brain MRI scan was significant for ""diffusely thin corpus callosum with global cerebral features of white matter volume loss. There is some abnormal FLAIR hyperintensity throughout the cerebral white matter most conspicuous within the periatrial white matter. Thinning of the cerebral white matter tracts is diffuse and not restricted to periventricular or deep white matter fibers and extends to the subcortical U. fibers resulting in an appearance analogous to polymicrogyria due to close approximation of the cortical mantle. Gray matter lesions are not seen and there appears be sparing of the brainstem and mesencephalon other than probable components of Wallerian degeneration. In summary, there has been progressive myelinization since last exam. The pattern of white matter loss and abnormality including involvement of the subcortical U fibers is atypical raising the question of a concomitant white matter metabolic disorder""; Spine MRI scan indicates inferior vermian hypoplasia and stable diffuse cord thinning. No focal cord lesion, syrinx or abnormal signal intensity. Findings appear stable from prior.",
8,widened lateral ventricles and thin corpus callosum,Lateral ventricle dilatation (HP:0006956); Thin corpus callosum (HP:0033725)
9,"periventricular leukomalacia, dysgenesis of corpus callosum",Dysplastic corpus callosum (HP:0006989); Periventricular leukomalacia (HP:0006970)


In [20]:
df['seizures']
seizureMapper = SimpleColumnMapper(hpo_id='HP:0001250',  
    hpo_label='Seizure',
    observed='yes',
    excluded='no')
seizureMapper.preview_column(df['seizures'])
column_mapper_d['seizures'] = seizureMapper

In [21]:
## THIS ONE IS NOT WORKING WELL ENOUGH YET
additional_custom_map = {'OCD': 'Obsessive-compulsive behavior',  
                         '5th finger clinodactyly': 'Clinodactyly of the 5th finger',
                         'small teeth':'Reduced cerebral white matter volume',
                         'widened lateral ventricles': 'Lateral ventricle dilatation',
                         'dysgenesis of corpus callosum': 'Dysplastic corpus callosum',
                        }
additionalFeaturesMapper = CustomColumnMapper(concept_recognizer=hpo_cr, custom_map_d=mri_custom_map, )
mriMapper.preview_column(df['Additional symptoms'])

Unnamed: 0,column,terms
0,"speech is ataxic but speaks in sentences/short phrases; attention issues, impulse control and emotional lability, OCD symptoms; recently developed scoliosis",Emotional lability (HP:0000712); Scoliosis (HP:0002650)
1,pre-natal pelvi-ureteric junction stenosis (spontaneous resolution at 6 m),
2,Nystagmus,Nystagmus (HP:0000639)
3,"Left hearing loss; Dysmorphic features: hypertelorism inner canthal distance 4.3cm; low set prominent ears, slight overhangin columella, hypodontia; 5th finger clinodactyly and 5th finger brachydactylky; synophrys; Encopresis",Hypertelorism (HP:0000316); Synophrys (HP:0000664); Hypodontia (HP:0000668); Clinodactyly (HP:0030084); Finger clinodactyly (HP:0040019); Encopresis (HP:0040183)
4,"no dysmorphism, small teeth, severe s-configured scoliosis of thoracic and lumbar spine",Scoliosis (HP:0002650)
5,,
6,"full cheeks, long philtrum, slight micrognatia, no specific dysmorphic features",Full cheeks (HP:0000293); Long philtrum (HP:0000343)
7,"normal birth weight, dramatic increased weight without excessive calorie intake; Small hands and feet.",Small hand (HP:0200055)
8,no dysmorphism,
9,"behavior: biting, scratching, throwing if fustrated, but generally happy dispositiion; myopic astigmatism and pseudostrabismus",Astigmatism (HP:0000483); Strabismus (HP:0000486); Myopic astigmatism (HP:0500041)


<H1>Mapping variants</H1>
<p>pyphetools attempts to map variants using the VariantValidator API.</p>

In [22]:
genome = 'hg38'
default_genotype = 'heterozygous'
transcript='NM_015133.4'
varMapper = VariantColumnMapper(assembly=genome,column_name='Transcript\nNM_015133.4\nc.', transcript=transcript, genotype=default_genotype)

<p>To use this tool, identify the column that contains valid HGVS nomenclature for the variants. Also, if there is a genotype ("heterozygous", "homozygous", "hemizygous") that applies to all variants, it can be used here.</p>
<p>This is the easiest situation (one het variant per patient in its own column). TODO -- provide a semimanual way of doing this for more complex cases)</p>

In [23]:
df['Transcript\nNM_015133.4\nc.']

0        c.65delG
1         c.79G>T
2        c.111C>G
3     c.1198G>A  
4       c.1331T>C
5       c.1331T>C
6       c.1574G>A
7       c.1732C>T
8       c.1732C>T
9       c.2982C>G
10      c.3436C>T
11      c.3436C>T
12      c.3436C>T
Name: Transcript\nNM_015133.4\nc., dtype: object

In [24]:
varMapper.preview_column(df['Transcript\nNM_015133.4\nc.'])

https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_015133.4%3Ac.65delG/NM_015133.4
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_015133.4%3Ac.79G>T/NM_015133.4
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_015133.4%3Ac.111C>G/NM_015133.4
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_015133.4%3Ac.1198G>A/NM_015133.4
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_015133.4%3Ac.1331T>C/NM_015133.4
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_015133.4%3Ac.1331T>C/NM_015133.4
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_015133.4%3Ac.1574G>A/NM_015133.4
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_015133.4%3Ac.1732C>T/NM_015133.4
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_015133.4%3Ac.1732C>T/NM_015133.4
https://rest.variantval

Unnamed: 0,variant
0,chr16:1706402CG-C
1,chr16:1706418G-T
2,chr16:1706450C-G
3,chr16:1748705G-A
4,chr16:1760409T-C
5,chr16:1760409T-C
6,chr16:1762388G-A
7,chr16:1762843C-T
8,chr16:1762843C-T
9,chr16:1766768C-G


<h1>Putting it all together</h1>
<p>We now parse the entire DataFrame to generate lists of terms for each patient. We additionally need to specify the columns that contain age and sex and patient ids.</p>

In [25]:
subject_d = {'id': 'Indvidual\nin\nmanuscript',
             'sex': 'Sex',
             'age': 'age at last assesment'}


encoder = CohortEncoder(df=df, hpo_cr=hpo_cr, column_mapper_d=column_mapper_d, individual_d=subject_d, variant_mapper=varMapper)

<H2>Specifying the disease</H2>
<p>The disease is specified as a CURIE (disease id, e.g., OMIM:154700) and label, e.g., Marfan syndrome. 
The following function assumes that all individuals in the Supplemental file have the same disease, and so we set the
disease for the entire cohort encoder.</p>

In [26]:
disease_id = '618443'
disease_name = 'Neurodevelopmental disorder with or without variable brain abnormalities'

encoder.set_disease(disease_id=disease_id, label=disease_name)

<h2>Preview</h2>
<p>A preview function is provided to check results before exporting in Phenopacket Schema format.</p>

In [27]:
encoder.preview_dataframe()

Unnamed: 0_level_0,sex,age,phenotypic features
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,M,14 y 8 m,"Ataxia (HP:0001251)\nGlobal developmental delay (HP:0001263)\nIntellectual disability, moderate (HP:0002342)\nexcluded: Developmental regression (HP:0002376)\nAutism (HP:0000717)\nHypotonia (HP:0001252)\nexcluded: Abnormality of movement (HP:0100022)\nexcluded: Cerebral visual impairment (HP:0100704)\nexcluded: Seizure (HP:0001250)"
2,M,4 y,"Ataxia (HP:0001251)\nGlobal developmental delay (HP:0001263)\nIntellectual disability, severe (HP:0010864)\nexcluded: Developmental regression (HP:0002376)\nexcluded: Autism (HP:0000717)\nHypotonia (HP:0001252)\nexcluded: Abnormality of movement (HP:0100022)\nexcluded: Cerebral visual impairment (HP:0100704)\nexcluded: Seizure (HP:0001250)"
3,M,4 y,"Global developmental delay (HP:0001263)\nIntellectual disability, moderate (HP:0002342)\nexcluded: Developmental regression (HP:0002376)\nexcluded: Autism (HP:0000717)\nHypotonia (HP:0001252)\nexcluded: Abnormality of movement (HP:0100022)\nexcluded: Cerebral visual impairment (HP:0100704)\nexcluded: Seizure (HP:0001250)"
4,M,7 y 6 m,"Global developmental delay (HP:0001263)\nIntellectual disability, mild (HP:0001256)\nexcluded: Developmental regression (HP:0002376)\nAutism (HP:0000717)\nexcluded: Hypotonia (HP:0001252)\nexcluded: Abnormality of movement (HP:0100022)\nexcluded: Cerebral visual impairment (HP:0100704)\nexcluded: Seizure (HP:0001250)"
5,M,10 y,Global developmental delay (HP:0001263)\nexcluded: Developmental regression (HP:0002376)\nexcluded: Autism (HP:0000717)\nHypotonia (HP:0001252)\nexcluded: Abnormality of movement (HP:0100022)\nSeizure (HP:0001250)
6,F,9 y,"Global developmental delay (HP:0001263)\nIntellectual disability, mild (HP:0001256)\nexcluded: Developmental regression (HP:0002376)\nexcluded: Autism (HP:0000717)\nHypotonia (HP:0001252)\nexcluded: Seizure (HP:0001250)"
7,F,3 y,"Global developmental delay (HP:0001263)\nIntellectual disability, mild (HP:0001256)\nexcluded: Developmental regression (HP:0002376)\nexcluded: Autism (HP:0000717)\nexcluded: Hypotonia (HP:0001252)\nexcluded: Seizure (HP:0001250)"
8,F,5 y,"Spastic paraplegia (HP:0001258)\nGlobal developmental delay (HP:0001263)\nIntellectual disability, severe (HP:0010864)\nexcluded: Developmental regression (HP:0002376)\nexcluded: Autism (HP:0000717)\nHypotonia (HP:0001252)\nSeizure (HP:0001250)"
9,F,6 y,"Lower limb muscle weakness (HP:0007340)\nUpper motor neuron dysfunction (HP:0002493)\nSpasticity (HP:0001257)\nGlobal developmental delay (HP:0001263)\nIntellectual disability, moderate (HP:0002342)\nexcluded: Developmental regression (HP:0002376)\nexcluded: Autism (HP:0000717)\nHypotonia (HP:0001252)\nSeizure (HP:0001250)"
10,M,4 y,"Global developmental delay (HP:0001263)\nIntellectual disability, moderate (HP:0002342)\nexcluded: Developmental regression (HP:0002376)\nexcluded: Autism (HP:0000717)\nHypotonia (HP:0001252)\nSeizure (HP:0001250)"


<h1>Exporting in GA4GH Phenopacket format</h1>
<p>The conversion code is in the Individual class. TODO -- we need to add additional information, mainly the MetaData</p>

In [28]:
individuals = encoder.get_individuals()

https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_015133.4%3Ac.65delG/NM_015133.4
size of variant_list is {len(variant_list)}
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_015133.4%3Ac.79G>T/NM_015133.4
size of variant_list is {len(variant_list)}
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_015133.4%3Ac.111C>G/NM_015133.4
size of variant_list is {len(variant_list)}
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_015133.4%3Ac.1198G>A/NM_015133.4
size of variant_list is {len(variant_list)}
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_015133.4%3Ac.1331T>C/NM_015133.4
size of variant_list is {len(variant_list)}
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_015133.4%3Ac.1331T>C/NM_015133.4
size of variant_list is {len(variant_list)}
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_015133.4%3

In [29]:
print(f"We extracted {len(individuals)} individuals")

We extracted 13 individuals


In [30]:
i1 = individuals[0]

In [31]:
phenopacket1 = i1.to_ga4gh_phenopacket()

Individual, size of variants 1


In [32]:
json_string = MessageToJson(phenopacket1)
print(json_string)

{
  "id": "1",
  "subject": {
    "id": "1",
    "timeAtLastEncounter": {
      "age": {
        "iso8601duration": "P2Y"
      }
    },
    "sex": "MALE"
  },
  "phenotypicFeatures": [
    {
      "type": {
        "id": "HP:0001251",
        "label": "Ataxia"
      }
    },
    {
      "type": {
        "id": "HP:0001263",
        "label": "Global developmental delay"
      }
    },
    {
      "type": {
        "id": "HP:0002342",
        "label": "Intellectual disability, moderate"
      }
    },
    {
      "type": {
        "id": "HP:0002376",
        "label": "Developmental regression"
      },
      "excluded": true
    },
    {
      "type": {
        "id": "HP:0000717",
        "label": "Autism"
      }
    },
    {
      "type": {
        "id": "HP:0001252",
        "label": "Hypotonia"
      }
    },
    {
      "type": {
        "id": "HP:0100022",
        "label": "Abnormality of movement"
      },
      "excluded": true
    },
    {
      "type": {
        "id": "HP:0100

<h2>Development plans</h2>
<p>To do for this package -- see the issues in GitHub. </p>