<h1>landscape of STXBP1-related disorders </h1>
<p>Extract the clinical data from <a href="https://pubmed.ncbi.nlm.nih.gov/35190816/"target="__blank">Xian et al. (2022) Assessing the landscape of STXBP1-related disorders in 534 individuals. Brain.</a>.<p>
    <p>Note that although OMIM lists only one disease associated with STXBP1, the authors assign patients to phenotypic groups, that we will use here for the diagnosis.</p>
    <p>According to the authors, patients with R406H and R406C mutations were more likely to have a burst suppression pattern on EEG and spastic tetraplegia, and less likely to have ataxia, compared to the rest of the cohort. Additionally, patients with premature termination mutations or deletions in the STXBP1 gene were more likely to have infantile spasms, hypsarrhythmia on EEG, ataxia, hypotonia, and neonatal seizure onset compared to patients with missense mutations.</p>

In [1]:
import pandas as pd
pd.set_option('display.max_colwidth', None) # show entire column contents, important!
from IPython.display import display, HTML
from pyphetools.creation import *
from pyphetools.visualization import *
from pyphetools.validation import *
import pyphetools
print(f"Using pyphetools version {pyphetools.__version__}")

Using pyphetools version 0.9.78




In [2]:
PMID = "PMID:35190816"
title = "Assessing the landscape of STXBP1-related disorders in 534 individuals"
cite = Citation(pmid=PMID, title=title)
metadata = MetaData(created_by="ORCID:0000-0002-0736-9199", citation=cite)
parser = HpoParser(hpo_json_file="../hp.json")
hpo_cr = parser.get_hpo_concept_recognizer()
hpo_version = parser.get_version()
hpo_ontology = parser.get_ontology()
metadata.default_versions_with_hpo(version=hpo_version)
print(f"HPO version {hpo_version}")

HPO version 2024-04-04


In [3]:
clinical_df = pd.read_table("input/brain-2021-00642-File011.tsv");
genotype_df = pd.read_table("input/brain-2021-00642-File011-genotype.tsv");

In [4]:
clinical_df.head(2)

Unnamed: 0,PatID,Source_Journal,Source_PMID*,Year,Sex,Phenotypic_group**,age_onset_m,age_offset_m,age_eval_y,Base_HPO***,HPO_term,Notes
0,STX_18469812_Subject_11,Nat Genet,18469812,2008.0,M,EOEE,2.0,,8.0,HP:0003593,Infantile onset,
1,STX_18469812_Subject_11,Nat Genet,18469812,2008.0,M,EOEE,2.0,,8.0,HP:0010818,Generalized tonic seizures,


In [5]:
onset_terms = {"Childhood onset", "Juvenile onset", "Neonatal onset","Infantile onset"}

<h2>PatientRow</h2>
<p>Parsing needs to combine the information from File011 (which has clinical data and HPO terms) and File011-genotype (which has STXBP1 variant data). We define a class called PatientRow, which contains the identifier,
sex, and age at last evaluation.</p>

In [6]:
import math
def is_integer(n):
    try:
        float(n)
    except ValueError:
        return False
    else:
        return float(n).is_integer()
    

def is_float(n):
    try:
        fn = float(n)
        if math.isnan(fn):
            return False
        return True
    except ValueError:
        return False
    
def iso_age_from_float(n):
    age_float = float(n)
    y = math.floor(age_float)
    m = math.floor(12*(age_float - y))
    return f"P{y}Y{m}M"
    

class PatientRow:
    def __init__(self, row):
        self.patID = row["PatID"]
        self.sex = row["Sex"]
        age_eval = row["age_eval_y"]
        if is_integer(age_eval):
            y = int(age_eval)
            self.age_eval = f"P{y}Y"
        elif is_float(age_eval):
            self.age_eval = iso_age_from_float(age_eval)
        else:
            #print(f"Could not parse age {age_eval}")
            self.age_eval = None
        self.phenogroup = row["Phenotypic_group**"]

In [7]:
from csv import DictReader
from collections import defaultdict
patient_d = defaultdict(list)
patient_demographic_d = defaultdict(PatientRow)
with open("input/brain-2021-00642-File011.tsv") as f:
    reader = DictReader(f, delimiter="\t")
    for row in reader:
        prow = PatientRow(row=row)
        patient_d[prow.patID].append(prow)
print(f"We extracted data on {len(patient_d)} individuals")

We extracted data on 534 individuals


<H1>Extracting genotypes</H1>
<p>Genotypes were extracted from the supplemental file brain-2021-00642-File011-genotype.tsv. Some of the indicated genotypes were not valid HGVS and were manually corrected, including entries such as
<tt>STXBP1:NM_001032221.3:exon18:c.1548_1559AT,STXBP1:NM_003165.3:exon18:c.1548_1559AT</tt> that was coded 
    as NM_001032221.3:c.1548_1559delinsAT, and the following other manual corrections.</p>
</p>
<ul>
<li><tt>NM_001032221.3:exon12:r.spl:NM_003165.3:exon12:r.spl</tt> was coded as NM_001032221.6:c.1029+1_1029+2delinsAA (PMID:31164858)</li>
    <li><tt>NM_001032221.3:exon1:r.spl:NM_003165.3:exon1:r.spl</tt>  was coded as NM_001032221.6:c.37+1_37+2del </li>
    <li><tt>NM_001032221.6:c.578+1->G</tt> was coded as NM_001032221.6:c.578+1dup</li>
    <li><tt>NM_001032221.3:exon9:c.794+2->T</tt> was coded as NM_001032221.3:c.794+2dup</li>
    <li><tt>NM_001032221.3:exon16:c.1360-1->C</tt> was coded as NM_001032221.3:c.1360-1_1360insC</li>
</ul>

In [8]:
genotype_df.head(2)

Unnamed: 0,PatID,Chr,Start,End,Ref,Alt,Func.refGeneWithVer,Gene.refGeneWithVer,GeneDetail.refGeneWithVer,ExonicFunc.refGeneWithVer,...,Otherinfo3,Otherinfo4,Otherinfo5,Otherinfo7,Otherinfo8,Otherinfo9,Otherinfo10,Otherinfo11,bed,Unnamed: 134
0,STX_18469812_Subject_11,9.0,130422313.0,130422313.0,T,A,exonic,STXBP1,.,nonsynonymous SNV,...,.,9,130422313,T,A,.,PASS,.,Name=70.695764689,
1,STX_18469812_Subject_3,9.0,130444768.0,130444768.0,G,A,exonic,STXBP1,.,nonsynonymous SNV,...,.,9,130444768,G,A,.,PASS,.,Name=75.862552050,


In [9]:
import re
def extract_var_inf(aachange):
    """
    aachange: e.g., STXBP1:NM_001032221.3:exon6:c.T353G:p.L118R,STXBP1:NM_003165.3:exon6:c.T353G:p.L118R
    """
    fields = aachange.split(":")
    index = 0
    i = 0
    transcript = "?"
    for f in fields:
        if f == "NM_001032221.3":
            index = i
            transcript = "NM_001032221.6"
            break
        elif f == "NM_003165.3":
            index = i
            transcript = "NM_003165.6"
            break
        i += 1
    if (i + 2) < len(fields):
        # note some entries are like this - c.1548_1559delinsAT,STXBP1
        variant = fields[i+2].split(",")[0]
    else:
        raise ValueError(f"Could not get variant because of fields: {fields} and i={i}")   
    return transcript, variant

def extract_splice_var(genedetail):
    """
    genedetail: e.g., NM_001032221.3:exon3:c.169+1G>A;NM_003165.3:exon3:c.169+1G>A
    """
    fields = genedetail.split(":")
    index = 0
    i = 0
    transcript = "?"
    # get the right field and update version to current
    for f in fields:
        if f == "NM_001032221.3":
            index = i
            transcript = "NM_001032221.6"
            break
        elif f == "NM_003165.3":
            index = i
            transcript = "NM_003165.6"
            break
        i += 1
    if (i + 2) < len(fields):
        variant = fields[i+2]
    else:
        raise ValueError(f"Could not get variant because of fields: {fields} and i={i} from genedetail \"{genedetail}\"")   
    return transcript, variant


class GenotypeEntry:
    def __init__(self, row):
        self.patID = row["PatID"]
        self.chrom = row["Chr"]
        self.start = row["Start"]
        self.end = row["End"]
        self.ref = row["Ref"]
        self.alt = row["Alt"]
        self.transcript = "?"
        func = row["Func.refGeneWithVer"]
        self.category = func
        genenot = row["Gene.refGeneWithVer"]
        aachange = row["AAChange.refGeneWithVer"]
        if func == "exonic":
            transcript, variant = extract_var_inf(aachange)
            self.transcript = transcript
            regex_del = r"c.\d+_\d+del"
            regex_single_nt_del = r"(c.\d+del)[ACGT]"
            regex_dup = r"c.(\d+)dup([A-Z]+)"
            regex_sub = r"c.([A-Z]+)(\d+)([A-Z]+)"
            regex_ins = r"c.(\d+)_(\d+)ins([A-Z]+)"  # e.g., 1372_1373insGCCGGAGCAA
            regex_delins = r"(c.\d+_\d+delins[A-Z]+)"
            result = re.search(regex_sub, variant)
            result_dup = re.search(regex_dup, variant)
            result_single_nt_del = re.search(regex_single_nt_del, variant)
            result_ins = re.search(regex_ins, variant)
            result_delins = re.search(regex_delins, variant)
            if re.match(regex_del, variant):
                self.hgvs = variant
            elif result:
                ref = result.group(1)
                position = result.group(2)
                alt = result.group(3)
                hgvs = f"c.{position}{ref}>{alt}"
                self.hgvs = hgvs
            elif result_dup:
                position=result_dup.group(1)
                hgvs = f"c.{position}dup"
                self.hgvs = hgvs
            elif result_single_nt_del:
                self.hgvs = result_single_nt_del.group(1)
            elif result_delins:
                self.hgvs = result_delins.group(1)
            elif result_ins:
                pos1 = result_ins.group(1)
                pos2 = result_ins.group(2)
                seq = result_ins.group(3)
                self.hgvs = f"c.{pos1}_{pos2}ins{seq}"
            else:
                raise ValueError(f"Could not parse variant {variant}")
        elif func == 'splicing':
            geneDetail = row["GeneDetail.refGeneWithVer"]
            transcript, variant = extract_splice_var(geneDetail)
            self.transcript = transcript
            self.hgvs = variant
        elif func == "NA":
            pass
        elif func == "intronic":
            pass  
        else:
            print(f"{self.patID}---function {func}")
            raise ValueError(f"Could not parse variant  for func {func}\n{row}")
                    

In [10]:
na_genotype = 0
intronic_genotype = 0
genotype_d = defaultdict(GenotypeEntry)
with open("input/brain-2021-00642-File011-genotype.tsv") as f:
    reader = DictReader(f, delimiter="\t")
    for row in reader:
        ge = GenotypeEntry(row=row)
        if ge.category == "intronic":
            intronic_genotype = intronic_genotype + 1
        elif ge.category == "NA":
            na_genotype = na_genotype +1
        else:
            patient_id = ge.patID
            genotype_d[patient_id] = ge
print(f"We got {len(genotype_d)} usable genotypes")
print(f"We got {na_genotype} NAs, and {intronic_genotype} intronic genotypes - both were skipped")  

We got 463 usable genotypes
We got 46 NAs, and 25 intronic genotypes - both were skipped


<h2>Extracting Hpo Terms</h2>
<p>The data for one patient is distributed across multiple rows of the input Excel file. In the Excel file,
excluded terms are coded with NP:0001234 instead of HP:0001234. We record such terms as excluded in the Phenopacket.</p>

In [11]:
def get_excluded_term(hpo_id, hpo_label,onset, offset):
    if hpo_id.startswith("NP"):
        excluded = True
        hpo_id = "H" + hpo_id[1:]
        if hpo_id not in hpo_ontology:
            raise(f"ERROR (get excluded term) - could not find {hpo_id}")
        hpotk_term = hpo_ontology.get_term(hpo_id)
        hpotk_name = hpotk_term.name
        # update labels if necessary
        if hpo_label != hpotk_name:
            hpo_label = hpotk_name
        return HpTerm(hpo_id=hpo_id, label=hpo_label, observed=False, onset=onset, resolution=offset) 
    else:
        raise ValueError(f"Attempt to use get_excluded_term with non negated term {hpo_id}")

def get_observed_term(hpo_id, hpo_label,onset, offset):
    if hpo_id not in hpo_ontology:
            raise(f"ERROR (get excluded term) - could not find {hpo_id}")
    hpotk_term = hpo_ontology.get_term(hpo_id)
    hpotk_name = hpotk_term.name
    # update labels if necessary
    if hpo_label != hpotk_name:
        hpo_label = hpotk_name
    return HpTerm(hpo_id=hpo_id, label=hpo_label, observed=True, onset=onset, resolution=offset) 


def row_to_hpo(row):
    """Transform a row of the dataframe to an HPO term
    """
    obsolete_ids = {"HP:0000720": "HP:0000712",
                   "HP:0011155": "HP:0032755",
                   "HP:0002281":"HP:0002282",
                   "HP:0040083":"HP:0030051",
                    "HP:0000735": "HP:0012760",
                   "HP:0040168":"HP:0007359"}
    ons = row["age_onset_m"]
    if isinstance(ons, float):
        onset_str = "na"
    else:
        onset_str = f"P{ons}M"
    onset = PyPheToolsAge.get_age(onset_str)
    ofs = row["age_offset_m"]
    if isinstance(ofs,float):
        offset_str = "na"
    else:
        offset_str = f"P{ofs}M"
    offset =  PyPheToolsAge.get_age(offset_str)
    hpo_id = row["Base_HPO***"]
    hpo_label = row["HPO_term"]
    if hpo_id in obsolete_ids:
        hpo_id = obsolete_ids.get(hpo_id)
    # excluded terms are coded with NP:0001234 instead of HP:0001234
    if hpo_id.startswith("NP"):
        return get_excluded_term(hpo_id, hpo_label, onset, offset)
    else:
        return get_observed_term(hpo_id, hpo_label, onset, offset)

In [12]:
patient_d = defaultdict(list)
patient_onset_d = dict()
patient_demographic_d = defaultdict(PatientRow)
for _, row in clinical_df.iterrows():
    patID = row["PatID"]
    if patID not in genotype_d:
        continue
    if patID not in patient_demographic_d:
        patient_demographic_d[patID] = PatientRow(row=row)
    hpo = row_to_hpo(row=row)
    if hpo is not None:
        if hpo.label in onset_terms:
            patient_onset_d[patID] = hpo.label
        else:
            patient_d[patID].append(hpo)
print(f"We got {len(patient_d)} patients and {len(patient_demographic_d)} demographics")
print(f"We also got {len(patient_onset_d)} onset annotations")

We got 463 patients and 463 demographics
We also got 253 onset annotations


<H2>Putting it all together</h2>

In [22]:
import pickle, os
def get_pickle_filename(name):
    """
    provide standard filenaming convention. We pickle results from VariantValidator to avoid
    calling API multiple times in different runs. For instance, the pickled file of variants for
    the SCL4A1 cohort will be called "variant_validator_cache_SLC4A1.pickle"
    """
    return f"variant_validator_cache_{name}.pickle"

def load_variant_pickle(name):
    """
    Load the pickle file. If the file cannot be found, return None. If it is found, return the
    pickled object (which in our case will be a dictionary of Variant objects).
    """
    fname = get_pickle_filename(name)
    if not os.path.isfile(fname):
        return None
    # De-serialize the object from the file
    with open(fname, "rb") as f:
        loaded_object = pickle.load(f)
        return loaded_object

def write_variant_pickle(name, my_object):
    """
    Write a dictionary to pickled file
    :param my_object: In this application, this argument will be a dictionary of Variant objects).
    """
    fname = get_pickle_filename(name)
    with open(fname, "wb") as f:
        pickle.dump(my_object, f)
v_d = load_variant_pickle("STXBP1")
if v_d is None:
    validated_var_d = dict
else:
    validated_var_d = v_d
print(f"Loaded {len(validated_var_d)} variants")

STXBP1_symbol = "STXBP1"
STXBP1_id = "HGNC:11444"
STXBP1_transcript = 'NM_001032221.6'
validator = VariantValidator(genome_build='hg38')
c = 0
for patid, gtype in genotype_d.items():
    #print(f"{patid} - {gtype.hgvs}")
    if gtype.transcript != 'NM_001032221.6' and gtype.transcript != 'NM_003165.6':
        raise ValueError(f"Unexpected transcript: {gtype.transcript}")
    total_hgvs = f"{gtype.transcript}:{gtype.hgvs}"
    if total_hgvs in validated_var_d:
        pass
    else:
        print(f"{patid}:{v}")
        v = validator.encode_hgvs(hgvs=gtype.hgvs, custom_transcript=gtype.transcript)
        validated_var_d[total_hgvs] = v
        write_variant_pickle("STXBP1", my_object=validated_var_d)

Loaded 153 variants
STX_BCH_002:NM_003165.6:c.1769C>T(chr9:127684434C>T)
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_001032221.6%3Ac.901del/NM_001032221.6?content-type=application%2Fjson
STX_BCH_004:NM_001032221.6:c.901del(chr9:127668183TC>T)
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_001032221.6%3Ac.794+1G>T/NM_001032221.6?content-type=application%2Fjson
STX_102:NM_001032221.6:c.794+1G>T(chr9:127666297G>T)
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_001032221.6%3Ac.1501_1519del/NM_001032221.6?content-type=application%2Fjson
STX_31387522_Patient_9:NM_001032221.6:c.1501_1519del(chr9:127680192CCCTTATATCTCTACCCGTT>C)
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_001032221.6%3Ac.827dup/NM_001032221.6?content-type=application%2Fjson
STX_BCH_005:NM_001032221.6:c.827dup(chr9:127668111G>GT)
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_003

In [23]:
print(f"We extracted {len(validated_var_d)} unique variants")

We extracted 261 unique variants


# Clinical classification

The authors classify the patients as 'EOEE', 'Ohtahara Syndrome', 'NDD', 'West Syndrome', 'Other DEE','Atypical Rett Syndrome'. 
OMIM lists only one STXBP1-associated disease: [Developmental and epileptic encephalopathy 4 (OMIM:612164)](https://omim.org/entry/612164), which is what we use to code the patients.


In [26]:
individual_list = []
dee4 = Disease(disease_id="OMIM:612164", disease_label="Developmental and epileptic encephalopathy 4")
for pat_id, patRow in patient_demographic_d.items():
    hpo_list = patient_d.get(pat_id)
    if hpo_list is None:
        print(f"Could not find hpo list for {pat_id}")
        continue
    if len(hpo_list) == 0:
        print(f"warning, empty HPO list for {pat_id}")
    sex = patRow.sex
    age = PyPheToolsAge.get_age(patRow.age_eval)
    gtype = genotype_d.get(pat_id)
    if gtype is None:
        print(f"Could not find genotype for {pat_id} (should never happen)")
        continue
    total_hgvs = f"{gtype.transcript}:{gtype.hgvs}"
    if total_hgvs not  in validated_var_d:
        print(f"could not find {total_hgvs}")
        continue
    variant = validated_var_d.get(total_hgvs)
    variant.set_heterozygous()
    hpo_term_list = patient_d.get(pat_id)
    phenolabel = patRow.phenogroup
    if pat_id in patient_onset_d:
        onset = patient_onset_d.get(pat_id)
        onset_term = HpoAge(onset)
        ind = Individual(individual_id=pat_id, 
                     hpo_terms=hpo_term_list, 
                     sex=sex, 
                     age_of_onset=onset_term,
                     age_at_last_encounter=age,
                     interpretation_list=[variant.to_ga4gh_variant_interpretation()],
                     disease=dee4)
    else:
        ind = Individual(individual_id=pat_id, 
                     hpo_terms=hpo_term_list, 
                     sex=sex, age_at_last_encounter=age,
                     interpretation_list=[variant.to_ga4gh_variant_interpretation()],
                     disease=dee4)
    individual_list.append(ind)
    
print(f"Created {len(individual_list)} individual objects")

Created 463 individual objects


In [27]:
cvalidator = CohortValidator(cohort=individual_list, ontology=hpo_ontology, min_hpo=1, allelic_requirement=AllelicRequirement.MONO_ALLELIC)
qc = QcVisualizer(cohort_validator=cvalidator)
display(HTML(qc.to_summary_html()))

Level,Error category,Count
WARNING,REDUNDANT,323


In [28]:
individuals = cvalidator.get_error_free_individual_list()
table = PhenopacketTable(individual_list=individuals, metadata=metadata)
display(HTML(table.to_html()))

Individual,Disease,Genotype,Phenotypic features
STX_18469812_Subject_11 (UNKNOWN; P8Y),Developmental and epileptic encephalopathy 4 (OMIM:612164),NM_001032221.6:c.251T>A (heterozygous),"Generalized tonic seizure (HP:0010818); Bilateral tonic-clonic seizure (HP:0002069); EEG with burst suppression (HP:0010851); Hypsarrhythmia (HP:0002521); Refractory (HP:0031375); Absent speech (HP:0001344); Severe muscular hypotonia (HP:0006829); Profound global developmental delay (HP:0012736); Spastic tetraplegia (HP:0002510); Choreoathetosis (HP:0001266); Frontal cortical atrophy (HP:0006913); Intellectual disability, profound (HP:0002187)"
STX_18469812_Subject_3 (UNKNOWN; P37Y),Developmental and epileptic encephalopathy 4 (OMIM:612164),NM_001032221.6:c.1631G>A (heterozygous),"Infantile spasms (HP:0012469); EEG with burst suppression (HP:0010851); Severe global developmental delay (HP:0011344); Intellectual disability, profound (HP:0002187); Absent speech (HP:0001344); Spastic diplegia (HP:0001264); Focal clonic seizure (HP:0002266); Generalized tonic seizure (HP:0010818); EEG with frontal focal spikes (HP:0012015)"
STX_18469812_Subject_6 (UNKNOWN; P6M),Developmental and epileptic encephalopathy 4 (OMIM:612164),NM_001032221.6:c.539G>A (heterozygous),EEG with burst suppression (HP:0010851); Infantile spasms (HP:0012469); Hypsarrhythmia (HP:0002521); Generalized tonic seizure (HP:0010818); Refractory (HP:0031375); Severe muscular hypotonia (HP:0006829); Profound global developmental delay (HP:0012736); Delayed CNS myelination (HP:0002188); Brain atrophy (HP:0012444); Generalized myoclonic seizure (HP:0002123)
STX_18469812_Subject_7 (UNKNOWN; P1Y1M),Developmental and epileptic encephalopathy 4 (OMIM:612164),NM_001032221.6:c.1328T>G (heterozygous),Infantile spasms (HP:0012469); Generalized tonic seizure (HP:0010818); EEG with burst suppression (HP:0010851); Hypsarrhythmia (HP:0002521); Refractory (HP:0031375); Profound global developmental delay (HP:0012736); Absent speech (HP:0001344); Severe muscular hypotonia (HP:0006829); Spastic tetraplegia (HP:0002510); Delayed CNS myelination (HP:0002188); Brain atrophy (HP:0012444)
STX_19557857_Patient_1 (UNKNOWN; P27Y),Developmental and epileptic encephalopathy 4 (OMIM:612164),NM_001032221.6:c.169+1G>A (heterozygous),"Focal impaired awareness seizure (HP:0002384); Hypotonia (HP:0001252); Tremor (HP:0001337); Hyperventilation (HP:0002883); Gait disturbance (HP:0001288); Focal clonic seizure (HP:0002266); Bilateral tonic-clonic seizure (HP:0002069); Generalized tonic seizure (HP:0010818); EEG with temporal focal spikes (HP:0012018); EEG with focal slow activity (HP:0010843); Bilateral multifocal epileptiform discharges (HP:0011189); Generalized myoclonic seizure (HP:0002123); Global developmental delay (HP:0001263); Absent speech (HP:0001344); Postural instability (HP:0002172); Drooling (HP:0002307); Intellectual disability, severe (HP:0010864)"
STX_19557857_Patient_2 (UNKNOWN; P15Y),Developmental and epileptic encephalopathy 4 (OMIM:612164),NM_001032221.6:c.1162C>T (heterozygous),"Focal impaired awareness seizure (HP:0002384); Hypotonia (HP:0001252); Tremor (HP:0001337); Focal motor seizure with version (HP:0011175); Global developmental delay (HP:0001263); Intellectual disability, severe (HP:0010864); Drooling (HP:0002307); EEG with frontal focal spikes (HP:0012015); EEG with focal slow activity (HP:0010843); Postural instability (HP:0002172); Gait disturbance (HP:0001288)"
STX_20876469_Patient_1 (UNKNOWN; P11Y),Developmental and epileptic encephalopathy 4 (OMIM:612164),NM_001032221.6:c.1434G>A (heterozygous),"Generalized myoclonic seizure (HP:0002123); EEG with focal epileptiform discharges (HP:0011185); EEG with generalized slow activity (HP:0010845); Intellectual disability, severe (HP:0010864); Ataxia (HP:0001251); Hyperactivity (HP:0000752); Focal myoclonic seizure (HP:0011166)"
STX_20876469_Patient_2 (UNKNOWN; P6Y),Developmental and epileptic encephalopathy 4 (OMIM:612164),NM_001032221.6:c.893_894del (heterozygous),"Generalized myoclonic seizure (HP:0002123); Hypsarrhythmia (HP:0002521); Generalized tonic seizure (HP:0010818); Multifocal epileptiform discharges (HP:0010841); Intellectual disability, profound (HP:0002187); Inability to walk (HP:0002540); Dyskinesia (HP:0100660); Hyperactivity (HP:0000752); Infantile spasms (HP:0012469)"
STX_20876469_Patient_3 (UNKNOWN; P11Y),Developmental and epileptic encephalopathy 4 (OMIM:612164),NM_001032221.6:c.1029+1G>T (heterozygous),"Generalized tonic seizure (HP:0010818); Focal-onset seizure (HP:0007359); Infantile spasms (HP:0012469); EEG with focal epileptiform discharges (HP:0011185); Hypsarrhythmia (HP:0002521); EEG with generalized slow activity (HP:0010845); Intellectual disability, profound (HP:0002187); Inability to walk (HP:0002540); Hyperactivity (HP:0000752)"
STX_20876469_Patient_6 (UNKNOWN; P11Y),Developmental and epileptic encephalopathy 4 (OMIM:612164),NM_001032221.6:c.429+1G>A (heterozygous),"Infantile spasms (HP:0012469); Bilateral tonic-clonic seizure with focal onset (HP:0007334); Bilateral multifocal epileptiform discharges (HP:0011189); Hypsarrhythmia (HP:0002521); Generalized tonic seizure (HP:0010818); Delayed CNS myelination (HP:0002188); EEG with generalized slow activity (HP:0010845); Intellectual disability, profound (HP:0002187); Inability to walk (HP:0002540); Axial hypotonia (HP:0008936); Head titubation (HP:0002599); Dyskinesia (HP:0100660); Cerebral visual impairment (HP:0100704); Precocious puberty in males (HP:0008185)"


<h3>Output the phenopackets to file</h3>

In [29]:
Individual.output_individuals_as_phenopackets(individual_list=individual_list,
                                              metadata=metadata)

We output 463 GA4GH phenopackets to the directory phenopackets


In [None]:
# pxf validate --hpo hp.json *.json
# no errors