<h1>landscape of STXBP1-related disorders </h1>
<p>Extract the clinical data from <a href="https://pubmed.ncbi.nlm.nih.gov/35190816/"target="__blank">Xian et al. (2022) Assessing the landscape of STXBP1-related disorders in 534 individuals. Brain.</a>.<p>
    <p>Note that although OMIM lists only one disease associated with STXBP1, the authors assign patients to phenotypic groups, that we will use here for the diagnosis.</p>

In [1]:
import phenopackets as php
from google.protobuf.json_format import MessageToDict, MessageToJson
from google.protobuf.json_format import Parse, ParseDict
import pandas as pd
import math
from csv import DictReader
pd.set_option('display.max_colwidth', None) # show entire column contents, important!
from collections import defaultdict
import re
from pyphetools.creation import *
from pyphetools.output import PhenopacketTable
# last tested with pyphetools version 0.3.0

In [2]:
parser = HpoParser()
hpo_cr = parser.get_hpo_concept_recognizer()
hpo_version = parser.get_version()
metadata = MetaData(created_by="ORCID:0000-0002-0736-9199")
metadata.default_versions_with_hpo(version=hpo_version)

In [3]:
clinical_df = pd.read_table("input/brain-2021-00642-File011.tsv");
genotype_df = pd.read_table("input/brain-2021-00642-File011-genotype.tsv");

In [4]:
clinical_df.head()

Unnamed: 0,PatID,Source_Journal,Source_PMID*,Year,Sex,Phenotypic_group**,age_onset_m,age_offset_m,age_eval_y,Base_HPO***,HPO_term,Notes
0,STX_18469812_Subject_11,Nat Genet,18469812,2008.0,M,EOEE,2.0,,8.0,HP:0003593,Infantile onset,
1,STX_18469812_Subject_11,Nat Genet,18469812,2008.0,M,EOEE,2.0,,8.0,HP:0010818,Generalized tonic seizures,
2,STX_18469812_Subject_11,Nat Genet,18469812,2008.0,M,EOEE,2.0,,8.0,HP:0002069,Generalized tonic-clonic seizures,
3,STX_18469812_Subject_11,Nat Genet,18469812,2008.0,M,EOEE,2.0,,8.0,HP:0010851,EEG with burst suppression,
4,STX_18469812_Subject_11,Nat Genet,18469812,2008.0,M,EOEE,2.0,,8.0,HP:0002521,Hypsarrhythmia,


<h2>PatientRow</h2>
<p>Parsing needs to combine the information from File011 (which has clinical data and HPO terms) and File011-genotype (which has STXBP1 variant data). We define a class called PatientRow, which contains the identifier,
sex, and age at last evaluation.</p>

In [5]:
def is_integer(n):
    try:
        float(n)
    except ValueError:
        return False
    else:
        return float(n).is_integer()
    

def is_float(n):
    try:
        fn = float(n)
        if math.isnan(fn):
            return False
        return True
    except ValueError:
        return False
    
def iso_age_from_float(n):
    age_float = float(n)
    y = math.floor(age_float)
    m = math.floor(12*(age_float - y))
    return f"P{y}Y{m}M"
    

class PatientRow:
    def __init__(self, row):
        self.patID = row["PatID"]
        self.sex = row["Sex"]
        age_eval = row["age_eval_y"]
        if is_integer(age_eval):
            y = int(age_eval)
            self.age_eval = f"P{y}Y"
        elif is_float(age_eval):
            self.age_eval = iso_age_from_float(age_eval)
        else:
            #print(f"Could not parse age {age_eval}")
            self.age_eval = None
        self.phenogroup = row["Phenotypic_group**"]
        


In [6]:
patient_d = defaultdict(list)
patient_demographic_d = defaultdict(PatientRow)
with open("input/brain-2021-00642-File011.tsv") as f:
    reader = DictReader(f, delimiter="\t")
    for row in reader:
        prow = PatientRow(row=row)
        patient_d[prow.patID].append(prow)
print(f"We extracted data on {len(patient_d)} individuals")

We extracted data on 534 individuals


<H1>Extracting genotypes</H1>
<p>Genotypes were extracted from the supplemental file brain-2021-00642-File011-genotype.tsv. Some of the indicated genotypes were not valid HGVS and were manually corrected, including entries such as
<tt>STXBP1:NM_001032221.3:exon18:c.1548_1559AT,STXBP1:NM_003165.3:exon18:c.1548_1559AT</tt> that was coded 
    as NM_001032221.3:c.1548_1559delinsAT, and the following other manual corrections.</p>
</p>
<ul>
<li><tt>NM_001032221.3:exon12:r.spl:NM_003165.3:exon12:r.spl</tt> was coded as NM_001032221.6:c.1029+1_1029+2delinsAA (PMID:31164858)</li>
    <li><tt>NM_001032221.3:exon1:r.spl:NM_003165.3:exon1:r.spl</tt>  was coded as NM_001032221.6:c.37+1_37+2del </li>
    <li><tt>NM_001032221.6:c.578+1->G</tt> was coded as NM_001032221.6:c.578+1dup</li>
    <li><tt>NM_001032221.3:exon9:c.794+2->T</tt> was coded as NM_001032221.3:c.794+2dup</li>
    <li><tt>NM_001032221.3:exon16:c.1360-1->C</tt> was coded as NM_001032221.3:c.1360-1_1360insC</li>
</ul>

In [7]:
genotype_df.head()

Unnamed: 0,PatID,Chr,Start,End,Ref,Alt,Func.refGeneWithVer,Gene.refGeneWithVer,GeneDetail.refGeneWithVer,ExonicFunc.refGeneWithVer,...,Otherinfo3,Otherinfo4,Otherinfo5,Otherinfo7,Otherinfo8,Otherinfo9,Otherinfo10,Otherinfo11,bed,Unnamed: 134
0,STX_18469812_Subject_11,9.0,130422313.0,130422313.0,T,A,exonic,STXBP1,.,nonsynonymous SNV,...,.,9,130422313,T,A,.,PASS,.,Name=70.695764689,
1,STX_18469812_Subject_3,9.0,130444768.0,130444768.0,G,A,exonic,STXBP1,.,nonsynonymous SNV,...,.,9,130444768,G,A,.,PASS,.,Name=75.862552050,
2,STX_18469812_Subject_6,9.0,130425593.0,130425593.0,G,A,exonic,STXBP1,.,nonsynonymous SNV,...,.,9,130425593,G,A,.,PASS,.,Name=89.193399398,
3,STX_18469812_Subject_7,9.0,130439001.0,130439001.0,T,G,exonic,STXBP1,.,nonsynonymous SNV,...,.,9,130439001,T,G,.,PASS,.,Name=20.833326089,
4,STX_19557857_Patient_1,9.0,130416076.0,130416076.0,G,A,splicing,STXBP1,NM_001032221.3:exon3:c.169+1G>A:NM_003165.3:exon3:c.169+1G>A,.,...,.,9,130416076,G,A,.,PASS,.,.,


In [8]:
def extract_var_inf(aachange):
    """
    aachange: e.g., STXBP1:NM_001032221.3:exon6:c.T353G:p.L118R,STXBP1:NM_003165.3:exon6:c.T353G:p.L118R
    """
    fields = aachange.split(":")
    index = 0
    i = 0
    transcript = "?"
    for f in fields:
        if f == "NM_001032221.3":
            index = i
            transcript = "NM_001032221.6"
            break
        elif f == "NM_003165.3":
            index = i
            transcript = "NM_003165.6"
            break
        i += 1
    if (i + 2) < len(fields):
        # note some entries are like this - c.1548_1559delinsAT,STXBP1
        variant = fields[i+2].split(",")[0]
    else:
        raise ValueError(f"Could not get variant because of fields: {fields} and i={i}")   
    return transcript, variant

def extract_splice_var(genedetail):
    """
    genedetail: e.g., NM_001032221.3:exon3:c.169+1G>A;NM_003165.3:exon3:c.169+1G>A
    """
    fields = genedetail.split(":")
    index = 0
    i = 0
    transcript = "?"
    # get the right field and update version to current
    for f in fields:
        if f == "NM_001032221.3":
            index = i
            transcript = "NM_001032221.6"
            break
        elif f == "NM_003165.3":
            index = i
            transcript = "NM_003165.6"
            break
        i += 1
    if (i + 2) < len(fields):
        variant = fields[i+2]
    else:
        raise ValueError(f"Could not get variant because of fields: {fields} and i={i} from genedetail \"{genedetail}\"")   
    return transcript, variant


class GenotypeEntry:
    def __init__(self, row):
        self.patID = row["PatID"]
        self.chrom = row["Chr"]
        self.start = row["Start"]
        self.end = row["End"]
        self.ref = row["Ref"]
        self.alt = row["Alt"]
        self.transcript = "?"
        func = row["Func.refGeneWithVer"]
        self.category = func
        genenot = row["Gene.refGeneWithVer"]
        aachange = row["AAChange.refGeneWithVer"]
        if func == "exonic":
            transcript, variant = extract_var_inf(aachange)
            self.transcript = transcript
            regex_del = r"c.\d+_\d+del"
            regex_single_nt_del = r"(c.\d+del)[ACGT]"
            regex_dup = r"c.(\d+)dup([A-Z]+)"
            regex_sub = r"c.([A-Z]+)(\d+)([A-Z]+)"
            regex_ins = r"c.(\d+)_(\d+)ins([A-Z]+)"  # e.g., 1372_1373insGCCGGAGCAA
            regex_delins = r"(c.\d+_\d+delins[A-Z]+)"
            result = re.search(regex_sub, variant)
            result_dup = re.search(regex_dup, variant)
            result_single_nt_del = re.search(regex_single_nt_del, variant)
            result_ins = re.search(regex_ins, variant)
            result_delins = re.search(regex_delins, variant)
            if re.match(regex_del, variant):
                self.hgvs = variant
            elif result:
                ref = result.group(1)
                position = result.group(2)
                alt = result.group(3)
                hgvs = f"c.{position}{ref}>{alt}"
                self.hgvs = hgvs
            elif result_dup:
                position=result_dup.group(1)
                hgvs = f"c.{position}dup"
                self.hgvs = hgvs
            elif result_single_nt_del:
                self.hgvs = result_single_nt_del.group(1)
            elif result_delins:
                self.hgvs = result_delins.group(1)
            elif result_ins:
                pos1 = result_ins.group(1)
                pos2 = result_ins.group(2)
                seq = result_ins.group(3)
                self.hgvs = f"c.{pos1}_{pos2}ins{seq}"
            else:
                raise ValueError(f"Could not parse variant {variant}")
        elif func == 'splicing':
            geneDetail = row["GeneDetail.refGeneWithVer"]
            transcript, variant = extract_splice_var(geneDetail)
            self.transcript = transcript
            self.hgvs = variant
        elif func == "NA":
            pass
        elif func == "intronic":
            pass  
        else:
            print(f"{self.patID}---function {func}")
            raise ValueError(f"Could not parse variant  for func {func}\n{row}")
                    

In [9]:
na_genotype = 0
intronic_genotype = 0
genotype_d = defaultdict(GenotypeEntry)
with open("input/brain-2021-00642-File011-genotype.tsv") as f:
    reader = DictReader(f, delimiter="\t")
    for row in reader:
        ge = GenotypeEntry(row=row)
        if ge.category == "intronic":
            intronic_genotype = intronic_genotype + 1
        elif ge.category == "NA":
            na_genotype = na_genotype +1
        else:
            patient_id = ge.patID
            genotype_d[patient_id] = ge
print(f"We got {len(genotype_d)} usable genotypes")
print(f"We got {na_genotype} NAs, and {intronic_genotype} intronic genotypes - both were skipped")  

We got 463 usable genotypes
We got 46 NAs, and 25 intronic genotypes - both were skipped


<h2>Extracting Hpo Terms</h2>
<p>The data for one patient is distributed across multiple rows of the input Excel file. In the Excel file,
excluded terms are coded with NP:0001234 instead of HP:0001234. We record such terms as excluded in the Phenopacket.</p>

In [10]:
def row_to_hpo(row):
    """Transform a row of the dataframe to an HPO term
    """
    try: 
        age_onset_m = int(row["age_onset_m"])
        onset = f"P{age_onset_m}M"
    except:
        onset = None
    try:
        age_offset_m = int(row["age_offset_m"])
        offset = f"P{age_offset_m}M"
    except:
        offset = None
    hpo_id = row["Base_HPO***"]
    hpo_label = row["HPO_term"]
    # excluded terms are coded with NP:0001234 instead of HP:0001234
    if hpo_id.startswith("NP"):
        excluded = True
        hpo_id = "H" + hpo_id[1:]
        return HpTerm(hpo_id=hpo_id, label=hpo_label, observed=False, onset=onset, resolution=offset) 
    else:
        return HpTerm(hpo_id=hpo_id, label=hpo_label, onset=onset, resolution=offset) 

In [11]:
patient_d = defaultdict(list)
patient_demographic_d = defaultdict(PatientRow)
for _, row in clinical_df.iterrows():
    patID = row["PatID"]
    if patID not in genotype_d:
        continue
    if patID not in patient_demographic_d:
        patient_demographic_d[patID] = PatientRow(row=row)
    hpo = row_to_hpo(row=row)
    patient_d[patID].append(hpo)
print(f"We got {len(patient_d)} patients and {len(patient_demographic_d)} demographics")

We got 463 patients and 463 demographics


<H2>Putting it all together</h2>

In [12]:
disease_label = "Developmental and epileptic encephalopathy 4"
disease_id = "OMIM:612164"

In [13]:
validator = VariantValidator(genome_build='hg38')
validated_var_d = defaultdict()
c = 0
for patid, gtype in genotype_d.items():
    #print(f"{patid} - {gtype.hgvs}")
    if gtype.transcript != 'NM_001032221.6' and gtype.transcript != 'NM_003165.6':
        raise ValueError(f"Unexpected transcript: {gtype.transcript}")
    total_hgvs = f"{gtype.transcript}:{gtype.hgvs}"
    if total_hgvs in validated_var_d:
        pass
    else:
        print(f"{patid}:{total_hgvs}")
        v = validator.encode_hgvs(hgvs=gtype.hgvs, custom_transcript=gtype.transcript)
        print(v)
        validated_var_d[total_hgvs] = v

STX_18469812_Subject_11:NM_001032221.6:c.251T>A
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_001032221.6%3Ac.251T>A/NM_001032221.6?content-type=application%2Fjson
chr9:127660034T>A
STX_18469812_Subject_3:NM_001032221.6:c.1631G>A
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_001032221.6%3Ac.1631G>A/NM_001032221.6?content-type=application%2Fjson
chr9:127682489G>A
STX_18469812_Subject_6:NM_001032221.6:c.539G>A
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_001032221.6%3Ac.539G>A/NM_001032221.6?content-type=application%2Fjson
chr9:127663314G>A
STX_18469812_Subject_7:NM_001032221.6:c.1328T>G
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_001032221.6%3Ac.1328T>G/NM_001032221.6?content-type=application%2Fjson
chr9:127676722T>G
STX_19557857_Patient_1:NM_001032221.6:c.169+1G>A
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_001032221.6%3Ac.169+1G>A/N

chr9:127665253C>G
STX_24189369_Patient_7:NM_001032221.6:c.1130dup
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_001032221.6%3Ac.1130dup/NM_001032221.6?content-type=application%2Fjson
chr9:127675822G>GA
STX_24781210_Patient_9:NM_001032221.6:c.875G>A
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_001032221.6%3Ac.875G>A/NM_001032221.6?content-type=application%2Fjson
chr9:127668160G>A
STX_25131622_Subject_71:NM_001032221.6:c.795-1G>A
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_001032221.6%3Ac.795-1G>A/NM_001032221.6?content-type=application%2Fjson
chr9:127668079G>A
STX_23533165_Patient_1:NM_001032221.6:c.1462-2A>T
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_001032221.6%3Ac.1462-2A>T/NM_001032221.6?content-type=application%2Fjson
chr9:127680155A>T
STX_23533165_Patient_2:NM_001032221.6:c.444del
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_0

chr9:127672109T>C
STX_26865513_Patient_11:NM_001032221.6:c.1438C>T
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_001032221.6%3Ac.1438C>T/NM_001032221.6?content-type=application%2Fjson
chr9:127678509C>T
STX_26865513_Patient_13:NM_001032221.6:c.1359+1G>A
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_001032221.6%3Ac.1359+1G>A/NM_001032221.6?content-type=application%2Fjson
chr9:127676754G>A
STX_26865513_Patient_16:NM_001032221.6:c.430-1G>C
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_001032221.6%3Ac.430-1G>C/NM_001032221.6?content-type=application%2Fjson
chr9:127663204G>C
STX_26865513_Patient_19:NM_003165.6:c.1723C>T
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_003165.6%3Ac.1723C>T/NM_003165.6?content-type=application%2Fjson
chr9:127684388C>T
STX_26865513_Patient_20:NM_001032221.6:c.1548_1559delinsAT
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg

chr9:127669899G>T
STX_29896790_P1:NM_001032221.6:c.1157del
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_001032221.6%3Ac.1157del/NM_001032221.6?content-type=application%2Fjson
chr9:127675847AC>A
STX_29896790_P2:NM_001032221.6:c.1030-1G>A
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_001032221.6%3Ac.1030-1G>A/NM_001032221.6?content-type=application%2Fjson
chr9:127673180G>A
STX_29896790_P3:NM_001032221.6:c.217G>C
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_001032221.6%3Ac.217G>C/NM_001032221.6?content-type=application%2Fjson
chr9:127658422G>C
STX_29896790_P4:NM_001032221.6:c.268G>C
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_001032221.6%3Ac.268G>C/NM_001032221.6?content-type=application%2Fjson
chr9:127660051G>C
STX_29896790_P7:NM_001032221.6:c.1482dup
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_001032221.6%3Ac.1482dup/NM_001032221.6?c

chr9:127680192CCCTTATATCTCTACCCGTT>C
STX_31387522_Patient_9:NM_001032221.6:c.827dup
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_001032221.6%3Ac.827dup/NM_001032221.6?content-type=application%2Fjson
chr9:127668111G>GT
STX_BCH_005:NM_003165.6:c.1708A>G
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_003165.6%3Ac.1708A>G/NM_003165.6?content-type=application%2Fjson
chr9:127684373A>G
STX_BCH_006:NM_001032221.6:c.464del
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_001032221.6%3Ac.464del/NM_001032221.6?content-type=application%2Fjson
chr9:127663238AG>A
STX_BCH_008:NM_001032221.6:c.847G>A
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_001032221.6%3Ac.847G>A/NM_001032221.6?content-type=application%2Fjson
chr9:127668132G>A
STX_BCH_009:NM_001032221.6:c.430-1G>T
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_001032221.6%3Ac.430-1G>T/NM_001032221.6?con

chr9:127675942G>A
STX_G3_P13:NM_001032221.6:c.1495_1497del
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_001032221.6%3Ac.1495_1497del/NM_001032221.6?content-type=application%2Fjson
chr9:127680187CACT>C
STX_G3_P19:NM_001032221.6:c.548T>C
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_001032221.6%3Ac.548T>C/NM_001032221.6?content-type=application%2Fjson
chr9:127663323T>C
STX_G3_P21:NM_001032221.6:c.1372_1373insGCCGGAGCAA
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_001032221.6%3Ac.1372_1373insGCCGGAGCAA/NM_001032221.6?content-type=application%2Fjson
chr9:127678443C>CGCCGGAGCAA
STX_G3_P25:NM_001032221.6:c.1627G>A
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_001032221.6%3Ac.1627G>A/NM_001032221.6?content-type=application%2Fjson
chr9:127682485G>A
STX_G3_P26:NM_001032221.6:c.795-1G>C
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_001032221.6%3A

chr9:127660051G>T
STX_SP_E:NM_001032221.6:c.725del
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_001032221.6%3Ac.725del/NM_001032221.6?content-type=application%2Fjson
chr9:127666223TC>T
STX_Str_Ca_SD:NM_001032221.6:c.1218C>A
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_001032221.6%3Ac.1218C>A/NM_001032221.6?content-type=application%2Fjson
chr9:127675911C>A
STX_STXBP1adult_Pt11:NM_001032221.6:c.1282del
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_001032221.6%3Ac.1282del/NM_001032221.6?content-type=application%2Fjson
chr9:127676674TC>T
STX_STXBP1adult_Pt3:NM_001032221.6:c.701A>G
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_001032221.6%3Ac.701A>G/NM_001032221.6?content-type=application%2Fjson
chr9:127666203A>G
STX_STXBP1adult_Pt4:NM_001032221.6:c.360dup
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_001032221.6%3Ac.360dup/NM_001032221.6?co

In [14]:
print(f"We extracted {len(validated_var_d)} unique variants")

We extracted 261 unique variants


In [15]:
individual_list = []
for pat_id, patRow in patient_demographic_d.items():
    hpo_list = patient_d.get(pat_id)
    if hpo_list is None:
        print(f"Could not find hpo list for {pat_id}")
        continue
    if len(hpo_list) == 0:
        print(f"warning, empty HPO list for {pat_id}")
    sex = patRow.sex
    age = patRow.age_eval
    gtype = genotype_d.get(pat_id)
    if gtype is None:
        print(f"Could not find genotype for {pat_id} (should never happen)")
        continue
    total_hgvs = f"{gtype.transcript}:{gtype.hgvs}"
    if total_hgvs not  in validated_var_d:
        print(f"could not find {total_hgvs}")
        continue
    variant = validated_var_d.get(total_hgvs)
    variant.set_heterozygous()
    hpo_term_list = patient_d.get(pat_id)
    phenolabel = patRow.phenogroup
    pheno_id = f"CUSTOM:{phenolabel}"
    ind = Individual(individual_id=pat_id, hpo_terms=hpo_term_list, sex=sex, age=age,variant_list=[variant], 
                   disease_id=pheno_id, disease_label=phenolabel )
    individual_list.append(ind)
    
print(f"Created {len(individual_list)} individual objects")

Created 463 individual objects


In [16]:
i1 = individual_list[10]
phenopacket1 = i1.to_ga4gh_phenopacket(metadata=metadata.to_ga4gh())
json_string = MessageToJson(phenopacket1)
print(json_string)

{
  "id": "STX_20887364_Subject_1655",
  "subject": {
    "id": "STX_20887364_Subject_1655",
    "timeAtLastEncounter": {
      "age": {
        "iso8601duration": "P0Y6M"
      }
    }
  },
  "phenotypicFeatures": [
    {
      "type": {
        "id": "HP:0003593",
        "label": "Infantile onset"
      },
      "onset": {
        "age": {
          "iso8601duration": "P0M"
        }
      },
      "resolution": {
        "age": {
          "iso8601duration": "P6M"
        }
      }
    },
    {
      "type": {
        "id": "HP:0012469",
        "label": "Infantile spasms"
      },
      "onset": {
        "age": {
          "iso8601duration": "P0M"
        }
      },
      "resolution": {
        "age": {
          "iso8601duration": "P6M"
        }
      }
    },
    {
      "type": {
        "id": "HP:0010851",
        "label": "EEG with burst suppression"
      },
      "onset": {
        "age": {
          "iso8601duration": "P0M"
        }
      },
      "resolution": {
     

In [17]:
ppacket_list = [i.to_ga4gh_phenopacket(metadata=metadata.to_ga4gh()) for i in individual_list]

In [18]:
table = PhenopacketTable(phenopacket_list=ppacket_list)

In [19]:
from IPython.display import display, HTML
display(HTML(table.to_html()))

Individual,Genotype,Phenotypic features
STX_18469812_Subject_11 (UNKNOWN; P8Y),NM_001032221.6:c.251T>A (heterozygous),"Infantile onset (HP:0003593); Generalized tonic seizures (HP:0010818); Generalized tonic-clonic seizures (HP:0002069); EEG with burst suppression (HP:0010851); Hypsarrhythmia (HP:0002521); Refractory (HP:0031375); Absent speech (HP:0001344); Severe muscular hypotonia (HP:0006829); Profound global developmental delay (HP:0012736); Spastic tetraplegia (HP:0002510); Choreoathetosis (HP:0001266); Frontal cortical atrophy (HP:0006913); Intellectual disability, profound (HP:0002187)"
STX_18469812_Subject_3 (UNKNOWN; P37Y),NM_001032221.6:c.1631G>A (heterozygous),"Neonatal onset (HP:0003623); Infantile spasms (HP:0012469); EEG with burst suppression (HP:0010851); Severe global developmental delay (HP:0011344); Intellectual disability, profound (HP:0002187); Absent speech (HP:0001344); Spastic diplegia (HP:0001264); Focal clonic seizures (HP:0002266); Generalized tonic seizures (HP:0010818); EEG with frontal focal spikes (HP:0012015)"
STX_18469812_Subject_6 (UNKNOWN; P0Y6M),NM_001032221.6:c.539G>A (heterozygous),Neonatal onset (HP:0003623); EEG with burst suppression (HP:0010851); Infantile spasms (HP:0012469); Hypsarrhythmia (HP:0002521); Generalized tonic seizures (HP:0010818); Refractory (HP:0031375); Severe muscular hypotonia (HP:0006829); Profound global developmental delay (HP:0012736); Delayed CNS myelination (HP:0002188); Brain atrophy (HP:0012444); Generalized myoclonic seizures (HP:0002123)
STX_18469812_Subject_7 (UNKNOWN; P1Y1M),NM_001032221.6:c.1328T>G (heterozygous),Infantile onset (HP:0003593); Infantile spasms (HP:0012469); Generalized tonic seizures (HP:0010818); EEG with burst suppression (HP:0010851); Hypsarrhythmia (HP:0002521); Refractory (HP:0031375); Profound global developmental delay (HP:0012736); Absent speech (HP:0001344); Severe muscular hypotonia (HP:0006829); Spastic tetraplegia (HP:0002510); Delayed CNS myelination (HP:0002188); Brain atrophy (HP:0012444)
STX_19557857_Patient_1 (UNKNOWN; P27Y),NM_001032221.6:c.169+1G>A (heterozygous),"Infantile onset (HP:0003593); Focal impaired awareness seizure (HP:0002384); EEG with focal epileptiform discharges (HP:0011185); Muscular hypotonia (HP:0001252); Tremor (HP:0001337); Hyperventilation (HP:0002883); Gait disturbance (HP:0001288); Focal clonic seizures (HP:0002266); Generalized tonic-clonic seizures (HP:0002069); Generalized tonic seizures (HP:0010818); EEG with temporal focal spikes (HP:0012018); EEG with focal slow activity (HP:0010843); Bilateral multifocal epileptiform discharges (HP:0011189); Generalized myoclonic seizures (HP:0002123); Global developmental delay (HP:0001263); Absent speech (HP:0001344); Postural instability (HP:0002172); Drooling (HP:0002307); Intellectual disability, severe (HP:0010864)"
STX_19557857_Patient_2 (UNKNOWN; P15Y),NM_001032221.6:c.1162C>T (heterozygous),"Childhood onset (HP:0011463); Focal impaired awareness seizure (HP:0002384); EEG with focal epileptiform discharges (HP:0011185); Muscular hypotonia (HP:0001252); Tremor (HP:0001337); Versive seizures (HP:0011175); Focal motor seizure (HP:0011153); Global developmental delay (HP:0001263); Intellectual disability, severe (HP:0010864); Drooling (HP:0002307); EEG with frontal focal spikes (HP:0012015); EEG with focal slow activity (HP:0010843); Postural instability (HP:0002172); Gait disturbance (HP:0001288)"
STX_20876469_Patient_1 (UNKNOWN; P11Y),NM_001032221.6:c.1434G>A (heterozygous),"Generalized myoclonic seizures (HP:0002123); Infantile onset (HP:0003593); EEG with focal epileptiform discharges (HP:0011185); EEG with generalized slow activity (HP:0010845); Intellectual disability, severe (HP:0010864); Ataxia (HP:0001251); Hyperactivity (HP:0000752); Focal myoclonic seizures (HP:0011166)"
STX_20876469_Patient_2 (UNKNOWN; P6Y),NM_001032221.6:c.893_894del (heterozygous),"Generalized myoclonic seizures (HP:0002123); Infantile onset (HP:0003593); EEG with focal epileptiform discharges (HP:0011185); Hypsarrhythmia (HP:0002521); Generalized tonic seizures (HP:0010818); Multifocal epileptiform discharges (HP:0010841); Intellectual disability, profound (HP:0002187); Inability to walk (HP:0002540); Dyskinesia (HP:0100660); Hyperactivity (HP:0000752); Infantile spasms (HP:0012469)"
STX_20876469_Patient_3 (UNKNOWN; P11Y),NM_001032221.6:c.1029+1G>T (heterozygous),"Generalized tonic seizures (HP:0010818); Neonatal onset (HP:0003623); Focal-onset seizure (HP:0007359); Infantile spasms (HP:0012469); EEG with focal epileptiform discharges (HP:0011185); Hypsarrhythmia (HP:0002521); EEG with generalized slow activity (HP:0010845); Intellectual disability, profound (HP:0002187); Inability to walk (HP:0002540); Hyperactivity (HP:0000752)"
STX_20876469_Patient_6 (UNKNOWN; P11Y),NM_001032221.6:c.429+1G>A (heterozygous),"Infantile onset (HP:0003593); Infantile spasms (HP:0012469); Generalized tonic-clonic seizures with focal onset (HP:0007334); Bilateral multifocal epileptiform discharges (HP:0011189); Hypsarrhythmia (HP:0002521); Generalized tonic seizures (HP:0010818); Delayed CNS myelination (HP:0002188); EEG with generalized slow activity (HP:0010845); Intellectual disability, profound (HP:0002187); Inability to walk (HP:0002540); Muscular hypotonia of the trunk (HP:0008936); Head titubation (HP:0002599); Dyskinesia (HP:0100660); Cerebral visual impairment (HP:0100704); Precocious puberty in males (HP:0008185)"


<h3>Output the phenopackets to file</h3>

In [20]:
Individual.output_individuals_as_phenopackets(individual_list=individual_list,metadata=metadata.to_ga4gh())

463