# GCSH

Data from [Arribas-Carreira L, et al. (2023) Pathogenic variants in GCSH encoding the moonlighting H-protein cause combined nonketotic hyperglycinemia and lipoate deficiency. Hum Mol Genet](https://pubmed.ncbi.nlm.nih.gov/36190515/)

In [1]:
import pandas as pd
pd.set_option('display.max_colwidth', None) # show entire column contents, important!
from IPython.display import HTML, display
from pyphetools.creation import *
from pyphetools.visualization import *
from pyphetools.validation import *
import pyphetools
print(f"pyphetools version {pyphetools.__version__}")

pyphetools version 0.9.53


In [2]:
PMID = "PMID:36190515"
title = "Pathogenic variants in GCSH encoding the moonlighting H-protein cause combined nonketotic hyperglycinemia and lipoate deficiency"
cite = Citation(pmid=PMID, title=title)
parser = HpoParser(hpo_json_file="../hp.json")
hpo_cr = parser.get_hpo_concept_recognizer()
hpo_version = parser.get_version()
hpo_ontology = parser.get_ontology()
metadata = MetaData(created_by="ORCID:0000-0002-5648-2155", citation=cite)
metadata.default_versions_with_hpo(version=hpo_version)
print(f"HPO version {hpo_version}")

HPO version 2024-02-27


In [3]:
df = pd.read_excel('input/Arribas-Carreira-GCSH-2023.xlsx')

In [4]:
df.head()

Unnamed: 0,Patient,Patient 1,Patient 2,Patient 3,Patient 4,Patient 5,Patient 6
0,Origin,Spain,Denmark,USA,UK-USA,Germany,France
1,Consanguinity,No,No,No,Yes,Yes,Yes
2,Sex,Female,Male,Female,Female,Male,Male
3,Pregnancy,Unremarkable,Unremarkable,Hyperemesis (HP:0012188),Unremarkable,Unremarkable,Unremarkable
4,Symptom onset,PD1,PD3,P2M,P6M,P4D,P3M


In [5]:
dft = df.transpose()
dft.columns = dft.iloc[0]
dft.drop(dft.index[0], inplace=True)
dft['individual_id'] = dft.index  # Set the new column 'patient_id' to be identical to the contents of the index
dft.head(2)

Patient,Origin,Consanguinity,Sex,Pregnancy,Symptom onset,Initial symptoms,Main symptoms,Other symptoms,EEG,Outcome,...,CSF glycine μM (Ref 2–10),CSF/plasma glycine ratio (Ref <0.02),Brain MRI,Corpus callosum,MRS,Variant 1,Variant effect 1,Variant 2,Variant effect 2,individual_id
Patient 1,Spain,No,Female,Unremarkable,PD1,"Lethargy (HP:0001254), Hypotonia (HP:0001252)","Coma, Apnea (HP:0002104)","Liver dysfunction (HP:0001410), thrombocytopenia (HP:0001873)",Burst suppression (HP:0010851),Deceased day 18,...,170,0.12,"Lesions middle and anterior cerebral artery distribution, caudate, thalami",Mild hypoplasia,,c.170A>G,p.His57Arg,c.148?_228?del,Deletion exon 2,Patient 1
Patient 2,Denmark,No,Male,Unremarkable,PD3,"Lethargy (HP:0001254), hypotonia (HP:0001252), hypoglycemia (HP:0001943)","Coma, hiccups (HP:0100247), hypotonia (HP:0001252), no reflexes (HP:0001284), myoclonias (HP:0003794), ventilation since day 4 (HP:0005946)",,"Burst suppression (HP:0010851), paroxysmal epileptic",Care withdrawal and demise day 7,...,288,0.2,"Diffusion restriction in PLIC, corticospinal tract, central tegmental tract and cerebellar white matter. Also, diffusion restriction anterior medial thalami","Thin corpus callosum (HP:0033725), low normal length",,c.1A>G,p.Met1?,c.293-2_293-1insT,p.(Asp98_Asp141del) Splicing IVS3 defect,Patient 2


In [6]:
#res = OptionColumnMapper.autoformat(df=dft, hpo_cr=hpo_cr)
column_mapper_list = list()
#print(res)

In [7]:
initial_d = {'Lethargy (HP:0001254)': 'Lethargy',
 'Hypotonia (HP:0001252)': 'Hypotonia',
 'hypotonia (HP:0001252)': 'Hypotonia',
 'hypoglycemia (HP:0001943)': 'Hypoglycemia',
 'Exaggerated startles (HP:0002267)': 'Exaggerated startle response',
 'infantile spasms (HP:0012469)': 'Infantile spasms',
 'partial seizures (HP:0007359)': 'Focal-onset seizure',
 #'Loss of skills following immunization (trigger by immunization HP:0025219)': 'PLACEHOLDER',
 'developmental delays (HP:0001263)': 'Global developmental delay',
 'apneic (HP:0002104)': 'Apnea',
 'pallor (HP:0000980) and cyanosis (HP:0000961)': 'Cyanosis',
 'requiring intubation and ventilation for 2\xa0weeks(HP:0004887)': 'Respiratory failure requiring assisted ventilation',
 'Left partial seizures': 'Focal-onset seizure'}
excluded = {}
initialMapper = OptionColumnMapper(column_name="Initial symptoms", concept_recognizer=hpo_cr, option_d=initial_d, excluded_d=excluded)
column_mapper_list.append(initialMapper)
initialMapper.preview_column(dft)

Unnamed: 0,mapping,count
0,Lethargy (HP:0001254) (observed),2
1,Hypotonia (HP:0001252) (observed),3
2,Hypoglycemia (HP:0001943) (observed),1
3,Exaggerated startle response (HP:0002267) (observed),1
4,Infantile spasms (HP:0012469) (observed),1
5,Focal-onset seizure (HP:0007359) (observed),2
6,Global developmental delay (HP:0001263) (observed),1
7,Apnea (HP:0002104) (observed),1
8,Cyanosis (HP:0000961) (observed),1
9,Respiratory failure requiring assisted ventilation (HP:0004887) (observed),1


In [8]:
main_d = {'Coma': 'Coma',
 'Apnea (HP:0002104)': 'Apnea',
 'hiccups (HP:0100247)': 'Recurrent singultus',
 'hypotonia (HP:0001252)': 'Hypotonia',
 'no reflexes (HP:0001284)': 'Areflexia',
 'myoclonias (HP:0003794)': 'Myoclonus',
 'ventilation since day 4 (HP:0005946)': 'Respiratory failure requiring assisted ventilation',
 'Epilepsy (HP:0001250)': 'Seizure',
 'progressive developmental delay (HP:0001263)': 'Global developmental delay',
 'Developmental delays (HP:0001263)': 'Global developmental delay',
 'hyperactivity (HP:0000752)': 'Hyperactivity',
 'impulsivity (HP:0100710)': 'Impulsivity',
 'rare hallucinations (HP:0000738)': 'Hallucinations',
 'limited communication': 'Global developmental delay',
 'Since age 3\xa0months: developmental delays (HP:0001263)': 'Global developmental delay',
 'dystonic movements (HP:0001332)': 'Dystonia',
 '6\xa0months: infantile spasms': 'Infantile spasms',
 'seizures controlled with benzoate and antiepileptics': 'Seizure'}
excluded = {}
mainMapper = OptionColumnMapper(column_name="Main symptoms", concept_recognizer=hpo_cr, option_d=main_d, excluded_d=excluded)
column_mapper_list.append(mainMapper)
mainMapper.preview_column(dft)


Unnamed: 0,mapping,count
0,Coma (HP:0001259) (observed),2
1,Apnea (HP:0002104) (observed),1
2,Recurrent singultus (HP:0100247) (observed),1
3,Hypotonia (HP:0001252) (observed),1
4,Areflexia (HP:0001284) (observed),1
5,Myoclonus (HP:0001336) (observed),1
6,Respiratory failure requiring assisted ventilation (HP:0004887) (observed),1
7,Seizure (HP:0001250) (observed),2
8,Global developmental delay (HP:0001263) (observed),4
9,Hyperactivity (HP:0000752) (observed),1


In [9]:
other_symptoms_d = {'Liver dysfunction (HP:0001410)': 'Decreased liver function',
 'thrombocytopenia (HP:0001873)': 'Thrombocytopenia',
 'Poor feeding (HP:0011968) requiring gastrostomy': 'Feeding difficulties'}
excluded = {}
other_symptomsMapper = OptionColumnMapper(column_name="Other symptoms", concept_recognizer=hpo_cr, option_d=other_symptoms_d, excluded_d=excluded)
column_mapper_list.append(other_symptomsMapper)
other_symptomsMapper.preview_column(dft)

Unnamed: 0,mapping,count
0,Decreased liver function (HP:0001410) (observed),1
1,Thrombocytopenia (HP:0001873) (observed),1
2,Feeding difficulties (HP:0011968) (observed),1


In [10]:
eeg_d = {'Burst suppression (HP:0010851)': 'EEG with burst suppression',
 'paroxysmal epileptic': 'Seizure',
 'Hypsarrhythmia (HP:0002521) with periodic attenuation': 'Hypsarrhythmia',
 'multifocal epilepsy (HP:0031165)': 'Multifocal seizures',}
eegMapper = OptionColumnMapper(column_name="EEG", concept_recognizer=hpo_cr, option_d=eeg_d, excluded_d=excluded)
column_mapper_list.append(eegMapper)
eegMapper.preview_column(dft)

Unnamed: 0,mapping,count
0,EEG with burst suppression (HP:0010851) (observed),2
1,Seizure (HP:0001250) (observed),3
2,Hypsarrhythmia (HP:0002521) (observed),1
3,Multifocal seizures (HP:0031165) (observed),1


In [11]:
outcome_d = {
 'developmental delays (DQ 70)': 'Global developmental delay',
 'dystonic movements (HP:0001332)': 'Dystonia',
 'truncal hypotonia (HP:0008936)': 'Axial hypotonia',
 'proximal limb muscle stiffness and hyperreflexia (HP:0002191)': 'Hyperreflexia',
 'hyperactivity(HP:0011171) (HP:0000752)': 'Hyperactivity',
 'irritability (HP:0000737)': 'Irritability',
 'developmental delays': 'Global developmental delay',
 '3\xa0years: hyperactivity (HP:0000752)': 'Hyperactivity',
 'agitation (HP:0000713)': 'Agitation',
 'DQ ±55': 'Global developmental delay',
 'hyperactive (HP:0000752)': 'Hyperactivity',
 'autist form (HP:0000729)': 'Autistic behavior'}
excluded = {}
outcomeMapper = OptionColumnMapper(column_name="Outcome", concept_recognizer=hpo_cr, option_d=outcome_d, excluded_d=excluded)
column_mapper_list.append(outcomeMapper)
outcomeMapper.preview_column(dft)

Unnamed: 0,mapping,count
0,Global developmental delay (HP:0001263) (observed),3
1,Dystonia (HP:0001332) (observed),1
2,Axial hypotonia (HP:0008936) (observed),1
3,Hyperreflexia (HP:0001347) (observed),1
4,Hyperactivity (HP:0000752) (observed),3
5,Irritability (HP:0000737) (observed),1
6,Seizure (HP:0001250) (observed),2
7,Agitation (HP:0000713) (observed),1
8,Autistic behavior (HP:0000729) (observed),1


In [12]:
glycine_d = {'1406': 'Hyperglycemia',
 '1461': 'Hyperglycemia',
 '715': 'Hyperglycemia',
 '812': 'Hyperglycemia',
 '1381': 'Hyperglycemia'}
excluded = {}
glycineMapper = OptionColumnMapper(column_name="Plasma glycine μM (Ref 77–376)", concept_recognizer=hpo_cr, option_d=glycine_d, excluded_d=excluded)
column_mapper_list.append(glycineMapper)
glycineMapper.preview_column(dft)

Unnamed: 0,mapping,count
0,Hyperglycemia (HP:0003074) (observed),5


In [13]:
csf_glycine_d = {'170': 'Increased CSF glycine concentration',
 '288': 'Increased CSF glycine concentration',
 '65': 'Increased CSF glycine concentration',
 '88': 'Increased CSF glycine concentration'}
excluded = {}
csf_glycineMapper = OptionColumnMapper(column_name="CSF glycine μM (Ref 2–10)", concept_recognizer=hpo_cr, option_d=csf_glycine_d, excluded_d=excluded)
column_mapper_list.append(csf_glycineMapper)
csf_glycineMapper.preview_column(dft)

Unnamed: 0,mapping,count
0,Increased CSF glycine concentration (HP:0500230) (observed),5


In [14]:
brain_mri_d = {
 'Diffusion restriction in PLIC': 'Abnormal diffusion weighted cerebral MRI morphology',
 'diffusion restriction anterior medial thalami': 'Abnormal diffusion weighted cerebral MRI morphology',
 'Day 7: diffusion restriction corticospinal tract': 'Abnormal diffusion weighted cerebral MRI morphology',
 'enhance peritrigonal; 16 months: possibly thin caudate nuclei with mild diffusion restriction': 'Abnormal diffusion weighted cerebral MRI morphology',
 'Diffusion restriction of the entire supratentorial white matter': 'Abnormal diffusion weighted cerebral MRI morphology',
}
excluded = { 'PLIC; 18\xa0months: no diffusion restriction': 'Abnormal diffusion weighted cerebral MRI morphology',}
brain_mriMapper = OptionColumnMapper(column_name="Brain MRI", concept_recognizer=hpo_cr, option_d=brain_mri_d, excluded_d=excluded)
column_mapper_list.append(brain_mriMapper)
brain_mriMapper.preview_column(dft)

Unnamed: 0,mapping,count
0,Abnormal diffusion weighted cerebral MRI morphology (HP:0032615) (observed),4


In [15]:
corpus_callosum_d = {'Mild hypoplasia': 'Hypoplasia of the corpus callosum',
 'Thin corpus callosum (HP:0033725)': 'Thin corpus callosum',
 'Hypoplasia': 'Hypoplasia of the corpus callosum',
 'thin genu and body': 'Thin corpus callosum',
}
excluded = {}
corpus_callosumMapper = OptionColumnMapper(column_name="Corpus callosum", concept_recognizer=hpo_cr, option_d=corpus_callosum_d, excluded_d=excluded)
column_mapper_list.append(corpus_callosumMapper)
corpus_callosumMapper.preview_column(dft)

Unnamed: 0,mapping,count
0,Hypoplasia of the corpus callosum (HP:0002079) (observed),2
1,Thin corpus callosum (HP:0033725) (observed),2


In [16]:
mrs_d = {
 'Low NAA': 'Reduced brain N-acetyl aspartate level by MRS',
}
excluded = {}
mrsMapper = OptionColumnMapper(column_name="MRS", concept_recognizer=hpo_cr, option_d=mrs_d, excluded_d=excluded)
column_mapper_list.append(mrsMapper)
mrsMapper.preview_column(dft)

Unnamed: 0,mapping,count
0,Reduced brain N-acetyl aspartate level by MRS (HP:0012708) (observed),1


In [17]:
sexMapper = SexColumnMapper(column_name="Sex", female_symbol="Female", male_symbol="Male")
#sexMapper.preview_column(dft)

In [18]:
ageMapper = AgeColumnMapper.iso8601(column_name="Symptom onset")
#ageMapper.preview_column(dft)

In [19]:
GCSH_transcript = "NM_004483.5"
GCSH_id = "HGNC:4208"
vman = VariantManager(df=dft, 
                      individual_column_name="individual_id",
                      allele_1_column_name="Variant 1", 
                      allele_2_column_name="Variant 2", 
                      transcript=GCSH_transcript, 
                      gene_symbol="GCSH",
                      gene_id=GCSH_id)

[INFO] encoding variant "c.(292 + 1_293–1)_(*919_?)dup"
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_004483.5%3Ac.(292 + 1_293–1)_(*919_?)dup/NM_004483.5?content-type=application%2Fjson
[ERROR] Could not retrieve Variant Validator information for c.(292 + 1_293–1)_(*919_?)dup: string indices must be integers
[INFO] encoding variant "c.148?_228?del"
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_004483.5%3Ac.148?_228?del/NM_004483.5?content-type=application%2Fjson
[ERROR] Could not retrieve Variant Validator information for c.148?_228?del: string indices must be integers


In [20]:
vman.to_summary()

Unnamed: 0,status,count,alleles
0,mapped,6,"c.1A>G, c.226C>T, c.170A>G, c.442A>C, c.344C>T, c.293-2_293-1insT"
1,unmapped,2,"c.(292 + 1_293–1)_(*919_?)dup, c.148?_228?del"


In [21]:
vman.code_as_chromosomal_duplication({"c.(292 + 1_293–1)_(*919_?)dup"})
vman.code_as_chromosomal_deletion({"c.148?_228?del"})
vman.to_summary()

Unnamed: 0,status,count,alleles
0,mapped,8,"c.1A>G, c.226C>T, c.170A>G, c.442A>C, c.344C>T, c.293-2_293-1insT, c.(292 + 1_293–1)_(*919_?)dup, c.148?_228?del"
1,unmapped,0,


In [22]:
MMDS7 = Disease(disease_id='OMIM:620423', disease_label='Multiple mitochondrial dysfunctions syndrome 7')
encoder = CohortEncoder(df=dft, 
                        hpo_cr=hpo_cr, 
                        column_mapper_list=column_mapper_list, 
                        individual_column_name="individual_id", 
                        agemapper=ageMapper, 
                        sexmapper=sexMapper,
                        metadata=metadata)
encoder.set_disease(MMDS7)

In [23]:
individuals = encoder.get_individuals()

In [24]:
var_d = vman.get_variant_d()
for k,v in var_d.items():
    print(k,v)

c.1A>G NM_004483.5:c.1A>G(chr16:81096278T>C)
c.226C>T NM_004483.5:c.226C>T(chr16:81090603G>A)
c.170A>G NM_004483.5:c.170A>G(chr16:81090659T>C)
c.442A>C NM_004483.5:c.442A>C(chr16:81082946T>G)
c.344C>T NM_004483.5:c.344C>T(chr16:81084543G>A)
c.293-2_293-1insT NM_004483.5:c.293-2_293-1insT(chr16:81084595C>CA)
c.(292 + 1_293–1)_(*919_?)dup <pyphetools.creation.structural_variant.StructuralVariant object at 0x109b0c310>
c.148?_228?del <pyphetools.creation.structural_variant.StructuralVariant object at 0x15ea76d30>


In [25]:
individual_d = { i.id: i for i in individuals }

In [26]:
for _, row in dft.iterrows():
    id = row["individual_id"]
    allele_1 = row["Variant 1"]
    allele_2 = row["Variant 1"]
    i = individual_d.get(id)
    var_1 = var_d.get(allele_1)
    if allele_1 == allele_2:
        var_1.set_homozygous()
        i.add_variant(var_1)
    else:
        var_1.set_heterozygous()
        i.add_variant(var_1)
        var_2 = var_d.get(allele_2)
        var_2.set_heterozygous()
        i.add_variant(var_2)
individuals = list(individual_d.values())

In [27]:
cvalidator = CohortValidator(cohort=individuals, ontology=hpo_ontology, min_hpo=1, allelic_requirement=AllelicRequirement.BI_ALLELIC)
qc = QcVisualizer(cohort_validator=cvalidator)
display(HTML(qc.to_summary_html()))

Level,Error category,Count
WARNING,REDUNDANT,4


In [28]:
individuals = cvalidator.get_error_free_individual_list()
table = PhenopacketTable(individual_list=individuals, metadata=metadata)
display(HTML(table.to_html()))

Individual,Disease,Genotype,Phenotypic features
Patient 1 (FEMALE; P0D),Multiple mitochondrial dysfunctions syndrome 7 (OMIM:620423),NM_004483.5:c.170A>G (homozygous),Lethargy (HP:0001254); Hypotonia (HP:0001252); Coma (HP:0001259); Apnea (HP:0002104); Decreased liver function (HP:0001410); Thrombocytopenia (HP:0001873); EEG with burst suppression (HP:0010851); Hyperglycemia (HP:0003074); Increased CSF glycine concentration (HP:0500230); Hypoplasia of the corpus callosum (HP:0002079)
Patient 2 (MALE; P0D),Multiple mitochondrial dysfunctions syndrome 7 (OMIM:620423),NM_004483.5:c.1A>G (homozygous),Abnormal diffusion weighted cerebral MRI morphology (HP:0032615); Lethargy (HP:0001254); Respiratory failure requiring assisted ventilation (HP:0004887); Coma (HP:0001259); Myoclonus (HP:0001336); Increased CSF glycine concentration (HP:0500230); Thin corpus callosum (HP:0033725); Areflexia (HP:0001284); EEG with burst suppression (HP:0010851); Hypotonia (HP:0001252); Seizure (HP:0001250); Hyperglycemia (HP:0003074); Hypoglycemia (HP:0001943); Recurrent singultus (HP:0100247)
Patient 3 (FEMALE; P2M),Multiple mitochondrial dysfunctions syndrome 7 (OMIM:620423),NM_004483.5:c.1A>G (homozygous),Exaggerated startle response (HP:0002267); Infantile spasms (HP:0012469); Global developmental delay (HP:0001263); Feeding difficulties (HP:0011968); Hypsarrhythmia (HP:0002521); Multifocal seizures (HP:0031165); Hypoplasia of the corpus callosum (HP:0002079)
Patient 4 (FEMALE; P6M),Multiple mitochondrial dysfunctions syndrome 7 (OMIM:620423),NM_004483.5:c.442A>C (homozygous),Hallucinations (HP:0000738); Increased CSF glycine concentration (HP:0500230); Hyperactivity (HP:0000752); Hypotonia (HP:0001252); Hyperglycemia (HP:0003074); Global developmental delay (HP:0001263); Seizure (HP:0001250); Impulsivity (HP:0100710)
Patient 5 (MALE; P4D),Multiple mitochondrial dysfunctions syndrome 7 (OMIM:620423),c.(292 + 1_293–1)_(*919_?)dup: chromosomal_duplication (SO:1000037),Abnormal diffusion weighted cerebral MRI morphology (HP:0032615); Respiratory failure requiring assisted ventilation (HP:0004887); Apnea (HP:0002104); Increased CSF glycine concentration (HP:0500230); Hyperactivity (HP:0000752); Irritability (HP:0000737); Hyperglycemia (HP:0003074); Global developmental delay (HP:0001263); Axial hypotonia (HP:0008936); Hyperreflexia (HP:0001347); Reduced brain N-acetyl aspartate level by MRS (HP:0012708); Seizure (HP:0001250); Cyanosis (HP:0000961); Dystonia (HP:0001332)
Patient 6 (MALE; P3M),Multiple mitochondrial dysfunctions syndrome 7 (OMIM:620423),NM_004483.5:c.344C>T (homozygous),Abnormal diffusion weighted cerebral MRI morphology (HP:0032615); Agitation (HP:0000713); Increased CSF glycine concentration (HP:0500230); Hyperactivity (HP:0000752); Focal-onset seizure (HP:0007359); Infantile spasms (HP:0012469); Hyperglycemia (HP:0003074); Global developmental delay (HP:0001263); Autistic behavior (HP:0000729)


In [29]:
Individual.output_individuals_as_phenopackets(individual_list=individuals, metadata=metadata)

We output 6 GA4GH phenopackets to the directory phenopackets
