<h1>SMARCC2: Bosch et al 2023 </h1>
<p>Extract the clinical data from <a href="https://pubmed.ncbi.nlm.nih.gov/37551667/"target="__blank">Bosch E, et al. (2023) Elucidating the clinical and molecular spectrum of SMARCC2-associated NDD in a cohort of 65 affected individuals. Genet Med.  PMID:37551667</a>.<p>
<p>The authors report that clinical presentation differed significantly, with LGD variants being predominantly inherited and associated with mildly reduced or normal cognitive development, while non-truncating variants were mostly de novo and presented with severe developmental delay. </p>

In [1]:
import phenopackets as php
from google.protobuf.json_format import MessageToDict, MessageToJson
from google.protobuf.json_format import Parse, ParseDict
import pandas as pd
import math
from csv import DictReader
pd.set_option('display.max_colwidth', None) # show entire column contents, important!
from collections import defaultdict
import re
from pyphetools.creation import *
from pyphetools.output import PhenopacketTable
# last tested with pyphetools version 0.4.6

In [2]:
parser = HpoParser()
hpo_cr = parser.get_hpo_concept_recognizer()
hpo_version = parser.get_version()
metadata = MetaData(created_by="ORCID:0000-0002-0736-9199")
metadata.default_versions_with_hpo(version=hpo_version)

In [13]:
df = pd.read_excel("input/FileS2_cases_clinical-table.xlsx", index_col ="Patient_ID (in Project)", comment="##")

In [14]:
df.head()

Unnamed: 0_level_0,HPO,Ind-01,Ind-02,Ind-03,Ind-04,Ind-05,Ind-06,Ind-07,Ind-08,Ind-09,...,Machol_Ind 15,Li_Ind 1,Chen_Pat 123,Chen_Pat 124,Chen_Pat 126,Sun_case,Yi_case,Lo_twin 1,Lo_twin 2,Gofin_Subject 5
Patient_ID (in Project),Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
#family,,Fam-01,Fam-02,Fam-03,Fam-04,Fam-05,Fam-06,Fam-07,Fam-08,Fam-09,...,Fam-45,Fam-46,Fam-47,Fam-48,Fam-49,Fam-50,Fam-51,Fam-52,Fam-52,Fam-53
#group,,novel,novel,novel,novel,novel,novel,novel,novel,novel,...,literature,literature,literature,literature,literature,literature,literature,literature,literature,literature
#analysis,,include,include,exclude,include,include,exclude,include,include,include,...,include,include,include,include,include,include,include,include,include,exclude
#Case published previously,,no,no,no,no,no,no,no,no,no,...,yes,yes,yes,yes,yes,yes,yes,yes,yes,yes
#Literature reference,,,,,,,,,,,...,PMID:30580808,PMID:34881817,PMID:34906496,PMID:34906496,PMID:34906496,PMID:35241061,PMID:35536477,PMID:35699097,PMID:35699097,PMID:35796094


In [15]:
df.index

Index([                                '#family',
                                        '#group',
                                     '#analysis',
                    '#Case published previously',
                         '#Literature reference',
                    '#Age at last investigation',
                                          '#Sex',
                                '#Consanguinity',
                                    '#Ethnicity',
                       '#Family medical history',
       ...
       'Abnormality of the genitourinary system',
                         'Laryngotracheomalacia',
               'Generalized abnormality of skin',
                                            None,
                                    'Oral cleft',
                          'Recurrent infections',
        'Feeding difficulties/failure to thrive',
                             'Sleep disturbance',
                        'Recurrent otitis media',
                  '#Other anomalies or 

In [16]:
pat_id_list = df.columns

In [18]:
age_list = df.loc["#Age at last investigation"]
age_list

HPO                                     NaN
Ind-01                                11y8m
Ind-02                                 1y6m
Ind-03                                12y8m
Ind-04                                 6y5m
                             ...           
Sun_case                              fetus
Yi_case                                 28y
Lo_twin 1                               NaN
Lo_twin 2                               NaN
Gofin_Subject 5    died in perinatal period
Name: #Age at last investigation, Length: 66, dtype: object

In [19]:
sex_list = df.loc["#Sex"]
sex_list

HPO                   NaN
Ind-01             female
Ind-02               male
Ind-03               male
Ind-04               male
                    ...  
Sun_case              NaN
Yi_case            female
Lo_twin 1            male
Lo_twin 2            male
Gofin_Subject 5    female
Name: #Sex, Length: 66, dtype: object

In [21]:
hg38_var_list = df.loc["#Variant(s) in SMARCC2 (genomic hg38/GRCh38)"]
hg38_var_list

HPO                                NaN
Ind-01              chr12-56169549-C-G
Ind-02              chr12-56168139-T-C
Ind-03              chr12-56168120-C-A
Ind-04              chr12-56168139-T-C
                          ...         
Sun_case           chr12-56164309-GT-G
Yi_case             chr12-56171697-C-G
Lo_twin 1          chr12-56165327-CA-C
Lo_twin 2          chr12-56165327-CA-C
Gofin_Subject 5     chr12-56172706-T-C
Name: #Variant(s) in SMARCC2 (genomic hg38/GRCh38), Length: 66, dtype: object

In [22]:
variant_type_list = df.loc["#variant type"]
variant_type_list.unique()

array([nan, 'missense', 'missense; confirmed protein loss', 'truncating',
       'splice; potentially inframe', 'splice; confirmed inframe',
       'inframe', 'splice; potentially truncating',
       'splice; confirmed NMD'], dtype=object)

In [23]:
allelic_state_list = df.loc["#Allelic state"]
allelic_state_list.unique()

array([nan, 'heterozygous', 'heterozygous '], dtype=object)

<h2>HPO data</h2>
<p>The file contains rows that are already encoded as HPO terms. We look for a yes or no at the beginning of each cell, and do not parse any other text giving more detail.</p>

In [38]:
df_hpo = df[df['HPO'].fillna('').str.contains("HP:")]

In [39]:
df_hpo.head()

Unnamed: 0_level_0,HPO,Ind-01,Ind-02,Ind-03,Ind-04,Ind-05,Ind-06,Ind-07,Ind-08,Ind-09,...,Machol_Ind 15,Li_Ind 1,Chen_Pat 123,Chen_Pat 124,Chen_Pat 126,Sun_case,Yi_case,Lo_twin 1,Lo_twin 2,Gofin_Subject 5
Patient_ID (in Project),Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Microcephaly,HP:0000252,no,yes,yes,yes,no,no,no,no,yes,...,no,no,,,,,,,,
Macrocephaly,HP:0000256,no,no,no,no,no,no,no,yes,no,...,no,yes,,,,,,,,
Abnormal facial shape,HP:0001999,yes,no,yes,yes,yes; occipital plagiocephaly,no,no,yes; macrocephaly,no,...,no,yes; macrocephaly,,,,,no,,,
Abnormality of the eye,HP:0000478,no,yes; strabismus,yes; ptosis at 6y,no,yes; strabismus,yes; strabismus,no,no,no,...,no,,,,,,,,,
Abnormality of the hand,HP:0001155,no,no,no,yes; clinodactyly,no,no,no; slender appearance of hands and fingers without true arachnodactyly,no,no,...,no,yes; fetal finger pads bilaterally,,,,,,,,
