<h1>Mutation pattern and genotype-phenotype correlations of SETD2 in neurodevelopmental disorders</h1>
<p>Generate phenopackets from the data reported in <a href="https://pubmed.ncbi.nlm.nih.gov/33766796/">Chen et al., (2021)</a>.</p>
<p>The authors report: To analyze the correlations between SETD2 mutations and corresponding phenotypes, we systematically review the reported individuals with de novo SETD2 variants, classify the pathogenicity, and analyze the detailed phenotypes. We subsequently manually curate 17 SETD2 de novo variants in 17 individuals from published literature. Individuals with de novo SETD2 variants present common phenotypes including speech and motor delay, intellectual disability, macrocephaly, ASD, overgrowth and recurrent otitis media. </p>

In [1]:
import pandas as pd
from IPython.display import HTML, display
pd.set_option('display.max_colwidth', None) # show entire column contents, important!
from collections import defaultdict
from pyphetools.creation import *
from pyphetools.visualization import PhenopacketTable, QcVisualizer
from pyphetools.validation import *
import pyphetools
print(f"Using pyphetools version {pyphetools.__version__}")

Using pyphetools version 0.9.76


In [2]:
PMID = "PMID:33766796"  # Chen et al, 2021
title = "Mutation pattern and genotype-phenotype correlations of SETD2 in neurodevelopmental disorders"
cite = Citation(pmid=PMID, title=title)
parser = HpoParser(hpo_json_file="../hp.json")
hpo_cr = parser.get_hpo_concept_recognizer()
hpo_version = parser.get_version()
hpo_ontology = parser.get_ontology()
metadata = MetaData(created_by="ORCID:0000-0002-0736-9199", citation=cite)
metadata.default_versions_with_hpo(version=hpo_version)
print(f"HPO version {hpo_version}")

HPO version 2024-04-03


In [3]:
df = pd.read_table('./input/chen21_setd2.tsv', sep="\t").astype(str)
df.head()

Unnamed: 0,Patient,1,2,3,4,5,8,9,10,11,12,14,16,17,19
0,Sex,female,male,female,male,male,male,female,male,male,male,male,male,female,male
1,Weight.age.measured,,+10.28SD,+3SD,,1.14SD,-2SD,,0.2SD,+1.79SD,4SD,–,+1.5SD,+0.96SD,
2,Height.age.measured,+0.5SD,+3.14SD,,+3SD,+0.25SD,+2SD,,+2.5SD,1.14SD,2.8SD,0.61SD,+2.5SD,+1.79SD,+0.53SD
3,Speech delay,+,+,,+,+,+,+,+,,+,+,+,+,–
4,Motor delay,+,+,+,+,–,–,+,+,–,,+,+,–,


In [4]:
dft = df.transpose()
dft.columns = dft.iloc[0]
dft.drop(dft.index[0], inplace=True)
dft['individual_id'] = dft.index
dft.head()

Patient,Sex,Weight.age.measured,Height.age.measured,Speech delay,Motor delay,Intellectual disability,Macrocephaly,ASD,Recurrent otitis media,Seizure,...,Accelerated osseous maturation,Anxiety,ADHD,Obsessive behavior,Aggressive behavior,Self-injury behavior,Gastrointestinal disturbance,Variant,primary_dx,individual_id
1,female,,+0.5SD,+,+,,+,–,+,,...,+,,,,+,+,,c.6775del,LLS,1
2,male,+10.28SD,+3.14SD,+,+,+,+,+,,–,...,,,,+,+,,,c.6471T>A,"ASD, ID",2
3,female,+3SD,,,+,,+,+,+,+,...,,,,,,,+,c.6341del,ASD,3
4,male,,+3SD,+,+,+,+,–,,,...,+,,,,,,,c.5285_5286del,Sotos,4
5,male,1.14SD,+0.25SD,+,–,+,–,+,,–,...,,-,+,+,+,-,-,c.4715+1G>A,ASD,5


In [5]:
items = {
    'Speech delay': ["Delayed speech and language development", "HP:0000750"],
    'Motor delay': ['Motor delay', 'HP:0001270'],
    'Intellectual disability': ['Intellectual disability', 'HP:0001249'],
    'Macrocephaly': ['Macrocephaly', 'HP:0000256'],
    'ASD': ['Autism', 'HP:0000717'],
    'Recurrent otitis media': ['Recurrent otitis media','HP:0000403'],
    'Seizure': ['Seizure', 'HP:0001250'],
    'Facial deformity': ['Abnormal facial shape', 'HP:0001999'],
    'Hypotonia': ['Hypotonia', 'HP:0001252'],
    'Accelerated osseous maturation': ['Accelerated skeletal maturation','HP:0005616'],
    'Anxiety': ['Anxiety','HP:0000739'],
    'ADHD': ['Attention deficit hyperactivity disorder','HP:0007018'],
    'Obsessive behavior': ['Compulsive behaviors','HP:0000722'],
    'Aggressive behavior': ['Aggressive behavior','HP:0000718'],
    'Self-injury behavior': ['Self-injurious behavior','HP:0100716'],
}
column_mapper_d = hpo_cr.initialize_simple_column_maps(column_name_to_hpo_label_map=items, observed='+',
    excluded='-')
column_mapper_list = list(column_mapper_d.values())
print(f"We created {len(column_mapper_list)} simple column mappers")

We created 15 simple column mappers


<h2>Transcript/Variant mapping</h2>

In [6]:
setd2_transcript = "NM_014159.7"
setd2_id = "HGNC:18420"
setd2_symbol = "SETD2"
vman = VariantManager(df=dft, individual_column_name="individual_id", allele_1_column_name="Variant",
                     gene_id=setd2_id, gene_symbol=setd2_symbol, transcript=setd2_transcript)

In [7]:
vman.to_summary()

Unnamed: 0,status,count,alleles
0,mapped,14,"c.5444T>G, c.4404dupA, c.4715+1G>A, c.5285_5286del, c.6895G>A, c.3185C>T, c.4644_4646del, c.6775del, c.6341del, c.6471T>A, c.4997A>G, c.121A>T, c.1647_1667delinsAC, c.2028del"
1,unmapped,0,


In [8]:
variant_d = vman.get_variant_d()
print(f"We get {len(variant_d)} unique variants")

We get 14 unique variants


In [9]:
varMapper = VariantColumnMapper(variant_d=variant_d,
                                variant_column_name='Variant',
                                default_genotype="heterozygous")

In [10]:
sexMapper = SexColumnMapper(male_symbol='male', female_symbol='female', column_name='Sex')
#sexMapper.preview_column(dft)

<h2>SETD2</h2>
<p>Variants in SETD2 are associated with three diseases in OMIM</p>
<ul>
    <li>Intellectual developmental disorder, autosomal dominant 70 	(OMIM:620157)</li>
     <li>Luscan-Lumish syndrome 	(OMIM:616831)</li>
     <li>Rabin-Pappas syndrome 	(OMIM:620155)</li>
</ul>

## Diagnosis

Mutation in SETD2 is associated with three diseases in OMIM:

- autosomal dominant intellectual developmental disorder-70 (MRD70), characterized by  is characterized by mild global developmental delay, moderately impaired intellectual disability with speech difficulties, and behavioral abnormalities. 
- Luscan-Lumish syndrome (LLS), which is characterized by macrocephaly, intellectual disability, speech delay, low sociability, and behavioral problems. More variable features include postnatal overgrowth, obesity, advanced carpal ossification, developmental delay, and seizures
- Rabin-Pappas syndrome (RAPAS) is a multisystemic disorder characterized by severely impaired global development apparent from infancy, feeding difficulties with failure to thrive, small head circumference, and dysmorphic facial features. Affected individuals have impaired intellectual development and hypotonia; they do not achieve walking or meaningful speech. Other neurologic findings may include seizures, hearing loss, ophthalmologic defects, and brain imaging abnormalities. 

In the publication by Chen et al., individuals are described with features similar to Sotos syndrome. There is no OMIM code for this. We will code this as Luscan-Lumish syndrome since in the original publication (PMID:24852293) about the disease that came to be known as Luscan-Lumish syndrome individuals were described as having Sotos like features. Similarly, we will code the individuals in the Chen et al., publication described as predominantly having autism spectrum disorder-like features as MRD70.



In [11]:
disease_d = {}
LLS = Disease(disease_id='OMIM:616831', disease_label='Luscan-Lumish syndrome')

disease_d['LLS'] = LLS
MRD70 = Disease(disease_id='OMIM:620157', disease_label='Intellectual developmental disorder, autosomal dominant 70')
disease_d['ID'] = MRD70
disease_d['ASD, ID'] = MRD70
# No OMIM/Mondo ID available
disease_d['Sotos'] = LLS
disease_d['ASD']= MRD70
disease_map = {}
for _, row in dft[['individual_id','primary_dx']].iterrows():
    id = row['individual_id']
    dx = row['primary_dx']
    disease_map[id] = disease_d.get(dx)

In [12]:
encoder = CohortEncoder(df=dft,
                        hpo_cr=hpo_cr,
                        column_mapper_list=column_mapper_list,
                        individual_column_name="individual_id",
                        sexmapper=sexMapper,
                        variant_mapper=varMapper,
                        metadata=metadata)
encoder.set_disease_dictionary(disease_map)

In [13]:
individuals = encoder.get_individuals()
cvalidator = CohortValidator(cohort=individuals, ontology=hpo_ontology, min_hpo=1, allelic_requirement=AllelicRequirement.MONO_ALLELIC)
qc = QcVisualizer(cvalidator)
display(HTML(qc.to_summary_html()))

Level,Error category,Count
INFORMATION,NOT_MEASURED,126


In [14]:
individuals = cvalidator.get_error_free_individual_list()
table = PhenopacketTable(individual_list=individuals, metadata=metadata)
display(HTML(table.to_html()))

Individual,Disease,Genotype,Phenotypic features
1 (FEMALE; n/a),Luscan-Lumish syndrome (OMIM:616831),NM_014159.7:c.6775del (heterozygous),Delayed speech and language development (HP:0000750); Motor delay (HP:0001270); Macrocephaly (HP:0000256); Recurrent otitis media (HP:0000403); Abnormal facial shape (HP:0001999); Accelerated skeletal maturation (HP:0005616); Aggressive behavior (HP:0000718); Self-injurious behavior (HP:0100716)
2 (MALE; n/a),"Intellectual developmental disorder, autosomal dominant 70 (OMIM:620157)",NM_014159.7:c.6471T>A (heterozygous),Delayed speech and language development (HP:0000750); Motor delay (HP:0001270); Intellectual disability (HP:0001249); Macrocephaly (HP:0000256); Autism (HP:0000717); Abnormal facial shape (HP:0001999); Compulsive behaviors (HP:0000722); Aggressive behavior (HP:0000718)
3 (FEMALE; n/a),"Intellectual developmental disorder, autosomal dominant 70 (OMIM:620157)",NM_014159.7:c.6341del (heterozygous),Motor delay (HP:0001270); Macrocephaly (HP:0000256); Autism (HP:0000717); Recurrent otitis media (HP:0000403); Seizure (HP:0001250); Hypotonia (HP:0001252)
4 (MALE; n/a),Luscan-Lumish syndrome (OMIM:616831),NM_014159.7:c.5285_5286del (heterozygous),Delayed speech and language development (HP:0000750); Motor delay (HP:0001270); Intellectual disability (HP:0001249); Macrocephaly (HP:0000256); Abnormal facial shape (HP:0001999); Hypotonia (HP:0001252); Accelerated skeletal maturation (HP:0005616)
5 (MALE; n/a),"Intellectual developmental disorder, autosomal dominant 70 (OMIM:620157)",NM_014159.7:c.4715+1G>A (heterozygous),Delayed speech and language development (HP:0000750); Intellectual disability (HP:0001249); Autism (HP:0000717); Attention deficit hyperactivity disorder (HP:0007018); Compulsive behaviors (HP:0000722); Aggressive behavior (HP:0000718); excluded: Anxiety (HP:0000739); excluded: Self-injurious behavior (HP:0100716)
8 (MALE; n/a),"Intellectual developmental disorder, autosomal dominant 70 (OMIM:620157)",NM_014159.7:c.4405dup (heterozygous),Delayed speech and language development (HP:0000750); Intellectual disability (HP:0001249); Macrocephaly (HP:0000256); Abnormal facial shape (HP:0001999)
9 (FEMALE; n/a),"Intellectual developmental disorder, autosomal dominant 70 (OMIM:620157)",NM_014159.7:c.2028del (heterozygous),Delayed speech and language development (HP:0000750); Motor delay (HP:0001270); Intellectual disability (HP:0001249); Macrocephaly (HP:0000256); Autism (HP:0000717); Seizure (HP:0001250); Hypotonia (HP:0001252); Anxiety (HP:0000739); Attention deficit hyperactivity disorder (HP:0007018)
10 (MALE; n/a),Luscan-Lumish syndrome (OMIM:616831),NM_014159.7:c.1647_1667delinsAC (heterozygous),Delayed speech and language development (HP:0000750); Motor delay (HP:0001270); Macrocephaly (HP:0000256); Recurrent otitis media (HP:0000403); Abnormal facial shape (HP:0001999); Aggressive behavior (HP:0000718)
11 (MALE; n/a),"Intellectual developmental disorder, autosomal dominant 70 (OMIM:620157)",NM_014159.7:c.6895G>A (heterozygous),Intellectual disability (HP:0001249); Autism (HP:0000717); Recurrent otitis media (HP:0000403); Anxiety (HP:0000739)
12 (MALE; n/a),Luscan-Lumish syndrome (OMIM:616831),NM_014159.7:c.5444T>G (heterozygous),Delayed speech and language development (HP:0000750); Macrocephaly (HP:0000256); Abnormal facial shape (HP:0001999); Accelerated skeletal maturation (HP:0005616)


In [15]:
output_directory = "phenopackets"
Individual.output_individuals_as_phenopackets(individual_list=individuals,
                                              metadata=metadata,
                                              outdir=output_directory)

We output 14 GA4GH phenopackets to the directory phenopackets


In [16]:
# pxf validate --hpo hp.json *.json
# No errrors