<h1>Mutation pattern and genotype-phenotype correlations of SETD2 in neurodevelopmental disorders</h1>
<p>Generate phenopackets from the data reported in <a href="https://pubmed.ncbi.nlm.nih.gov/33766796/">Chen et al., (2021)</a>.</p>
<p>The authors report: To analyze the correlations between SETD2 mutations and corresponding phenotypes, we systematically review the reported individuals with de novo SETD2 variants, classify the pathogenicity, and analyze the detailed phenotypes. We subsequently manually curate 17 SETD2 de novo variants in 17 individuals from published literature. Individuals with de novo SETD2 variants present common phenotypes including speech and motor delay, intellectual disability, macrocephaly, ASD, overgrowth and recurrent otitis media. </p>

In [14]:
import phenopackets as php
from google.protobuf.json_format import MessageToDict, MessageToJson
from google.protobuf.json_format import Parse, ParseDict
import pandas as pd
pd.set_option('display.max_colwidth', None) # show entire column contents, important!
from collections import defaultdict
import numpy as np
import pyphetools
from pyphetools.creation import *
from pyphetools.output import PhenopacketTable
print(f"pyphetools version {pyphetools.__version__}")

pyphetools version 0.6.5


In [15]:
parser = HpoParser()
hpo_cr = parser.get_hpo_concept_recognizer()
hpo_version = parser.get_version()
PMID = "PMID:33766796"  # Chen et al, 2021
title = "Mutation pattern and genotype-phenotype correlations of SETD2 in neurodevelopmental disorders"
metadata = MetaData(created_by="ORCID:0000-0002-0736-9199", pmid=PMID, pubmed_title=title)
metadata.default_versions_with_hpo(version=hpo_version)

In [16]:
df = pd.read_table('./input/chen21_setd2.tsv', sep="\t").astype(str)
df

Unnamed: 0,Patient,1,2,3,4,5,8,9,10,11,12,14,16,17,19
0,Sex,female,male,female,male,male,male,female,male,male,male,male,male,female,male
1,Weight.age.measured,,+10.28SD,+3SD,,1.14SD,-2SD,,0.2SD,+1.79SD,4SD,–,+1.5SD,+0.96SD,
2,Height.age.measured,+0.5SD,+3.14SD,,+3SD,+0.25SD,+2SD,,+2.5SD,1.14SD,2.8SD,0.61SD,+2.5SD,+1.79SD,+0.53SD
3,Speech delay,+,+,,+,+,+,+,+,,+,+,+,+,–
4,Motor delay,+,+,+,+,–,–,+,+,–,,+,+,–,
5,Intellectual disability,,+,,+,+,+,+,,+,,+,,,
6,Macrocephaly,+,+,+,+,–,+,+,+,,+,–,+,–,+
7,ASD,–,+,+,–,+,–,+,–,+,–,+,–,+,+
8,Recurrent otitis media,+,,+,,,,,+,+,,,,,+
9,Seizure,,–,+,,–,,+,,,,,,-,


In [17]:
dft = df.transpose()
dft.columns = dft.iloc[0]
dft.drop(dft.index[0], inplace=True)
dft['patient_id'] = dft.index
dft.head()

Patient,Sex,Weight.age.measured,Height.age.measured,Speech delay,Motor delay,Intellectual disability,Macrocephaly,ASD,Recurrent otitis media,Seizure,...,Accelerated osseous maturation,Anxiety,ADHD,Obsessive behavior,Aggressive behavior,Self-injury behavior,Gastrointestinal disturbance,Variant,primary_dx,patient_id
1,female,,+0.5SD,+,+,,+,–,+,,...,+,,,,+,+,,c.6775del,LLS,1
2,male,+10.28SD,+3.14SD,+,+,+,+,+,,–,...,,,,+,+,,,c.6471T>A,"ASD, ID",2
3,female,+3SD,,,+,,+,+,+,+,...,,,,,,,+,c.6341del,ASD,3
4,male,,+3SD,+,+,+,+,–,,,...,+,,,,,,,c.5285_5286del,Sotos,4
5,male,1.14SD,+0.25SD,+,–,+,–,+,,–,...,,-,+,+,+,-,-,c.4715+1G>A,ASD,5


In [18]:
hpo_cr = parser.get_hpo_concept_recognizer()

In [19]:
items = {
    'Speech delay': ["Delayed speech and language development", "HP:0000750"], 
    'Motor delay': ['Motor delay', 'HP:0001270'],
    'Intellectual disability': ['Intellectual disability', 'HP:0001249'],
    'Macrocephaly': ['Macrocephaly', 'HP:0000256'],
    'ASD': ['Autism', 'HP:0000717'],
    'Recurrent otitis media': ['Recurrent otitis media','HP:0000403'],
    'Seizure': ['Seizure', 'HP:0001250'],
    'Facial deformity': ['Abnormal facial shape', 'HP:0001999'],
    'Hypotonia': ['Hypotonia', 'HP:0001252'],
    'Accelerated osseous maturation': ['Accelerated skeletal maturation','HP:0005616'],
    'Anxiety': ['Anxiety','HP:0000739'],
    'ADHD': ['Attention deficit hyperactivity disorder','HP:0007018'],
    'Obsessive behavior': ['Compulsive behaviors','HP:0000722'],
    'Aggressive behavior': ['Aggressive behavior','HP:0000718'],
    'Self-injury behavior': ['Self-injurious behavior','HP:0100716'],
}
item_column_mapper_d = hpo_cr.initialize_simple_column_maps(column_name_to_hpo_label_map=items, observed='+',
    excluded='-')
print(f"We created {len(item_column_mapper_d)} simple column mappers")

We created 15 simple column mappers


<h2>Transcript/Variant mapping</h2>

In [20]:
setd2_transcript = "NM_014159.7"
genome = 'hg38'
default_genotype = 'heterozygous'
variant_list = dft['Variant'].unique()
print(variant_list)
variant_d = {}
vvalidator = VariantValidator(genome_build=genome, transcript=setd2_transcript)
for v in variant_list:
    var = vvalidator.encode_hgvs(v)
    variant_d[v] = var
print(f"We get {len(variant_d)} unique variants")

['c.6775del' 'c.6471T>A' 'c.6341del' 'c.5285_5286del' 'c.4715+1G>A'
 'c.4404dupA' 'c.2028del' 'c.1647_1667delinsAC' 'c.6895G>A' 'c.5444T>G'
 'c.4997A>G' 'c.4644_4646del' 'c.3185C>T' 'c.121A>T']
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_014159.7%3Ac.6775del/NM_014159.7?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_014159.7%3Ac.6471T>A/NM_014159.7?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_014159.7%3Ac.6341del/NM_014159.7?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_014159.7%3Ac.5285_5286del/NM_014159.7?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_014159.7%3Ac.4715+1G>A/NM_014159.7?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_014159.7%3Ac.4404d

In [21]:
varMapper = VariantColumnMapper(variant_d=variant_d, 
                                variant_column_name='Variant', 
                                default_genotype=default_genotype)

In [22]:
# Ages not available
sexMapper = SexColumnMapper(male_symbol='male', female_symbol='female', column_name='Sex')
#sexMapper.preview_column(dft['Sex'])

In [25]:
encoder = CohortEncoder(df=dft, 
                        hpo_cr=hpo_cr, 
                        column_mapper_d=item_column_mapper_d, 
                        individual_column_name="patient_id", 
                        sexmapper=sexMapper,
                        agemapper=AgeColumnMapper.not_provided(),
                        variant_mapper=varMapper, metadata=metadata,
                        pmid=PMID)
encoder.set_disease(disease_id='OMIM:616831', label='Luscan-Lumish syndrome')

<h2>SETD2</h2>
<p>Variants in SETD2 are associated with three diseases in OMIM</p>
<ul>
    <li>Intellectual developmental disorder, autosomal dominant 70 	(OMIM:620157)</li>
     <li>Luscan-Lumish syndrome 	(OMIM:616831)</li>
     <li>Rabin-Pappas syndrome 	(OMIM:620155)</li>
</ul>

In [26]:
individuals = encoder.get_individuals()

<h2>Diagnosis</h2>
<p>Some individuals with SETD2 variants only show features similar to Sotos syndrome. There is no OMIM code for this so we code this with the OMIM id for the gene. Downstream analysis should use the original descriptions.</p>p>

In [30]:
for indi in individuals:
    row = dft.loc[indi.id]
    dx = row["primary_dx"]
    if dx == "LLS":
        indi.set_disease(disease_id='OMIM:616831', disease_label='Luscan-Lumish syndrome')
    elif "ID" in dx or "ASD" in dx:
        indi.set_disease(disease_id='OMIM:620157', disease_label='Intellectual developmental disorder, autosomal dominant 70')
    elif "Sotos" == dx:
        indi.set_disease(disease_id="OMIM:612778a", disease_label="Sotos=like")
    else:
        raise ValueError(f"Unrecognized diagnosis: {dx}")

In [31]:
# Preview
i1 = individuals[0]
phenopacket1 = i1.to_ga4gh_phenopacket(metadata=metadata.to_ga4gh())
json_string = MessageToJson(phenopacket1)
print(json_string)

{
  "id": "PMID_33766796_1",
  "subject": {
    "id": "1",
    "sex": "FEMALE"
  },
  "phenotypicFeatures": [
    {
      "type": {
        "id": "HP:0000750",
        "label": "Delayed speech and language development"
      }
    },
    {
      "type": {
        "id": "HP:0001270",
        "label": "Motor delay"
      }
    },
    {
      "type": {
        "id": "HP:0000256",
        "label": "Macrocephaly"
      }
    },
    {
      "type": {
        "id": "HP:0000403",
        "label": "Recurrent otitis media"
      }
    },
    {
      "type": {
        "id": "HP:0001999",
        "label": "Abnormal facial shape"
      }
    },
    {
      "type": {
        "id": "HP:0005616",
        "label": "Accelerated skeletal maturation"
      }
    },
    {
      "type": {
        "id": "HP:0000718",
        "label": "Aggressive behavior"
      }
    },
    {
      "type": {
        "id": "HP:0100716",
        "label": "Self-injurious behavior"
      }
    }
  ],
  "interpretations": [
    {

In [32]:
from IPython.display import HTML, display

phenopackets = [i.to_ga4gh_phenopacket(metadata=metadata.to_ga4gh()) for i in individuals]
table = PhenopacketTable(phenopacket_list=phenopackets)
display(HTML(table.to_html()))

Individual,Genotype,Phenotypic features
1 (FEMALE; ),NM_014159.7:c.6775del (heterozygous),Delayed speech and language development (HP:0000750); Motor delay (HP:0001270); Macrocephaly (HP:0000256); Recurrent otitis media (HP:0000403); Abnormal facial shape (HP:0001999); Accelerated skeletal maturation (HP:0005616); Aggressive behavior (HP:0000718); Self-injurious behavior (HP:0100716)
2 (MALE; ),NM_014159.7:c.6471T>A (heterozygous),Delayed speech and language development (HP:0000750); Motor delay (HP:0001270); Intellectual disability (HP:0001249); Macrocephaly (HP:0000256); Autism (HP:0000717); Abnormal facial shape (HP:0001999); Compulsive behaviors (HP:0000722); Aggressive behavior (HP:0000718)
3 (FEMALE; ),NM_014159.7:c.6341del (heterozygous),Motor delay (HP:0001270); Macrocephaly (HP:0000256); Autism (HP:0000717); Recurrent otitis media (HP:0000403); Seizure (HP:0001250); Hypotonia (HP:0001252)
4 (MALE; ),NM_014159.7:c.5285_5286del (heterozygous),Delayed speech and language development (HP:0000750); Motor delay (HP:0001270); Intellectual disability (HP:0001249); Macrocephaly (HP:0000256); Abnormal facial shape (HP:0001999); Hypotonia (HP:0001252); Accelerated skeletal maturation (HP:0005616)
5 (MALE; ),NM_014159.7:c.4715+1G>A (heterozygous),Delayed speech and language development (HP:0000750); Intellectual disability (HP:0001249); Autism (HP:0000717); Attention deficit hyperactivity disorder (HP:0007018); Compulsive behaviors (HP:0000722); Aggressive behavior (HP:0000718)
8 (MALE; ),NM_014159.7:c.4405dup (heterozygous),Delayed speech and language development (HP:0000750); Intellectual disability (HP:0001249); Macrocephaly (HP:0000256); Abnormal facial shape (HP:0001999)
9 (FEMALE; ),NM_014159.7:c.2028del (heterozygous),Delayed speech and language development (HP:0000750); Motor delay (HP:0001270); Intellectual disability (HP:0001249); Macrocephaly (HP:0000256); Autism (HP:0000717); Seizure (HP:0001250); Hypotonia (HP:0001252); Anxiety (HP:0000739); Attention deficit hyperactivity disorder (HP:0007018)
10 (MALE; ),NM_014159.7:c.1647_1667delinsAC (heterozygous),Delayed speech and language development (HP:0000750); Motor delay (HP:0001270); Macrocephaly (HP:0000256); Recurrent otitis media (HP:0000403); Abnormal facial shape (HP:0001999); Aggressive behavior (HP:0000718)
11 (MALE; ),NM_014159.7:c.6895G>A (heterozygous),Intellectual disability (HP:0001249); Autism (HP:0000717); Recurrent otitis media (HP:0000403); Anxiety (HP:0000739)
12 (MALE; ),NM_014159.7:c.5444T>G (heterozygous),Delayed speech and language development (HP:0000750); Macrocephaly (HP:0000256); Abnormal facial shape (HP:0001999); Accelerated skeletal maturation (HP:0005616)


In [33]:
output_directory = "phenopackets"
Individual.output_individuals_as_phenopackets(individual_list=individuals,
                                              pmid=PMID,
                                              metadata=metadata.to_ga4gh(),
                                              outdir=output_directory)

We output 14 GA4GH phenopackets to the directory phenopackets
