# GNAS
Guanine Nucleotide-Binding Protein G(S) Subunit Alpha is a complex imprinted locus that produces multiple transcripts through the use of alternative promoters and alternative splicing. The most well-characterized transcript derived from GNAS, Gs-alpha, encodes the alpha subunit of the stimulatory guanine nucleotide-binding protein (G protein) ([OMIM:139320](https://omim.org/entry/139320)).

Pseudohypoparathyroidism type Ia (PHP Ia; [OMIM:103580](https://omim.org/entry/103580)) is caused by mutation resulting in loss of function of the Gs-alpha isoform of the GNAS gene (139320) on the maternal allele. This results in expression of the Gs-alpha protein only from the paternal allele.

Pseudopseudohypoparathyroidism (PPHP; [OMIM:612463](https://omim.org/entry/612463)) is caused by a mutation resulting in loss of function of the Gs-alpha isoform of the GNAS gene (139320) on the paternal allele. This results in expression of the Gs-alpha protein only from the maternal allele.

[Thiel et al. (2015)](https://pubmed.ncbi.nlm.nih.gov/25802881/) report

> Mutations are distributed throughout the Gsα coding exons of GNAS and there is a lack of genotype-phenotype correlation. The authors identified 58 different mutations in 88 patients and 27 relatives, and found a significantly higher occurrence of subcutaneous calcifications in patients harboring truncating versus missense mutations was demonstrated. 

In this notebook, we extract phenopackets from the Supplemental table provided by Thiel et al.

Note that the original publication reports variants according to Refseq: NM_001077488.1. We have translated coordinates to the MANE transcript is NM_000516.7, both are shown in the input Excel file that we created based on the original Word supplemental file.

In [212]:
import pandas as pd
pd.set_option('display.max_colwidth', None) # show entire column contents, important!
import typing
import os
import re
from google.protobuf.json_format import MessageToJson
from pyphetools.creation import Measurements, VariantManager, PromoterVariant, StructuralVariant, OntologyTerms
from pyphetools.pp.v202 import *
import pyphetools
print(f"Using pyphetools version {pyphetools.__version__}")

Using pyphetools version 0.9.108


In [213]:
df = pd.read_excel("input/Thiele_2015_GNAS.xlsx")
df.head()

Unnamed: 0,Index No,Sex,Age,PHP Type,AHOsigns,PTH[pg/ml],Ca[mmol/l],Ph[mmol/l],Gsalpha activity[%],Endocrinopathies,NM_001077488.1,NM_000516.7,variant_comment,Location,Inheritance,Previously described
0,P1,F,P10Y,PHPIa,"o,sst,bm,scc,mr",669,1.75,2.62,62.6,"PTH, TSH",c.93dupG,c.93dup,p.K32EfsX22,Exon 1,de novo,"(Thiele et al., 2010)"
1,P2,F,P15Y10M,PHPIa,"o,bm",43,2.33,1.69,50.8,"TSH, GHRH",c.89T>C,c.89T>C,p.L30P,Exon 1,de novo,This study
2,P3,F,P21Y8M,PHPIa,"o,bm",nr,nr,nr,45.6,TSH,c.94_96delAAG,c.94_96del,p.K32del,Exon 1,de novo,This study
3,P4,M,P1Y,PHPIa,scc,el,nr,nr,49.4,PTH,c.136_138dupCTG,c.136_138dup,p.L46dup,Exon 1,de novo,"(Fernández Rebollo et al., 2013; Lim et al., 2002)"
4,P5,F,P16Y3M,PPHP,"bm,scc,mr",nr,nr,nr,49.0,no,c.136_138dupCTG,c.136_138dup,p.L46dup,,de novo,


In [214]:
PMID = "PMID:25802881"
title = "A positive genotype-phenotype correlation in a large cohort of patients with Pseudohypoparathyroidism Type Ia and Pseudo-pseudohypoparathyroidism and 33 newly identified mutations in the GNAS gene"
created_by="ORCID:0000-0002-5648-2155"
metadata = MetaData.metadata_for_pmid(created_by=created_by, pmid=PMID, citation_title=title, include_loinc=True)

# Parsing


The suppl. table 1 summarizes the available data of our cohort of PHPIa, PPHP, and POH patients, including the age at diagnosis, the clinical signs, laboratory results, peptide hormone resistance. the underlying genetic defect, and the inheritance. P: patient, M: mother, S: sister, B: brother, f: female, m: male, nr: normal range; el: elevated; nk: not known, 
Reference values: PTH: 15-55 pg/ml, Ca: 2.1-2.6 mmol/l, Ph 0.9-1.75mmol/l, Gsalpha activity 85-115%

In [215]:
def get_aho_hpos(aho_signs):
    """
    Signs related to Albright hereditary osteodystrophy (AHO).
    Abbreviations used in the original publication: o: obesity, sst: short stature, bm: brachmetacarpia, scc: subcutanous calicifications, mr: mental redardation
    """
    aho_list = aho_signs.split(",")
    hpo_term_d = {
        "o": ["Obesity", "HP:0001513"],
        "sst": ["Short stature", "HP:0004322"],
        "bm": ["Short metacarpal", "HP:0010049"],
        "scc": ["Subcutaneous ossification", "HP:0034282"],
        "mr": ["Intellectual disability", "HP:0001249"],
        "bd": ["Brachydactyly", "HP:0001156"]
    }
    phenotypic_feature_list = list()
    for aho in aho_list:
        if aho == "no signs" or aho == "":
            continue
        if aho == "nk":
            continue # not known
        elif aho in hpo_term_d:
            arr = hpo_term_d.get(aho)
            label = arr[0]
            hpo_id = arr[1]
            term = OntologyClass(id=hpo_id, label=label)
            phenotypic_feature_list.append(PhenotypicFeature(type=term))
        else:
            raise ValueError(f"Did not recognize AHO sign: \"{aho}\"")
    return phenotypic_feature_list       


In [216]:
def get_disease(diagnosis_type):
    if diagnosis_type == "PHPIa": # PHPIa
        diseaseClass = OntologyClass(id="OMIM:103580", label="Pseudohypoparathyroidism Ia")
    elif diagnosis_type == "PPHP":
        diseaseClass = OntologyClass(id="OMIM:612463", label="Pseudopseudohypoparathyroidism")
    elif diagnosis_type == "POH":
        diseaseClass = OntologyClass(id="OMIM:166350", label="Osseous heteroplasia, progressive")	
    else:
        raise ValueError(f"Did not recognize diagnosis \"{diagnosis_type}\"")
    return Disease(term=diseaseClass) 
    	

In [217]:
def parathormone(row:pd.Series, 
        measurement_list: typing.List[Measurement],
        phenotypic_feature_list: typing.List[PhenotypicFeature]):
    """
    PTH: 15-55 pg/ml
    LOINC:35566-9, Parathyrin [Mass/volume] in Serum or Plasma --baseline
    """
    value = row["PTH[pg/ml]"]
    if value == "na" or value == "nk":
        return None
    elevated_pth = PhenotypicFeature(type=OntologyClass(id="HP:0003165", label="Elevated circulating parathyroid hormone level"))
    diminished_pth = PhenotypicFeature(type=OntologyClass(id="HP:0031817", label="Decreased circulating parathyroid hormone level"))
    not_elevated_pth = PhenotypicFeature(type=OntologyClass(id="HP:0003165", label="Elevated circulating parathyroid hormone level"), excluded=True)
    if value == "nr":
        phenotypic_feature_list.append(not_elevated_pth)
        return
    if value == "el":
        phenotypic_feature_list.append(elevated_pth)
        return
    if value == "low":
        phenotypic_feature_list.append(diminished_pth)
        return
    if isinstance(value,float): # this happens if there is no value in the input table cell
        return None
    if isinstance(value,str) and value.endswith(" "):
        raise ValueError(f"Maformed PTH: \"{value}\"")
    lower_limit_of_normal = 15
    upper_limit_of_normal = 55
    try:
        concentration = float(value)
        m = Measurements.picogram_per_milliliter(code="LOINC:35566-9",
                                      label="Parathyrin [Mass/volume] in Serum or Plasma --baseline",
                                      concentration=concentration,
                                      low=lower_limit_of_normal,
                                      high=upper_limit_of_normal)
        measurement_list.append(m)
        if concentration > upper_limit_of_normal:
            pf = elevated_pth
        elif concentration < lower_limit_of_normal:
            pf = diminished_pth
        else:
            pf = not_elevated_pth
        phenotypic_feature_list.append(pf)
    except ValueError:
        print(f"Could not parse \"{value}\"") 

In [218]:
def calcium(row:pd.Series, 
        measurement_list: typing.List[Measurement],
        phenotypic_feature_list: typing.List[PhenotypicFeature]):
    """
    Ca: 2.1-2.6 mmol/l
    LOINC: 17861-6  Calcium [Mass/volume] in Serum or Plasma
    """
    value = row["Ca[mmol/l]"]
    if value == "na" or value == "nk":
        return None
    if isinstance(value,float): # this happens if there is no value in the input table cell
        return None
    if value == "nr":
        pf = PhenotypicFeature(type=OntologyClass(id="HP:0004363", label="Abnormal circulating calcium concentration"), excluded=True)
        phenotypic_feature_list.append(pf)
        return
    if value == "el":
        pf =  PhenotypicFeature(type=OntologyClass(id="HP:0003072", label="Hypercalcemia"))
        phenotypic_feature_list.append(pf)
        return
    if value == "low":
        pf = PhenotypicFeature(type=OntologyClass(id="HP:0002901", label="Hypocalcemia"))
        phenotypic_feature_list.append(pf)
        return
    if isinstance(value,str) and value.endswith(" "):
        raise ValueError(f"Maformed PTH: \"{value}\"")
    lower_limit_of_normal = 2.1
    upper_limit_of_normal = 2.6
    try:
        concentration = float(value)
        m = Measurements.millimole_per_liter(code="LOINC:17861-6",
                        label="Calcium [Mass/volume] in Serum or Plasma",
                        concentration=concentration,
                        low=lower_limit_of_normal,
                        high=upper_limit_of_normal)
        measurement_list.append(m)
        if concentration > upper_limit_of_normal:
            pf = PhenotypicFeature(type=OntologyClass(id="HP:0003072", label="Hypercalcemia"))
        elif concentration < lower_limit_of_normal:
            pf = PhenotypicFeature(type=OntologyClass(id="HP:0002901", label="Hypocalcemia"))
        else:
            pf = PhenotypicFeature(type=OntologyClass(id="HP:0004363", label="Abnormal circulating calcium concentration"), excluded=True)
        phenotypic_feature_list.append(pf)
    except ValueError:
        print(f"Could not parse \"{value}\"") 

In [219]:
def phosphate(row:pd.Series, 
        measurement_list: typing.List[Measurement],
        phenotypic_feature_list: typing.List[PhenotypicFeature]):
    """
    Ph 0.9-1.75mmol/l
    LOINC: 2777-1 Phosphate [Mass/volume] in Serum or Plasma
    """
    value = row["Ph[mmol/l]"]
    if value == "na" or value == "nk":
        return None
    if isinstance(value,float): # this happens if there is no value in the input table cell
        return None
    if value == "nr":
        pf = PhenotypicFeature(type=OntologyClass(id="HP:0100529", label="Abnormal blood phosphate concentration"), excluded=True)
        phenotypic_feature_list.append(pf)
        return
    if value == "el":
        pf = PhenotypicFeature(type=OntologyClass(id="HP:0002905", label="Hyperphosphatemia"))
        phenotypic_feature_list.append(pf)
        return
    if value == "low":
        pf = PhenotypicFeature(type=OntologyClass(id="HP:0002148", label="Hypophosphatemia"))
        phenotypic_feature_list.append(pf)
        return
    if isinstance(value,str) and value.endswith(" "):
        raise ValueError(f"Maformed PTH: \"{value}\"")
    lower_limit_of_normal = 0.9
    upper_limit_of_normal = 1.75
    try:
        concentration = float(value)
        m = Measurements.millimole_per_liter(code="LOINC:2777-1",
                        label="Phosphate [Mass/volume] in Serum or Plasma",
                        concentration=concentration,
                        low=lower_limit_of_normal,
                        high=upper_limit_of_normal)
        measurement_list.append(m)
        if concentration > upper_limit_of_normal:
            pf = PhenotypicFeature(type=OntologyClass(id="HP:0002905", label="Hyperphosphatemia"))
        elif concentration < lower_limit_of_normal:
            pf = PhenotypicFeature(type=OntologyClass(id="HP:0002148", label="Hypophosphatemia"))
        else:
            pf = PhenotypicFeature(type=OntologyClass(id="HP:0100529", label="Abnormal blood phosphate concentration"), excluded=True)
        phenotypic_feature_list.append(pf)
    except ValueError:
        print(f"Could not parse \"{value}\"")

In [220]:
gnas_symbol = "GNAS"
gnas_id = "HGNC:4392"
gnas_MANE_transcript = "NM_000516.7"
vmanager = VariantManager(df=df, 
    individual_column_name="Index No", 
    transcript=gnas_MANE_transcript, 
    gene_id=gnas_id, 
    gene_symbol=gnas_symbol, 
    allele_1_column_name=gnas_MANE_transcript)
variant_d = vmanager.get_variant_d()

In [221]:
def row_to_phenopacket(row:pd.Series):
    individual_id = row["Index No"]
    phenopacket_id = "PMID_25802881_{}".format(individual_id.replace(" ", "_"))
    sex = row["Sex"]
    age_at_dx = row["Age"]
    age_of_onset = Age(iso8601duration=age_at_dx)
    i = Individual(id=individual_id)
    if sex == "M":
        i.sex = Sex.MALE
    elif sex == "F":
        i.sex = Sex.FEMALE
    else:
        print(f"Warning count not identify sex for {individual_id}")
    allele_1 = row[gnas_MANE_transcript]
    
    var = variant_d.get(allele_1)
    var_obj = var.to_variant_interpretation_202()
    var_obj.variation_descriptor.allelic_state = OntologyTerms.heterozygous()
    ## create genomic interpretation
    interpretation_list = list()
    genomic_interpretation = GenomicInterpretation(subject_or_biosample_id=individual_id, 
                                                       interpretation_status=GenomicInterpretation.InterpretationStatus.CAUSATIVE,
                                                       call=var_obj)
    interpretation_list.append(genomic_interpretation)
    diagnosis_type = row["PHP Type"]
    disease = get_disease(diagnosis_type=diagnosis_type)
    diagnosis = Diagnosis(disease=disease.term)
    interpretation = Interpretation(id=individual_id, progress_status=Interpretation.ProgressStatus.SOLVED, diagnosis=diagnosis)
    aho_signs = row["AHOsigns"]
    phenotypic_features = get_aho_hpos(aho_signs)
    measurements = list()
    parathormone(row=row, phenotypic_feature_list=phenotypic_features, measurement_list=measurements) 
    calcium(row=row, measurement_list=measurements, phenotypic_feature_list=phenotypic_features)
    phosphate(row=row, measurement_list=measurements, phenotypic_feature_list=phenotypic_features)
   
    ppkt = Phenopacket(id=phenopacket_id, 
                       subject=i, 
                       diseases=[disease], 
                       phenotypic_features=phenotypic_features, 
                       measurements=measurements, 
                       interpretations=[interpretation], 
                       meta_data=metadata)
    return ppkt


In [222]:
phenopacket_list = list()
firstRow = None
for _,row in df.iterrows():
    if firstRow is None:
        firstRow = row
        continue ## first row has additional definitions such as HPO identifiers
    item = row_to_phenopacket(row)
    phenopacket_list.append(item.to_message())

In [223]:
outdir = "phenopackets"
if not os.path.isdir(outdir):
    os.makedirs(outdir)
written = 0
json_list = list()
for ppkt in phenopacket_list:    
    json_string = MessageToJson(ppkt)
    fname = re.sub('[^A-Za-z0-9_-]', '', ppkt.id)  # remove any illegal characters from filename
    fname = fname.replace(" ", "_") + ".json"
    outpth = os.path.join(outdir, fname)
    with open(outpth, "wt") as fh:
        fh.write(json_string)
        json_list.append(json_string)
        written += 1
print(f"We output {written} GA4GH phenopackets to the directory {outdir}")

We output 114 GA4GH phenopackets to the directory phenopackets
