<H1>FBN1: Marfan syndrome (Katzke, 2002)</H1>
<p>Extract phenopackets from the clinical data in <a href="https://pubmed.ncbi.nlm.nih.gov/12203992/" target="__blank">Katzke et al (2002)</a>. Note that we will remove individual B46 from the analysis because only very scant clinical information is provided.</p>

In [1]:
import pandas as pd
pd.set_option('display.max_colwidth', None) # show entire column contents, important!
from IPython.display import display, HTML
from pyphetools.creation import *
from pyphetools.visualization import *
from pyphetools.validation import *
import pyphetools
print(f"Using pyphetools version {pyphetools.__version__}")

Using pyphetools version 0.9.63


In [2]:
PMID = "PMID:12203992"
title = "TGGE screening of the entire FBN1 coding sequence in 126 individuals with marfan syndrome and related fibrillinopathies"
cite = Citation(pmid=PMID, title=title)
parser = HpoParser(hpo_json_file="../hp.json")
hpo_cr = parser.get_hpo_concept_recognizer()
hpo_version = parser.get_version()
hpo_ontology = parser.get_ontology()
metadata = MetaData(created_by="ORCID:0000-0002-0736-9199", citation=cite)
metadata.default_versions_with_hpo(version=hpo_version)
print(f"HPO version {hpo_version}")

HPO version 2024-02-27


In [3]:
df = pd.read_table("input/katzke_2002.tsv")
df.set_index('Patient', inplace=True)
df["individual_id"] = df.index
df.drop("B46",inplace=True)

In [4]:
df.head()

Unnamed: 0_level_0,Exon,Gent,Age,Gender,Skeletal,Ocular,Cardiovascular,Other,FH,Cosegregation,HGVS,Protein,individual_id
Patient,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
D15,2,y,25,M,14815,EL,ARD,St,+,,c.184C>T,R62C,D15
D55,2,n,31,M,-,"EL,RD",n,*,-,+,c.184C>T,R62C,D55
D10,3,n,16,F,71415,EL,n,n,-,+,c.344C>G,S115C,D10
B3,14,e,15,F,146,"EL,M",ARD,-,-,+,c.1760G>A,C587Y,B3
B1,14,y,8,M,4578121415,"EL,M","ARD,MVP",n,+,+,c.1787G>A,C596Y,B1


In [5]:
column_mapper_list = list()

In [6]:
def get_skeletal_items(skel):
    items = str(skel).split(",")
    d = {
        "1": "Tall stature",
        "2": "Pectus carinatum",
        "3": "Pectus excavatum",
        "4": "Arachnodactyly",
        "5": "Dolichostenomelia", # US/LSo0.86 or arm span to height
        "6": "Scoliosis",
        "7": "Limited elbow extension",
        "8": "Pes planus",
        "9": "Protrusio acetabuli",
        "10": "Pectus excavatum",
        "11": "Joint hypermobility",
        "12": "High palate",
        "13": "Malar flattening", #  typical facial appearance in the original publication
        "14": "Dolichocephaly",
        "15": "Enophthalmus"
    }
    results = set()
    for it in items:
        if it in d:
            results.add(d.get(it))
        elif it == "-" or it == "nan" or len(it) == 0:
            continue
        else:
            raise ValueError(f"Could not find \"{it}\" in list")
    if "1" in results and "5" in results:
        results.remove("1") # "1" is implied by "5" and is redundant, so remove it!
    return " ;".join(results)


for pat_id in df.index:
    skeletal = df.at[pat_id, "Skeletal"]
    df.at[pat_id, "skeletal_terms"] = get_skeletal_items(skel=skeletal)
    #print(df.at[pat_id, "Skeletal"])

In [7]:
df.head()

Unnamed: 0_level_0,Exon,Gent,Age,Gender,Skeletal,Ocular,Cardiovascular,Other,FH,Cosegregation,HGVS,Protein,individual_id,skeletal_terms
Patient,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
D15,2,y,25,M,14815,EL,ARD,St,+,,c.184C>T,R62C,D15,Pes planus ;Enophthalmus ;Arachnodactyly ;Tall stature
D55,2,n,31,M,-,"EL,RD",n,*,-,+,c.184C>T,R62C,D55,
D10,3,n,16,F,71415,EL,n,n,-,+,c.344C>G,S115C,D10,Dolichocephaly ;Limited elbow extension ;Enophthalmus
B3,14,e,15,F,146,"EL,M",ARD,-,-,+,c.1760G>A,C587Y,B3,Scoliosis ;Arachnodactyly ;Tall stature
B1,14,y,8,M,4578121415,"EL,M","ARD,MVP",n,+,+,c.1787G>A,C596Y,B1,Pes planus ;High palate ;Enophthalmus ;Dolichostenomelia ;Dolichocephaly ;Limited elbow extension ;Arachnodactyly


In [8]:
skelMapper = OptionColumnMapper(column_name='skeletal_terms',concept_recognizer=hpo_cr, option_d={})
column_mapper_list.append(skelMapper)
skelMapper.preview_column(df)

Unnamed: 0,mapping,count
0,Pes planus (HP:0001763) (observed),13
1,Arachnodactyly (HP:0001166) (observed),26
2,Tall stature (HP:0000098) (observed),20
3,Dolichocephaly (HP:0000268) (observed),9
4,Limited elbow extension (HP:0001377) (observed),3
5,Scoliosis (HP:0002650) (observed),13
6,High palate (HP:0000218) (observed),11
7,Disproportionate tall stature (HP:0001519) (observed),7
8,Pectus excavatum (HP:0000767) (observed),8
9,Joint hypermobility (HP:0001382) (observed),14


In [9]:
ophth_d ={
    "EL": "Ectopia lentis",
    "M": "Myopia",
    "RD": "Retinal detachment",
    "FC": "Flat cornea"
}
ophthMapper = OptionColumnMapper(column_name="Ocular",concept_recognizer=hpo_cr, option_d=ophth_d)
column_mapper_list.append(ophthMapper)
ophthMapper.preview_column(df)

Unnamed: 0,mapping,count
0,Ectopia lentis (HP:0001083) (observed),26
1,Retinal detachment (HP:0000541) (observed),1
2,Myopia (HP:0000545) (observed),18
3,Flat cornea (HP:0007720) (observed),2


In [10]:
cv_d ={
    "ARD": "Aortic root aneurysm",
    "MVP":"Mitral valve prolapse",
    'AR-dis':"Ascending aortic dissection",
    'MVP':"Mitral valve prolapse",
    "MR2*": "Mitral regurgitation",
}
cvMapper = OptionColumnMapper(column_name='Cardiovascular',concept_recognizer=hpo_cr, option_d=cv_d)
column_mapper_list.append(cvMapper)
cvMapper.preview_column(df)

Unnamed: 0,mapping,count
0,Aortic root aneurysm (HP:0002616) (observed),19
1,Mitral valve prolapse (HP:0001634) (observed),8
2,Ascending aortic dissection (HP:0004933) (observed),3
3,Mitral regurgitation (HP:0001653) (observed),1


In [11]:
other_d = {
    "St": "Striae distensae",
    "H": "Inguinal hernia",
    "colob": "Lens coloboma"
}
otherMapper = OptionColumnMapper(column_name='Other',concept_recognizer=hpo_cr, option_d=other_d)
column_mapper_list.append(otherMapper)
otherMapper.preview_column(df)

Unnamed: 0,mapping,count
0,Striae distensae (HP:0001065) (observed),7
1,Inguinal hernia (HP:0000023) (observed),5
2,Lens coloboma (HP:0100719) (observed),1


In [12]:
fbn1_transcript='NM_000138.5' 
vman = VariantManager(df=df, 
                      transcript=fbn1_transcript, 
                      gene_symbol="FBN1", 
                      allele_1_column_name='HGVS', 
                      individual_column_name="individual_id")

In [14]:
varMapper = VariantColumnMapper(variant_d=vman.get_variant_d(),
                                variant_column_name='HGVS', 
                                default_genotype='heterozygous')
sexMapper = SexColumnMapper(male_symbol="M", female_symbol="F", column_name="Gender")
ageMapper = AgeColumnMapper.by_year("Age")
encoder = CohortEncoder(df=df, 
                        hpo_cr=hpo_cr, 
                        column_mapper_list=column_mapper_list, 
                        individual_column_name="Patient", 
                        age_at_last_encounter_mapper=ageMapper, 
                        age_of_onset_mapper=AgeColumnMapper.not_provided(),
                        sexmapper=sexMapper,
                        variant_mapper=varMapper,
                        metadata=metadata)
omim_id = "OMIM:154700"
omim_label = "Marfan syndrome"
mfs = Disease(disease_id=omim_id, disease_label=omim_label)
encoder.set_disease(disease=mfs)

In [15]:
individuals = encoder.get_individuals()
cvalidator = CohortValidator(cohort=individuals, ontology=hpo_ontology, min_hpo=1, allelic_requirement=AllelicRequirement.MONO_ALLELIC)
qc = QcVisualizer(cohort_validator=cvalidator)
display(HTML(qc.to_summary_html()))

Level,Error category,Count
WARNING,REDUNDANT,4


In [16]:
individuals = cvalidator.get_error_free_individual_list()
phenopackets = [i.to_ga4gh_phenopacket(metadata=metadata.to_ga4gh()) for i in individuals]
table = PhenopacketTable(phenopacket_list=phenopackets)
display(HTML(table.to_html()))

Individual,Disease,Genotype,Phenotypic features
D15 (MALE; P25Y),Marfan syndrome (OMIM:154700),NM_000138.5:c.184C>T (heterozygous),Pes planus (HP:0001763); Arachnodactyly (HP:0001166); Tall stature (HP:0000098); Ectopia lentis (HP:0001083); Aortic root aneurysm (HP:0002616); Striae distensae (HP:0001065)
D55 (MALE; P31Y),Marfan syndrome (OMIM:154700),NM_000138.5:c.184C>T (heterozygous),Ectopia lentis (HP:0001083); Retinal detachment (HP:0000541)
D10 (FEMALE; P16Y),Marfan syndrome (OMIM:154700),NM_000138.5:c.344C>G (heterozygous),Dolichocephaly (HP:0000268); Limited elbow extension (HP:0001377); Ectopia lentis (HP:0001083)
B3 (FEMALE; P15Y),Marfan syndrome (OMIM:154700),NM_000138.5:c.1760G>A (heterozygous),Scoliosis (HP:0002650); Arachnodactyly (HP:0001166); Tall stature (HP:0000098); Ectopia lentis (HP:0001083); Myopia (HP:0000545); Aortic root aneurysm (HP:0002616)
B1 (MALE; P8Y),Marfan syndrome (OMIM:154700),NM_000138.5:c.1787G>A (heterozygous),Pes planus (HP:0001763); High palate (HP:0000218); Disproportionate tall stature (HP:0001519); Dolichocephaly (HP:0000268); Limited elbow extension (HP:0001377); Arachnodactyly (HP:0001166); Ectopia lentis (HP:0001083); Myopia (HP:0000545); Aortic root aneurysm (HP:0002616); Mitral valve prolapse (HP:0001634)
D26 (FEMALE; P50Y),Marfan syndrome (OMIM:154700),NM_000138.5:c.1960G>A (heterozygous),Pes planus (HP:0001763); Flat cornea (HP:0007720); Aortic root aneurysm (HP:0002616); Ascending aortic dissection (HP:0004933); Striae distensae (HP:0001065)
B9 (MALE; P12Y),Marfan syndrome (OMIM:154700),NM_000138.5:c.2055C>G (heterozygous),Pes planus (HP:0001763); Arachnodactyly (HP:0001166); Pectus excavatum (HP:0000767); Tall stature (HP:0000098); Ectopia lentis (HP:0001083); Aortic root aneurysm (HP:0002616)
B19 (MALE; P3Y),Marfan syndrome (OMIM:154700),NM_000138.5:c.2055C>G (heterozygous),Scoliosis (HP:0002650); High palate (HP:0000218); Disproportionate tall stature (HP:0001519); Joint hypermobility (HP:0001382); Arachnodactyly (HP:0001166); Ectopia lentis (HP:0001083); Aortic root aneurysm (HP:0002616)
D59 (FEMALE; P16Y),Marfan syndrome (OMIM:154700),NM_000138.5:c.2042C>A (heterozygous),Scoliosis (HP:0002650); High palate (HP:0000218); Tall stature (HP:0000098); Joint hypermobility (HP:0001382); Arachnodactyly (HP:0001166); Ectopia lentis (HP:0001083); Myopia (HP:0000545)
D62 (FEMALE; P49Y),Marfan syndrome (OMIM:154700),NM_000138.5:c.2047T>C (heterozygous),Scoliosis (HP:0002650); Pes planus (HP:0001763); Joint hypermobility (HP:0001382); Arachnodactyly (HP:0001166); Pectus excavatum (HP:0000767); Ectopia lentis (HP:0001083); Myopia (HP:0000545); Mitral valve prolapse (HP:0001634); Striae distensae (HP:0001065)


In [17]:
output_directory = "phenopackets"
Individual.output_individuals_as_phenopackets(individual_list=individuals,
                                              metadata=metadata,
                                              outdir=output_directory)

We output 32 GA4GH phenopackets to the directory phenopackets
