# Yamaguchi T et al. (2023) COL3A1

Data from [Yamaguchi T et al. (2023) Comprehensive genetic screening for vascular Ehlers-Danlos syndrome through an amplification-based next-generation sequencing system. Am J Med Genet A. 2023 Jan;191(1):37-51.](https://pubmed.ncbi.nlm.nih.gov/36189931/).
The authors stated that the original diagnosis was 'vEDS', 'LDS/FTAAD', or 'FTAAD', but we will code all individuals with the final diagnosis of [Ehlers-Danlos syndrome, vascular type](https://omim.org/entry/130050).

In [1]:
import pandas as pd
pd.set_option('display.max_colwidth', None) # show entire column contents, important!
from IPython.display import HTML, display
from pyphetools.creation import *
from pyphetools.validation import *
from pyphetools.visualization import *
import pyphetools
print(f"pyphetools version {pyphetools.__version__}")

pyphetools version 0.9.63


In [2]:
PMID = "PMID:36189931"
title = "Comprehensive genetic screening for vascular Ehlers-Danlos syndrome through an amplification-based next-generation sequencing system"
cite = Citation(pmid=PMID, title=title)
parser = HpoParser(hpo_json_file="../hp.json")
hpo_cr = parser.get_hpo_concept_recognizer()
hpo_version = parser.get_version()
hpo_ontology = parser.get_ontology()
metadata = MetaData(created_by="ORCID:0000-0002-5648-2155", citation=cite)
metadata.default_versions_with_hpo(version=hpo_version)
print(f"HPO version {hpo_version}")

HPO version 2024-02-27


In [3]:
df = pd.read_excel('input/Yamaguchi_2022_PMID_36189931.xlsx')
df = df.astype(str)

In [4]:
df["individual_id"] = df["Patient"].apply(lambda x: f"Individual {x}")

In [5]:
df.head()

Unnamed: 0,Patient,Variants (NM_000090.3),Suspected disease,Age at genetic diagnosis (years),Sex,Aortic dissection,Aortic rupture,Arterial dissection,Arterial rupture,Uterine rupture,...,Acrogeria,Talipes equinovarus,Congenital hip dislocation,Hypermobility of small joints,Tendon and muscle rupture,Gingival recession/fragility,Keratoconus,Early onset varicose veins,Family history,individual_id
0,1,c.547G>A:p.Gly183Ser,vEDS,59,F,−,−,−,+,,...,−,−,−,−,−,−,,−,−,Individual 1
1,2,c.547G>A:p.Gly183Ser,vEDS,29,F,−,−,+,−,−,...,−,−,−,−,−,−,,−,"Mother, died, carotid-cavernous sinus fistula",Individual 2
2,3,c.556G>A:p.Gly186Ser,vEDS,d.15,M,+,+,−,−,,...,−,−,−,−,−,−,,−,"Mother, died at 30s, subarachnoid hemorrhage",Individual 3
3,4,c.565G>C:p.Gly189Arg,vEDS,45,F,−,−,+,−,−,...,−,−,−,−,−,−,,−,−,Individual 4
4,5,c.583G>A:p.Gly195Arg,vEDS,17,F,−,−,−,−,,...,−,−,−,+,−,−,,−,"Mother, died at 32 years, aortic dissection; grandfather, died at 30s, aortic dissection",Individual 5


## Phenotypic features

The following code maps some columns automatically.

In [6]:
generator = SimpleColumnMapperGenerator(df=df, observed='+', excluded='−', hpo_cr=hpo_cr)
column_mapper_list = generator.try_mapping_columns()

In [7]:
display(HTML(generator.to_html()))

Result,Columns
Mapped,Aortic dissection; Aortic rupture; Arterial dissection; Arterial rupture; Uterine rupture; Easy bruising; Spontaneous pneumothorax; Talipes equinovarus; Congenital hip dislocation; Keratoconus
Unmapped,"Patient; Variants (NM_000090.3); Suspected disease; Age at genetic diagnosis (years); Sex; Sigmoid colon perforation; Carotid-cavernous sinus fistula; Thin, translucent skin; Characteristic facial features; Acrogeria; Hypermobility of small joints; Tendon and muscle rupture; Gingival recession/fragility; Early onset varicose veins; Family history; individual_id"


In [8]:
#Acrogeria"; -- need HPO Term
# Carotid-cavernous sinus fistula -- need HPO Term
#v"Characteristic facial features"; " -- too general to code
feature_d = {
    "Early onset varicose veins": ["Varicose veins", "HP:0002619"],
    "Gingival recession/fragility": ["Gingival fragility", "HP:0034518"], 
    "Tendon and muscle rupture": ["Tendon rupture", "HP:0100550"],
    "Hypermobility of small joints":  ["Finger joint hypermobility", "HP:0006094"],
    "Thin, translucent skin": ["Dermal translucency", "HP:0010648"]
}

for k, v in feature_d.items():
    mapper = SimpleColumnMapper(column_name=k, hpo_id=v[1], hpo_label=v[0], observed="+", excluded="−")
    column_mapper_list.append(mapper)

In [9]:
df["hgvs"] = df["Variants (NM_000090.3)"].apply(lambda x: x.split(":")[0])

In [10]:
col3a1_transcript = "NM_000090.3"
col3a1_id = "HGNC:2201"
vmanager = VariantManager(df=df,
                          transcript=col3a1_transcript,
                          individual_column_name="Patient",
                          allele_1_column_name="hgvs",
                          gene_symbol="COL3A1", 
                          gene_id=col3a1_id,
                         overwrite=True)

[INFO] encoding variant "c.724C>T"
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_000090.3%3Ac.724C>T/NM_000090.3?content-type=application%2Fjson
[INFO] encoding variant "c.[1546G>T;1556G>T]"
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_000090.3%3Ac.[1546G>T;1556G>T]/NM_000090.3?content-type=application%2Fjson
[INFO] encoding variant "c.598C>T"
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_000090.3%3Ac.598C>T/NM_000090.3?content-type=application%2Fjson
[INFO] encoding variant "c.565G>C"
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_000090.3%3Ac.565G>C/NM_000090.3?content-type=application%2Fjson
[INFO] encoding variant "c.897+2T>A"
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_000090.3%3Ac.897+2T>A/NM_000090.3?content-type=application%2Fjson
[INFO] encoding variant "c.2518G>A"
https://rest.variantvalidator.org/VariantValidator/variantvalid

In [11]:
vmanager.to_summary()

Unnamed: 0,status,count,alleles
0,mapped,33,"c.724C>T, c.[1546G>T;1556G>T], c.598C>T, c.565G>C, c.897+2T>A, c.2518G>A, c.659_664del, c.1330G>A, c.1662+1G>A, c.755G>A, c.2283+5G>T, c.547G>A, c.556G>A, c.2815G>A, c.754G>A, c.3525+1G>A, c.2134_2160del, c.848T>A, c.2870G>T, c.665G>A, c.897+2T>G, c.1194+1G>A, c.2869G>A, c.763G>T, c.1346G>T, c.1862G>A, c.583G>A, c.1977+5G>C, c.2357G>A, c.951+5G>C, c.2356G>A, c.3338G>A, c.3256G>C"
1,unmapped,1,ex. 24–33 deletion


In [12]:
vmanager.code_as_chromosomal_deletion({'ex. 24–33 deletion'})

In [13]:
vmanager.to_summary()

Unnamed: 0,status,count,alleles
0,mapped,34,"c.724C>T, c.[1546G>T;1556G>T], c.598C>T, c.565G>C, c.897+2T>A, c.2518G>A, c.659_664del, c.1330G>A, c.1662+1G>A, c.755G>A, c.2283+5G>T, c.547G>A, c.556G>A, c.2815G>A, c.754G>A, c.3525+1G>A, c.2134_2160del, c.848T>A, c.2870G>T, c.665G>A, c.897+2T>G, c.1194+1G>A, c.2869G>A, c.763G>T, c.1346G>T, c.1862G>A, c.583G>A, c.1977+5G>C, c.2357G>A, c.951+5G>C, c.2356G>A, c.3338G>A, c.3256G>C, ex. 24–33 deletion"
1,unmapped,0,


<h1>Demographic data</h1>

In [14]:
# Revise the age column because it is a mix of years (e.g. 59) and days (e.g. d.17)
def get_iso_8601(age_string):
    if age_string.startswith("d."):
        age_string = age_string[2:]
        return f"P{age_string}D"
    else:
        return f"P{age_string}Y"
df["isoAge"] = df["Age at genetic diagnosis (years)"].apply(lambda x: get_iso_8601(x))
ageMapper = AgeColumnMapper.iso8601(column_name="isoAge")
#ageMapper.preview_column(df)

In [15]:
sexMapper = SexColumnMapper(male_symbol='M', female_symbol='F', column_name='Sex')
#sexMapper.preview_column(df)

In [19]:
varMapper = VariantColumnMapper(variant_d=vmanager.get_variant_d(),
                                variant_column_name="hgvs", 
                                default_genotype="heterozygous")

In [20]:
vEDS = Disease(disease_id='OMIM:130050', disease_label='Ehlers-Danlos syndrome, vascular type')
encoder = CohortEncoder(df=df, 
                        hpo_cr=hpo_cr, 
                        column_mapper_list=column_mapper_list, 
                        individual_column_name="individual_id", 
                        age_of_onset_mapper=ageMapper, 
                        age_at_last_encounter_mapper=AgeColumnMapper.not_provided(),
                        sexmapper=sexMapper,
                        variant_mapper=varMapper,
                        metadata=metadata)
encoder.set_disease(vEDS)

In [21]:
individuals = encoder.get_individuals()

In [22]:
cvalidator = CohortValidator(cohort=individuals, ontology=hpo_ontology, min_hpo=1, allelic_requirement=AllelicRequirement.MONO_ALLELIC)
qc = QcVisualizer(cohort_validator=cvalidator)
display(HTML(qc.to_summary_html()))

Level,Error category,Count
INFORMATION,NOT_MEASURED,82


In [23]:
individuals = cvalidator.get_error_free_individual_list()
table = PhenopacketTable(individual_list=individuals, metadata=metadata)
display(HTML(table.to_html()))

Individual,Disease,Genotype,Phenotypic features
Individual 1 (FEMALE; n/a),"Ehlers-Danlos syndrome, vascular type (OMIM:130050)",NM_000090.3:c.547G>A (heterozygous),Arterial rupture (HP:0025019); Bruising susceptibility (HP:0000978); Dermal translucency (HP:0010648); excluded: Aortic dissection (HP:0002647); excluded: Aortic rupture (HP:0031649); excluded: Arterial dissection (HP:0005294); excluded: Spontaneous pneumothorax (HP:0002108); excluded: Talipes equinovarus (HP:0001762); excluded: Congenital hip dislocation (HP:0001374); excluded: Varicose veins (HP:0002619); excluded: Gingival fragility (HP:0034518); excluded: Tendon rupture (HP:0100550); excluded: Finger joint hypermobility (HP:0006094)
Individual 2 (FEMALE; n/a),"Ehlers-Danlos syndrome, vascular type (OMIM:130050)",NM_000090.3:c.547G>A (heterozygous),Arterial dissection (HP:0005294); Bruising susceptibility (HP:0000978); excluded: Aortic dissection (HP:0002647); excluded: Aortic rupture (HP:0031649); excluded: Arterial rupture (HP:0025019); excluded: Uterine rupture (HP:0100718); excluded: Spontaneous pneumothorax (HP:0002108); excluded: Talipes equinovarus (HP:0001762); excluded: Congenital hip dislocation (HP:0001374); excluded: Varicose veins (HP:0002619); excluded: Gingival fragility (HP:0034518); excluded: Tendon rupture (HP:0100550); excluded: Finger joint hypermobility (HP:0006094); excluded: Dermal translucency (HP:0010648)
Individual 3 (MALE; n/a),"Ehlers-Danlos syndrome, vascular type (OMIM:130050)",NM_000090.3:c.556G>A (heterozygous),Aortic dissection (HP:0002647); Aortic rupture (HP:0031649); excluded: Arterial dissection (HP:0005294); excluded: Arterial rupture (HP:0025019); excluded: Bruising susceptibility (HP:0000978); excluded: Spontaneous pneumothorax (HP:0002108); excluded: Talipes equinovarus (HP:0001762); excluded: Congenital hip dislocation (HP:0001374); excluded: Varicose veins (HP:0002619); excluded: Gingival fragility (HP:0034518); excluded: Tendon rupture (HP:0100550); excluded: Finger joint hypermobility (HP:0006094); excluded: Dermal translucency (HP:0010648)
Individual 4 (FEMALE; n/a),"Ehlers-Danlos syndrome, vascular type (OMIM:130050)",NM_000090.3:c.565G>C (heterozygous),Arterial dissection (HP:0005294); Bruising susceptibility (HP:0000978); Dermal translucency (HP:0010648); excluded: Aortic dissection (HP:0002647); excluded: Aortic rupture (HP:0031649); excluded: Arterial rupture (HP:0025019); excluded: Uterine rupture (HP:0100718); excluded: Spontaneous pneumothorax (HP:0002108); excluded: Talipes equinovarus (HP:0001762); excluded: Congenital hip dislocation (HP:0001374); excluded: Varicose veins (HP:0002619); excluded: Gingival fragility (HP:0034518); excluded: Tendon rupture (HP:0100550); excluded: Finger joint hypermobility (HP:0006094)
Individual 5 (FEMALE; n/a),"Ehlers-Danlos syndrome, vascular type (OMIM:130050)",NM_000090.3:c.583G>A (heterozygous),Bruising susceptibility (HP:0000978); Finger joint hypermobility (HP:0006094); Dermal translucency (HP:0010648); excluded: Aortic dissection (HP:0002647); excluded: Aortic rupture (HP:0031649); excluded: Arterial dissection (HP:0005294); excluded: Arterial rupture (HP:0025019); excluded: Spontaneous pneumothorax (HP:0002108); excluded: Talipes equinovarus (HP:0001762); excluded: Congenital hip dislocation (HP:0001374); excluded: Varicose veins (HP:0002619); excluded: Gingival fragility (HP:0034518); excluded: Tendon rupture (HP:0100550)
Individual 6 (FEMALE; n/a),"Ehlers-Danlos syndrome, vascular type (OMIM:130050)",NM_000090.3:c.598C>T (heterozygous),Arterial dissection (HP:0005294); Bruising susceptibility (HP:0000978); Keratoconus (HP:0000563); Gingival fragility (HP:0034518); Dermal translucency (HP:0010648); excluded: Aortic dissection (HP:0002647); excluded: Aortic rupture (HP:0031649); excluded: Arterial rupture (HP:0025019); excluded: Uterine rupture (HP:0100718); excluded: Spontaneous pneumothorax (HP:0002108); excluded: Talipes equinovarus (HP:0001762); excluded: Congenital hip dislocation (HP:0001374); excluded: Varicose veins (HP:0002619); excluded: Tendon rupture (HP:0100550); excluded: Finger joint hypermobility (HP:0006094)
Individual 7 (FEMALE; n/a),"Ehlers-Danlos syndrome, vascular type (OMIM:130050)",NM_000090.3:c.659_664del (heterozygous),Bruising susceptibility (HP:0000978); Talipes equinovarus (HP:0001762); Finger joint hypermobility (HP:0006094); excluded: Aortic dissection (HP:0002647); excluded: Aortic rupture (HP:0031649); excluded: Arterial dissection (HP:0005294); excluded: Arterial rupture (HP:0025019); excluded: Spontaneous pneumothorax (HP:0002108); excluded: Congenital hip dislocation (HP:0001374); excluded: Varicose veins (HP:0002619); excluded: Gingival fragility (HP:0034518); excluded: Tendon rupture (HP:0100550); excluded: Dermal translucency (HP:0010648)
Individual 8 (FEMALE; n/a),"Ehlers-Danlos syndrome, vascular type (OMIM:130050)",NM_000090.3:c.665G>A (heterozygous),Aortic dissection (HP:0002647); Arterial dissection (HP:0005294); Bruising susceptibility (HP:0000978); Gingival fragility (HP:0034518); Dermal translucency (HP:0010648); excluded: Aortic rupture (HP:0031649); excluded: Arterial rupture (HP:0025019); excluded: Uterine rupture (HP:0100718); excluded: Spontaneous pneumothorax (HP:0002108); excluded: Talipes equinovarus (HP:0001762); excluded: Congenital hip dislocation (HP:0001374); excluded: Varicose veins (HP:0002619); excluded: Tendon rupture (HP:0100550); excluded: Finger joint hypermobility (HP:0006094)
Individual 9 (MALE; n/a),"Ehlers-Danlos syndrome, vascular type (OMIM:130050)",NM_000090.3:c.724C>T (heterozygous),Aortic dissection (HP:0002647); excluded: Aortic rupture (HP:0031649); excluded: Arterial dissection (HP:0005294); excluded: Arterial rupture (HP:0025019); excluded: Bruising susceptibility (HP:0000978); excluded: Spontaneous pneumothorax (HP:0002108); excluded: Talipes equinovarus (HP:0001762); excluded: Congenital hip dislocation (HP:0001374); excluded: Varicose veins (HP:0002619); excluded: Tendon rupture (HP:0100550); excluded: Finger joint hypermobility (HP:0006094); excluded: Dermal translucency (HP:0010648)
Individual 10 (MALE; n/a),"Ehlers-Danlos syndrome, vascular type (OMIM:130050)",NM_000090.3:c.754G>A (heterozygous),Aortic dissection (HP:0002647); Spontaneous pneumothorax (HP:0002108); excluded: Aortic rupture (HP:0031649); excluded: Arterial dissection (HP:0005294); excluded: Arterial rupture (HP:0025019)


In [24]:
Individual.output_individuals_as_phenopackets(individual_list=individuals,
                                              metadata=metadata)

We output 35 GA4GH phenopackets to the directory phenopackets
