# Yamaguchi T et al. (2023) COL3A1

Data from [Yamaguchi T et al. (2023) Comprehensive genetic screening for vascular Ehlers-Danlos syndrome through an amplification-based next-generation sequencing system. Am J Med Genet A. 2023 Jan;191(1):37-51.](https://pubmed.ncbi.nlm.nih.gov/36189931/).
The authors stated that the original diagnosis was 'vEDS', 'LDS/FTAAD', or 'FTAAD', but we will code all individuals with the final diagnosis of [Ehlers-Danlos syndrome, vascular type](https://omim.org/entry/130050).

In [1]:
import pandas as pd
pd.set_option('display.max_colwidth', None) # show entire column contents, important!
from IPython.display import HTML, display
from pyphetools.creation import *
from pyphetools.validation import *
from pyphetools.visualization import *
import pyphetools
print(f"pyphetools version {pyphetools.__version__}")

pyphetools version 0.9.1


In [2]:
PMID = "PMID:36189931"
title = "Comprehensive genetic screening for vascular Ehlers-Danlos syndrome through an amplification-based next-generation sequencing system"
cite = Citation(pmid=PMID, title=title)
parser = HpoParser()
hpo_cr = parser.get_hpo_concept_recognizer()
hpo_version = parser.get_version()
hpo_ontology = parser.get_ontology()
metadata = MetaData(created_by="ORCID:0000-0002-5648-2155", citation=cite)
metadata.default_versions_with_hpo(version=hpo_version)
print(f"HPO version {hpo_version}")

HPO version 2023-10-09


In [3]:
df = pd.read_excel('input/Yamaguchi_2022_PMID_36189931.xlsx')
df = df.astype(str)

In [4]:
df.head()

Unnamed: 0,Patient,Variants (NM_000090.3),Suspected disease,Age at genetic diagnosis (years),Sex,Aortic dissection,Aortic rupture,Arterial dissection,Arterial rupture,Uterine rupture,...,Spontaneous pneumothorax,Acrogeria,Talipes equinovarus,Congenital hip dislocation,Hypermobility of small joints,Tendon and muscle rupture,Gingival recession/fragility,Keratoconus,Early onset varicose veins,Family history
0,1,c.547G>A:p.Gly183Ser,vEDS,59,F,−,−,−,+,,...,−,−,−,−,−,−,−,,−,−
1,2,c.547G>A:p.Gly183Ser,vEDS,29,F,−,−,+,−,−,...,−,−,−,−,−,−,−,,−,"Mother, died, carotid-cavernous sinus fistula"
2,3,c.556G>A:p.Gly186Ser,vEDS,d.15,M,+,+,−,−,,...,−,−,−,−,−,−,−,,−,"Mother, died at 30s, subarachnoid hemorrhage"
3,4,c.565G>C:p.Gly189Arg,vEDS,45,F,−,−,+,−,−,...,−,−,−,−,−,−,−,,−,−
4,5,c.583G>A:p.Gly195Arg,vEDS,17,F,−,−,−,−,,...,−,−,−,−,+,−,−,,−,"Mother, died at 32 years, aortic dissection; grandfather, died at 30s, aortic dissection"


## Phenotypic features

The following code maps some columns automatically.

In [5]:
generator = SimpleColumnMapperGenerator(df=df, observed='+', excluded='−', hpo_cr=hpo_cr)
column_mapper_d = generator.try_mapping_columns()

In [6]:
display(HTML(generator.to_html()))

Result,Columns
Mapped,Aortic dissection; Aortic rupture; Arterial dissection; Arterial rupture; Uterine rupture; Easy bruising; Spontaneous pneumothorax; Talipes equinovarus; Congenital hip dislocation; Keratoconus
Unmapped,"Patient; Variants (NM_000090.3); Suspected disease; Age at genetic diagnosis (years); Sex; Sigmoid colon perforation; Carotid-cavernous sinus fistula; Thin, translucent skin; Characteristic facial features; Acrogeria; Hypermobility of small joints; Tendon and muscle rupture; Gingival recession/fragility; Early onset varicose veins; Family history"


In [7]:
#Acrogeria"; -- need HPO Term
# Carotid-cavernous sinus fistula -- need HPO Term
#v"Characteristic facial features"; " -- too general to code
feature_d = {
    "Early onset varicose veins": ["Varicose veins", "HP:0002619"],
    "Gingival recession/fragility": ["Gingival fragility", "HP:0034518"], 
    "Tendon and muscle rupture": ["Tendon rupture", "HP:0100550"],
    "Hypermobility of small joints":  ["Finger joint hypermobility", "HP:0006094"],
    "Thin, translucent skin": ["Dermal translucency", "HP:0010648"]
}

for k, v in feature_d.items():
    mapper = SimpleColumnMapper(hpo_id=v[1], hpo_label=v[0], observed="+", excluded="−")
    column_mapper_d[k] = mapper

In [8]:
df["hgvs"] = df["Variants (NM_000090.3)"].apply(lambda x: x.split(":")[0])

In [9]:
col3a1_transcript = "NM_000090.3"
col3a1_id = "HGNC:2201"
vvalidator = VariantValidator(genome_build="hg38", transcript=col3a1_transcript)
variant_d = {}
for v in df["hgvs"].unique():
    if v == "ex. 24–33 deletion":
        var = StructuralVariant.chromosomal_deletion(cell_contents=v, gene_symbol="COL3A1", gene_id=col3a1_id)
    else:
        var = vvalidator.encode_hgvs(v)
    variant_d[v] = var

https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_000090.3%3Ac.547G>A/NM_000090.3?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_000090.3%3Ac.556G>A/NM_000090.3?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_000090.3%3Ac.565G>C/NM_000090.3?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_000090.3%3Ac.583G>A/NM_000090.3?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_000090.3%3Ac.598C>T/NM_000090.3?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_000090.3%3Ac.659_664del/NM_000090.3?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_000090.3%3Ac.665G>A/NM_000090.3?content-type=application%2Fjson
https://rest.var

<h1>Demographic data</h1>

In [10]:
# Revise the age column because it is a mix of years (e.g. 59) and days (e.g. d.17)
def get_iso_8601(age_string):
    if age_string.startswith("d."):
        age_string = age_string[2:]
        return f"P{age_string}D"
    else:
        return f"P{age_string}Y"
df["isoAge"] = df["Age at genetic diagnosis (years)"].apply(lambda x: get_iso_8601(x))
ageMapper = AgeColumnMapper.iso8601(column_name="isoAge")
#ageMapper.preview_column(df["isoAge"])

In [11]:
sexMapper = SexColumnMapper(male_symbol='M', female_symbol='F', column_name='Sex')
#sexMapper.preview_column(df['Sex'])

In [12]:
vEDS = Disease(disease_id='OMIM:130050', disease_label='Ehlers-Danlos syndrome, vascular type')
encoder = CohortEncoder(df=df, 
                        hpo_cr=hpo_cr, 
                        column_mapper_d=column_mapper_d, 
                        individual_column_name="Patient", 
                        agemapper=ageMapper, 
                        sexmapper=sexMapper,
                        metadata=metadata,  
                        citation=cite)
encoder.set_disease(vEDS)

In [13]:
individuals = encoder.get_individuals()

# Add variant data

In [14]:
patient_id_to_hgvs_d = {row["Patient"]:row["hgvs"] for _, row in df[["Patient", "hgvs"]].iterrows()}
for indi in individuals:
    if  indi.id not in patient_id_to_hgvs_d:
        raise ValueError(f"Could not find individual \"{indi.id}\"")
    hgvs = patient_id_to_hgvs_d.get(indi.id)
    var = variant_d.get(hgvs)
    var.set_heterozygous()
    indi.add_variant(var)

# Perform Q/C

In [15]:
cvalidator = CohortValidator(cohort=individuals, ontology=hpo_ontology, min_hpo=1, allelic_requirement=AllelicRequirement.MONO_ALLELIC)
qc = QcVisualizer(ontology=hpo_ontology, cohort_validator=cvalidator)
display(HTML(qc.to_html()))

ID,Level,Category,Message,HPO Term
PMID_36189931_10,INFORMATION,NOT_MEASURED,Bruising susceptibility (HP:0000978) was listed as not measured and will be omitted,not measured: Bruising susceptibility (HP:0000978)
PMID_36189931_10,INFORMATION,NOT_MEASURED,Congenital hip dislocation (HP:0001374) was listed as not measured and will be omitted,not measured: Congenital hip dislocation (HP:0001374)
PMID_36189931_23,INFORMATION,NOT_MEASURED,Congenital hip dislocation (HP:0001374) was listed as not measured and will be omitted,not measured: Congenital hip dislocation (HP:0001374)
PMID_36189931_30,INFORMATION,NOT_MEASURED,Congenital hip dislocation (HP:0001374) was listed as not measured and will be omitted,not measured: Congenital hip dislocation (HP:0001374)
PMID_36189931_10,INFORMATION,NOT_MEASURED,Dermal translucency (HP:0010648) was listed as not measured and will be omitted,not measured: Dermal translucency (HP:0010648)
PMID_36189931_30,INFORMATION,NOT_MEASURED,Dermal translucency (HP:0010648) was listed as not measured and will be omitted,not measured: Dermal translucency (HP:0010648)
PMID_36189931_10,INFORMATION,NOT_MEASURED,Finger joint hypermobility (HP:0006094) was listed as not measured and will be omitted,not measured: Finger joint hypermobility (HP:0006094)
PMID_36189931_30,INFORMATION,NOT_MEASURED,Finger joint hypermobility (HP:0006094) was listed as not measured and will be omitted,not measured: Finger joint hypermobility (HP:0006094)
PMID_36189931_9,INFORMATION,NOT_MEASURED,Gingival fragility (HP:0034518) was listed as not measured and will be omitted,not measured: Gingival fragility (HP:0034518)
PMID_36189931_10,INFORMATION,NOT_MEASURED,Gingival fragility (HP:0034518) was listed as not measured and will be omitted,not measured: Gingival fragility (HP:0034518)


In [16]:
individuals = cvalidator.get_error_free_individual_list()
table = PhenopacketTable(individual_list=individuals, metadata=metadata)
display(HTML(table.to_html()))

Individual,Disease,Genotype,Phenotypic features
1 (FEMALE; P59Y),"Ehlers-Danlos syndrome, vascular type (OMIM:130050)",NM_000090.3:c.547G>A (heterozygous),Arterial rupture (HP:0025019); Bruising susceptibility (HP:0000978); Dermal translucency (HP:0010648); excluded: Aortic dissection (HP:0002647); excluded: Aortic rupture (HP:0031649); excluded: Arterial dissection (HP:0005294); excluded: Spontaneous pneumothorax (HP:0002108); excluded: Talipes equinovarus (HP:0001762); excluded: Congenital hip dislocation (HP:0001374); excluded: Varicose veins (HP:0002619); excluded: Gingival fragility (HP:0034518); excluded: Tendon rupture (HP:0100550); excluded: Finger joint hypermobility (HP:0006094)
2 (FEMALE; P29Y),"Ehlers-Danlos syndrome, vascular type (OMIM:130050)",NM_000090.3:c.547G>A (heterozygous),Arterial dissection (HP:0005294); Bruising susceptibility (HP:0000978); excluded: Aortic dissection (HP:0002647); excluded: Aortic rupture (HP:0031649); excluded: Arterial rupture (HP:0025019); excluded: Uterine rupture (HP:0100718); excluded: Spontaneous pneumothorax (HP:0002108); excluded: Talipes equinovarus (HP:0001762); excluded: Congenital hip dislocation (HP:0001374); excluded: Varicose veins (HP:0002619); excluded: Gingival fragility (HP:0034518); excluded: Tendon rupture (HP:0100550); excluded: Finger joint hypermobility (HP:0006094); excluded: Dermal translucency (HP:0010648)
3 (MALE; P15D),"Ehlers-Danlos syndrome, vascular type (OMIM:130050)",NM_000090.3:c.556G>A (heterozygous),Aortic dissection (HP:0002647); Aortic rupture (HP:0031649); excluded: Arterial dissection (HP:0005294); excluded: Arterial rupture (HP:0025019); excluded: Bruising susceptibility (HP:0000978); excluded: Spontaneous pneumothorax (HP:0002108); excluded: Talipes equinovarus (HP:0001762); excluded: Congenital hip dislocation (HP:0001374); excluded: Varicose veins (HP:0002619); excluded: Gingival fragility (HP:0034518); excluded: Tendon rupture (HP:0100550); excluded: Finger joint hypermobility (HP:0006094); excluded: Dermal translucency (HP:0010648)
4 (FEMALE; P45Y),"Ehlers-Danlos syndrome, vascular type (OMIM:130050)",NM_000090.3:c.565G>C (heterozygous),Arterial dissection (HP:0005294); Bruising susceptibility (HP:0000978); Dermal translucency (HP:0010648); excluded: Aortic dissection (HP:0002647); excluded: Aortic rupture (HP:0031649); excluded: Arterial rupture (HP:0025019); excluded: Uterine rupture (HP:0100718); excluded: Spontaneous pneumothorax (HP:0002108); excluded: Talipes equinovarus (HP:0001762); excluded: Congenital hip dislocation (HP:0001374); excluded: Varicose veins (HP:0002619); excluded: Gingival fragility (HP:0034518); excluded: Tendon rupture (HP:0100550); excluded: Finger joint hypermobility (HP:0006094)
5 (FEMALE; P17Y),"Ehlers-Danlos syndrome, vascular type (OMIM:130050)",NM_000090.3:c.583G>A (heterozygous),Bruising susceptibility (HP:0000978); Finger joint hypermobility (HP:0006094); Dermal translucency (HP:0010648); excluded: Aortic dissection (HP:0002647); excluded: Aortic rupture (HP:0031649); excluded: Arterial dissection (HP:0005294); excluded: Arterial rupture (HP:0025019); excluded: Spontaneous pneumothorax (HP:0002108); excluded: Talipes equinovarus (HP:0001762); excluded: Congenital hip dislocation (HP:0001374); excluded: Varicose veins (HP:0002619); excluded: Gingival fragility (HP:0034518); excluded: Tendon rupture (HP:0100550)
6 (FEMALE; P54Y),"Ehlers-Danlos syndrome, vascular type (OMIM:130050)",NM_000090.3:c.598C>T (heterozygous),Arterial dissection (HP:0005294); Bruising susceptibility (HP:0000978); Keratoconus (HP:0000563); Gingival fragility (HP:0034518); Dermal translucency (HP:0010648); excluded: Aortic dissection (HP:0002647); excluded: Aortic rupture (HP:0031649); excluded: Arterial rupture (HP:0025019); excluded: Uterine rupture (HP:0100718); excluded: Spontaneous pneumothorax (HP:0002108); excluded: Talipes equinovarus (HP:0001762); excluded: Congenital hip dislocation (HP:0001374); excluded: Varicose veins (HP:0002619); excluded: Tendon rupture (HP:0100550); excluded: Finger joint hypermobility (HP:0006094)
7 (FEMALE; P21Y),"Ehlers-Danlos syndrome, vascular type (OMIM:130050)",NM_000090.3:c.659_664del (heterozygous),Bruising susceptibility (HP:0000978); Talipes equinovarus (HP:0001762); Finger joint hypermobility (HP:0006094); excluded: Aortic dissection (HP:0002647); excluded: Aortic rupture (HP:0031649); excluded: Arterial dissection (HP:0005294); excluded: Arterial rupture (HP:0025019); excluded: Spontaneous pneumothorax (HP:0002108); excluded: Congenital hip dislocation (HP:0001374); excluded: Varicose veins (HP:0002619); excluded: Gingival fragility (HP:0034518); excluded: Tendon rupture (HP:0100550); excluded: Dermal translucency (HP:0010648)
8 (FEMALE; P34Y),"Ehlers-Danlos syndrome, vascular type (OMIM:130050)",NM_000090.3:c.665G>A (heterozygous),Aortic dissection (HP:0002647); Arterial dissection (HP:0005294); Bruising susceptibility (HP:0000978); Gingival fragility (HP:0034518); Dermal translucency (HP:0010648); excluded: Aortic rupture (HP:0031649); excluded: Arterial rupture (HP:0025019); excluded: Uterine rupture (HP:0100718); excluded: Spontaneous pneumothorax (HP:0002108); excluded: Talipes equinovarus (HP:0001762); excluded: Congenital hip dislocation (HP:0001374); excluded: Varicose veins (HP:0002619); excluded: Tendon rupture (HP:0100550); excluded: Finger joint hypermobility (HP:0006094)
9 (MALE; P50Y),"Ehlers-Danlos syndrome, vascular type (OMIM:130050)",NM_000090.3:c.724C>T (heterozygous),Aortic dissection (HP:0002647); excluded: Aortic rupture (HP:0031649); excluded: Arterial dissection (HP:0005294); excluded: Arterial rupture (HP:0025019); excluded: Bruising susceptibility (HP:0000978); excluded: Spontaneous pneumothorax (HP:0002108); excluded: Talipes equinovarus (HP:0001762); excluded: Congenital hip dislocation (HP:0001374); excluded: Varicose veins (HP:0002619); excluded: Tendon rupture (HP:0100550); excluded: Finger joint hypermobility (HP:0006094); excluded: Dermal translucency (HP:0010648)
10 (MALE; P27Y),"Ehlers-Danlos syndrome, vascular type (OMIM:130050)",NM_000090.3:c.754G>A (heterozygous),Aortic dissection (HP:0002647); Spontaneous pneumothorax (HP:0002108); excluded: Aortic rupture (HP:0031649); excluded: Arterial dissection (HP:0005294); excluded: Arterial rupture (HP:0025019)


In [17]:
Individual.output_individuals_as_phenopackets(individual_list=individuals,
                                              metadata=metadata,
                                              outdir="phenopackets")

We output 35 GA4GH phenopackets to the directory phenopackets
