# Exercise 4 - Rare form of Familial Hypercholesterolemia (FH)
**Questions are marked in bold**

From a check at the doctor, the child (HG04204) of a small family is found to have abnormally high levels of cholesterol.

![alt](images/1.jpg)

It turns out that the parents are somewhat related as 2nd degree cousins but there are no other records of abnormal cholesterol levels in the family

![alt](images/2.jpg)

**Q1. If the abnormal levels are caused by genetics, what type of inheritance is most likely based on the above information?**

*Double-click this text to write your answer*

As part of research for a screening-program, the two children and parents are whole-exome sequenced (WES) in the search for variants that could explain the phenotype. let's load the WES data

In [1]:
#Load the WES data
vars <- readRDS('/course/novo23/wes/ex02.wes.rds')
consequences <- c('transcript_ablation','splice_acceptor_variant','splice_donor_variant','stop_gained','frameshift_variant','stop_lost','start_lost','transcript_amplification','inframe_insertion','inframe_deletion','missense_variant','protein_altering_variant','splice_region_variant','incomplete_terminal_codon_variant','start_retained_variant','stop_retained_variant','synonymous_variant')
paste0('Loaded ', nrow(vars), ' WES variants')

**Q2. Assuming Familial Hypercholesterolemia (FH) prevalence of 0.1% what is the maximum allele frequency you could find of a high penetrance recessive FH-variant?** Prevalence of FH due to this variant:
$$
Prevalence=P(homozygous)=AF^2
$$
Modify the code below to find the maximum allele frequency which can lead to a FH-prevalence of 0.1%

In [7]:
# Chosen allele frequency
AF=0.032 # <- change this

# Calculate prevalence from chosen allele frequency
prevalence=AF^2

# Print out prevalence in percentage:
paste0('FH prevalence due to this variant= ', prevalence*100, '%')

From [ensembl](https://www.ensembl.org/info/genome/variation/prediction/predicted_data.html) we have the following severity-order of functional consequences

|Order|Consequence|IMPACT|
|---|---|---|
|1|transcript_ablation|HIGH|
|2|splice_acceptor_variant|HIGH|
|3|splice_donor_variant|HIGH|
|4|stop_gained|HIGH|
|5|frameshift_variant|HIGH|
|6|stop_lost|HIGH|
|7|start_lost|HIGH|
|8|transcript_amplification|HIGH|
|9|inframe_insertion|MODERATE|
|10|inframe_deletion|MODERATE|
|11|missense_variant|MODERATE|
|12|protein_altering_variant|MODERATE|
|13|splice_region_variant|LOW|
|14|incomplete_terminal_codon_variant|LOW|
|15|start_retained_variant|LOW|
|16|stop_retained_variant|LOW|
|17|synonymous_variant|LOW|

WES filtering:
1. Select ‘stop_gained’ as the worst consequence (to use as a filter).
2. Based on the above (**Q2**), choose a maximum allele frequency (AF) in gnomAD

In [11]:
# Filterings
# Try to change the maximum gnomad AF and worst consequence.
max_gnomad_AF = 0.033
consequence_or_worse_than = 'stop_gained'

or_worse = consequences[1:which(consequences==consequence_or_worse_than)]
subset(vars,
       gnomAD_freq < max_gnomad_AF &
       Consequence %in% or_worse)

Unnamed: 0_level_0,Chr,Pos,Ref,Alt,HG03642,HG03679,HG04204,HG04215,Gene,Consequence,AminoAcid,rsID,gnomAD_freq
Unnamed: 0_level_1,<chr>,<int>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<dbl>
347,chr1,1049050,TGC...,T,0|0,0|1,0|0,0|0,AGRN,splice_donor_variant,,rs1553177542,0.0278467
5,chr1,25553937,G,A,0|1,0|1,1|1,1|0,LDLRAP1,stop_gained,A,,0.0
83,chr3,131024739,G,A,0|0,0|1,0|0,1|0,ASTE1,stop_gained,A,rs549290479,0.0
6590,chr3,195726458,C,T,0|1,0|0,0|0,0|0,MUC20,stop_gained,T,rs73203946,0.00226419
7483,chr4,140389235,G,A,0|0,0|1,1|0,0|0,CLGN,stop_gained,A,rs200583755,0.00328947
8610,chr5,141173919,G,T,0|0,1|0,1|0,1|0,PCDHB7,stop_gained,T,rs138641501,0.000328731
10121,chr6,54083615,CTG,C,1|0,0|0,0|1,0|1,MLIP,splice_donor_variant,,rs536383818,0.00360656
10328,chr6,99446136,T,A,0|0,1|0,1|0,1|0,USP45,stop_gained,A,rs189281869,0.0041507
123,chr6,117387779,C,G,0|1,0|0,0|1,0|0,ROS1,splice_donor_variant,,rs573272485,0.0
12695,chr8,51556839,C,T,1|0,0|0,0|1,0|1,PXDNL,splice_donor_variant,,rs544725965,0.0180683


To find a candidate variant, try
- Looking at the genotype for the sick individual (HG04204)
- Google the gene name of candidate variants to determine the function
- Set a more strict maximum gnomAD AF filter

**Q3. Based on the above, propose a candidate disease variant that follows the inheritance from Q1**

*Double-click this text to write your answer*

# Continue with the next exercise 
["Exerciese 5 - WES of individuals with diabetes"](5WESdiab.ipynb)