# Estimate the rare variant heritability of age-related hearing impairment traits

# Aim


# Concepts


Proportion of phenotypic variance captured by common SNPs - SNP-based heritability (h^2SNP)

Discrepancy between h^2ped and h^2SNP:

1. Causal variants are not well tagged by common SNPs because they are rare
2. Pedigree heritability is overestimated because of confounding with environmental effects or non-additive genetic variation


## 1. Calculate the SNP-based heritability from common variants and compared to available literature

First of all a GRM from all of the autosomal SNPs needs to be calculated.


## Input files

### Phenotype files

These files have already been QC'ed to include individuals with each of the hearing impairment traits and control individuals without hearing impairment related phenotypes

Mega-sample 

H-aid:

* /mnt/mfs/statgen/UKBiobank/phenotype_files/hearing_impairment/090321_UKBB_Hearing_aid_f3393_expandedwhite_6436cases_96601ctrl_PC1_2.tsv

H-diff:

* /mnt/mfs/statgen/UKBiobank/phenotype_files/hearing_impairment/090321_UKBB_Hearing_difficulty_f2247_expandedwhite_45502cases_96601ctrl_PC1_2.tsv

H-noise:

* /mnt/mfs/statgen/UKBiobank/phenotype_files/hearing_impairment/090321_UKBB_Hearing_noise_f2257_expandedwhite_65660cases_96601ctrl_PC1_2.tsv

H-both:

* /mnt/mfs/statgen/UKBiobank/phenotype_files/hearing_impairment/090321_UKBB_Combined_f2247_f2257_expandedwhite_38410cases_96601ctrl_PC1_2.tsv


### Genotype files

Original exome sequence files in plink format are here: 
* /mnt/mfs/statgen/UKBiobank/data/exome_files/project_VCF/072721_run/plink/ukb23156_c{1..22}.merged.filtered.bed

QC done in these VCF files was: 

- DP-SPNs=10
- DP-indels=10
- GQ=20
- AB-SNP=0.15
- AB-indel=0.20
- geno=0.1

Samples with missingness >10% `-mind 0.1` in the genotype array

* ~/UKBiobank/data/exome_files/project_VCF/072721_run/merged_plink/mind_0.1/cache/ukb23155_qc_merged.mind_0.1.filtered.mindrem.id

Extra QC step is needed here to make sure we have the best quality variants for heritability calculation

### Selecting white European samples

According to Wainschtein et al 2022, they do two rounds of PC's calculations (20 PC's) one with common variants and one with rare variants. The prunning is also done using different parameters for each of these analyses

- Common: MAF 0.01-0.5, window 50Kb, r2=0.1
- Rare: MAF 0.004 (MAC=5) - 0.01, window 100Kb, r2=0.05

In our case, we will use our already defined white European population that was classified using the genotype array data with common variants, calculating 10 PC's and the manhalanobis distance to dected outliers. 

* /mnt/mfs/statgen/UKBiobank/results/083021_PCA_results/europeans/cache/UKB_genotypedatadownloaded083019.090221_sample_variant_qc_final_callrate90.filtered.extracted.europeans.filtered.fam

### Removing related individuals 

In this step we just want to keep the unrelated European individuals for heritability calculations. We use a kinship=0.0625 (to remove related individuals up to third degree)

* remove_samples=/mnt/mfs/statgen/UKBiobank/results/083021_PCA_results/090221_king/*.related_id

### Remove individuals showing excess of heterozygosity based on GRM off-diagonal?

Don't know if this is necessary or not

Here we start with QC'ed exome sequence data but we will generate additional files with more stringent QC for heritability calculation

1. MAF keep all rare-variants (Wainschtein 2022 paper uses `--maf 0.0001` )
2. `--geno 0.05` (originally for our exome QC we used a `--geno 0.1`)
3. `--hwe 0.000001` 
4. `--mind 0.05` (originally we did not remove individuals based on mind for the exome QC)
5. `--snps_only` add this option to remove indels from calculation

### Build the GRM from the exome sequence data?

# Step 1. Extra QC on the exome data

In [2]:
## Select White European unrelated individuals 
## Do some extra QC on the exome data
cwd=$UKBB_PATH/results/ARHI_heritability
## Use the exome filtered file
genoFile=`echo $UKBB_PATH/data/exome_files/project_VCF/072721_run/plink/ukb23156_c{1..22}.merged.filtered.bed`
## To keep the samples of white Europeans only
keep_samples=$UKBB_PATH/results/083021_PCA_results/europeans/cache/UKB_genotypedatadownloaded083019.090221_sample_variant_qc_final_callrate90.filtered.extracted.europeans.filtered.fam
## To remove related individuals 
remove_samples=$UKBB_PATH/results/083021_PCA_results/090221_king/*.related_id

# Do not set a MAF filter, this will keep both common and rare variants
maf_filter=0 
# Set a more stringent geno filter of 0.05
geno_filter=0.05
# Set a HWE filter 1x10^-6
hwe_filter=0.000001
# Do not set a sample missingness filter at this point, otherwise many samples would be removed
mind_filter=0

gwas_sbatch=~/hearing/heritability/herit_eur_unrel_exome_$(date +"%Y-%m-%d").sbatch
gwasqc_sos=~/project/xqtl-pipeline/code/data_preprocessing/genotype/GWAS_QC.ipynb
tpl_file=~/bioworkflows/admin/csg.yml
container=~/containers/bioinfo.sif
numThreads=20
job_size=1
mem='30G'

gwasqc_args="""qc:1
    --cwd $cwd
    --genoFile $genoFile
    --keep_samples $keep_samples
    --remove_samples $remove_samples
    --maf_filter $maf_filter
    --geno_filter $geno_filter
    --hwe_filter $hwe_filter
    --mind_filter $mind_filter
    --numThreads $numThreads 
    --job_size $job_size
    --container $container
    --mem $mem
"""

sos run ~/project/UKBB_GWAS_dev/admin/Get_Job_Script.ipynb csg\
    --template-file $tpl_file \
    --workflow-file $gwasqc_sos \
    --to-script $gwas_sbatch \
    --args "$gwasqc_args"


INFO: Running [32mcsg[0m: Configuration for Columbia csg partition cluster
INFO: [32mcsg[0m is [32mcompleted[0m.
INFO: [32mcsg[0m output:   [32m/home/dmc2245/hearing/heritability/herit_eur_unrel_exome_2022-05-12.sbatch[0m
INFO: Workflow csg (ID=wceb9651c235b3364) is executed successfully with 1 completed step.


In [None]:
###### Merge BEDs ######
#${list_beds} contains the list of the autosomes (excepted chr 1) for merging

plink \
	--bfile ${BED_file_merged} \
	--merge-list ${list_beds} \
	--make-bed \
	--maf 0.0001 \
	--geno 0.05  \
	--hwe 0.000001 \
	--mind 0.05 \
	--out ${BED_file_merged_QC} \
	--threads ${ncpu}