A snakemake pipeline for implementing Mendelian randomization
mr.smk is a snakemake/R pipeline for estimating the causal association of a given exposure with an outcome using the TwoSampleMR R package. The steps in the pipeline include:
- Clump GWAS summary statistics using clumping window of 250kb, r2 of 0.1 and significance level of 1 was used for the index and secondary SNPs.
- Plot a Manhattan plot
- Obstain genetic variants (Instruments) below a given PT from the exposure GWAS.
- Extract SNP effects from the outcome GWAS, and if exposure instrument is not avaliable in the outcome GWAS.
- For Exposure instruments missing in the Outcome GWAS, look for LD proxies in the 1000 genomes EUR samples.
- Harmonize exposure and outcome effects using
TwoSampleMR::harmonise_data
- Perform a Global Test of Pelietropy and idenfity outliers using
MR-PRESSO
- Run MR analysis, senstivity analysis and write a Rmarkdown html report.
Be sure to download and install the latest versions of the following software packages:
Plink should be located within the the /usr/local/bin/ directory. The following code can be used to move the executibles: cp </path/to/executible> /usr/local/bin/
The following R packages are also required:
## Install tidyverse, rmarkdown, and devtools
install.packages(c("tidyverse", "rmarkdown", "devtools"))
## Install TwoSampleMR and ggforce
devtools::install_github(c("MRCIEU/TwoSampleMR", '"rondolab/MR-PRESSO", "WSpiller/RadialMR"'))
Once all the prerequiste software is isntalled, MitoImpute can be installed on a git-enabled machine by typeing:
git clone https://github.com/marcoralab/MendelianRandomization.git
Summary statisitcs do not need to be in a specific format. However it is recomended to use the Summary Statistic Munger pipeline to standardize all the files.
- Tips for Formatting A Lot of GWAS Summary Association Statistics Data
- Across-cohort QC analyses of GWAS summary statistics from complex traits
The config.yaml file should contain the following information:
DataOut: path/to/dir/ for intermediate output files
DataOutput: path/to/dir/ for final output files
REF: path/to/population reference plink files (eg 1000 Geneomes)
clumpr2: r2 value for LD clumping - recomended 0.001
clumpkb: LD clumping window - recomended 1000
Pthreshold: pvalue theshold for selecting intrusments
## Forbidden wildcard combinations
missing: [
{"ExposureCode" : CODE, "OutcomeCode" : CODE, "Pthreshold" : PVALUE}
]
## LOAD as outcome
EXPOSURES:
-
CODE: code to use
NAME: trait name
FILE: path/to/gwas/summary/statistics
COLUMNS:
SNP: marker name
CHROM: chromsome
POS: POS
REF: REF
ALT: ALT
AF: AF
BETA: BETA
SE: SE
P: P
Z: Z
N: N
TRAIT: TRAIT
OUTCOMES:
-
CODE: Kunkle2019load
NAME: 'Late Onset Alzheimers Disease'
FILE: 1_RawData/GWAS/Kunkle2019load.chrall.CPRA_b37.tsv.gz
COLUMNS:
SNP: DBSNP_ID
CHROM: CHROM
POS: POS
REF: REF
ALT: ALT
AF: AF
BETA: BETA
SE: SE
P: P
Z: Z
N: N
TRAIT: TRAIT