Mendelian Randomziation Snakemake Pipeline

A snakemake pipeline for implementing Mendelian randomization

mr.smk is a snakemake/R pipeline for estimating the causal association of a given exposure with an outcome using the TwoSampleMR R package. The steps in the pipeline include:

Clump GWAS summary statistics using clumping window of 250kb, r2 of 0.1 and significance level of 1 was used for the index and secondary SNPs.
Plot a Manhattan plot
Obstain genetic variants (Instruments) below a given P_T from the exposure GWAS.
Extract SNP effects from the outcome GWAS, and if exposure instrument is not avaliable in the outcome GWAS.
For Exposure instruments missing in the Outcome GWAS, look for LD proxies in the 1000 genomes EUR samples.
Harmonize exposure and outcome effects using TwoSampleMR::harmonise_data
Perform a Global Test of Pelietropy and idenfity outliers using MR-PRESSO
Run MR analysis, senstivity analysis and write a Rmarkdown html report.

Getting Started

Installation

Be sure to download and install the latest versions of the following software packages:

Plink should be located within the the /usr/local/bin/ directory. The following code can be used to move the executibles: cp </path/to/executible> /usr/local/bin/

The following R packages are also required:

## Install tidyverse, rmarkdown, and devtools
install.packages(c("tidyverse", "rmarkdown", "devtools"))

## Install TwoSampleMR and ggforce
devtools::install_github(c("MRCIEU/TwoSampleMR", '"rondolab/MR-PRESSO", "WSpiller/RadialMR"'))

Once all the prerequiste software is isntalled, MitoImpute can be installed on a git-enabled machine by typeing:

git clone https://github.com/marcoralab/MendelianRandomization.git

Formatting GWAS Summary Statistics

Summary statisitcs do not need to be in a specific format. However it is recomended to use the Summary Statistic Munger pipeline to standardize all the files.

Config File

The config.yaml file should contain the following information:

DataOut: path/to/dir/ for intermediate output files
DataOutput: path/to/dir/ for final output files

REF: path/to/population reference plink files (eg 1000 Geneomes)
clumpr2: r2 value for LD clumping - recomended 0.001
clumpkb: LD clumping window - recomended 1000
Pthreshold: pvalue theshold for selecting intrusments

## Forbidden wildcard combinations
missing: [
   {"ExposureCode" : CODE, "OutcomeCode" : CODE, "Pthreshold" : PVALUE}
 ]

## LOAD as outcome
EXPOSURES:
  -
    CODE: code to use
    NAME: trait name
    FILE: path/to/gwas/summary/statistics
    COLUMNS:
      SNP: marker name
      CHROM: chromsome
      POS: POS
      REF: REF
      ALT: ALT
      AF: AF
      BETA: BETA
      SE: SE
      P: P
      Z: Z
      N: N
      TRAIT: TRAIT


OUTCOMES:
  -
    CODE: Kunkle2019load
    NAME: 'Late Onset Alzheimers Disease'
    FILE: 1_RawData/GWAS/Kunkle2019load.chrall.CPRA_b37.tsv.gz
    COLUMNS:
      SNP: DBSNP_ID
      CHROM: CHROM
      POS: POS
      REF: REF
      ALT: ALT
      AF: AF
      BETA: BETA
      SE: SE
      P: P
      Z: Z
      N: N
      TRAIT: TRAIT

Name		Name	Last commit message	Last commit date
Latest commit History 299 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cluster.yaml		cluster.yaml
config.yaml		config.yaml
dag_mr.svg		dag_mr.svg
mr.smk		mr.smk

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mendelian Randomziation Snakemake Pipeline

Getting Started

Installation

Formatting GWAS Summary Statistics

Further Reading on formating summary statistics

Config File

About

Releases

Packages

Languages

License

marcoralab/MRPipeline

Folders and files

Latest commit

History

Repository files navigation

Mendelian Randomziation Snakemake Pipeline

Getting Started

Installation

Formatting GWAS Summary Statistics

Further Reading on formating summary statistics

Config File

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages