Skip to content
/ MRPipeline Public template

Investigating the causal relationship between exposures and Alzheimer's endophenotypes

License

Notifications You must be signed in to change notification settings

marcoralab/MRPipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Mendelian Randomziation Snakemake Pipeline

A snakemake pipeline for implementing Mendelian randomization


mr.smk is a snakemake/R pipeline for estimating the causal association of a given exposure with an outcome using the TwoSampleMR R package. The steps in the pipeline include:

  1. Clump GWAS summary statistics using clumping window of 250kb, r2 of 0.1 and significance level of 1 was used for the index and secondary SNPs.
  2. Plot a Manhattan plot
  3. Obstain genetic variants (Instruments) below a given PT from the exposure GWAS.
  4. Extract SNP effects from the outcome GWAS, and if exposure instrument is not avaliable in the outcome GWAS.
  5. For Exposure instruments missing in the Outcome GWAS, look for LD proxies in the 1000 genomes EUR samples.
  6. Harmonize exposure and outcome effects using TwoSampleMR::harmonise_data
  7. Perform a Global Test of Pelietropy and idenfity outliers using MR-PRESSO
  8. Run MR analysis, senstivity analysis and write a Rmarkdown html report.

DAG

Getting Started

Installation

Be sure to download and install the latest versions of the following software packages:

  1. Python 3
  2. Snakemake
  3. R
  4. Rstudio
  5. PLINK

Plink should be located within the the /usr/local/bin/ directory. The following code can be used to move the executibles: cp </path/to/executible> /usr/local/bin/

The following R packages are also required:

  1. tidyverse
  2. rmarkdown
  3. TwoSampleMR
  4. MR-PRESSO
  5. RadialMR
## Install tidyverse, rmarkdown, and devtools
install.packages(c("tidyverse", "rmarkdown", "devtools"))

## Install TwoSampleMR and ggforce
devtools::install_github(c("MRCIEU/TwoSampleMR", '"rondolab/MR-PRESSO", "WSpiller/RadialMR"'))

Once all the prerequiste software is isntalled, MitoImpute can be installed on a git-enabled machine by typeing:

git clone https://github.com/marcoralab/MendelianRandomization.git

Formatting GWAS Summary Statistics

Summary statisitcs do not need to be in a specific format. However it is recomended to use the Summary Statistic Munger pipeline to standardize all the files.

Further Reading on formating summary statistics

  1. Tips for Formatting A Lot of GWAS Summary Association Statistics Data
  2. Across-cohort QC analyses of GWAS summary statistics from complex traits

Config File

The config.yaml file should contain the following information:

DataOut: path/to/dir/ for intermediate output files
DataOutput: path/to/dir/ for final output files

REF: path/to/population reference plink files (eg 1000 Geneomes)
clumpr2: r2 value for LD clumping - recomended 0.001
clumpkb: LD clumping window - recomended 1000
Pthreshold: pvalue theshold for selecting intrusments

## Forbidden wildcard combinations
missing: [
   {"ExposureCode" : CODE, "OutcomeCode" : CODE, "Pthreshold" : PVALUE}
 ]

## LOAD as outcome
EXPOSURES:
  -
    CODE: code to use
    NAME: trait name
    FILE: path/to/gwas/summary/statistics
    COLUMNS:
      SNP: marker name
      CHROM: chromsome
      POS: POS
      REF: REF
      ALT: ALT
      AF: AF
      BETA: BETA
      SE: SE
      P: P
      Z: Z
      N: N
      TRAIT: TRAIT


OUTCOMES:
  -
    CODE: Kunkle2019load
    NAME: 'Late Onset Alzheimers Disease'
    FILE: 1_RawData/GWAS/Kunkle2019load.chrall.CPRA_b37.tsv.gz
    COLUMNS:
      SNP: DBSNP_ID
      CHROM: CHROM
      POS: POS
      REF: REF
      ALT: ALT
      AF: AF
      BETA: BETA
      SE: SE
      P: P
      Z: Z
      N: N
      TRAIT: TRAIT

About

Investigating the causal relationship between exposures and Alzheimer's endophenotypes

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages