Skip to content
/ hipo Public

Heritability Informed Power Optimization (HIPO)

Notifications You must be signed in to change notification settings

gqi/hipo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

hipo

hipo is an R package that performs heritability informed power optimization (HIPO)1 for conducting multi-trait association analysis on summary level data.

Installing the package.

Before using this package, you need to install the LDSC software that performs LD score regression2,3 to estimate heritability and genetic correlation. Click here for LDSC. You must have Python and all the Python packages required by LDSC. LDSC recommends installing Anaconda 2 which comes with all the required Python packages.

You can use the LD score files and HapMap 3 SNP list provided by LD Hub. Download and unzip by running the following code in Terminal

wget https://data.broadinstitute.org/alkesgroup/LDSCORE/eur_w_ld_chr.tar.bz2
wget https://data.broadinstitute.org/alkesgroup/LDSCORE/w_hm3.snplist.bz2
bunzip2 eur_w_ld_chr.tar.bz2
tar -xvf eur_w_ld_chr.tar
bunzip2 w_hm3.snplist.bz2

and put them in the ldsc folder. You can also specify your own LD score file later in the function.

The dplyr packages is also required. Type install.packages("dplyr") in the R console to install.

Example

Once you have installed and loaded the package, here's an example to guide you through the procedure.

  1. Download and unzip the Global Lipids Genetics Consortium (2013) summary results from joint analysis of metabochip and GWAS data. Click on the link below to download.
  1. Read the data (I store the summary level data in the folder blood_lipid_data). To run the script below, install readr package first by typing install.packages("readr") in the R console.
rm(list=ls())
library(readr)
library(dplyr)
traitvec = c('LDL', 'HDL', 'TG', 'TC') # Following the order on the consortium website
sumstats = list()
for (trait in traitvec){
    sumstats[[trait]] = read_delim(paste0('blood_lipid_data/jointGwasMc_', trait, '.txt'), delim = '\t') %>% filter(rsid!='.') # Remove SNPs without an rs number.
}
  1. Preprocess the data and transform them into data frames with 9 columns:
  • rsid: RS number of SNPs.
  • A1: effect allele.
  • A2: reference allele.
  • N: sample size.
  • z: z-statistic.
  • pval: p-value.
  • chr: chromosome.
  • bp: physical position.
  • freqA1: allele frequency of A1.
for (trait in traitvec){
    sumstats[[trait]] = sumstats[[trait]] %>%
        mutate(position = strsplit(SNP_hg19, split = ':')) %>%
        mutate(chr = as.integer(gsub('chr', '',sapply(position, function(x) x[1]))), bp = as.integer(sapply(position, function(x) x[2]))) %>%
        mutate(z = beta/se, pval = 2*(1-pnorm(abs(z))) , A1 = toupper(A1), A2 = toupper(A2)) %>%
        select(rsid, A1, A2, N, z, pval, chr, bp, freqA1 = Freq.A1.1000G.EUR) %>%
    print(trait)
}
  1. Implement HIPO Change the directory arguments before running the code below:
  • out.path: the directory where you want the intermediate LDSC output files to be stored.
  • ldsc.path: the directory that stores the LDSC package. We will be calling LDSC from there.
  • python.path: the directory of the Python version you want to use. On a Mac computer, check ~/.bash_profile file for the path to Anaconda 2. python.path can be unspecified. But in this case, your default version of Python must have all the required packages.
library(hipo)
res = hipo(sumstats, out.path = "PATH_TO_STORE_LDSC_OUTPUT_FILES", maf.thr = 0.05, HIPOD.num = 4, ldsc.path = "PATH_TO_LDSC", python.path = "PATH_TO_PYTHON", mergeallele = TRUE, truncate = NULL)

Note: Error KeyError: '[-1 -1 -1 ... -1 -2 -1] not in index' is usually caused by a new version of pandas. Try reverting pandas to version 0.18.1.

References

  1. Qi, Guanghao, and Chatterjee, Nilanjan. "Heritability informed power optimization (HIPO) leads to enhanced detection of genetic associations across multiple traits." bioRxiv (2017): 218404.
  2. Bulik-Sullivan, Brendan K., et al. "LD score regression distinguishes confounding from polygenicity in genome-wide association studies." Nature genetics 47.3 (2015): 291-295.
  3. Bulik-Sullivan, Brendan, et al. "An atlas of genetic correlations across human diseases and traits." Nature genetics 47.11 (2015): 1236-1241.

About

Heritability Informed Power Optimization (HIPO)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages