This is a demo of a PRS script that allows you to automatically train and calculate PRS scores across all popular PRS methods. Currently, the demo will only work for LDpred2 and a simple PRS scoring method so the demo can be done in a timely fashion. For purposes of the demo, the PRS methods are trained only on data from chromosome 22 to make computation quick.
- Clone this repo into a folder on your local computer
git clone https://github.com/qlu-lab/PRS-DEMO.git
- Change directory into PRS-DEMO folder
cd PRS-DEMO
-
PRS-DEMO
is developed using R. The statistical computing software R (>=4.3) is required.- The following packages are necessary for running
PRS-DEMO
, but they will be automatically installed for you when you run the demo if you don't already have them installed. Required R packages: tidyverse, data.table, R.utils, plyr, bigsnpr, bigreadr, optparse, foreach, rngtools- Please download these R packages ahead of the demo using
install.packges
if you are able to
- Please download these R packages ahead of the demo using
- The following packages are necessary for running
-
Make output folder for PRS weights
mkdir weights
- Download LD and GWAS data and put it in the input folder
If you don't already have wget
downloaded on your computer, follow the following tutorials to download it on your machine.
- Download and Install wget on Mac
- Download and Install wget on Linux
- Download and Install wget on Windows
Download the LD and GWAS data using wget
wget -nd -r -P ./input ftp://ftp.biostat.wisc.edu/pub/lu_group/Projects/PRS_demo/input
[Update!]Alternatively, you can download the input data from the Box drive
-
Download PLINK
-
Move the downloaded file from your Downloads folder to your
PRS-DEMO
folder -
For macs, you may see the following error message after downloading PLINK:
-
If you get this error go to System Settings -> Privacy & Security and scroll down until you get to this section. Allow PLINK to be downloaded and try downloading again.
-
If you are using Windows, download Git bash. Use this as your terminal to run the PRS script.
- To run the script to get PRS scores, run
bash calculate_prs.sh \
-s ./input/gwas_train.txt.gz \
-l ./input/1kg_hm3_QCed_noM \
-g ./input/1kg_hm3_QCed_noM \
-p ./plink \
-m ldpred2,prs \
-o mac
Where flags are
- -s: path to sumstats_file
- -l: path to LD files
- -g: path to genotype file
- -p: path to PLINK software
- -m: PRS methods you want to run (right now it will default to run LDPred2 and default PRS method)
- -o: opterating system (mac, windows, or linux)
Output will be written to prs_scores.txt
and the first few rows of data will look like:
The columns are as follows:
- FID: family ID from genotype file
- IID: individual ID from genotype file
- LDpred2_: 13 columns representing running LDpred2 with 13 different tuning parameters
- LDpred2_auto: sample p from a posterior distribution and calculate h2 in each iteration of Gibbs sampler
- All other columns are of the format LDpred2_0.03_0.001_sparse. In this example, the tuning parameters are:
- heritability (h2) is 0.03
- proportion of causal variants (p) is 0.001
- sparse is True (when sparse is True, for variants whose posterior probability of being a causal variant is smaller than the set proportion of causal variants value, their effect sizes will be exactly 0)
- PRS_: 5 columns representing running default PRS method with 5 different tuning parameters
- All columns are of the format PRS_0.01. In this example the tuning parameter is:
- p-value <= 0.01
- All columns are of the format PRS_0.01. In this example the tuning parameter is: