# PRS calculation

In this session, we will calculate height PRS on aDNA samples. The tool we will use for this is called PRSice-2. More info can be found here: https://choishingwan.github.io/PRSice/

For this session, we will again use the quilt2 conda, which you have already installed. If not, you can find it at our GitHub page https://github.com/lm-ut/Workshop_25/ or in this folder: /gpfs/helios/projects/echo_workshops/project.1.tk/conda_env

### Input GWAS file

We will calculate score for height, therefore we will use the most recent and biggest height GWAS from Yengo et al. 2022 (https://pubmed.ncbi.nlm.nih.gov/36224396/). Have a look at the file:

In [None]:
cd
mkdir PRS_height
cd PRS_height

In [None]:
cp /gpfs/helios/projects/echo_workshops/project.1.tk/data/PRSice_files/GWAS_height_Yengo_2022 .

In [None]:
head GWAS_height_Yengo_2022

How many SNPs are in the file? How many genome-wide significant (P-value < 5e-08) SNPs are present?

In [None]:
awk '$10 < 5e-08' GWAS_height_Yengo_2022 | wc -l

### PRSice-2

We will calculate PRS for various P-value thresholds using PRSice-2. You can find the script here: /gpfs/helios/projects/echo_workshops/project.1.tk/scripts/PRSice/PRSice

In [None]:
cp /gpfs/helios/projects/echo_workshops/project.1.tk/scripts/PRSice/PRSice/PRSice.R .
cp /gpfs/helios/projects/echo_workshops/project.1.tk/scripts/PRSice/PRSice/PRSice_linux .
cp /gpfs/helios/projects/echo_workshops/project.1.tk/data/PRSice_files/1000G_NFE_for_height_PRS.* .
cp /gpfs/helios/projects/echo_workshops/project.1.tk/data/PRSice_files/England_Gretzinger_for_PRS.* .
cp /gpfs/helios/projects/echo_workshops/project.1.tk/scripts/PRSice/PRS_height.slurm .

In [None]:
#!/bin/bash
#SBATCH -J PRSice
#SBATCH --time=01:00:00
#SBATCH --ntasks=1
#SBATCH -A ealloc_e7679_project1-tk-echo
#SBATCH --nodes=1

your_user="" #fill in your user name here

export PATH="/gpfs/helios/home/etais/${your_user}/miniconda3/bin:${PATH}"
source activate quilt2

Rscript ./PRSice.R \
--prsice PRSice_linux \
--base GWAS_height_Yengo_2022 \
--target 1000G_NFE_for_height_PRS \
--snp CHR_POS_ID --chr CHR --bp POS --A1 EFFECT_ALLELE --A2 OTHER_ALLELE \
--stat BETA --beta \
--pvalue P \
--clump-kb 250 --clump-p 1 --clump-r2 0.1 \
--bar-levels 5e-08,1e-05,0.001,0.05,0.5,1 \
--fastscore \
--no-regress \
--print-snp \
--out PRS_height_1000G_NFE

Rscript ./PRSice.R \
--prsice PRSice_linux \
--base GWAS_height_Yengo_2022 \
--target England_Gretzinger_for_PRS \
--snp CHR_POS_ID --chr CHR --bp POS --A1 EFFECT_ALLELE --A2 OTHER_ALLELE \
--stat BETA --beta \
--pvalue P \
--no-clump --extract PRS_height_1000G_NFE.snp \
--bar-levels 5e-08,1e-05,0.001,0.05,0.5,1 \
--fastscore \
--no-regress \
--print-snp \
--out PRS_height_Gretzinger

This creates several files. The raw PRS scores can be found in the files with .all_score extension. Further, in the files with extension .prsice, you can find some usefull information.

In [None]:
cat PRS_height_1000G_NFE.prsice
cat PRS_height_Gretzinger.prsice

We will now PC-correct and standardize the scores.

In [None]:
cp /gpfs/helios/projects/echo_workshops/project.1.tk/data/PRSice_files/PC_file .
cp /gpfs/helios/projects/echo_workshops/project.1.tk/scripts/PRSice/PC_correction_standardization.R .

In [None]:
conda activate quilt2

In [None]:
Rscript PC_correction_standardization.R "PRS_height_1000G_NFE.all_score" "PRS_height_Gretzinger.all_score" "PC_file"

This will create two files: one with scaled scores for the 1000 genomes data and one with scaled scores for the Gretzinger data. For the other analyses, we will use the Pt_0.001 scores (raw, and scaled).

In [None]:
awk '{print $1, $2, $5, $23}' PRS_height_scaled_scores_Gretzinger.txt | head

We can now make some plots per subpopulation.

In [None]:
cp /gpfs/helios/projects/echo_workshops/project.1.tk/scripts/PRSice/post_PRS.R .
cp /gpfs/helios/projects/echo_workshops/project.1.tk/data/PRSice_files/pop_file.txt .

In [None]:
Rscript post_PRS.R

Feel free to create your own plots e.g. a forrest plot, histogram...