# PRSice exercise

Explore the base GWAS and compute polygenic scores for height in Europeans using PRSice2. 

<img src="../Images/bash.png" alt="R" width="40"> Bash kernel.


In [None]:
ln -sf ../../Data
ln -sf ../Results

<img src="../Images/R.png" alt="R" width="40"> R kernel.

In [None]:
# Setup to avoid long messages and plot on screen
options(warn=-1)
options(jupyter.plot_mimetypes = 'image/png')

# Load GWAS package qqman
suppressMessages(library("qqman"))

# Manhattan plot using --logistic results
height_eur <- read.table("./Data/Height.QC.gz", head=TRUE)
manhattan(height_eur, main = "Manhattan plot height GWAS", cex.axis=1.1)

## QQ plot 
qq(height_eur$P, main = "Q-Q plot")


Does the plot surprise you? You can notice extreme deviations in the QQ-plot and an overwhelming number of significant variants. To refine your analysis and avoid false positives, you should perform MAF and INFO filtering to exclude rare variants and poorly imputed or uncertain variants that could lead to false associations.

<img src="../Images/bash.png" alt="Bash" width="40"> Bash kernel. 

You need to perform the PRS analysis on the simulated dataset in the following way:

In [None]:
PRSice --base ./Data/Height.QC.gz \
    --target ./Data/EUR.QC \
    --binary-target F \
    --pheno ./Data/EUR.height \
    --cov ./Data/EUR.covariate \
    --base-maf MAF:0.01 \
    --base-info INFO:0.8 \
    --stat BETA \
    --out Results/GWAS7/EUR.PRSice

By looking at the output file `.summary`, we can conclude that:

- Best-fit P-value is ~0.4
- Phenotypic variation explained by the best-fitting model is ~0.16

In [None]:
cat Results/GWAS7/EUR.PRSice.summary

<img src="../Images/R.png" alt="R" width="40"> R kernel.

Below is an example of how you could create a plot in R to visualize height PGS differences across sex:

In [None]:
library(ggplot2)

# Read in the files
prs <- read.table("./Results/GWAS7/EUR.PRSice.best", header=T)
height <- read.table("./Data/EUR.height", header=T)
sex <- read.table("./Data/EUR.cov", header=T)

# Rename the sex
sex$Sex <- as.factor(sex$Sex)
levels(sex$Sex) <- c("Male", "Female")

# Merge the files
dat <- merge(merge(prs, height), sex)

# Start plotting
ggplot(dat, aes(x=PRS, y=Height, color=Sex))+
    geom_point()+
    theme_bw()+
    labs(x="Polygenic Score", y="Height") +
    theme(axis.text=element_text(size=12), axis.title=element_text(size=12), legend.text=element_text(size=12),legend.title=element_text(size=14))