# GWAS with PLINK and R

## Installation
Optional: One could additionally install **HaploView** from https://sourceforge.net/projects/haploview/ and **gPLINK** from https://zzz.bwh.harvard.edu/plink/dist/gPLINK-2.050.zip

In [None]:
sudo apt install plink1.9

In R install **qqman** with `install.packages("qqman")`

## Data

download example dataset by Shaun Purcell, however, it is also provided in this repository

In [None]:
# wget https://zzz.bwh.harvard.edu/plink/dist/example.zip

In [None]:
pwd

In [None]:
ls -l

## Processing

Initial creation of a binary dataset from raw genotype data for faster data access

In [None]:
plink1.9 --file gwas.org --make-bed --out gwas.bin

Check for missing called genotypes ...

In [None]:
awk '{allels+=NF-6;for(i=7; i<=NF; i++){if($i!~/[ATGC]/){mis++;allmis++}}print NR, NF-6, mis, (NF-6-mis)/(NF-6); n=0; mis=0}END{print allmis,allels,(allels-allmis)/(allels)}' gwas.org.ped | tail -4

### Validate converted data

In [None]:
plink1.9 --maf 0.01 --geno 0.05 --mind 0.05 --write-snplist --make-bed --bfile gwas.bin --out gwas.bin.filtered

### GWAS

In [None]:
head -5 gwas.bin.fam

In [None]:
awk '{print $6}' gwas.bin.fam | sort | uniq -c

In [None]:
plink1.9 --bfile gwas.bin.filtered --assoc --adjust --out gwas.bin.filtered

In [None]:
wc -l gwas.bin.filtered.assoc

In [None]:
head gwas.bin.filtered.assoc

In [None]:
sort -g -k 9 gwas.bin.filtered.assoc | head -5

In [None]:
awk '$9<0.05' gwas.bin.filtered.assoc | sort -g -k 9 | wc -l

In [None]:
head -5 gwas.bin.filtered.assoc.adjusted

In [None]:
wc -l gwas.bin.filtered.assoc.adjusted 

In [None]:
awk -v OFS="\t" '{print $1,$2,$3,$5,$9}' gwas.bin.filtered.assoc.adjusted | head -5

In [None]:
awk -v OFS="\t" '$9<0.1{print $1,$2,$3,$5,$9}' gwas.bin.filtered.assoc.adjusted 

### Calculate Hardy-Weinberg

In [None]:
plink1.9 --bfile gwas.bin.filtered --hwe 0.00001 --make-bed --out gwas.bin.filtered.hwe

In [None]:
wc -l gwas.bin.filtered.hwe.bim gwas.bin.filtered.hwe.fam

In [None]:
plink1.9 --bfile gwas.bin.filtered --hardy --out gwas.bin.filtered.hardy

In [None]:
head -5 gwas.bin.filtered.hardy.hwe

Meaning of columns:
- CHR	Chromosome code
- SNP	Variant identifier
- TEST	Type of test: one of {'ALL', 'AFF', 'UNAFF', 'ALL(QT)', 'ALL(NP)'}
- A1	Allele 1 (usually minor)
- A2	Allele 2 (usually major)
- GENO	'/'-separated genotype counts (A1 hom, het, A2 hom)
- O(HET)	Observed heterozygote frequency
- E(HET)	Expected heterozygote frequency
- P	Hardy-Weinberg equilibrium exact test p-value

In [None]:
awk '$3=="UNAFF" && $9<0.001' gwas.bin.filtered.hardy.hwe | head -5

In [None]:
egrep '(CHR|rs2513514|rs6110115|rs2508756|rs16976702)' gwas.bin.filtered.hardy.hwe

### Filter Hardy
Filter for controls (TEXT=="UNAFF") and p-values <= 0.001

In [None]:
awk '$3=="UNAFF" && $9<=0.001{print $0}' hardy.hwe | sort -g -k 9 | head -3

### Create QC-filtered Dataset

In [None]:
plink1.9 --maf 0.01 --geno 0.05 --mind 0.05 --hwe 0.001 --bfile gwas.bin --make-bed --out gwas.bin.filtered

In [None]:
cat gwas.bin.filtered.log

### Perform Basic Association Analysis of QC-Dataset

In [None]:
plink1.9 --bfile gwas.bin.filtered --assoc --adjust --out gwas.bin.filtered.assoc --gplink

In [None]:
cat gwas.bin.filtered.assoc.log

In [None]:
head -3 gwas.bin.filtered.assoc.assoc gwas.bin.filtered.assoc.assoc.adjusted

In [None]:
wc -l gwas.bin.filtered.assoc.assoc gwas.bin.filtered.assoc.assoc.adjusted

### Analyze in R
```library(qqman)
gwas<-read.table("gwas.bin.filtered.assoc.assoc", header=TRUE)
manhattan(gwas, chr="CHR", bp="BP", snp="SNP", p="P",suggestiveline=F, genomewideline=T)
as.data.frame(table(gwas$CHR)) # number of SNPs per chromosome

SNPsOfInterest<-c("rs2513514","rs6110115")
manhattan(gwas, annotatePval = 0.01, highlight=SNPsOfInterest)
manhattan(gwas, annotatePval=0.001, highlight=SNPsOfInterest)
manhattan(gwas, annotatePval=0.0001, highlight=SNPsOfInterest)
manhattan(gwas, annotatePval=0.00001, highlight=SNPsOfInterest)
```

Fuse *gwas.bin.filtered.assoc.assoc* and *gwas.bin.filtered.assoc.assoc.adjusted* with respect to column "BP"

In [None]:
awk 'BEGIN{print "CHR","BP","SNP","P","GC","BONF"; while(getline < "gwas.bin.filtered.assoc.assoc" > 0){snp[$2]=$3}}{{print $1,snp[$2],$2,$3,$4,$5}}' gwas.bin.filtered.assoc.assoc.adjusted > gwas.bin.filtered.assoc.assoc.adjusted.fused

```
gwas_adj<-read.table("gwas.bin.filtered.assoc.assoc.adjusted.fused", header=TRUE)
manhattan(gwas_adj, chr="CHR", bp="BP", snp="SNP", p="P",suggestiveline=T, annotatePval=0.0001, genomewideline=T)
manhattan(subset(gwas_adj, CHR==8),xlim=c(12000000,14000000),genomewideline=T)
qq(gwas_adj$P)
```
Create plot with: `manhattan(gwas_adj, genomewideline=T, annotatePval=0.00001, annotateTop=F, highlight=gwas_adj$SNP[gwas_adj$BONF<1])`

![gwas.png](Images/gwas.png)

Create plot with: `manhattan(subset(gwas_adj, CHR==11),xlim=c(75000000,77000000),genomewideline=T, annotatePval=0.00001, annotateTop=F)`

![gwas-chr-11.png](Images/gwas-chr-11.png)

Create plot with: `manhattan(subset(gwas_adj, CHR==8),xlim=c(12000000,14000000),genomewideline=T)`

![gwas-chr-8.png](Images/gwas-chr-8.png)

Create plot with: `qq(gwas_adj$P)`

![gwas-qq.png](Images/gwas-qq.png)

In [None]:
grep rs9616985 gwas.bin.filtered.assoc.assoc.adjusted.fused

In [None]:
tail -5 gwas.bin.filtered.assoc.assoc.adjusted.fused

### Alternative Approach with gwaRs

In R install the package with `install.packages("gwaRs")`. 

In [None]:
R -e 'library("gwaRs")'

Fuse *gwas.bin.filtered.assoc* and *gwas.bin.filtered.assoc.adjusted* with respect to column "BP"

In [None]:
head -2 gwas.bin.filtered.assoc gwas.bin.filtered.assoc.adjusted

In [None]:
awk 'BEGIN{print "CHR","BP","SNP","P","GC","BONF"; while(getline < "gwas.bin.filtered.assoc" > 0){snp[$2]=$3}}$1!="CHR"{{print $1,snp[$2],$2,$3,$4,$5}}' gwas.bin.filtered.assoc.adjusted > gwas.bin.filtered.assoc.adjusted.fused

In [None]:
head  gwas.bin.filtered.assoc.adjusted.fused

In [None]:
# Create Manhattan Plot
R -e 'png("Images/man-plot.png", width = 1000, height = 600);
library("gwaRs");
gwas_adj<-read.table("gwas.bin.filtered.assoc.adjusted.fused", header=TRUE);
man_plot(gwas_adj, annotatePval=0.00001, highlight=gwas_adj$SNP[gwas_adj$BONF<1], chromCol=c("tomato","lightslateblue"));
dev.off()'

![Manhattan Plot](Images/man-plot.png)

In [None]:
# Create Q-Q Plot
R -e 'png("Images/qq-plot.png", width = 1000, height = 600);
library("gwaRs");
gwas_adj<-read.table("gwas.bin.filtered.assoc.adjusted.fused", header=TRUE);
qq_plot(gwas_adj);
dev.off()'

![qq-plot](Images/qq-plot.png)

In [None]:
# Create Karyotype Plot
R -e 'png("Images/karyo-plot.png", width = 1000, height = 600);
library("gwaRs");
gwas_adj<-read.table("gwas.bin.filtered.assoc.adjusted.fused", header=TRUE);
karyotype_plot(gwas_adj);
dev.off()'

![karyo-plot](Images/karyo-plot.png)