Skip to content
3D GWAS across multiple phenotypes
R
Branch: master
Clone or download
Cannot retrieve the latest commit at this time.
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
R
data
doc
man
vignettes
.Rbuildignore
00check.log
DESCRIPTION
NAMESPACE
PheGWAS.Rproj
README.md

README.md

PheGWAS: Three-dimensional approach to dynamically visualize GWAS across multiple phenotypes

Introduction:

GWAS is Genome Wide Association Study which is done on many variants and one phenotype. The Manhattan plot is the popular way to visualize this( https://github.com/stephenturner/qqman ). On the other hand, PheWAS (https://rdrr.io/github/PheWAS/PheWAS/ ) is phenome wide association study which is done on a single variant and many phenotypes. PheGWAS is a new concept to do analysis on many variants and many phenotypes. This is a package to do interactive 3D visualization on many variants many phenotypes.

More details are found in the preprint: https://www.biorxiv.org/content/10.1101/694794v1. PheGWAS is an ongoing project from Department of Population Health and Genomics, University of Dundee. Efforts are being made to add new features that would make it more beneficial for comprehensive gene-disease association analysis.

Installation:

Prerequisites

Please Install plotly, devtools and Biomart before installing the PheGWAS package.

Install plotly

install.packages("plotly")

Install devtools

install.packages("devtools")

Install BioMart

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("biomaRt")

Installation of PheGWAS package

If you downloaded the tar version of the package

install.packages("<path to the tar>",dependencies = TRUE, repos = NULL)

Eg:

install.packages("/Downloads/PheGWAS_0.1.0.tgz",dependencies = TRUE, repos = NULL)

Or if you have the unzipped version of the package then

devtools::install("<path to package folder name>", dependencies=TRUE)

Eg:

devtools::install("/Downloads/PheGWAS-master",dependencies = TRUE)

Brief Documentation:

Interactive PheGWAS landscape mode features:

• X axis: Chromosome (level 1) and base pair positions (level 2).

• Z axis: The phenotypes name.

• Y axis: -log10(p-value) or other values such as the effect size can be specified.

• Info pop-up: Hovering over a point reveals information about the plotted SNP, the information includes Phenotype name, kbp position range, P value, Locus, SNP ID, Gene. The GENE name is mapped to the corresponding SNPID within the package by making use of the R package “BioMArt”.

• Interactive features like zoom in, zoom out, turntable rotation, rotation on an axis and pan scale features.

• The figure can be exported as static or interactive plot (which is recommended), in order to share with colleagues.

Exploring the GWAS summary data with PheGWAS:

Preprocessing the dataframe’s:

The first stage of the exploration is to prepare the dataframe to pass to the PheGWAS exploration functions. The preparation of the dataframe is done using the function “processphegwas”.

List of the GWAS summary stats files as dataframe’s are passed to this function, the GWAS summary stats file is to be of this format.

If gene to rsid is not available. Gene will be mapped internally with the BioMart Module

CHR        BP       rsid <pvalue of the phenotype>

If gene to rsid is available, this could be passed as an input parameter.

CHR        BP       rsid gene  <pvalue of the phenotype> 

Here the p-value column name must be replaced by phenotype name, this column name is taken for the y axis of the landscape.

The PheGWAS exploration is undertaken in different levels (as of now its 2 different levels). The exploration is done using function “landscape”

Level 1: Exploration on entire genome level:

The first level is the entire genome level. In this level the entire chromosomes are plotted in a 3D landscape view with each peak in the chromosome representing the highest pvalue in that chromosome. The X axis here is the chromosome number. Z axis and Y axis are phenotype name and -log10(p-value) respectively.

With this visualization the user gets an idea which chromosome contribute to multiple phenotypes. A chromosome contributing to multiple phenotypes has to be explored, as we are more interested to see if a certain region within the chromosome is contributing to multiple phenotypes. This is where the exploration is taken to the next level of the PheGWAS.

Level 2: Exploration on individual chromosomes:

The second level is the single chromosome level. In this level the entire chromosome is divided into different groups based on bp units given by the user (100kbp is the default). The x axis here is the BP groups in the chromosome. Z axis and Y axis are phenotype name and -log10(p-value) respectively.

Examples:

The 6 different data frames from 2 different consortium are made available in the package for getting used to the package and its usage.

Following processed summary data are from the lipid consortium:

hdl - HDL GWAS summary dataset ldl - LDL GWAS summary dataset trigs - TRIG GWAS summary dataset tchol - TOTAL CHOLESTROL GWAS summary dataset

Results are from H. Grallert et al., “Discovery and refinement of loci associated with lipid levels,” Nat. Genet., vol. 45, no. 11, pp. 1274–1283, 2013.

> head(hdl)
  CHR        BP       rsid     HDL                                                          gene
1   7  92383888       rs10 0.76460                                                          CDK6
2  12 126890980  rs1000000 0.33750 RP4-809F18.1;RP4-809F18.2;RP5-944M2.1;RP5-944M2.2;RP5-944M2.3
3   4  21618674 rs10000010 0.25460                            KCNIP4;RP11-556G22.1;RP11-556G22.3
4   4   1357325 rs10000012 0.26640                AC078852.1;AC078852.2;CRIPAK;MAEA;NKX1-1;UVSSA
5   4  37225069 rs10000013 0.63740                                              KIAA1239;MIR4801
6   4  84778125 rs10000017 0.05778                                                    RP11-8L2.1
> head(ldl)
  CHR        BP       rsid     LDL                                                          gene
1   7  92383888       rs10 0.03411                                                          CDK6
2  12 126890980  rs1000000 0.51210 RP4-809F18.1;RP4-809F18.2;RP5-944M2.1;RP5-944M2.2;RP5-944M2.3
3   4  21618674 rs10000010 0.13150                            KCNIP4;RP11-556G22.1;RP11-556G22.3
4   4   1357325 rs10000012 0.89260                AC078852.1;AC078852.2;CRIPAK;MAEA;NKX1-1;UVSSA
5   4  37225069 rs10000013 0.26410                                              KIAA1239;MIR4801
6   4  84778125 rs10000017 0.89340                                                    RP11-8L2.1
> head(trig)
  CHR        BP       rsid   TRIGS                                                          gene
1   7  92383888       rs10 0.39990                                                          CDK6
2  12 126890980  rs1000000 0.08781 RP4-809F18.1;RP4-809F18.2;RP5-944M2.1;RP5-944M2.2;RP5-944M2.3
3   4  21618674 rs10000010 0.26940                            KCNIP4;RP11-556G22.1;RP11-556G22.3
4   4   1357325 rs10000012 0.25140                AC078852.1;AC078852.2;CRIPAK;MAEA;NKX1-1;UVSSA
5   4  37225069 rs10000013 0.86350                                              KIAA1239;MIR4801
6   4  84778125 rs10000017 0.69120                                                    RP11-8L2.1
> head(tchol)
  CHR        BP       rsid TOTAL_CHOLESTROL                                                          gene
1   7  92383888       rs10          0.03129                                                          CDK6
2  12 126890980  rs1000000          0.99050 RP4-809F18.1;RP4-809F18.2;RP5-944M2.1;RP5-944M2.2;RP5-944M2.3
3   4  21618674 rs10000010          0.01953                            KCNIP4;RP11-556G22.1;RP11-556G22.3
4   4   1357325 rs10000012          0.94260                AC078852.1;AC078852.2;CRIPAK;MAEA;NKX1-1;UVSSA
5   4  37225069 rs10000013          0.24640                                              KIAA1239;MIR4801
6   4  84778125 rs10000017          0.98070                                                    RP11-8L2.1

Following processed summary data are from the giant consortium:

bmimen - BMI for men bmiwomen - BMI for women

Results are from Randall JC, Winkler TW, Kutalik Z, Berndt SI, Jackson AU, Monda KL, Kilpeläinen TO, Esko T, Mägi R, Li S, et al. (2013). Sex-stratified genome-wide association studies including 270,000 individuals show sexual dimorphism in genetic loci for anthropometric traits. PLoS Genet 9: e1003500.

> head(bmimen)
  CHR        BP       rsid BMI_men
1  10   9960129  rs4747841  0.9400
2  10   9960259  rs4749917  0.9400
3  10 100012739   rs737656  0.7000
4  10 100012890   rs737657  0.7600
5  10 100013438 rs17524355  0.5700
6  10 100013563  rs7086391  0.0051
> head(bmiwomen)
  CHR        BP       rsid BMI_women
1  10   9960129  rs4747841      0.31
2  10   9960259  rs4749917      0.31
3  10 100012739   rs737656      0.28
4  10 100012890   rs737657      0.27
5  10 100013438 rs17524355      0.47
6  10 100013563  rs7086391      0.72

The dataframe’s are passed to processphegwas function as a list of dataframe’s.

library(PheGWAS)
x <- list(hdl,ldl,trig,tchol)
## This preprocess the dataframes’ for passing to the landscape function.
y <- processphegwas(x)

## pass the dataframe from the processphegwas
# 3D landscape visualization of all the phenotypes across the base pair #positions
landscape(y)

# 3D landscape visualization of all the phenotypes across the base pair #positions, showing only the crusts above a certain threshold
landscape(y,sliceval = 10)

# 3D landscape visualization of chromosome number 16
landscape(x,sliceval = 10,chromosome = 16)

# 3D landscape visualization of all the phenotypes across the base pair #positions, showing only the crusts that are within a certain region (between sliceval and upperlimit)
landscape(y,sliceval = 10 ,chromosome = 16,upperlimit = 50)

# 3D landscape visualization of chromosome number 16, with bp division on #100,000 (default is 1,000,000)
landscape(x ,chromosome = 16,bpdivision = 100,000)

# 3D landscape visualization of chromosome number 16, with bp division on #1,000,000 (default is 1,000,000), gene view active (refer paper for more details)

landscape(d, chromosome=16,geneview = TRUE)

## 3-D association table for many phenotypes
landscapetable(d, sliceval = 10, chromosome=3)
You can’t perform that action at this time.