-
Notifications
You must be signed in to change notification settings - Fork 2
/
HierarchicalPriors_04.Rmd
186 lines (140 loc) · 7.49 KB
/
HierarchicalPriors_04.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
---
title: "CoPheScan: Example with Hierarchical Priors"
author: "Ichcha Manipur"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{CoPheScan: Example with Hierarchical Priors}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "HP-"
)
```
The hierarchical model is ideal when a large set of variants and phenotypes are available i.e., a phenome-wide association study with a set of variants which are previously known to have a functional role or have been implicated in a disease.
Here we show how to use cophescan to infer hierarchical priors on a small test dataset.
```{r setup, message=FALSE, warning=FALSE}
library(cophescan)
library(dplyr)
library(ggplot2)
```
```{r}
data("cophe_multi_trait_data")
trait_dat = cophe_multi_trait_data$summ_stat$Trait_1
str(trait_dat)
querysnpid <- cophe_multi_trait_data$querysnpid
LD <- cophe_multi_trait_data$LD
```
#### Multi-trait analysis
The first step is preparing the input for the hierarchical model which are the log Bayes factors: lBF.Ha and lBF.Hc.
We will use `cophe.susie.lbf` to extract Bayes factors estimated using SuSIE.
Note: When there are no credible sets identified with SuSIE the function internally calculates lBF.Ha and lBF.Hc using the Approximate Bayes Factor method.
```{r message=FALSE, warning=FALSE, results='hide'}
## Hide print messages from coloc
res.multi.lbf <- list()
for (trait_idx in seq_along(cophe_multi_trait_data$summ_stat)){
querytrait_ss <- cophe_multi_trait_data$summ_stat[[trait_idx]]
# Here LD is the same
querytrait_ss$LD <- LD
trait_variant_pair <- paste0('Trait', trait_idx, '_', querysnpid)
res.multi.lbf[[trait_variant_pair]] <- cophe.susie.lbf(querytrait_ss, querysnpid = querysnpid, querytrait = paste0('Trait_', trait_idx))
}
res.multi.lbf.df = bind_rows(res.multi.lbf)
```
```{r}
head(res.multi.lbf.df)
```
**Note:**
1. The output of `cophe.susie` or `cophe.single` can also be used as input to the hierarchical model as it has all the fields required for the input. This would be useful when you would like to compare results from the fixed priors to those obtained from priors inferred using the hierarchical model. [Swap `cophe.susie` for `cophe.susie.lbf` above and instead of bind_rows do : `res.multi.lbf.df = multitrait.simplify(res.multi.lbf)`].
2. Use the for loop (as above) to run large datasets as storing all the coloc-structured data in a list is memory-intensive. This is also helpful in cases where there are multiple querysnps and different regions of the query traits have to be analysed
#### Run hierarchical model for priors
The input df for the multi.dat arguments should contain the following fields: "lBF.Ha","lBF.Hc" and "nsnps".
Set the argument covar to TRUE to include covariates
```{r warning=F, message=F, fig.width = 8, fig.height=8, out.width = "75%", fig.ncol = 3, fig.align = "center"}
covar=FALSE
covar_vec=rep(1, nrow(res.multi.lbf.df))
## Set covar to TRUE to include covariates, uncomment the following 2 lines
# covar=TRUE
# covar_vec = cophe_multi_trait_data$covar_vec
cophe.hier.res <- run_metrop_priors(res.multi.lbf.df, avg_posterior=TRUE, avg_pik = TRUE, covar_vec = covar_vec, covar = covar, nits = 50000, thin = 5)
names(cophe.hier.res)
```
Note: Setting posterior or pik to TRUE is memory intensive for very large datasets
#### Diagnostics of the hierarchical model
Run 3-4 chains of the model and check if there is convergence of chains.
Note: For large datasets run the chains separately and save them in individual .RData files. These can be loaded later for diagnostics.
```{r message=FALSE}
cophe.hier.res.chain.list <- lapply(1:4, function(x)
run_metrop_priors(res.multi.lbf.df, avg_posterior=TRUE, avg_pik = TRUE,
covar_vec = covar_vec, covar = covar, nits = 50000, thin = 5))
```
```{r mcmcDiagnostics, warning=FALSE, message=FALSE, fig.width=5, fig.height=5, out.width="80%", fig.align="center"}
# Store user parameters
old_par <- par(no.readonly = TRUE)
# chain_colors <- c("#e63946c4", "#f1faee", "#a8dadc", "#457b9d" )
chain_colors <- c("#f4f1de", "#e07a5fb2", "#3d405bb2", "#81b29aa6")
layout(matrix(c(1, 2, 3, 4, 5, 5), ncol=2, byrow = TRUE), respect = TRUE,
heights = c(0.9, 0.9, 0.1))
matplot(sapply(cophe.hier.res.chain.list, function(x) x$ll), type = "l",
col = chain_colors,
main ="loglik", ylab = "ll", xlab = "Iteration", lty = 1)
y_ax <- c("alpha", "beta", "gamma")
num_pars <- ifelse(covar, 3, 2)
for (idx in 1:num_pars) {
matplot(sapply(cophe.hier.res.chain.list, function(x) x$parameters[idx, ]),
type = "l", col = chain_colors,
main = paste(y_ax[idx]), ylab = y_ax[idx], xlab = "Iteration", lty = 1
)
}
if (!covar) {
plot(1, type = "n", axes = FALSE, xlab = "", ylab = "")
}
par(mar=c(0, 0, 0, 0))
plot(1, type = "n", axes = FALSE, xlab = "", ylab = "")
legend("top", legend = paste("Chain", 1:4), col = chain_colors, lty = 1, lwd = 2,
horiz = TRUE, bty = "n")
# Reset user parameters
par(old_par)
```
#### Prediction
`cophe.hier.res.chain.list[[1]]$avg.posterior` contains the posterior probabilities of the hypotheses : $H_n$, $H_a$ and $H_c$ for the queryvariant/querytrait pairs obtained from the hierarchical model. Here we use the first chain for prediction.
```{r}
res.post.prob = cbind(cophe.hier.res.chain.list[[1]]$avg.posterior, cophe.hier.res$data)
```
We can use the `cophe.hyp.predict` function to predict the hypothesis given the posterior probabilities.
The cophe.hyp.call column shows the predicted hypothesis for each query trait-query variant pair.
```{r}
res.hier.predict <- cophe.hyp.predict(as.data.frame(res.post.prob ))
col_disp <- c( "PP.Hn", "PP.Ha", "PP.Hc", "nsnps", "querysnp", "querytrait",
"typeBF", "grp", "cophe.hyp.call")
knitr::kable(res.hier.predict[, col_disp], row.names = FALSE, digits=3)
```
#### Visualisation
Use the `cophe_plot` function to return -log10(pval), ppHa and ppHc PheWAS plots from the cophescan output.
```{r cophePlots, message=FALSE}
res.plots = cophe_plot(res.hier.predict, traits.dat = cophe_multi_trait_data$summ_stat, querysnpid = querysnpid, query_trait_names = paste0('Trait_', 1:24))
# if (!require(ggpubr)) {
# install.packages("ggpubr")
# }
# ggpubr::ggarrange(res.plots$pval, res.plots$ppHa, res.plots$ppHc, ncol = 2,
# nrow = 2)
```
```{r, warning=FALSE, message=FALSE, echo=FALSE, out.width = "40%"}
res.plots$pval + theme(legend.position="bottom")
res.plots$ppHa + theme(legend.position="bottom")
res.plots$ppHc + theme(legend.position="bottom")
```
Note: For large datasets, it's not feasible to input all coloc-structured data into "traits.dat" at once.
Instead, use a loop and run the "get_beta" function over all the trait-variant pairs, and provide the resulting data frame (after binding the rows) as the "beta_p" argument in `cophe_plot`:
``` r
# beta_p_list <- lapply(seq_along(cophe_multi_trait_data$summ_stat), function(x) get_beta(list(cophe_multi_trait_data$summ_stat[[x]]), querysnpid, names(cophe_multi_trait_data$summ_stat)[x]))
# ### the datsets need not be in a list as in cophe_multi_trait_data$summ_stat and can be stored independently.
# beta_p_df = bind_rows(beta_p_list)
# ### Make sure the query trait names in beta_p_df are the same as in res.hier.predict
# res.plots = cophe_plot(res.hier.predict, querysnpid = querysnpid, query_trait_names = beta_p_df$querytrait, beta_p = beta_p_df)
```
------------------------------------------------------------------------