/
large_effect.Rmd
145 lines (117 loc) · 4.62 KB
/
large_effect.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
---
title: "SuSiE vs. FINEMAP in an example where the causal SNPs have relatively large effects"
author: Peter Carbonetto
output: workflowr::wflow_html
---
In this small example drawn from our [simulations][dsc], we show that
that FINEMAP works well with an "in-sample LD" matrix---that is, a
correlation matrix that was estimated using the same sample that was
used to compute the single-SNP association statistics---but, can
perform surprisingly poorly with an "out-of-sample" LD matrix. We have
observed that this degradation in performance only occurs in rare
cases---specifically, these are caases when the effects of the causal
SNPs are very large (*i.e.*, when individual causal SNPs explain a
large fraction of the total variance in the phenotype). In this
example, the phenotypes were simulated from a linear regression model
with large coefficients for the causal SNPs.
We also run SuSiE on the same data. Unlike FINEMAP, SuSiE performs
similarly well in this example with either the in-sample and
out-of-sample LD matrix.
```{r knitr-opts, include=FALSE}
knitr::opts_chunk$set(comment = "#",collapse = TRUE,results = "hold",
fig.align = "center",dpi = 120)
```
First, we load some packages used in the code below, and set the seed
for reproducibility.
```{r load-pkgs, message=FALSE}
library(data.table)
library(susieR)
set.seed(1)
```
Load the summary data: the least-squares effect estimates
$\hat{\beta}_i$ and their standard errors $\hat{s}_i$ for each SNP
$i$. Here we also compute the *z*-scores since SuSiE accepts the
*z*-scores as input.
```{r load-data-1}
dat1 <- readRDS("../data/small_data_11.rds")
dat3 <- readRDS("../data/small_data_11_sim_gaussian_pve_n_8_get_sumstats_n_1.rds")
maf <- dat1$maf$in_sample
bhat <- dat3$sumstats$bhat
shat <- dat3$sumstats$shat
z <- bhat/shat
```
In this simulation, two of the SNPs have a nonzero effect on the
phenotype:
```{r load-data-2}
dat2 <- readRDS("../data/small_data_11_sim_gaussian_pve_n_8.rds")
b <- drop(dat2$meta$true_coef)
vars <- which(b != 0)
vars
```
In-sample LD
------------
We begin by running SuSiE with the "in-sample" LD estimate.
```{r susie-in-1}
ldinfile <- "../data/small_data_11_sim_gaussian_pve_n_8_get_sumstats_n_1.ld_sample_n_file.in_n.ld.gz"
Rin <- as.matrix(fread(ldinfile))
fit1 <- susie_rss(z,Rin,n = 800,min_abs_corr = 0.1,refine = FALSE,
verbose = TRUE)
```
(We will get a recommendation to estimate the residual variance, but
to maintain consistency with the analysis below using an out-of-sample
LD estimate, we ignore this advice.)
SuSiE returns a single credible set (CS) containing a large number of strongly
correlated SNPs, and one of the SNPs in this CS is a (true) causal SNP.
```{r susie-in-2}
print(fit1$sets[c("cs","purity")])
```
Here's a visualization of this result. (In this plot, the CS is
depicted by the light blue circles, and the two causal SNPs are drawn
as red triangles.)
```{r susie-in-3, fig.height=3.5, fig.width=4.5}
cs1 <- fit1$sets$cs$L1
par(mar = c(4,4,1,1))
plot(1:1001,fit1$pip,pch = 20,cex = 0.8,ylim = c(0,0.1),
xlab = "SNP",ylab = "susie PIP")
points(cs1,fit1$pip[cs1],pch = 1,cex = 1,col = "cyan")
points(vars,fit1$pip[vars],pch = 2,cex = 0.8,col = "tomato")
```
Now let's try running FINEMAP on these same data:
```{r finemap-in-1, eval=FALSE}
p <- length(b)
dat <- data.frame(rsid = 1:p,
chromosome = rep(1,p),
position = rep(1,p),
allele1 = rep("A",p),
allele2 = rep("C",p),
maf = round(maf,digits = 6),
beta = round(bhat,digits = 6),
se = round(shat,digits = 6))
write.table(dat,"sim1.z",quote = FALSE,col.names = TRUE,row.names = FALSE)
system(paste("./finemap_v1.4.1_x86_64 --sss --log --in-files sim1.master",
"--n-causal-snps 5"))
```
Out-of-sample LD
----------------
*Add text here.*
```{r susie-out-1}
ldoutfile <- "../data/small_data_11.ld_refout_file.refout.ld.gz"
Rout <- as.matrix(fread(ldoutfile))
fit2 <- susie_rss(z,Rout,n = 800,min_abs_corr = 0.1,refine = FALSE,
verbose = TRUE)
```
As above, SuSiE returns a single CS containing one of the two causal
SNPs:
```{r susie-out-2}
print(fit2$sets[c("cs","purity")])
```
Here is a visualization of this result:
```{r susie-out-3, fig.height=3.5, fig.width=4.5}
cs1 <- fit2$sets$cs$L1
par(mar = c(4,4,1,1))
plot(1:1001,fit2$pip,pch = 20,cex = 0.8,ylim = c(0,0.1),
xlab = "SNP",ylab = "susie PIP")
points(cs1,fit2$pip[cs1],pch = 1,cex = 1,col = "cyan")
points(vars,fit2$pip[vars],pch = 2,cex = 0.8,col = "tomato")
```
[dsc]: https://github.com/zouyuxin/dsc_susierss