/
large_effect.Rmd
89 lines (73 loc) · 2.76 KB
/
large_effect.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
---
title: "SuSiE vs. FINEMAP in an example where the causal SNPs have relatively large effects"
author: Peter Carbonetto
output: workflowr::wflow_html
---
In this small example drawn from our [simulations][dsc], we show that
that FINEMAP works well with an "in-sample LD" matrix---that is, a
correlation matrix that was estimated using the same sample that was
used to compute the single-SNP association statistics---but, can
perform surprisingly poorly with an "out-of-sample" LD matrix. We have
observed that this degradation in performance only occurs in rare
cases---specifically, these are caases when the effects of the causal
SNPs are very large (*i.e.*, when individual causal SNPs explain a
large fraction of the total variance in the phenotype). In this
example, the phenotypes were simulated from a linear regression model
with large coefficients for the causal SNPs.
We also run SuSiE on the same data. Unlike FINEMAP, SuSiE performs
similarly well in this example with either the in-sample and
out-of-sample LD matrix.
```{r knitr-opts, include=FALSE}
knitr::opts_chunk$set(comment = "#",collapse = TRUE,results = "hold",
fig.align = "center",dpi = 120)
```
First, we load some packages used in the code below.
```{r load-pkgs, message=FALSE}
library(data.table)
library(susieR)
library(ggplot2)
library(cowplot)
```
Load the summary data: the least-squares effect estimates
$\hat{\beta}_i$ and their standard errors $\hat{s}_i$ for each SNP
$i$. Here we also compute the *z*-scores since SuSiE accepts the
*z*-scores as input.
```{r load-data-1}
dat1 <- readRDS("../data/small_data_11.rds")
dat3 <- readRDS("../data/small_data_11_sim_gaussian_pve_n_8_get_sumstats_n_1.rds")
maf <- dat1$maf$in_sample
bhat <- dat3$sumstats$bhat
shat <- dat3$sumstats$shat
z <- bhat/shat
```
In this simulation, two of the SNPs have a nonzero effect on the
phenotype:
```{r load-data-2}
dat2 <- readRDS("../data/small_data_11_sim_gaussian_pve_n_8.rds")
b <- drop(dat2$meta$true_coef)
which(b != 0)
```
In-sample LD
------------
*Add text here.*
```{r susie-in-sample-1}
ldinfile <- "small_data_11_sim_gaussian_pve_n_8_get_sumstats_n_1.ld_sample_n_file.in_n.ld"
Rin <- as.matrix(fread(ldinfile))
fit <- susie_rss(z,Rin,n = 800,min_abs_corr = 0.1,refine = FALSE,
verbose = TRUE)
```
*Add text here.*
```{r susie-in-sample-2}
print(fit$sets[c("cs","purity")])
vars <- which(b != 0)
cs1 <- fit$sets$cs$L1
cs2 <- fit$sets$cs$L2
plot(1:1001,fit$pip,pch = 20,cex = 0.8,ylim = c(0,0.5),
xlab = "SNP",ylab = "susie PIP")
points(cs1,fit$pip[cs1],pch = 1,cex = 1,col = "cyan")
points(cs2,fit$pip[cs2],pch = 1,cex = 1,col = "gold")
points(vars,fit$pip[vars],pch = 2,cex = 0.8,col = "tomato")
```
Out-of-sample LD
----------------
[dsc]: https://github.com/zouyuxin/dsc_susierss