/
difficult_example.Rmd
188 lines (147 loc) · 5.34 KB
/
difficult_example.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
---
title: Add title here
author: Peter Carbonetto
date: June 23, 2020
site: workflowr::wflow_site
output: workflowr::wflow_html
---
To our surprise, we discovered that the Lasso (or Elastic Net)
sometimes provides more considerably more accurate predictions than
mr.ash in examples where there are many predictors having a small
effect on the outcome. Here I expand on an example from
[Matthew's brief investigation of this phenomenon][mr-ash-vs-lasso] to
better understand the (mis) behaviour of mr.ash.
```{r knitr, echo=FALSE}
knitr::opts_chunk$set(comment = "#",results = "hold",collapse = TRUE,
fig.align = "center",dpi = 140)
```
Load packages
-------------
These are the packages used in the analysis.
```{r load-pkgs, message=FALSE, warning=FALSE}
library(glmnet)
library(varbvs)
library(mr.ash.alpha)
library(ggplot2)
library(cowplot)
```
Simulate data
-------------
I simulate the data just as Matthew did, except that I split the data
into a training and a test set.
These are the data simulation settings: the number of samples in
training set, "n"; number of simulated variables, "p"; the number
of variables affecting the outcome ("p1"); and the proportion of variance in the outcome explained by the variables ("pve").
```{r data-sim-params}
n <- 500
p <- 1000
p1 <- 467
pve <- 0.95
```
Simulate a $2n \times p$ design matrix; the first $n$ rows is the
training set data, and the remaining $n$ rows are the test data.
```{r sim-x}
set.seed(15)
X <- matrix(rnorm(2*n*p),2*n,p)
X <- scale(X,center = TRUE,scale = TRUE)
```
Simulate the $p$ regression coefficients; only $p_1 < p$ of the
coefficients are nonzero.
```{r sim-coef}
b <- rep(0,p)
j <- sample(p,p1)
b[j] <- rnorm(p1)
```
Simulate the responses so that the target PVE is met.
```{r sim-y}
y <- drop(X %*% b)
se <- sqrt((1 - pve)/pve) * sd(y)
y <- y + rnorm(n,sd = se)
```
Split the data 50-50 into a training set and a test set.
```{r split-data}
test <- 1:n
Xtest <- X[test,]
ytest <- y[test]
X <- X[-test,]
y <- y[-test]
```
Fit the elastic net and mr.ash models
-------------------------------------
Fit the Elastic Net model, in which the penalty strength parameter
($\lambda$) is chosen via 10-fold cross-validation.
```{r fit-glmnet}
fit.glmnet <- cv.glmnet(X,y,alpha = 0.95,standardize = FALSE)
```
Fit the mr.ash model using the default settings.
```{r fit-mr-ash-1, warning=FALSE, results="hide"}
fit.mrash <- mr.ash(X,y,standardize = FALSE)
```
Fit the mr.ash model again, but give it some help by providing it with
the prior and residual variance used to simulate the data. Also, the
posterior estimates of the coefficients are initialized to the Elastic
Net estimates.
```{r fit-mr-ash-2, warning=FALSE, results="hide"}
b <- coef(fit.glmnet)[-1]
w1 <- p1/p
s <- se^2
fit.trueg <- mr.ash(X,y,beta.init = b,update.pi = FALSE,update.sigma2 = FALSE,
sigma2 = s,sa2 = c(0,1/s),pi = c(1 - w1,w1))
```
Now we provide mr.ash with a little less help: we initialize the prior
to the settings used to simulate the data, but allow mr.ash to fit the
prior.
```{r fit-mr-ash-3, warning=FALSE, results="hide"}
fit.trueginit <- mr.ash(X,y,beta.init = coef(fit.glmnet)[-1],
update.pi = TRUE,update.sigma2 = FALSE,
sigma2 = s,sa2 = c(0,1/s),pi = c(1 - w1,w1))
```
Evaluate models on test set
---------------------------
Predict the test set outcomes using the fitted models.
```{r predict-test-outcomes}
y.glmnet <- drop(predict(fit.glmnet,Xtest,s = "lambda.min"))
y.mrash <- predict(fit.mrash,Xtest)
y.trueg <- predict(fit.trueg,Xtest)
y.trueginit <- predict(fit.trueginit,Xtest)
```
Report the accuracy of the test predictions by the root-mean
squared error (RMSE).
```{r compute-rmse}
rmse <- function (x, y) sqrt(mean((x - y)^2))
cat(sprintf("glmnet: %0.3f\n",rmse(ytest,y.glmnet)))
cat(sprintf("mr.ash: %0.3f\n",rmse(ytest,y.mrash)))
cat(sprintf("mr.ash (true prior): %0.3f\n",rmse(ytest,y.trueg)))
cat(sprintf("mr.ash (true prior init): %0.3f\n",rmse(ytest,y.trueginit)))
```
A couple surprises
------------------
These results are surprising in a couple ways:
1. The Elastic Net method does very well, despite the fact that
we typically think of the method as being best suited for sparse
settings in which only a few variables have an effect.
2. Unsuprisingly, mr.ash does well when the prior is fixed to the true
settings. However, initializing the mr.ash prior to the truth, then
fitting the prior to the data, does not improve performance at all,
and in fact makes things slightly worse (at least in this example).
Let's investigate this second surprise a little more closely.
Add header here
---------------
```{r fit-varbvs, fig.width=4, fig.height=5}
logodds <- seq(-3,0,length.out = 50)
fit.varbvs <- varbvs(X,NULL,y,update.sigma = s,sa = 1/s,
logodds = logodds,verbose = FALSE)
```
```{r plot-elbo}
sigmoid10 <- function (x) 1/(1 + 10^(-x))
logw <- fit.varbvs$logw
pdat <- data.frame(w = sigmoid10(logodds),
elbo = logw,sigmoid10(logodds))
ggplot(pdat,aes(x = w,y = elbo)) +
geom_point() +
geom_line() +
scale_x_continuous(trans = "log10",breaks = c(0.001,0.01,0.1,0.5)) +
labs(x = "\u03c0",y = "ELBO") +
theme_cowplot(10)
```
[mr-ash-vs-lasso]: https://stephens999.github.io/misc/mr_ash_vs_lasso.html