analysis/difficult_example.Rmd

---
title: Add title here
author: Peter Carbonetto
date: June 23, 2020
site: workflowr::wflow_site
output: workflowr::wflow_html
---

To our surprise, we discovered that the Lasso (or Elastic Net)
sometimes provides more considerably more accurate predictions than
mr.ash in examples where there are many predictors having a small
effect on the outcome. Here I expand on an example from
[Matthew's brief investigation of this phenomenon][mr-ash-vs-lasso] to
better understand the (mis) behaviour of mr.ash.

```{r knitr, echo=FALSE}
knitr::opts_chunk$set(comment = "#",results = "hold",collapse = TRUE,
                      fig.align = "center",dpi = 140)
```

Load packages
-------------

These are the packages used in the analysis.

```{r load-pkgs, message=FALSE, warning=FALSE}
library(glmnet)
library(varbvs)
library(mr.ash.alpha)
library(ggplot2)
library(cowplot)
```

Simulate data
-------------

I simulate the data just as Matthew did, except that I split the data
into a training and a test set.

These are the data simulation settings: the number of samples in
training set, "n"; number of simulated variables, "p"; the number
of variables affecting the outcome ("p1"); and the proportion of variance in the outcome explained by the variables ("pve").

```{r data-sim-params}
n   <- 500
p   <- 1000
p1  <- 467
pve <- 0.95
```

Simulate a $2n \times p$ design matrix; the first $n$ rows is the
training set data, and the remaining $n$ rows are the test data.

```{r sim-x}
set.seed(15)
X <- matrix(rnorm(2*n*p),2*n,p)
X <- scale(X,center = TRUE,scale = TRUE)
```

Simulate the $p$ regression coefficients; only $p_1 < p$ of the
coefficients are nonzero.

```{r sim-coef}
b    <- rep(0,p)
j    <- sample(p,p1)
b[j] <- rnorm(p1)
```

Simulate the responses so that the target PVE is met.

```{r sim-y}
y  <- drop(X %*% b)
se <- sqrt((1 - pve)/pve) * sd(y)
y  <- y + rnorm(n,sd = se)
```

Split the data 50-50 into a training set and a test set.

```{r split-data}
test  <- 1:n
Xtest <- X[test,]
ytest <- y[test]
X     <- X[-test,]
y     <- y[-test]
```

Fit the elastic net and mr.ash models
-------------------------------------

Fit the Elastic Net model, in which the penalty strength parameter
($\lambda$) is chosen via 10-fold cross-validation.

```{r fit-glmnet}
fit.glmnet <- cv.glmnet(X,y,alpha = 0.95,standardize = FALSE)
```

Fit the mr.ash model using the default settings.

```{r fit-mr-ash-1, warning=FALSE, results="hide"}
fit.mrash <- mr.ash(X,y,standardize = FALSE)
```

Fit the mr.ash model again, but give it some help by providing it with
the prior and residual variance used to simulate the data. Also, the
posterior estimates of the coefficients are initialized to the Elastic
Net estimates.

```{r fit-mr-ash-2, warning=FALSE, results="hide"}
b  <- coef(fit.glmnet)[-1]
w1 <- p1/p
s  <- se^2
fit.trueg <- mr.ash(X,y,beta.init = b,update.pi = FALSE,update.sigma2 = FALSE,
                    sigma2 = s,sa2 = c(0,1/s),pi = c(1 - w1,w1))
```

Now we provide mr.ash with a little less help: we initialize the prior
to the settings used to simulate the data, but allow mr.ash to fit the
prior.

```{r fit-mr-ash-3, warning=FALSE, results="hide"}
fit.trueginit <- mr.ash(X,y,beta.init = coef(fit.glmnet)[-1],
                        update.pi = TRUE,update.sigma2 = FALSE,
                        sigma2 = s,sa2 = c(0,1/s),pi = c(1 - w1,w1))
```

Evaluate models on test set
---------------------------

Predict the test set outcomes using the fitted models.

```{r predict-test-outcomes}
y.glmnet    <- drop(predict(fit.glmnet,Xtest,s = "lambda.min"))
y.mrash     <- predict(fit.mrash,Xtest)
y.trueg     <- predict(fit.trueg,Xtest)
y.trueginit <- predict(fit.trueginit,Xtest)
```

Report the accuracy of the test predictions by the root-mean
squared error (RMSE).

```{r compute-rmse}
rmse <- function (x, y) sqrt(mean((x - y)^2))
cat(sprintf("glmnet:                   %0.3f\n",rmse(ytest,y.glmnet)))
cat(sprintf("mr.ash:                   %0.3f\n",rmse(ytest,y.mrash)))
cat(sprintf("mr.ash (true prior):      %0.3f\n",rmse(ytest,y.trueg)))
cat(sprintf("mr.ash (true prior init): %0.3f\n",rmse(ytest,y.trueginit)))
```

A couple surprises
------------------

These results are surprising in a couple ways:

1. The Elastic Net method does very well, despite the fact that
   we typically think of the method as being best suited for sparse
   settings in which only a few variables have an effect.

2. Unsuprisingly, mr.ash does well when the prior is fixed to the true
   settings. However, initializing the mr.ash prior to the truth, then
   fitting the prior to the data, does not improve performance at all,
   and in fact makes things slightly worse (at least in this example).

Let's investigate this second surprise a little more closely.

Add header here
---------------

```{r fit-varbvs, fig.width=4, fig.height=5}
logodds    <- seq(-3,0,length.out = 50)
fit.varbvs <- varbvs(X,NULL,y,update.sigma = s,sa = 1/s,
                     logodds = logodds,verbose = FALSE)
```

```{r plot-elbo} 
sigmoid10 <- function (x) 1/(1 + 10^(-x))
logw <- fit.varbvs$logw
pdat <- data.frame(w    = sigmoid10(logodds),
                   elbo = logw,sigmoid10(logodds))
ggplot(pdat,aes(x = w,y = elbo)) +
  geom_point() +
  geom_line() +
  scale_x_continuous(trans = "log10",breaks = c(0.001,0.01,0.1,0.5)) +
  labs(x = "\u03c0",y = "ELBO") +
  theme_cowplot(10)
```

[mr-ash-vs-lasso]: https://stephens999.github.io/misc/mr_ash_vs_lasso.html