# MR-ASH simulation data analysis

Simulation procedures are [available here](https://gaow.github.io/mvarbvs/doc/pipeline/Pipelines.html#Simulations). In brief, we looked at various types of underlying mixture distribution of effect size, including those explored in Figure 2 of [ash paper](https://academic.oup.com/biostatistics/article/18/2/275/2557030/False-discovery-rates-a-new-deal), with different proportions of $\pi_0$.

We perform 3 analysis for simulated data:
1. Univariate analysis (R's `lm()`)
2. `mr-ash` analysis 
3. `ash` analysis using results from 1

Results are stored at: `/project/compbio/GTEx_eQTL/MRASH_results/Simulation`. 

## Highlights
### For genotype without LD convolution
These are results with `permuted` in the file name.
1. When signal is dense and slap `mr-ash` will generate a quite sparse solution with much over estimated $\hat{\sigma}$. In the case of 50% signal `mr-ash` estimates $\hat{\sigma} > 1000$ whereas the truth is 1. (see file `0p5_big_normal_1.expr.analyzed.pdf`). Also though better in shrinking the effect than `ash` the estimate is still much larger. There is a big difference between `mr-ash` and `ash` results, too. 
2. There is less over estimate of $\hat{\sigma}$ when signal is more spiky. (see file `0p5_near_normal_1.expr.analyzed.pdf` and `0p5_spiky_1.expr.analyzed.pdf`)
3. In a somewhat more realistic scenario both `ash` and `mr-ash` can do quite good job. There does not seem to have a particular harm using `mr-ash`. (see file `0p999_big_normal_1.expr.analyzed.pdf`)
4. The difference between `mr-ash` and `ash` in the more realistic scenario above is even less obvious when signal is more spiky. (see file `0p999_near_normal_1.expr.analyzed.pdf` and `0p999_spiky_1.expr.analyzed.pdf`). In fact both `mr-ash` and `ash` recovers the true signal less well than the more slap situation (but maybe slap situation is more realistic?).

### For genotype with LD convolution
These are results without `permuted` in the file name.
1. When signal is dense and slap, `mr-ash` suffers the same problem as before. But in this scenario `ash` result over estimates effect size even more than plain univariate analysis. (see file `0p5_big_normal_1.expr.analyzed.pdf`)
2. In a somewhat more realistic scenario (see file `0p999_big_normal_1.expr.analyzed.pdf`) the advantage of `mr-ash` is obvious. However there is still a slight over-estimate of $\hat{\sigma}$.
3. When signal gets more spiky, there seems a slight under-estimate of $\hat{\sigma}$. Still `mr-ash` does better than `ash`. (see file `0p999_near_normal_1.expr.analyzed.pdf` and `0p999_spiky_1.expr.analyzed.pdf`)

Here we [compare CDF of estimates with the truth](http://stephenslab.github.io/ash/analysis/plot_cdf_eg.html)

In [None]:
g = ashr::normalmix(c(res$meta$pi0, (1 - res$meta$pi0) * res$meta$pi), rep(0, length(res$meta$sigma)+1), c(0, res$meta$sigma))
x = seq(-6,6,length = 500)
cdf_dat = data.frame(x = x,y = as.numeric(ashr::mixcdf(g,x)), method="truth", scenario = res$meta$name)
library(ggplot2)
ggplot(cdf_dat, aes(x = x,y = y,color = method)) + 
    geom_line(lwd = 1.5,alpha = 0.7) + facet_grid(.~scenario) + 
    theme(legend.position = "bottom")