# A modular approach to M&M ASH model

## Motivation

In genetic association studies there is great interest in finding multiple causal variants (eQTL or GWAS hits). Several methods for fine mapping in univariate association problems (eg, eQTL discovery with single tissues) have been developed, but for multi-condition analysis this problem is more difficult to address. [DAP (Wen 2016)](http://dx.doi.org/10.1016/j.ajhg.2016.03.029) uses an MCMC based method (DAP) for fine mapping but it lacks a principled approach to combine genome-wide information to estimate hyper-parameters required by the algorithm, due to computational limitations.

On the other side of the coin, linkage Disequilibrium (LD) can impact effect estimates in multivariate association problems. LD causes effect size estimates not of the true effect of each SNP, but the "LD-convolved" effect that essentially is the combined effect of all SNPs in LD with the SNP of interest. So not only are these estimates correlated but more fundamentally they are inaccurate. In [mash (Urbut 2016)](http://dx.doi.org/10.1101/096552) we do not distinguish causal eQTLs from SNPs that are in LD with them; rather it makes inference only for the top SNP in each gene. Yet we have observed in GTEx data that effect size estimated for the top eQTL can be opposite in sign in brain vs non-brain tissues. This is likely due to multiple eQTLs in negative LD, rather than true effects being negative.

So we attempt to fix both problems with the M&M ASH model, or m&m for short hereafter. We have developed with [m&m assuming identity covariance](https://github.com/gaow/mvarbvs/tree/master/analysis/writeup/identity_cov) and [m&m assuming diagonal plus low rank covariance](https://github.com/gaow/mvarbvs/tree/master/analysis/writeup/lr_diag_cov), along with a very first draft [implementation](https://github.com/gaow/mnmashr), based on the variational inference framework similar to [varbvs (Carbonatto 2012)](http://stephenslab.uchicago.edu/assets/papers/Carbonetto2012.pdf) and [mrash](https://github.com/stephenslab/mvash) to hand issues with LD, yet in the multivariate notation that fits into the mash framework at least in setting up the mixture model. At the same time there are other potentially connected work in the lab including [rss (Zhu 2016)](http://dx.doi.org/10.1101/042457), which can be extended to the summary statistics version of mrash, and BMASS (Turchin), which can be considered a special case of mash (if we are willing to use known residual covariance matrix). Building m&m from scratch already involves implementing varbvs / mrash and mash as special cases; adding other special cases to m&m seems overly ambitious, and may result in a monster (rather than master!) algorithm / software that claims to do everything yet excels in nothing (in terms of performance) compared to existing individual pieces that have already been carefully designed, well engineered, extensively tested and properly documented`*`. Therefore we want to adopt a modular design to m&m that harnesses, rather than reinvent, all related Stephens lab work. 

`*` *one aspect the monster algorithm is surely to suffer is that the mixture components has to be initialized up-front, that means it has to be computed with LD-convoluted effect size estimates. But the modular approach will not have the problem.*

## Step one: deconvoluted effect size estimation

The key is to verify that the hyper-parameters $\mathbf{\pi}$ obtained via running ash on effect size estimates from mr-ash agrees with those estimated from mr-ash. We can then generalize that running mash on mr-ash estimate is equivalent to running the monster version of m&m.

## Step two: multivariate analysis on deconvoluted effect size estimates
Learns typical patterns of sparsity, sharing and correlations among effects from 

## Step three: fine-mapping for eQTL discovery