# Prototype of VEM in M&M ASH model
This is the VEM step of M&M ASH model, version 2.

## M&M ASH model

We assume the following multivariate, multiple regression model with $N$ samples, $J$ effects and $R$ conditions (and **without covariates, for the time being**)
\begin{align}
\bs{Y}_{N\times R} = \bs{X}_{N \times J}\bs{B}_{J \times R} + \bs{E}_{N \times R},
\end{align}
where
\begin{align}
\bs{E} &\sim \N_{N \times R}(\bs{0}, \bs{I}_N, \bs{\Lambda}^{-1}),\\
\bs{\Lambda} &= \diag(\lambda_1,\ldots,\lambda_R).
\end{align}

We assume true effects $\bs{b}_j$ (rows of $\bs{B}$) are iid with prior distribution of mixtures of multivariate normals

$$p(\bs{b}_j) = \sum_{t = 0}^T\pi_t\N_R(\bs{b}_j | \bs{0}, \bs{V}_t),$$

Where the $\bs{V}_t$'s are $R \times R$ covariance matrices and the $\pi_t$'s are their weights.

We place Gamma prior on $\bs{\Lambda}$

$$\lambda_r \overset{iid}{\sim} \gm(\alpha, \beta),$$

and set $\alpha = \beta = 0$ so that it is equivalent to estimating $\bs{\Lambda}$ via maximum likelihood.

We can augment the prior of $\bs{b}_j$ by indicator vector $\bs{\gamma}_j \in \mathbb{R}^T$ for membership of $\bs{b}_j$ into one of the $T$ mixture groups. The densities involved are

\begin{align}
p(\bs{Y},\bs{B},\bs{\Gamma},\bs{\Lambda}) &= p(\bs{Y}|\bs{B}, \bs{\Lambda})p(\bs{B}|\bs{\Gamma})p(\bs{\Gamma})p(\bs{\Lambda}), \\
p(\bs{Y}|\bs{B}, \bs{\Lambda}) &= N_{N \times R}(\bs{X}\bs{B}, \bs{I}_N, \bs{\Lambda}^{-1}), \\
p(\lambda_r|\alpha,\beta) &= \frac{\beta^{\alpha}}{\Gamma(\alpha)}\lambda_r^{\alpha - 1}\exp\{-\beta\lambda_r\}, \\
p(\bs{b}_j|\bs{\gamma}_j) &= \prod_{t = 0}^T\left[\N(\bs{b}_j|\bs{0},\bs{V}_t)\right]^{\gamma_{jt}},\\
p(\bs{\gamma}_j) &= \prod_{t = 0}^{T} \pi_t^{\gamma_{jt}}.
\end{align}

**We assume $V_t$'s and their corresponding $\pi_t$'s are known. In practice we use `mashr` to estimate these quantities and provide them to M&M.**

### Variational approximation to densities

For the posterior of $\bs{B}$ we seek an independent variational approximation based on

\begin{align}
q(\bs{B}, \bs{\Gamma}, \bs{\Lambda}) = q(\bs{\Lambda})\prod_{j = 1}^{J}q(\bs{b}_j,\bs{\gamma}_j),
\end{align}

so that we can maximize over $q$ the following lower bound of the marginal log-likelihood

\begin{align}
\log p(\bs{Y}) \geq \mathcal{L}(q) = \int q(\bs{B}, \bs{\Gamma}, \bs{\Lambda}) \log\left\{\frac{p(\bs{Y},\bs{B},\bs{\Gamma},\bs{\Lambda})}{q(\bs{B}, \bs{\Gamma}, \bs{\Lambda})}\right\}\dif\bs{B}\dif\bs{\Gamma}\dif\bs{\Lambda},
\end{align}

Gao & Wei have previously developed [a version that assumes $\Lambda = I_R$](https://github.com/gaow/mvarbvs/blob/master/writeup/identity_cov/mnmash.pdf). This version generalized it to a diagonal matrix with Gamma priors. [David has developed a version](https://www.overleaf.com/11985539jvwgjhrqnrry#/45465793/) that assumes a diagonal plus low rank structure -- the model of that version is a bit different from shown here, and will be prototyped later after this version works.

### Core updates

The complete derivation of updates are documented elsewhere (in the two PDF write-ups whose links are shown above). Here I document core updates to guide implementation of the algorithm.

Let $E[\bs{R}_{-j}] := \bs{Y} - \bs{X}\bs{\mu}_{\bs{B}} + \bs{x}_j\bs{\mu}_{\bs{B}[j, ]}^{\intercal}$, then


\begin{align}
\bs{\xi}_j &= E\left[\bs{R}_{-j}\right]^{\intercal}\bs{x}_j\|\bs{x}_j\|^{-2}, \\
\bs{\Sigma}_{jt} &= \left(\bs{V}_t^{-1} + \|\bs{x}_j\|^2\bs{\Lambda}\right)^{-1}, \\
\bs{\mu}_{jt} &= \bs{\Sigma}_{jt}\bs{\Lambda}E\left[\bs{R}_{-k}\right]^{\intercal}\bs{x}_j, \\
w_{jt} &= \frac{\pi_t\N(\bs{\xi}_j|\bs{0}, \bs{V}_t + \bs{\Lambda}^{-1}\|\bs{x}_j\|^{-2})}{\sum_{t = 0}^T\pi_t\N(\bs{\xi}_j|\bs{0}, \bs{V}_t + \bs{\Lambda}^{-1}\|\bs{x}_j\|^{-2})},\\
\bs{\mu}_{\bs{B}[j, ]}  &= \sum_{t = 0}^T w_{jt}\bs{\mu}_{jt}
\end{align}

We update until the lower bound $\mathcal{L}(q)$ converges.

## Initialization

* We fit `mashr` with effects learned from univariate analysis to obtain $\pi_t$ and $V_t$
  * For the first round the effects are "LD-polluted"
* Use multivariate LASSO to get the ordering of $X$ for input. Similar approach has been previousely used with `varbvs`.
* We "stack" expression data under multiple conditions and impute missing data with mean imputation or `softImpute` for a completed $Y$ matrix.
  * This version of the model does not impute missing data in $Y$ in its variational updates although this will be added in next version.
* We regress out covariates beforehand
  * Same approach taken by Guan & Stephens 2011 yet not Carbonetto & Stephens 2012
  * In next version we will preprocess covariates by "stacking" them together and perform a low-rank decomposition / imputation. For example for 50 tissues there will be a blocked matrix with a total of over 1000 PEER factors when stacked together, with non-random missing data. We will perform a low rank approximation to hopefully only keep < 50 PEER. We will then control for covariates in the M&M model.
  
**In this notebook we use a test data set of 2 tissues: Thyroid and Lung. As a first pass we also fix $\bs{V}$ as a null matrix plus an identity matrix, with weights and $\pi_0=0.9$.**

In [1]:
dat = readRDS('/home/gaow/Documents/GTExV8/Thyroid.Lung.FMO2.filled.rds')
str(dat)
attach(dat)
univariate_regression = function(X, y, Z = NULL){
    if (!is.null(Z)) {
        y = .lm.fit(Z, y)$residuals
    }
    calc_stderr = function(X, residuals) { sqrt(diag(sum(residuals^2) / (nrow(X) - 2) * chol2inv(chol(t(X) %*% X)))) }
    output = do.call(rbind, 
                  lapply(c(1:ncol(X)), function(i) { 
                      g = .lm.fit(cbind(1, X[,i]), y)
                      return(c(coef(g)[2], calc_stderr(cbind(1, X[,i]), g$residuals)[2]))
                  })
                 )
    return(list(betahat = output[,1], sebetahat = output[,2], 
                new_y = y))
}

betahat = matrix(0, dim(X)[2], dim(Y)[2])
sebetahat = matrix(0, dim(X)[2], dim(Y)[2])
for (i in 1:dim(Y)[2]) {
    res = univariate_regression(X, Y[,i])
    betahat[,i] = res$betahat
    sebetahat[,i] = res$sebetahat
}

str(betahat)
str(sebetahat)


List of 2
 $ Y:'data.frame':	698 obs. of  2 variables:
  ..$ Thyroid: num [1:698] 0.163 0.436 -0.212 0.327 -0.698 ...
  ..$ Lung   : num [1:698] 0.77011 0.77799 -0.65361 0.00672 -0.36792 ...
 $ X: num [1:698, 1:7492] 1 0 0 0 0 1 1 0 1 1 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:698] "GTEX-111CU" "GTEX-111FC" "GTEX-111VG" "GTEX-111YS" ...
  .. ..$ : chr [1:7492] "chr1_170185243_G_A_b38" "chr1_170185272_T_C_b38" "chr1_170185405_C_A_b38" "chr1_170185417_G_A_b38" ...
 num [1:7492, 1:2] -0.0341 -0.07 -0.0492 -0.07 -0.07 ...
 num [1:7492, 1:2] 0.0393 0.0706 0.0555 0.0706 0.0706 ...


In [2]:
%get X Y betahat sebetahat --from R
import numpy as np

Loading required package: feather


## Data preview

In [3]:
X

array([[ 1.,  0.,  0., ...,  0.,  1.,  0.],
       [ 0.,  0.,  0., ...,  0.,  1.,  0.],
       [ 0.,  1.,  0., ...,  0.,  1.,  0.],
       ..., 
       [ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  1.,  0.],
       [ 0.,  0.,  0., ...,  0.,  1.,  0.]])

In [4]:
Y = Y.as_matrix()

In [5]:
Y

array([[ 0.16348104,  0.77010917],
       [ 0.43588995,  0.77798736],
       [-0.21237311, -0.65361193],
       ..., 
       [ 0.62036618, -0.0035004 ],
       [ 0.00279156, -0.05439095],
       [-0.14650835,  0.29935286]])

## Utility: mash model

In [21]:
#%save /home/gaow/GIT/software/libgaow/py/src/model_mash.py -f
#!/usr/bin/env python3
__author__ = "Gao Wang"
__copyright__ = "Copyright 2016, Stephens lab"
__email__ = "gaow@uchicago.edu"
__license__ = "MIT"
__version__ = "0.1.0"

import numpy as np, scipy as sp
from scipy.stats import norm, multivariate_normal as mvnorm
from collections import OrderedDict

def inv_sympd(m):
    '''
    Inverse of symmetric positive definite
    https://stackoverflow.com/questions/40703042/more-efficient-way-to-invert-a-matrix-knowing-it-is-symmetric-and-positive-semi
    '''
    zz , _ = sp.linalg.lapack.dpotrf(m, False, False)
    inv_m , info = sp.linalg.lapack.dpotri(zz)
    # lapack only returns the upper or lower triangular part
    return np.triu(inv_m) + np.triu(inv_m, k=1).T

def get_svs(s, V):
    '''
    diag(s) @ V @ diag(s)
    '''
    return (s * V.T).T * s

class LikelihoodMASH:
    def __init__(self, data):
        self.J = data.B.shape[0]
        self.R = data.B.shape[1]
        self.P = len(data.U)
        self.data = data
        self.data.lik = {'relative_likelihood' : None,
                         'lfactor': None,
                         'marginal_loglik': None,
                         'loglik': None,
                         'null_loglik': None,
                         'alt_loglik': None}
        self.debug = None

    def compute_log10bf(self):
        self.data.log10bf = (self.data.lik['alt_loglik'] -  self.data.lik['null_loglik']) / np.log(10)

    def compute_relative_likelihood_matrix(self):
        matrix_llik = self._calc_likelihood_matrix_comcov() if self.data.is_common_cov() \
                      else self._calc_likelihood_matrix()
        lfactors = np.vstack(np.amax(matrix_llik, axis = 1))
        self.data.lik['relative_likelihood'] = np.exp(matrix_llik - lfactors)
        self.data.lik['lfactor'] = lfactors

    def _calc_likelihood_matrix(self):
        loglik = np.zeros((self.J, self.P))
        for j in range(self.J):
            sigma_mat = get_svs(self.data.S[j,:], self.data.V)
            try:
                loglik[j,:] = np.array([mvnorm.logpdf(self.data.B[j,:], cov = sigma_mat + self.data.U[p], allow_singular = True) for p in self.data.U])
            except Exception:
                self.debug = {'j': j, 'covs': [sigma_mat + self.data.U[p] for p in self.data.U]}
                raise
        return loglik

    def _calc_likelihood_matrix_comcov(self):
        sigma_mat = get_svs(self.data.S[0,:], self.data.V)
        return np.array([mvnorm.logpdf(self.data.B, cov = sigma_mat + self.data.U[p]) for p in self.data.U])

    def compute_loglik_from_matrix(self, options = ['all', 'alt', 'null']):
        '''
        data.lik.relative_likelihood first column is null, the rest are alt
        '''
        if 'marginal' in options:
            self.data.lik['marginal_loglik'] = np.log(self.data.lik['relative_likelihood'] @ self.data.pi) + self.data.lik['lfactor'] - np.sum(np.log(self.data.S), axis = 0)
            self.data.lik['loglik'] = np.sum(self.data.lik['marginal_loglik'])
        if 'alt' in options:
            self.data.lik['alt_loglik'] = np.log(self.data.lik['relative_likelihood'][:,1:] @ (self.data.pi[1:] / (1 - self.data.pi[0]))) + self.data.lik['lfactor'] - np.sum(np.log(self.data.S), axis = 1)
        if 'null' in options:
            self.data.lik['null_loglik'] = np.log(self.data.lik['relative_likelihood'][:,0]) + self.data.lik['lfactor'] - np.sum(np.log(self.data.S), axis = 1)

class PosteriorMASH:
    def __init__(self, data):
        '''
        // @param b_mat J by R
        // @param s_mat J by R
        // @param v_mat R by R
        // @param U_cube list of prior covariance matrices, for each mixture component P by R by R
        '''
        self.J = data.B.shape[0]
        self.R = data.B.shape[1]
        self.P = len(data.U)
        self.data = data
        self.data.post_mean_mat = np.zeros((self.R, self.J))
        self.data.post_mean2_mat = np.zeros((self.R, self.J))
        self.data.neg_prob_mat = np.zeros((self.R, self.J))
        self.data.zero_prob_mat = np.zeros((self.R, self.J))

    def compute_posterior_weights(self):
        d = (self.data.pi * self.data.lik['relative_likelihood'])
        self.data.posterior_weights = (d.T / np.sum(d, axis = 1))

    def compute_posterior(self):
        for j in range(self.J):
            Vinv_mat = inv_sympd(get_svs(self.data.S[j,:], self.data.V))
            mu1_mat = np.zeros((self.R, self.P))
            mu2_mat = np.zeros((self.R, self.P))
            zero_mat = np.zeros((self.R, self.P))
            neg_mat = np.zeros((self.R, self.P))
            for p, name in enumerate(self.data.U.keys()):
                U1_mat = self.get_posterior_cov(Vinv_mat, self.data.U[name])
                mu1_mat[:,p] = self.get_posterior_mean_vec(self.data.B[j,:], Vinv_mat, U1_mat)
                sigma_vec = np.sqrt(np.diag(U1_mat))
                null_cond = (sigma_vec == 0)
                mu2_mat[:,p] = np.square(mu1_mat[:,p]) + np.diag(U1_mat)
                if not null_cond.all():
                    neg_mat[np.invert(null_cond),p] = norm.cdf(mu1_mat[np.invert(null_cond),p], scale=sigma_vec[np.invert(null_cond)])
                zero_mat[null_cond,p] = 1.0
            self.data.post_mean_mat[:,j] = mu1_mat @ self.data.posterior_weights[:,j]
            self.data.post_mean2_mat[:,j] = mu2_mat @ self.data.posterior_weights[:,j]
            self.data.neg_prob_mat[:,j] = neg_mat @ self.data.posterior_weights[:,j]
            self.data.zero_prob_mat[:,j] = zero_mat @ self.data.posterior_weights[:,j]

    def compute_posterior_comcov(self):
        Vinv_mat = inv_sympd(get_svs(self.data.S[0,:], self.data.V))
        for p, name in enumerate(self.data.U.keys()):
            zero_mat = np.zeros((self.R, self.P))
            U1_mat = self.get_posterior_cov(Vinv_mat, self.data.U[name])
            mu1_mat = self.get_posterior_mean_mat(self.data.B, Vinv_mat, U1_mat)
            sigma_vec = np.sqrt(np.diag(U1_mat))
            null_cond = (sigma_vec == 0)
            sigma_mat = np.repeat(sigma_vec, self.J, axis = 1)
            neg_mat = np.zeros((self.R, self.J))
            if not null_cond.all():
                neg_mat[np.invert(null_cond),:] = norm.cdf(mu1_mat[np.invert(null_cond),:], scale = sigma_mat[np.invert(null_cond),:])
            m2_mat = np.square(mu1_mat) + np.diag(U1_mat)
            zero_mat[null_cond,:] = 1.0
            self.data.post_mean_mat += posterior_weights[p,:] * mu1_mat
            self.data.post_mean2_mat += posterior_weights[p,:] * mu2_mat
            self.data.neg_prob_mat += posterior_weights[p,:] * neg_mat
            self.data.zero_prob_mat += posterior_weights[p,:] * zero_mat

    @staticmethod
    def get_posterior_mean_vec(B, V_inv, U):
        return U @ (V_inv @ B)

    @staticmethod
    def get_posterior_mean_mat(B, V_inv, U):
        return B @ V_inv @ U

    @staticmethod
    def get_posterior_cov(V_inv, U):
        return U @ inv_sympd(V_inv @ U + np.identity(U.shape[0]))

    @classmethod
    def apply(cls, data):
        obj = cls(data)
        obj.compute_posterior_weights()
        if data.is_common_cov():
            obj.compute_posterior_comcov()
        else:
            obj.compute_posterior()

class PriorMASH:
    def __init__(self, data):
        self.data = data
        self.R = data.B.shape[1]

    def expand_cov(self, use_pointmass = True):
        def product(x,y):
            for item in y:
                yield x*item
        res = OrderedDict()
        if use_pointmass:
            res['null'] = np.zeros((self.R, self.R))
        res.update(OrderedDict(sum([[(f"{p}.{i+1}", g) for i, g in enumerate(product(self.data.U[p], np.square(self.data.grid)))] for p in self.data.U], [])))
        self.data.U = res

## Utility: regression data

In [20]:
#%save /home/gaow/GIT/software/libgaow/py/src/regression_data.py -f
#!/usr/bin/env python3
__author__ = "Gao Wang"
__copyright__ = "Copyright 2016, Stephens lab"
__email__ = "gaow@uchicago.edu"
__license__ = "MIT"
__version__ = "0.1.0"

#from model_mash import PriorMASH, LikelihoodMASH, PosteriorMASH

class RegressionData:
    def __init__(self, X = None, Y = None, Z = None, B = None, S = None):
        self.X = X
        self.Y = Y
        self.Z = Z
        self.B = B
        self.S = S
        self.lik = None
        self.l10bf = None

    def set_prior(self):
        pass

    def calc_likelihood(self):
        pass

    def calc_posterior(self):
        pass

    def calc_bf(self):
        pass

class MASHData(RegressionData):
    def __init__(self, X = None, Y = None, Z = None, B = None, S = None):
        RegressionData.__init__(self, X, Y, Z, B, S)
        self.post_mean_mat = None
        self.post_mean2_mat = None
        self.neg_prob_mat = None
        self.zero_prob_mat = None
        self._is_common_cov = None
        self.V = None
        self.U = None
        self.pi = None
        self.posterior_weights = None
        self.grid = None

    def is_common_cov(self):
        if self._is_common_cov is None and self.S is not None:
            self._is_common_cov = (self.S.T == self.S.T[0,:]).all()
        return self._is_common_cov

    def calc_posterior(self):
        PosteriorMASH.apply(self)

    def calc_likelihood(self):
        LikelihoodMASH.apply(self)

## Main function calls

In [8]:
data = MASHData(X = X, Y = Y, B = betahat, S = sebetahat)
data.U = {'identity': np.identity(2)}
data.V = np.identity(2)
data.pi = np.array([0.9, 0.05, 0.05])
data.grid = [0.5, 1]
prior = PriorMASH(data)
prior.expand_cov()

In [12]:
lik = LikelihoodMASH(data)
lik.compute_relative_likelihood_matrix()

In [13]:
lik.compute_loglik_from_matrix(options = ['alt', 'null'])
lik.compute_log10bf()

In [14]:
import warnings
warnings.filterwarnings("error")
PosteriorMASH.apply(data)

### multivariate normal issue
FIXME: should be infinity!

In [17]:
mvnorm.logpdf(np.array([0,0]), cov = np.array([[0,0],[0,0]]), allow_singular = True)

-0.0

## VEM updates
The Core function:

In [23]:
def singe_snp_multivariate(data, Y):
    '''
    single snp bayes regression of Y on each column of X
    under MASH model
    Y is N by R matrix
    X is N by J matrix
    Assume residual variance is identity for now
    '''
    data.reset(Y=Y)
    lik = LikelihoodMASH(data)
    lik.compute_relative_likelihood_matrix()
    lik.compute_loglik_from_matrix(options = ['alt', 'null'])
    lik.compute_log10bf()
    PosteriorMASH.apply(data)
    return {'alpha': data.log10bf / np.sum(data.log10bf), 'mu': data.post_mean_mat, 's2': data.post_mean2_mat - np.square(data.post_mean_mat)}