# Idea

We would like to examine the distribution of the effect of gene expression ($\beta_{gene}$) on trait across all genes. First of all, let's ignore the possibility that a variant can also have un-mediated effect. 

$$\begin{align*}
    \hat\beta_{gwas} &= \beta_{gene} \hat\beta_{eqtl} + \epsilon \\
    \epsilon &\sim \mathcal{N}(0, \sigma^2) \\
    \beta_{gene} &\sim g
\end{align*}$$

Ideally, we want to estimate $g$ without any assumption on the shape of $g$ (non-parametric method) but let's keep anything simple here and assume that $g$ is a mixture of Gaussian. Namely,

$$\begin{align*}
    g = \sum_{k = 1}^K \pi_k \mathcal{N}(0, \sigma_k^2)
\end{align*}$$

Then, the problem of solving for $g$ becomes to estimate $\pi_k, \sigma_k^2$ from data $\hat\beta_{gwas}, \hat\beta_{eqtl}$. 

## EM algorithm

For a simple random effect model, the likelihood is obtained by marignalizing out $\beta_{gene}$. Namely, (for $K = 1$ case)

$$\begin{align*}
    \beta_{gwas} &\sim \mathcal{N}(0, \hat\beta_{eqtl}^2 \sigma_1^2 + \sigma^2)
\end{align*}$$

So that, by introducing the hidden variable $Z$ indicating which component is really contributing to the variance term, the complete likelihood is

$$\begin{align*}
    \beta_{gwas}| Z = k &\sim \mathcal{N}(0, \hat\beta_{eqtl}^2 \sigma_k^2 + \sigma^2) \\
    Z &\sim Multinomial(1; \pi_1, \dots, \pi_k)
\end{align*}$$

From the complete likelihood, we can obtain the EM algorithm. 

- **E-step**:

$$\begin{align*}
    L_{ki} &= \mathcal{N}(\hat\beta_{gwas, i}; 0, \hat\beta_{eqtl, i}^2 \sigma_k^2 + \sigma^2) \\
    w_{ki}^{(t)} &= \Pr(Z_i = k | Data_i, \theta^{(t)}) \\
    &= \frac{L_{ki}^{(t)} \pi_k^{(t)}}{\sum_{k'} L_{k'i}^{(t)} \pi_{k'}^{(t)}}
\end{align*}$$

- **M-step**:

$$\begin{align*}
    \pi_{k}^{(t+1)} &= \frac{\sum_i {w_{ki}^{(t)}}}{\sum_{k'} \sum_i {w_{k'i}^{(t)}}} \\
    \sigma_k^{(t+1)}, \sigma^{(t+1)} &= \arg\max_{\sigma_k, \sigma} \sum_k \sum_i w_{ki}^{(t)} (\frac{1}{2}\log\frac{1}{\hat\beta_{eqtl, i}^2\sigma_k^2 + \sigma^2} - \frac{1}{2} \frac{\hat\beta_{gwas, i}^2}{\hat\beta_{eqtl, i}^2\sigma_k^2 + \sigma^2})
\end{align*}$$

To solve for the last optimization problem, I plan to use gradient decsent as first try. To make it unconstraint ($\sigma, \sigma_k$ should be at least non-zero), let $x = \log(\sigma^2), x_k = \log(\sigma_k^2)$. 

$$\begin{align*}
    \tau_{ki} &= \hat\beta_{eqtl, i}^{2} \sigma_k^{2} + \sigma^{2} \\
    f_1 &= -\frac{1}{\tau_{ki}} + \frac{\hat\beta_{gwas, i}^2}{\tau_{ki}^2} \\
    \frac{\partial}{\partial x_k} &= e^{x_k} \sum_i \hat\beta_{eqtl, i}^{2} w_{ki}^{(t)} f_1 \\
    \frac{\partial}{\partial x} &= e^{x} \sum_i \sum_k w_{ki}^{(t)} f_1
\end{align*}$$

It turns out that gradient descent solution is not close enough to the optimum (in term of the fact the gradient is far from zero). I also used Newton-Raphson's method to improve the solution locally. 

$$\begin{align*}
    f_2 &= \frac{1}{\tau_{ki}^2} - \frac{2\hat\beta_{gwas, i}^2}{\tau_{ki}^3} \\
    \frac{\partial^2}{\partial x_k \partial x_k} &= e^{x_k} \sum_i \hat\beta_{eqtl, i}^{2} w_{ki}^{(t)} f_1 + e^{x_k} \sum_i \beta_{eqtl, i}^{2} w_{ki}^{(t)} f_2\\
    \frac{\partial^2}{\partial x \partial x} &= e^{x} \sum_i \sum_k w_{ki}^{(t)} f_1 + e^{x} \sum_i \sum_k w_{ki}^{(t)} f_2 \\
    \frac{\partial^2}{\partial x \partial x_k} &= e^{x} \sum_i \sum_k \hat\beta_{eqtl, i}^2 w_{ki}^{(t)} f_2
\end{align*}$$


## Other strategy? Estimating mixture proportion with grid search

Another thing I can imagine is to fix $\sigma_k^2$ in mixture model and estimate mixture proportion. At least in this setting, we can obtain analytical M-step or even I may find out-of-box solver for this task ([`ashr`](https://github.com/stephens999/ashr)?)

If I naively replace the above mixture model with fixed $\sigma_k^2$, the M-step is still lack of analytical result. The only incremental improvement is that the optimization problem (root-finding problem) reduces from $K+1$ dimensions to one dimension. Then I would not bother to implement unless my previous solver is too unstable to rely on. 