## BIOSTAT 257: Homework 5
### Joanna Boland

Again we continue with the linear mixed effects model (LMM)
$$
    \mathbf{Y}_i = \mathbf{X}_i \boldsymbol{\beta} + \mathbf{Z}_i \boldsymbol{\gamma} + \boldsymbol{\epsilon}_i, \quad i=1,\ldots,n,
$$
where   
- $\mathbf{Y}_i \in \mathbb{R}^{n_i}$ is the response vector of $i$-th individual,  
- $\mathbf{X}_i \in \mathbb{R}^{n_i \times p}$ is the fixed effects predictor matrix of $i$-th individual,  
- $\mathbf{Z}_i \in \mathbb{R}^{n_i \times q}$ is the random effects predictor matrix of $i$-th individual,  
- $\boldsymbol{\epsilon}_i \in \mathbb{R}^{n_i}$ are multivariate normal $N(\mathbf{0}_{n_i},\sigma^2 \mathbf{I}_{n_i})$,  
- $\boldsymbol{\beta} \in \mathbb{R}^p$ are fixed effects, and  
- $\boldsymbol{\gamma} \in \mathbb{R}^q$ are random effects assumed to be $N(\mathbf{0}_q, \boldsymbol{\Sigma}_{q \times q}$) independent of $\boldsymbol{\epsilon}_i$.

The log-likelihood of the $i$-th datum $(\mathbf{y}_i, \mathbf{X}_i, \mathbf{Z}_i)$ is 
$$
    \ell_i(\boldsymbol{\beta}, \mathbf{L}, \sigma_0^2) = - \frac{n_i}{2} \log (2\pi) - \frac{1}{2} \log \det \boldsymbol{\Omega}_i - \frac{1}{2} (\mathbf{y} - \mathbf{X}_i \boldsymbol{\beta})^T \boldsymbol{\Omega}_i^{-1} (\mathbf{y} - \mathbf{X}_i \boldsymbol{\beta}),
$$
where
$$
    \boldsymbol{\Omega}_i = \sigma^2 \mathbf{I}_{n_i} + \mathbf{Z}_i \boldsymbol{\Sigma} \mathbf{Z}_i^T.
$$
Given $m$ independent data points $(\mathbf{y}_i, \mathbf{X}_i, \mathbf{Z}_i)$, $i=1,\ldots,m$, we seek the maximum likelihood estimate (MLE) by maximizing the log-likelihood
$$
\ell(\boldsymbol{\beta}, \boldsymbol{\Sigma}, \sigma_0^2) = \sum_{i=1}^m \ell_i(\boldsymbol{\beta}, \boldsymbol{\Sigma}, \sigma_0^2).
$$

In HW4, we used the nonlinear programming (NLP) approach (Newton type algorithms) for optimization. In this assignment, we derive and implement an expectation-maximization (EM) algorithm for the same problem.

In [None]:
# load necessary packages; make sure install them first
using BenchmarkTools, Distributions, LinearAlgebra, Random, Revise

### Question 1: Refresher on Normal-Normal Model

Assume the conditional distribution
$$
\mathbf{y} \mid \boldsymbol{\gamma} \sim N(\mathbf{X} \boldsymbol{\beta} + \mathbf{Z} \boldsymbol{\gamma}, \sigma^2 \mathbf{I}_n)
$$
and the prior distribution
$$
\boldsymbol{\gamma} \sim N(\mathbf{0}_q, \boldsymbol{\Sigma}).
$$
By the Bayes theorem, the posterior distribution is
\begin{eqnarray*}
f(\boldsymbol{\gamma} \mid \mathbf{y}) &=& \frac{f(\mathbf{y} \mid \boldsymbol{\gamma}) \times f(\boldsymbol{\gamma})}{f(\mathbf{y})}, \end{eqnarray*}
where $f$ denotes corresponding density. 

Note that
\begin{eqnarray*}
f(\boldsymbol{\gamma}) &\propto& \text{exp}\Bigg(-\frac{1}{2}\boldsymbol{\gamma}^T\boldsymbol{\Sigma}^{-1} \boldsymbol{\gamma}\Bigg), \\ 
f(\mathbf{y} \mid \boldsymbol{\gamma}) &\propto& \text{exp}\Bigg(-\frac{1}{2}(\mathbf{y} - \mathbf{X} \boldsymbol{\beta} + \mathbf{Z} \boldsymbol{\gamma})^T(\sigma^2 \mathbf{I}_n)^{-1} (\mathbf{y} - \mathbf{X} \boldsymbol{\beta} + \mathbf{Z} \boldsymbol{\gamma})\Bigg) \\
f(\mathbf{y} \mid \boldsymbol{\gamma}) &\propto& \text{exp}\Bigg(-\frac{1}{2}\sigma^{-2}\boldsymbol{\gamma}^T\mathbf{Z}^T\mathbf{Z}\boldsymbol{\gamma} - \sigma^{-2}\boldsymbol{\gamma}^T\mathbf{Z}^T(\mathbf{y} - \mathbf{X} \boldsymbol{\beta})\Bigg) \\
f(\boldsymbol{\gamma} \mid \mathbf{y}) &\propto& f(\mathbf{y} \mid \boldsymbol{\gamma}) f(\boldsymbol{\gamma}) \\
f(\boldsymbol{\gamma} \mid \mathbf{y}) &\propto& \text{exp}\Bigg(-\frac{1}{2}\boldsymbol{\gamma}^T\boldsymbol{\Sigma}^{-1} \boldsymbol{\gamma} -\frac{1}{2}\sigma^{-2}\boldsymbol{\gamma}^T\mathbf{Z}^T\mathbf{Z}\boldsymbol{\gamma} - \sigma^{-2}\boldsymbol{\gamma}^T\mathbf{Z}^T(\mathbf{y} - \mathbf{X} \boldsymbol{\beta})\Bigg) \\
f(\boldsymbol{\gamma} \mid \mathbf{y}) &\propto& \text{exp}\Bigg(-\frac{1}{2}\boldsymbol{\gamma}^T(\boldsymbol{\Sigma}^{-1} + \sigma^{-2}\mathbf{Z}^T\mathbf{Z}) \boldsymbol{\gamma}^T - \sigma^{-2}\boldsymbol{\gamma}^T\mathbf{Z}^T(\mathbf{y} - \mathbf{X} \boldsymbol{\beta})\Bigg)
\end{eqnarray*}

Therefore, using properties of normal distributions and completing the square, we know that 

$$\mathbf{y} \mid \boldsymbol{\gamma} \sim N(A^{-1}b, A^{-1})$$,

where
$$A = \boldsymbol{\Sigma}^{-1} + \sigma^{-2}\mathbf{Z}^T\mathbf{Z}, \quad b = \sigma^{-2}\mathbf{Z}^T(\mathbf{y} - \mathbf{X} \boldsymbol{\beta})$$

Therefore, by the Woodbury Identity

$$\text{Var} (\boldsymbol{\gamma} \mid \mathbf{y}) = (\boldsymbol{\Sigma}^{-1} + \sigma^{-2}\mathbf{Z}^T\mathbf{Z})^{-1}$$
$$\text{Var} (\boldsymbol{\gamma} \mid \mathbf{y}) = \boldsymbol{\Sigma} - \boldsymbol{\Sigma}\mathbf{Z}^T(\sigma^{2}\mathbf{I} + \mathbf{Z}\boldsymbol{\Sigma}\mathbf{Z}^T)^{-1}\mathbf{Z}\boldsymbol{\Sigma}$$,

and additionally

\begin{eqnarray*}
\mathbb{E} (\boldsymbol{\gamma} \mid \mathbf{y}) &=& \sigma^{-2} (\sigma^{-2} \mathbf{Z}^T \mathbf{Z} + \boldsymbol{\Sigma}^{-1})^{-1 } \mathbf{Z}^T (\mathbf{y} - \mathbf{X} \boldsymbol{\beta}) \\
&=& \sigma^{-2}(\boldsymbol{\Sigma} - \boldsymbol{\Sigma}\mathbf{Z}^T(\sigma^{2}\mathbf{I} + \mathbf{Z}\boldsymbol{\Sigma}\mathbf{Z}^T)^{-1}\mathbf{Z}\boldsymbol{\Sigma})\mathbf{Z}^T (\mathbf{y} - \mathbf{X} \boldsymbol{\beta}) \\
&=& \boldsymbol{\Sigma}\mathbf{Z}^T(\sigma^{-2}\mathbf{I} - \sigma^{-2}(\sigma^{2}\mathbf{I} + \mathbf{Z}\boldsymbol{\Sigma}\mathbf{Z}^T)^{-1}\mathbf{Z}\boldsymbol{\Sigma}\mathbf{Z}^T)(\mathbf{y} - \mathbf{X} \boldsymbol{\beta}) \\
&=& \boldsymbol{\Sigma}\mathbf{Z}^T(\sigma^{2}\mathbf{I} + \mathbf{Z}\boldsymbol{\Sigma}\mathbf{Z}^T)^{-1}(\sigma^{-2}\mathbf{I}(\sigma^{2}\mathbf{I} + \mathbf{Z}\boldsymbol{\Sigma}\mathbf{Z}^T) - \sigma^{-2}\mathbf{Z}\boldsymbol{\Sigma}\mathbf{Z}^T)(\mathbf{y} - \mathbf{X} \boldsymbol{\beta}) \\
&=& \boldsymbol{\Sigma}\mathbf{Z}^T(\sigma^{2}\mathbf{I} + \mathbf{Z}\boldsymbol{\Sigma}\mathbf{Z}^T)^{-1}(\mathbf{I} + \sigma^{-2}\mathbf{Z}\boldsymbol{\Sigma}\mathbf{Z}^T - \sigma^{-2}\mathbf{Z}\boldsymbol{\Sigma}\mathbf{Z}^T)(\mathbf{y} - \mathbf{X} \boldsymbol{\beta}) \\
&=& \boldsymbol{\Sigma}\mathbf{Z}^T(\sigma^{2}\mathbf{I} + \mathbf{Z}\boldsymbol{\Sigma}\mathbf{Z}^T)^{-1}(\mathbf{y} - \mathbf{X} \boldsymbol{\beta})
\end{eqnarray*}

### Question 2: Derive EM Algorithm

1. Write down the complete log-likelihood

\begin{eqnarray*}
\sum_{i=1}^m \log f(\mathbf{y}_i, \boldsymbol{\gamma}_i \mid \boldsymbol{\beta}, \boldsymbol{\Sigma}, \sigma^2) &=& \sum_{i=1}^m [\log f(\mathbf{y}_i \mid  \boldsymbol{\gamma}_i,\boldsymbol{\beta}, \sigma^2) + \log f( \boldsymbol{\gamma}_i \mid \boldsymbol{\Sigma})] \\
&=& \sum_{i=1}^m \Bigg[- \frac{n_i}{2} \log (2\pi) - \frac{1}{2} \log \det (\sigma^{2}\mathbf{I}_{n_i}) - \frac{1}{2} (\mathbf{y}_i - \mathbf{X}_i \boldsymbol{\beta} - \mathbf{Z}_i \boldsymbol{\gamma}_i)^T \sigma^{-2}\mathbf{I}_{n_i} (\mathbf{y}_i - \mathbf{X}_i \boldsymbol{\beta} - \mathbf{Z}_i \boldsymbol{\gamma}_i)
- \frac{q}{2} \log (2\pi) - \frac{1}{2} \log \det \boldsymbol{\Sigma} - \frac{1}{2} \boldsymbol{\gamma}_i^T \Sigma^{-1}\boldsymbol{\gamma}_i\Bigg] \\
&=& \sum_{i=1}^m \Bigg[- \frac{n_i}{2} \log (2\pi) - \frac{1}{2} \log \det (\sigma^{2}\mathbf{I}_{n_i}) - \frac{1}{2} (\mathbf{y}_i - \mathbf{X}_i \boldsymbol{\beta} - \mathbf{Z}_i \boldsymbol{\gamma}_i)^T \sigma^{-2}\mathbf{I}_{n_i} (\mathbf{y}_i - \mathbf{X}_i \boldsymbol{\beta} - \mathbf{Z}_i \boldsymbol{\gamma}_i)
 - \frac{1}{2} \boldsymbol{\gamma}_i^T \Sigma^{-1}\boldsymbol{\gamma}_i\Bigg] - \frac{qm}{2} \log (2\pi) - \frac{m}{2} \log \det \boldsymbol{\Sigma}
\end{eqnarray*}

2. Derive the $Q$ function (E-step).

\begin{eqnarray*}
Q(\boldsymbol{\beta}, \boldsymbol{\Sigma}, \sigma^2 \mid \boldsymbol{\beta}^{(t)}, \boldsymbol{\Sigma}^{(t)}, \sigma^{2(t)})
\end{eqnarray*}


### Question 3: Objective of a single datum

We modify the code from HW4 to evaluate the objective, the conditional mean of $\boldsymbol{\gamma}$, and the conditional variance of $\boldsymbol{\gamma}$. Start-up code is provided below. You do _not_ have to use this code.