# Multivariate Normal

## Things To Note
### Notation
$ Y_i = \begin{pmatrix} {Y}_{i1} \\\\ ⋮ \\\\ {Y}_{in_i} \\\\ \end{pmatrix}  $

### Assumptions
Firstly, subject $i$ data is independent of subject $j$ data, $Y_i \perp\!\!\!\perp Y_j$, $i \ne j$.  
Secondly, for subject $i$, their data at $k$ is not independent of their data at $k'$, $Y_{ik} \not\!\perp\!\!\!\perp Y_{ik'}$, $k \ne k'$.

### Reminder of Transformations
Let $a,b,c,d$ be constants. Also, $ Y_i = \begin{pmatrix} {Y}_{i1} \\\\ ⋮ \\\\ {Y}_{in_i} \\\\ \end{pmatrix}  $.  
1. $E[aY+b] = aE[Y]+b$
2. $\text{Cov}(aY_1 + b, cY_2 +d)=ac*\text{Cov}(Y_1,Y_2)$
3. Let $c=(c_1,...,c_n)^T$. $E[c^TY]=c^TE[Y]$.
4. $\text{Var}(c^TY)=c^T\text{Var}(Y)c$

## Multivariate Normal (MVN)
$Y \sim MVN(\mu, \Sigma)$ where $\mu$ is the mean, which helps us capture the average patterns over time, and $\Sigma$ is the covariance matrix, which helps us model dependency/correlations.   
In this class, we will learn techniques to model the mean $\mu$ so that we represent the average pattern in the repeated measures over time, and model the covariance $\Sigma$ so that we best represent the correlated dependencies in the repeated measures over time.  
-  $\mu_{n \times 1} = E[Y_{n \times 1}]$
-  $\Sigma_{n \times n} = \text{Cov}(Y_{n \times 1}) = E{(Y-\mu)(Y-\mu)^T} = \begin{pmatrix} \sigma_{11} & \sigma_{12} & \cdots & \sigma_{1n} \\ \sigma_{21} & \sigma_{22} & \cdots & \sigma_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ \sigma_{n1} & \sigma_{n2} & \cdots & \sigma_{nn} \end{pmatrix}$ where the variances are on the diagonal and the covariances are on the off diagonal. When we model $\Sigma$, the covariance matrix, we have to make sure our model ensures $\Sigma$ is symmetric and positive definite.
- $R_{n \times n} = \text{Corr}(Y_{n \times 1}) = \begin{pmatrix} 1 & \rho_{12} & \cdots & \rho_{1n} \\ \rho_{21} & 1 & \cdots & \rho_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ \rho_{n1} & \rho_{n2} & \cdots & 1 \end{pmatrix}$, where $\rho_{jk} = \sigma_{jk} / \sqrt{\sigma_{jj} \sigma_{kk}} \in [-1,1]$, which is the correlation between the $j$th and $k$th element of $Y= ({Y}_{1}, \cdots, {Y}_{j}, \cdots, {Y}_{k}, \cdots, {Y}_{n})^T$.

### Definition Reminders
#### Positive Definite
$\Sigma_{n \times n}$ is **positive definite** if for any vector $c \ne 0$, $c^T\Sigma c > 0$. **Note**: If $\Sigma_{n \times n}$ is positive definite, then $\Sigma_{n \times n}$ is full rank.  
A matrix is **full rank** if there is no $c_0 \ne 0$ such that $c_0^TY=k$, where $k$ is a constant.  
##### Example
Suppose $Y_{3 \times 1}$ and $\text{Cov}(Y_{3 \times 1})=\Sigma_{3 \times 3} \equiv I_{3 \times 3}$, which is the identity matrix.  
Take any $c=(c_1,c_2,c_3)^T \ne 0$.  
$c^T\Sigma c = \begin{pmatrix} c_1 & c_2 & c_3 \end{pmatrix} \begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{pmatrix} \begin{pmatrix} c_1 \\ c_2 \\ c_3 \end{pmatrix} = \begin{pmatrix} c_1 & c_2 & c_3 \end{pmatrix} \begin{pmatrix} c_1 \\ c_2 \\ c_3 \end{pmatrix} = c_1^2 + c_2^2 + c_3^2 > 0$.  
Therefore, $I_{3 \times 3}$ is positive definite.
#### Positive Semi-definite
$\Sigma_{n \times n}$ is **positive semi-definite** if for any vector $c \ne 0$, $c^T\Sigma c ≥ 0$, or if $c_0^TY=k$ for some$c_0 \ne 0$ (meaning $\Sigma$ is not full rank), or $\text{Var}(c_0^TY)=0$.
##### Example
Take $Y_{3 \times 1} = \begin{pmatrix} Y_1 \\ Y_2 \\ Y_3 \end{pmatrix}$ and define this transformation $Z = \begin{pmatrix} Y_1 - \bar{Y} \\ Y_2 - \bar{Y} \\ Y_3 - \bar{Y} \end{pmatrix}$ where $\bar{Y} = \frac{Y_1 + Y_2 + Y_3}{3}$.  
We will show $\text{Cov}(Z)$ is positive semi-definite.   
Take $c_0=(1/3,1/3,1/3)^T$.  
$c_0^TZ= \begin{pmatrix} 1/3 & 1/3 & 1/3 \end{pmatrix} \begin{pmatrix} Y_1 - \bar{Y} \\ Y_2 - \bar{Y} \\ Y_3 - \bar{Y} \end{pmatrix} = \frac{1}{3}(Y_1+Y_2+Y_3)-\frac{1}{3}(\bar{Y}+\bar{Y}+\bar{Y})=\bar{Y}-\bar{Y}=0$. Our constant $k$ is $0$ here, so we showed $c_0^TZ=0$ ($c_0^TY=k$ from above).  
Next, we compute $\text{Var}(c_0^TZ)$.  
$Z_i = Y_i - \bar{Y} = Y_i - \frac{1}{3}(Y_1+Y_2+Y_3)$  
$Z = \begin{pmatrix} Y_1 - \bar{Y} \\ Y_2 - \bar{Y} \\ Y_3 - \bar{Y} \end{pmatrix} = \begin{pmatrix} 1-\frac{1}{3} & \frac{1}{3} & \frac{1}{3} \\ \frac{1}{3} & 1-\frac{1}{3} & \frac{1}{3} \\ \frac{1}{3} & \frac{1}{3} & 1-\frac{1}{3} \end{pmatrix} \begin{pmatrix} Y_1 \\ Y_2 \\ Y_3 \end{pmatrix} = AY$  
$\text{Var}(c_0^TZ) = c_0^T\text{Var}(Z)c_0= \begin{pmatrix} 1/3 & 1/3 & 1/3 \end{pmatrix} \text{Var}(AY) \begin{pmatrix} 1/3 \\ 1/3 \\ 1/3 \end{pmatrix} = \begin{pmatrix} 1/3 & 1/3 & 1/3 \end{pmatrix} A \text{Var}(Y) A^T \begin{pmatrix} 1/3 \\ 1/3 \\ 1/3 \end{pmatrix}$
$\text{Var}(c_0^TZ) = \begin{pmatrix} 1/3 & 1/3 & 1/3 \end{pmatrix} \begin{pmatrix} \frac{2}{3} & \frac{1}{3} & \frac{1}{3} \\ \frac{1}{3} & \frac{2}{3} & \frac{1}{3} \\ \frac{1}{3} & \frac{1}{3} & \frac{2}{3} \end{pmatrix} \text{Var}(Y) \begin{pmatrix} \frac{2}{3} & \frac{1}{3} & \frac{1}{3} \\ \frac{1}{3} & \frac{2}{3} & \frac{1}{3} \\ \frac{1}{3} & \frac{1}{3} & \frac{2}{3} \end{pmatrix} \begin{pmatrix} 1/3 \\ 1/3 \\ 1/3 \end{pmatrix}$  
We don't have to compute this full thing, just $c_0^TA = \begin{pmatrix} 1/3 & 1/3 & 1/3 \end{pmatrix} \begin{pmatrix} \frac{2}{3} & \frac{1}{3} & \frac{1}{3} \\ \frac{1}{3} & \frac{2}{3} & \frac{1}{3} \\ \frac{1}{3} & \frac{1}{3} & \frac{2}{3} \end{pmatrix} = \begin{pmatrix} 0 & 0 & 0 \end{pmatrix}$. Therefore, $\text{Var}(c_0^TZ)=0$.  
Putting everything together, we have that $\text{Cov}(Z)$ is positive semi-definite.

### Unbiased Estimators of $\mu$ and $\Sigma$
Let $ Y_i = \begin{pmatrix} {Y}_{i1} & \cdots & {Y}_{in} \end{pmatrix}^T$ and $Y_i \sim MVN(\mu, \Sigma)$.
#### Sample Mean 
An unbiased estimator of $\mu$ is the sample mean, $\bar{Y} = \frac{1}{N} \sum_{i=1}^{N} Y_i$. $\bar{Y}$ is an unbiased estimator because $E[\bar{Y}]=\mu$. 
##### Proof of $E[\bar{Y}]=\mu$
$E[\bar{Y}] = E[\frac{1}{N} \sum_{i=1}^{N} Y_i] = \frac{1}{N}(N\mu) = \mu$  
$\text{Cov}(\bar{Y}) = \text{Cov}(\frac{1}{N} \sum_{i=1}^{N} Y_i) = \frac{1}{N^2}\text{Cov}(\sum_{i=1}^{N} Y_i)=\frac{1}{N^2}N\text{Cov}(Y_1)=\frac{1}{N}\text{Cov}(Y_1)=\frac{1}{N}\Sigma$
#### Sample Covariance
An unbiased estimator of $\Sigma$ is the sample covariance, $S_{n \times n} = \frac{1}{N-1} \sum_{i=1}^{N} (Y_i - \bar{Y})(Y_i - \bar{Y})^T$. $S_{n \times n}$ is an unbiased estimator because $E[S]=\Sigma$. 
When we use the sample covariance $S$ as our estimator of $\Sigma$, we are assuming $\Sigma$ is unstructured. Unstructured means we are estimating every element of $\Sigma = \begin{pmatrix} \sigma_{11} & \sigma_{12} & \cdots & \sigma_{1n} \\ \sigma_{21} & \sigma_{22} & \cdots & \sigma_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ \sigma_{n1} & \sigma_{n2} & \cdots & \sigma_{nn} \end{pmatrix}$. If I had assumed $\Sigma$ was sturctured, that means I'm putting a structure in $\Sigma$ to estimate fewer parameter.  
**Examples of Structured $\Sigma$**  
$\Sigma = \text{diag}(\sigma_{11}, \cdots, \sigma_{nn})$  
$\Sigma = \begin{pmatrix} \sigma_{11} & \sigma_{12} & \cdots & 0 \\ \sigma_{21} & \sigma_{22} & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \sigma_{nn} \end{pmatrix}$ (This assumes all $0$'s except the diagonal and the $1,2$ block.)
##### Proof of $E[S]=\Sigma$
$E[(Y_i - \bar{Y})(Y_i - \bar{Y})^T]=E[Y_i Y_i^T] + E[\bar{Y} \bar{Y}^T] - E[Y_i \bar{Y}^T] - E[\bar{Y} Y_i^T] = [\Sigma + \mu \mu^T] + [\frac{1}{N}\Sigma + \mu \mu^T] - E[Y_i \bar{Y}^T] - E[\bar{Y} Y_i^T]$  
Then, using $\Sigma = E[Y_iY_i^T] - \mu \mu^T$.  
$E[Y_i \bar{Y}^T] = \frac{1}{N}\sum_{i=1}^{N}E[Y_i Y_k^T] = \frac{1}{N}\{\Sigma+ \mu \mu^T + (N-1)\mu \mu^T \}$  
Note: $\Sigma+ \mu \mu^T$ is for $i=k$ and $(N-1)\mu \mu^T$ is for $i \ne k$.
Thus, $E[(Y_i - \bar{Y})(Y_i - \bar{Y})^T]= \Sigma(1+ \frac{1}{N}) + 2\mu \mu^T - \frac{2}{N}\Sigma - 2\mu \mu^T = \Sigma(1 - \frac{1}{N})$.  
Finally, $E[S] = E[\frac{1}{N-1} \sum_{i=1}^{N}(Y_i - \bar{Y})(Y_i - \bar{Y})^T] = \frac{1}{N-1} (N) \frac{N-1}{N} \Sigma = \Sigma$


## Transformations of MVNs
Assume $Y \sim MVN(\mu, \Sigma)$ and $a$ and $c$ are $r \times n$ and $r \times 1$ matrices of constants, respectively.  
The transformation $Z_{r \times 1} = aY+c \sim MVN(a\mu + c, a\Sigma a^T)$.

## Conditional Distributions of MVNs
Assume $Y \sim MVN(\mu, \Sigma)$. $Y$ can be written as $Y = \begin{pmatrix} Y_1{}_{\,q\times 1} \\ Y_2{}_{\,r\times 1} \end{pmatrix}$, where $\mu = \begin{pmatrix} \mu_1{}_{\,q\times 1} \\ \mu_2{}_{\,r\times 1} \end{pmatrix}$ and $\Sigma = \begin{pmatrix} \Sigma_{11}{}_{\,q\times q} & \Sigma_{12}{}_{\,q\times r} \\ \Sigma_{21}{}_{\,r\times q} & \Sigma_{22}{}_{\,r\times r} \end{pmatrix}$.  
The conditional distribution of $Y_1$ given $Y_2 = y_2$, $Y_1 | Y_2 = y_2$, is $MVN(\mu_{Y_1 | Y_2}, \Sigma_{Y_1 | Y_2}$.   
$\mu_{Y_1 | Y_2} = E[Y_1 | Y_2 = y_2] = \mu_1 + \Sigma_{12} \Sigma_{22}^{-1}(y_2 - \mu_2)$  
$\Sigma_{Y_1 | Y_2} = \Sigma_{11} - \Sigma_{12} \Sigma_{22}^{-1} \Sigma_{21}$

## Example: Dental Study
This study had 27 participants, 16 boys and 11girls. The distance from the pituitary gland to pterygomaxillary fissure measured at ages 8, 10, 12, and 14. To note, a greater distance is better for orthodontic therapy.  
![](images/002_1.png)

We will use the following data:   
![](images/002_2.png)

Let $Y_{i1}$ be the distance measure for girl $i$ at 8 years old, and $Y_{i2}$ be the distance measure for girl $i$ at 10 years old.  
$Y_i = \begin{pmatrix} Y_{i1} \\ Y_{i2} \end{pmatrix} \sim MVN\left\{\begin{pmatrix} 21.18 \\ 22.23 \end{pmatrix}, \begin{pmatrix} 4.51 & 3.35 \\ 3.35 & 3.62 \end{pmatrix} \right\}$  
Note: We used the sample mean and sample covariance because they're unbiased estimators.  
  
Now suppose I want to compare distances at age 8 with the change in distance from ages 8 to 10 years.  
Let $Y_i = \begin{pmatrix} Z_{i1} \\ Z_{i2} \end{pmatrix} = \begin{pmatrix} Y_{i1} \\ Y_{i2} - Y_{i1} \end{pmatrix}$.  
Reminder: $Z = aY + c \sim MVN(a\mu, a\Sigma a^T)$.  
$Z_i = \begin{pmatrix} Y_{i1} \\ Y_{i2} - Y_{i1} \end{pmatrix} = \begin{pmatrix} 1 & 0 \\ -1 & 1 \end{pmatrix} \begin{pmatrix} Y_{i1} \\ Y_{i2} \end{pmatrix} + \begin{pmatrix} 0 \\ 0 \end{pmatrix}$  

$E[Z_i] = a\mu + c = \begin{pmatrix} 21.18 \\ 1.05 \end{pmatrix}$  
$\text{Cov}(Z_i) = a\Sigma a^T = \begin{pmatrix} 4.51 & -1.16 \\ -1.16 & 1.42 \end{pmatrix} $  
From $\text{Cov}(Y_i)$ and $\text{Cov}(Z_i)$, we can compute the correlation matrices. We will get the following results: $\text{Corr}(Y_{i1}, Y_{i2}) = 0.83$, which is high. Since the correlation is 0.83, the distance measure for the girl at age 10 is more likely to increase.  
$\text{Corr}(Z_{i1}, Z_{i2}) = -0.46$. This means that if the distance measure at age 8 is small (or large), the difference from 8 to 10 will be big (or small). 