# week 3, session 1

# Multivariate normal distributions

In [44]:
import math
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import ticker, colors
from scipy.stats import norm
from scipy import special
from scipy.stats import multivariate_normal
from ipywidgets import interact,FloatSlider

# Independent univariate normal distributions

A one-dimensional normal distribution, is parameterized by its mean, $\mu$, and variance, $\sigma^2$. The mean describes the location of the variable, while the variance describes the scale of the variable as the variation around the mean. The probability density function is

$$ 
\mathcal{N}(x;\mu,\sigma^2) = \frac{1}{\sqrt{2\pi\sigma^2} }   \text{exp}\left( -\frac{(x - \mu)^2}{2\sigma^2} \right)
$$ 
In a two dimensional random process, samples are defined by two values, $x = (x_1,x_2)$. In this exercise, we imagine that the density for each component is an independent normal distribution, such that 
$x_1\sim\mathcal{N}(\mu_1,\sigma^2_1)$ and $x_2\sim\mathcal{N}(\mu_2,\sigma^2_2)$.  

In the following questions, let $\mu_1 = 5, \mu_2 = 20, \sigma^2_1 = 2, \sigma^2_2 = 5$

### Questions
$\star$ Generate a dataset by sampling $N = 10000$ samples from the random process, such that each sample is defined by two values $x_1$ and $x_2$. 

$\star$ Visualize the sampled data in a 2-dimensional scatter plot.

$\star$ Use np.mean and np.var to compute the sample-means and sample-variances. Compare with $\mu_1, \mu_2$ and $\sigma^2_1, \sigma^2_2$. Comment on the result for different number of samples $N$.

$\star$ For each dimension; plot a normalized histogram of the sample-values together with the density function given by the corresponding normal distribution.  Comment on the results when varying the number of samples $N$.

$\star$ For a random sample $(x_1,x_2)$, what is the probability $\mathbb{P}(x_1\leq \mu1 \cap x_2\leq \mu2)$. How could you investigate this empirically?


### Hints

$\bullet$ Remember that in Numpy, the normal distribution is defined by the location (by the mean) and scale (by the standard deviation - not the variance). 

$\bullet$ You can use the command plt.axis('equal') to force a plot to have equal axis ratios. This might make the scatterplot easier to read.


# Multivariate normal distribution


In $d$ dimensions, the multivariate normal probability density function is given by

$$
\mathcal{N}(\mathbf{x}; \mathbf{\mu},\mathbf{\Sigma}) = \frac{1}{ \sqrt{ (2\pi)^d |\mathbf{\Sigma}|}}\exp\left( -\frac{1}{2}(\mathbf{x}-\mathbf{\mu})^T\mathbf{\Sigma}^{-1}(\mathbf{x}-\mathbf{\mu}) \right)
$$
where $\mathbf{\mu}$ is a $d$-dimensional vector and $\mathbf{\Sigma}$ is the $d\times d$ covariance matrix.

We still consider sampling in two dimensions, such that a sample is given by two values $x = (x_1,x_2)$, with $(\mu_1,\mu_2)$ representing the means and $(\sigma^2_1,\sigma^2_2)$ representing the variance in the respective dimensions.

Now consider the case where the two components are correlated, such that there is a trend that whenever $x_1$ is larger than $\mu_1$, then $x_2$ tends to also be larger than $\mu_2$. And whenever $x_1$ is smaller than $\mu_1$ then $x_2$ tends to be smaller than $\mu_2$.  

In this case we cannot sample from two independent normal distributions, as they cannot describe the covariance between components. Instead, let $\mathbf{x}$ be a 2-dimensional (multivariate) normal distribution given by a mean, $\mathbf{\mu} = (\mu_1, \mu_2)$ and covariance matrix $\mathbf{\Sigma}$.


The covariance between components can be computed as, $\text{cov}[x_1,x_2] \equiv \mathbb{E}[(x_1 - \mu_1)(x_2-\mu_2)]$. The covariance matrix is then given by

$$\mathbf{\Sigma} = \begin{bmatrix}\sigma^2_{1} & \text{cov}[x_1,x_2]\\ \text{cov}[x_2,x_1] &\sigma^2_{2}\end{bmatrix}
= \begin{bmatrix}\sigma_{11} &\sigma_{12}\\ \sigma_{21} &\sigma_{22}\end{bmatrix}.$$

The terms $\sigma_{21}$ and $\sigma_{12}$ must be equal as they describe the covariance between the same components. The covariance matrix is hence always symmetric.

In this exercise, let $\mu_1,\mu_2,\sigma^2_1,\sigma^2_2$ be the same values as in the previous exercise, and set the covariance to $\sigma_{12} = \sigma_{21} = 2.5$.

### Questions

$\star$ Convince yourself that the sampled data from the previous exercise is independent, by computing the empirical covariance between the two components. 

$\star$ Generate a dataset with $N = 10000$ samples from the 2-dimensional normal $X\sim\mathcal{N}(\mathbf{\mu},\Sigma)$. 

$\star$ Use np.mean and np.cov to compute the sample mean and sample covariance matrix for the data. Compare with $\mathbf{\mu}$ and $\Sigma$ and comment on the results when varying $N$.


$\star$ Create a function that shows the following plots when provided with a dataset, and the $\mathbf{\mu}$ and $\mathbf{\Sigma}$ variables for the 2D normal from which the data was sampled.
-  The sampled data as a scatter plot. 
-  A countour plot for the 2D normal, such as:
        
        x, y = np.meshgrid(np.linspace(0,10,100),np.linspace(10,30,200))
        pos = np.empty(x.shape + (2,))
        pos[:, :, 0] = x; pos[:, :, 1] = y
        multinorm = multivariate_normal([mu1, mu2], [[sigma11, sigma12], [sigma21, sigma22]])
        plt.contour(x, y, multinorm.pdf(pos))
        plt.grid(True)
        plt.axis('equal')

- In a separate figure for each of the two components, plot the normalized histogram for the data. Add the plot of the corresponding marginal probability densitity functions, $p(x_1)$ and $p(x_2)$, to these figures.



$\star$ Use the plotting function to visualize the generated data. Comment on the figures in comparison with the results obtained from previously sampling independently in the previous exercise.

### Hints

$\bullet$ You can draw from the multivariate normal distribution using the numpy function np.random.multivariate_normal

$\bullet$ To make the figures easier to compare, you can plot them side-by-side using the plt.subplot features.



# Interpretation of the covariance

The correlation coefficient, $\rho$, is a quantity for describing the correlation between components. It is defined as the normalized covariance, as
$$ \rho = \dfrac{\text{cov}[x_1,x_2]}{\sqrt{\sigma^2_1\sigma^2_2}}  = \dfrac{\sigma_{12}}{\sqrt{\sigma_{12}\sigma_{21}}}\quad,\qquad \text{where $\rho\in[-1,1]$} $$

### Questions

$\star$ Use the plotting function to create similar visualizations for the 2-dimensional normal distribution but with different covariance matrices.
- For example, try to fix the variances, $\sigma^2_1$ and $\sigma^2_2$, while only changing the covariance. 
- Comment on how the orientation and shape of the ellipsoids (in the contour plots) depends on the covariance matrix. 
- Can you create a situation where $\rho = 1$ or $\rho = -1$. Comment on the relationship between $x_1$ and $x_2$ as $\rho$ approaches  $1$ or $-1$.


# Marginals and conditionals of the 2D normal

A very convenient property of the 2-dimensional normal distribution is that, when one of the variables $y$ is fixed at a known value $v$,  the conditional probability density function for the other variable $p(x|y=v)$ follows a normal distribution  $p(x|y=v)\sim\mathcal{N}$ $(\mu_{x|y=v}\;,\;\sigma^2_{x|y=v})$.

Here, the conditional mean and variance of $x$ is given by:

$$ \mu_{x|y=v} = \mu_{x} + \rho \sigma_x \dfrac{v - \mu_y}{\sigma_y}  $$

$$ \sigma_{x|y=v} = \sigma_x \sqrt{1-\rho^2} $$


The two dimensional normal with $\mathbf{\mu} = 0$ and $\mathbf\Sigma = \begin{bmatrix} 1 & \rho \\ \rho & 1 \end{bmatrix}$ is known as the standard bivariate normal distribution with correlation coefficient $\rho$. The probability density function is then

$$ p(x_1,x_2) = \dfrac{1}{2\pi\sqrt{1-\rho^2}} \exp \left( -\frac{ 1 }{ 2(1-\rho^2)}(x_1^2 + x_2^2 - 2\rho x_1 x_2)   \right)  $$


### Questions


$\star$ This exercise will demonstrate how you can derive the marginals and conditionlas of the density;
- Complete the square (see hint) in the expression:
$$ x_1^2 + x_2^2 - 2\rho x_1 x_2 + \underbrace{((\rho x_1)^2-(\rho x_1)^2)}_{0} $$
- Show mathematically that the joint is equal to $p(x_2|x_1)p(x_1)$ using,
$$ p(x_2 | x_1) = \mathcal{N}(x_2; \rho x_1, 1-\rho^2),\quad p(x_1) = \mathcal{N}(x_1;0,1).  $$
- What happens as $\rho$ gets close to $-1$ or $1$? 
- For what value of $\rho$ is $p(x_2|x_1)=p(x_1)$, making $X_1$ and $X_2$ independent?

$\star$ The following widget allows you to interactively change the covariance and fixed value of $y$ for a standard bivariate normal distribution $p(x,y)$. 
- Comment on the relation between $p(x)$ and $p(x|y=v)$ for different $v$ and $\rho$ configurations.

### Hints
$\bullet$ The technique of $\textit{completing the square}$ is about recognizing terms that can be collected into a squared difference expression using,
$$ a^2 + b^2 - 2ab = (a-b)^2 $$
Look for a similar pattern in the expression.

$\bullet$ If in doubt about $p(x_2)$, try to walk through the exercises again with $x_1$ and $x_2$ switched - would anything change?

In [80]:

def plotCondition(v,cov):    

    #define means, variances and covariance matrix
    mu_x = 0;
    mu_y = 0;

    sigma_xx = 1
    sigma_yy = 1
    sigma_xy = cov
    SIGMA = [ [sigma_xx, sigma_xy],[sigma_xy, sigma_yy]]  

    #compute correlation coefficient and conditional mean and variance
    rho = sigma_xy/np.sqrt(sigma_xx*sigma_yy)
    print('correlation coefficient = ',rho)
    mu_ = mu_x+rho*(v-mu_y)/sigma_yy
    sigma_ = sigma_xx*np.sqrt(1-rho*rho)

    #plot contour of p(x,y) and y = v as a line
    plt.figure(figsize=(10,10))
    x1 = np.linspace(-10,10, 100)
    ax1 = plt.subplot(221)
    x, y = np.meshgrid(x1,x1)
    pos = np.empty(x.shape + (2,))
    pos[:, :, 0] = x; pos[:, :, 1] = y
    multinorm = multivariate_normal([mu_x, mu_y], SIGMA)
    plt.contour(x, y, multinorm.pdf(pos),levels=np.logspace(-10,-1,10), cmap='RdBu_r',norm=colors.LogNorm())
    plt.hlines(v, -8,8, colors='k', linestyles='dashed', label = 'y=v')
    plt.setp(ax1.get_xticklabels(), visible=False)
    plt.ylabel('y')
    plt.legend()
    plt.axis([-10,10,-10,10])
    plt.axis('equal')
    plt.grid(True)

    #plot the marginal, p(x), and conditional, p(x|y=v) 
    ax2 = plt.subplot(223)
    
    h = plt.plot(x1,norm.pdf(x1,mu_,sigma_), label = 'p(x|y=v)')
    plt.vlines(mu_,0,norm.pdf(mu_,mu_,sigma_),color=h[0].get_color(),alpha=0.5)
    h = plt.plot(x1,norm.pdf(x1,mu_x,sigma_xx), label = 'p(x)');
    plt.vlines(mu_x,0,norm.pdf(mu_x,mu_x,sigma_xx),color=h[0].get_color(),alpha=0.5)
    
    plt.scatter(v,0,color='k',marker='o',label='v',zorder=10,s=80)
    plt.ylim(0,0.9)
    plt.xlabel('x')
    plt.ylabel('density')
    plt.legend()
    plt.grid(True)

    
    ax3 = plt.subplot(222)
    covs = np.linspace(-1,1,100)[1:-1]
    plt.plot(covs,covs*v)
    plt.scatter(cov,cov*v,color='k')
    plt.hlines(v,-1,1,color='k',linestyle='--')
    plt.xlabel(r'$\rho$')
    plt.ylabel('mean of $p(x|y=v)$')
    plt.axis([-1,1,-5,5])
    ax4 = plt.subplot(224)
    covs = np.linspace(-1,1,100)[1:-1]
    plt.plot(covs,np.sqrt(1-np.square(covs)))
    plt.scatter(cov,np.sqrt(1-np.square(cov)),color='k')
    plt.xlabel(r'${\rho}$')
    plt.ylabel('deviation of $p(x|y=v)$')
    plt.axis([-1,1,0,1.1])
    
interact(plotCondition, 
    v=FloatSlider(min = -5.0, max = 5.0, value = 0, continuous_update=False),
    cov = FloatSlider(min = -.9, max = .9, value = 0, continuous_update = False),

);  

interactive(children=(FloatSlider(value=0.0, continuous_update=False, description='v', max=5.0, min=-5.0), Flo…