In [None]:
import resources.workspace as ws
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
plt.ion();

$
% START OF MACRO DEF
% DO NOT EDIT IN INDIVIDUAL NOTEBOOKS, BUT IN macros.py
%
\newcommand{\Reals}{\mathbb{R}}
\newcommand{\Expect}[0]{\mathbb{E}}
\newcommand{\NormDist}{\mathcal{N}}
%
\newcommand{\DynMod}[0]{\mathscr{M}}
\newcommand{\ObsMod}[0]{\mathscr{H}}
%
\newcommand{\mat}[1]{{\mathbf{{#1}}}}
%\newcommand{\mat}[1]{{\pmb{\mathsf{#1}}}}
\newcommand{\bvec}[1]{{\mathbf{#1}}}
%
\newcommand{\trsign}{{\mathsf{T}}}
\newcommand{\tr}{^{\trsign}}
\newcommand{\tn}[1]{#1}
\newcommand{\ceq}[0]{\mathrel{≔}}
%
\newcommand{\I}[0]{\mat{I}}
\newcommand{\K}[0]{\mat{K}}
\newcommand{\bP}[0]{\mat{P}}
\newcommand{\bH}[0]{\mat{H}}
\newcommand{\bF}[0]{\mat{F}}
\newcommand{\R}[0]{\mat{R}}
\newcommand{\Q}[0]{\mat{Q}}
\newcommand{\B}[0]{\mat{B}}
\newcommand{\C}[0]{\mat{C}}
\newcommand{\Ri}[0]{\R^{-1}}
\newcommand{\Bi}[0]{\B^{-1}}
\newcommand{\X}[0]{\mat{X}}
\newcommand{\A}[0]{\mat{A}}
\newcommand{\Y}[0]{\mat{Y}}
\newcommand{\E}[0]{\mat{E}}
\newcommand{\U}[0]{\mat{U}}
\newcommand{\V}[0]{\mat{V}}
%
\newcommand{\x}[0]{\bvec{x}}
\newcommand{\y}[0]{\bvec{y}}
\newcommand{\z}[0]{\bvec{z}}
\newcommand{\q}[0]{\bvec{q}}
\newcommand{\br}[0]{\bvec{r}}
\newcommand{\bb}[0]{\bvec{b}}
%
\newcommand{\bx}[0]{\bvec{\bar{x}}}
\newcommand{\by}[0]{\bvec{\bar{y}}}
\newcommand{\barB}[0]{\mat{\bar{B}}}
\newcommand{\barP}[0]{\mat{\bar{P}}}
\newcommand{\barC}[0]{\mat{\bar{C}}}
\newcommand{\barK}[0]{\mat{\bar{K}}}
%
\newcommand{\D}[0]{\mat{D}}
\newcommand{\Dobs}[0]{\mat{D}_{\text{obs}}}
\newcommand{\Dmod}[0]{\mat{D}_{\text{obs}}}
%
\newcommand{\ones}[0]{\bvec{1}}
\newcommand{\AN}[0]{\big( \I_N - \ones \ones\tr / N \big)}
%
% END OF MACRO DEF
$

Before discussing sequential (time-dependent) inference,
we need to know how to estimate unknowns a single data/observations (vector),
But before discussing *Bayes' rule*,
we should review the most useful of probability distributions, namely
# The Gaussian (Normal) distribution

Consider the Gaussian random variable $x \sim \mathcal{N}(\mu, \sigma^2)$.

Equivalently, we may write
$\begin{align}
p(x) = \mathcal{N}(x \mid \mu, \sigma^2)
\end{align}$
for its probability density function (**pdf**), which is given by
$$\begin{align}
\mathcal{N}(x \mid \mu, \sigma^2) = (2 \pi \sigma^2)^{-1/2} e^{-(x-\mu)^2/2 \sigma^2} \, , \tag{G1}
\end{align}$$
for $x \in (-\infty, +\infty)$.

**Exc 2.2:** Code it up (complete the code below)! Hints:
* Note that `**` is the exponentiation/power operator
* $e^x$ is available as `np.exp(x)`

In [None]:
def pdf_G1(x, mu, sigma2):
    "Univariate (scalar), Gaussian pdf"
    ### INSERT ANSWER HERE ###
    return pdf_values

In [None]:
# ws.show_answer('pdf_G1')

Computers generally represent functions numerically by their values on a grid
of points (nodes), an approach called "discretisation".

In [None]:
bounds = -20, 20
N = 201                         # num of grid points
grid1d = np.linspace(*bounds,N) # grid
dx = grid1d[1] - grid1d[0]      # grid spacing

**Exc 2.3:** The following code plots the Gaussian pdf you implemented above.
Play around with `mu` and `sigma2` to answer these questions:
 * How does the pdf curve change when `mu` changes?
 * How does the pdf curve change when you increase `sigma2`?
 * In a few words, describe the shape of the Gaussian pdf curve. Does this ring a bell for you? *Hint: it should be clear as a bell!*

In [None]:
values = []
@ws.interact(mu=bounds, sigma2=(1, 100), nRemembr=range(12))
def plot_pdf_G1(mu=0, sigma2=25, nRemembr=1):
    evalud = pdf_G1(grid1d, mu, sigma2)
    global values
    values = [evalud, *values[:nRemembr]]
    colors = plt.get_cmap('jet')(np.linspace(0, 1, nRemembr))
    plt.figure(figsize=(6, 2))
    for line, c in zip(values, colors):
        plt.plot(grid1d, line, c=c)
    plt.xlim(*bounds)
    plt.ylim(0, .2)
    plt.show()

**Exc 2.4 (optional):** Recall the definition of the expectation (with respect to $p(x)$), namely
$$\Expect [f(x)] \mathrel{≔} \int  f(x) \, p(x) \, d x \,,$$
where the integral is over the whole domain of $x$.  
Recall $p(x) = \mathcal{N}(x \mid \mu, \sigma^2)$ from eqn (G1).  
Use pen, paper, and calculus to show that
 - (i) the first parameter, $\mu$, indicates its mean, i.e. that $$\mu = \Expect[x] \,.$$
   *Hint: you can rely on the result of (iii)*
 - (ii) the second parameter, $\sigma^2>0$, indicates its variance,
   i.e. that $$\sigma^2 = \mathbb{Var}(x) \mathrel{≔} \Expect[(x-\mu)^2] \,.$$
   *Hint: use $x^2 = x x$ to enable integration by parts.*
 - (iii) $E[1] = 1$.  
   *Hint: Neither Bernouilli and Laplace managed this,
   until Gauss did it by focusing on $(E[1])^2$.
   For more help, watch [3Blue1Brown](https://www.youtube.com/watch?v=cy8r7WSuT1I&t=3m52s).*

In [None]:
# ws.show_answer('Gauss integrals')

**Exc 2.5:** Recall $p(x) = \mathcal{N}(x \mid \mu, \sigma^2)$ from eqn (G1).  
Use pen, paper, and calculus to answer the following questions,  
which derive some helpful mnemonics about the distribution.

 * (i) Find $x$ such that $p(x) = 0$.
 * (ii) Where is the location of the mode (maximum) of the distribution?  
    I.e. find $x$ such that $\frac{d p}{d x}(x) = 0$.  
    *Hint: it's easier to analyse $\log p(x)$ rather than $p(x)$ itself.*
 * (iii) Where is the inflection point? I.e. where $\frac{d^2 p}{d x^2}(x) = 0$.
 * (iv) Some forms of "sensitivity analysis" (a basic form of uncertainty quantification) consist in evaluating $\frac{d^2 p}{d x^2}(x)$ at the mode.
Explain this by reference to the Gaussian shape.
*Hint: calculate and interpret $\frac{d^2 p}{d x^2}(\mu)$*

### The multivariate (i.e. vector) case
Here's the pdf of the *multivariate* Gaussian (for any dimension $\ge 1$):
$$\begin{align}
\NormDist(\x \mid  \mathbf{\mu}, \mathbf{\Sigma})
&=
|2 \pi \mathbf{\Sigma}|^{-1/2} \, \exp\Big(-\frac{1}{2}\|\x-\mathbf{\mu}\|^2_\mathbf{\Sigma} \Big) \, , \tag{GM}
\end{align}$$
where $|.|$ represents the matrix determinant,  
and $\|.\|_\mathbf{W}$ represents the norm with weighting: $\|\x\|^2_\mathbf{W} = \x^T \mathbf{W}^{-1} \x$.  
In this multivariate case, $\mathbf{\Sigma}$ is called the *covariance* (matrix).

The following implements this pdf. Take a moment to digest the code, but don't worry if you don't understand it all. Hints:
 * `@` produces matrix multiplication (`*` in `Matlab`);
 * `*` produces array multiplication (`.*` in `Matlab`);
 * `axis=-1` makes `np.sum()` work along the last dimension of an ND-array.

In [None]:
from numpy.linalg import det, inv

def weighted_norm22(points, W):
    "Computes the norm of each vector (row in `points`), weighted by `W`."
    return np.sum( (points @ inv(W)) * points, axis=-1)

def pdf_GM(points, mu, Sigma):
    "pdf -- Gaussian, Multivariate: N(x | mu, Sigma) for each x in `points`."
    c = np.sqrt(det(2*np.pi*Sigma))
    return 1/c * np.exp(-0.5*weighted_norm22(points - mu, Sigma))

The following code plots the pdf as contour (iso-density) curves.

In [None]:
grid2d = np.dstack(np.meshgrid(grid1d, grid1d))

@ws.interact(corr=(-1, 1, .05), std_x=(1e-5, 10, 1))
def plot_Gaussian_contours(corr=0.7, std_x=1):
    # Form covariance matrix (C) from input and some constants
    var_x = std_x**2
    var_y = 1
    cv_xy = np.sqrt(var_x * var_y) * corr
    C = 25 * np.array([[var_x, cv_xy],
                       [cv_xy, var_y]])
    # Evaluate (compute)
    density_values = pdf_GM(grid2d, mu=0, Sigma=C)
    # Plot
    plt.figure(figsize=(4, 4))
    height = 1/np.sqrt(det(2*np.pi*C))
    plt.contour(grid1d, grid1d, density_values,
               levels=np.linspace(1e-4, height, 11))
    plt.axis('equal');
    plt.show()

**Exc 2.7:** How do the contours look? Try to understand why. Cases:
 * (a) correlation=0.
 * (b) correlation=0.99.
 * (c) correlation=0.5. (Note that we've used `plt.axis('equal')`).
 * (d) correlation=0.5, but with non-equal variances.

**Exc 2.8:** Play the [correlation game](http://guessthecorrelation.com/) (doesn't work right in Chrome) until you get a score (shown as gold coins) of 5 or more.

**Exc 2.9:**
* What's the difference between correlation and covariance?
* What's the difference between correlation (or covariance) and dependence?  
  *Hint: consider this [image](https://en.wikipedia.org/wiki/Pearson_correlation_coefficient#/media/File:Correlation_examples2.svg)*
* Does correlation imply causation?
* Can you use correlation to in making predictions?

**Exc 2.30 (optional):** Why are we so fond of the Gaussian assumption?

In [None]:
# ws.show_answer('Why Gaussian')

### Next: [Bayesian inference](T3%20-%20Bayesian%20inference.ipynb)