# Gaussians

_A Visual Intro to ML — Chapter 2_

## Part 0: `python` preamble

In [28]:
from util import style
style()
from bokeh.plotting import gridplot, figure, show
import numpy as np

## Part 1: Introduction — $\mathcal{N}$
When you first saw the equation for the probability density function (PDF) of a Gaussian, were you taken aback by the number of symbols in it? I know I was. A Gaussian $\mathcal{N}(\mu, \sigma^2)$ we has the following PDF:

$$ f(x; \mu, \sigma^2) = \frac{1}{\sqrt{2\sigma^2\pi}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}$$

Like, what the heck is that? Well, let's try to break it down.

## Part 2: Simplification — $Z$

Let's first consider a standard normal distrubtion $Z$, which has mean $\mu = 0$ and variance $\sigma^2 = 1$. That gives us a simpler PDF:

$$
f(x; \mu=0, \sigma^2=1)
= \frac{1}{\sqrt{2(1)^2\pi}}e^{-\frac{(x-0)^2}{2(1)^2}}
= \frac{1}{\sqrt{2\pi}}e^{-\frac{1}{2}x^2}
$$

Plotted, it looks like this:

In [27]:
x = np.arange(-4, 4, 0.01)
y = 1/np.sqrt(2*np.pi)*np.exp(-0.5*x**2)
p = figure(title='Standard normal curve', x_axis_label='x', y_axis_label='f(x)', y_range=(0,0.5), height=350)
p.line(x, y, legend='f(x)')
show(p);

Don't worry, we'll come back to different means and variances later. For now, let's just try to get to 

$$
\frac{1}{\sqrt{2\pi}}e^{-\frac{1}{2}x^2}
$$

## Part 3: Let's start with $e^x$

I think at its core, the motivation behind a Gaussian distribution is: "How can we base a distribution around $e$?" I'm neither a mathematician nor a math historian, so what I think doesn't really matter. But humor me and let's follow that line of thought.

First, let's consider what $e^x$ looks like:

In [3]:
x = np.arange(-3, 3, 0.01)
y = np.exp(x)
p = figure(title='e^x', x_axis_label='x', y_axis_label='e^x', y_range=(0,20), height=350)
p.line(x, y, legend='e^x')
show(p);

Very pretty. Recall how exponents work: $e^x$ asymptotically approahces $0$ as $x \rightarrow -\infty$, but it's always positive for any value of $x$. Think about $e^x$ for $x = 1000$ and $x = -1000$:

$$e^{1000} = \text{very large}\\
e^{-1000} = \frac{1}{e^{1000}} = \text{very small (but still > 0)}
$$

This really isn't anything terribly special. $e \approx 2.7182...$, and we can make other curves that looks like this with $2^x$, $3^x$, etc:

In [4]:
x = np.arange(-3, 3, 0.01)
p = figure(
    title='a^x for various values of a',
    x_axis_label='x',
    y_axis_label='a^x',
    y_range=(0,20),
    height=350)
p.line(x, np.power(1.5, x), legend='1.5^x', color='darkgrey')
p.line(x, np.power(2, x), legend='2^x', color='navy')
p.line(x, np.power(2.5, x), legend='2.5^x', color='olive')
p.line(x, np.power(np.e, x), legend='e^x', color='firebrick')
p.line(x, np.power(3, x), legend='3^x', color='deeppink')
p.legend.location = 'top_left'
show(p);

But for now, we'll use $e^x$. We kind of have to, as that's what the Gaussian distribution uses. Plus $e$ is pretty cool; Wikipedia has a whole [series of articles](https://en.wikipedia.org/wiki/E_(mathematical_constant) on $e$. (Don't go read them, just look, be in awe of the volume of information, and come right back, please.)

## Part 4: $e^x$, $e^{-x}$, $e^{|x|}$, $e^{-|x|}$

Let's start playing around with $e^x$. For example, can you guess what happens if we plot $e^{-x}$?

In [38]:
x = np.arange(-3, 3, 0.01)
p = figure(title='e^x and e^-x', x_axis_label='x', y_axis_label='f(x)', y_range=(0,4), height=350)
p.line(x, np.exp(x), legend='e^x')
p.line(x, np.exp(-x), legend='e^-x', color='firebrick')
show(p);

We can think of this like taking the negative of the input: it mirros the curve across an imaginary vertical line at $x = 0$.

We can grab the upper part of the plot by looking at only positive values of $x$ (i.e. plotting $e^{|x|}$), or grab the bottom part by looking at only the negative values of $x$ (i.e. plotting $e{-|x|}$). Let's do that now:

In [29]:
ab = figure(title='e^|x|', x_axis_label='x', y_axis_label='e^|x|', y_range=(0,4), width=300, height=350)
abn = figure(title='e^-|x|', x_axis_label='x', y_axis_label='e^-|x|', y_range=(0,4), width=300, height=350)
ab.line(x, np.exp(np.abs(x)), legend='e^|x|')
abn.line(x, np.exp(-np.abs(x)), legend='e^-|x|')
p = gridplot([[ab, abn]])
show(p);

The one on the right, $e^{-|x|}$, looks almost like a bell curve, but it's too pointy. How do we smooth it out?

## Smoothness with $x^2$

Remember what the graph of $y = x^2$ looks like?

In [32]:
p = figure(title='x^2', x_axis_label='x', y_axis_label='x^2', y_range=(-0.5,2), height=350)
p.line(x, x**2, legend='x^2')
show(p);

So here's an idea: what if instead of plotting $e^{|x|}$ and $e^{-|x|}$, we plotted $e^{x^2}$ and $e^{-x^2}$?

In [36]:
x1 = figure(title='e^|x| and e^-|x|', x_axis_label='x', y_axis_label='f(x)', y_range=(0,4), width=300, height=350)
x1.line(x, np.exp(np.abs(x)), legend='e^|x|', color='firebrick')
x1.line(x, np.exp(-np.abs(x)), legend='e^-|x|')
x2 = figure(title='e^(x^2) and e^(-x^2)', x_axis_label='x', y_axis_label='f(x)', y_range=(0,4), width=300, height=350)
x2.line(x, np.exp(x**2), legend='e^(x^2)', color='firebrick')
x2.line(x, np.exp(-x**2), legend='e^(-x^2)')

p = gridplot([[x1, x2]])
show(p);

The blue curve on the bottom of the right graph, which is $e^{-x^2}$, looks just like a bell curve! Let's plot it a bit bigger because I'm so excited:

In [37]:
p = figure(title='e^-(x^2)', x_axis_label='x', y_axis_label='e^-(x^2)', y_range=(0,1.1), height=350)
p.line(x, np.exp(-x**2), legend='e^-(x^2)')
show(p);

**Holy cow!** That basically looks like a gaussian! It turns out all we need to get the right basic shape is:

$$e^{-x^2}$$

The rest of it will just be massaging this to get the properties we want.

## Massaging: Adding mass
TODO: do integral to motivate next few mutations

$e^{-\frac{1}{2}x^2}$

In [39]:
p = figure(title='e^(-1/2 x^2)', x_axis_label='x', y_axis_label='e^(-1/2 x^2)', y_range=(0,1.1), height=350)
p.line(x, np.exp(-0.5*x**2), legend='e^(-1/2 x^2)')
show(p);

TODO: show several graphs of how different -1/a factors smooths out the mass without changing the height of the peak

## Squashing peak with constant factors

$\frac{1}{\sqrt{2\pi}}e^{-\frac{1}{2}x^2}$

TODO: plot several different squashings with constant factors

In [12]:
x = np.arange(-4, 4, 0.01)
y = 1/np.sqrt(2*np.pi)*np.exp(-0.5*x**2)
p = figure(
    title='Standard normal curve',
    x_axis_label='x',
    y_axis_label='f(x)',
    y_range=(0,1.1),
    height=350)
p.line(x, y, legend='f(x)')
show(p);

## Shifting the mean

$$\frac{1}{\sqrt{2\pi}}e^{-\frac{1}{2}(x-\mu)^2} \text{ with } \mu \in \{-2, 0, 2\}$$

In [13]:
x = np.arange(-4, 4, 0.01)
m0 = 1/np.sqrt(2*np.pi)*np.exp(-0.5*x**2)
m2 = 1/np.sqrt(2*np.pi)*np.exp(-0.5*(x-2)**2)
mn2 = 1/np.sqrt(2*np.pi)*np.exp(-0.5*(x+2)**2)
p = figure(
    title='Normal curves with different means',
    x_axis_label='x',
    y_axis_label='f(x)',
    y_range=(0,1.1),
    height=350)
p.line(x, m0, legend='mean = 0')
p.line(x, m2, legend='mean = 2', color='firebrick')
p.line(x, mn2, legend='mean = -2', color='olive')
show(p);

## Changing the variance

TODO: different variances