original version: Jessica Hamrick and Tom Griffiths

----

In [None]:
import numpy as np
from matplotlib import  pyplot as plt
from IPython.html.widgets import interact
%matplotlib inline

## Continuous Probability and Probability Densities

So far we have talked about probability spaces in which we have a finite (or countably infinite) number of sample points. In real life, however, many of quantities that we wish to model are _real-valued_ -- examples include the time between earthquakes along the San Andreas fault, the location of a robot within an environment, and the angle of deflection of a photon on a sensor.  

### Continuous Random Variables

To begin, consider the discrete random variable $X$ which measures the probability that a robot is in a particular state. Let us further assume that the robot can only be in a single state at a time, and that there are only 6 total states the robot could possibly take. We can summarize the probability mass function for $X$ in the following table:

| state | P(X=state) |
|:-------:|:------------:|
|    1    |      1/6     |
|    2    |      1/12     |
|    3    |      1/6     |
|    4    |      1/3     |
|    5    |      1/12     |
|    6    |      1/6     |

We can also represent these probabilities visually:

In [None]:
labels = np.arange(1,7)
probs = [1./6, 1./12, 1./6, 1./3, 1./12, 1./6]

fig, ax = plt.subplots()
ax.bar(labels, probs, align='center');
ax.set_xticks(labels)
ax.set_xlabel('State');
ax.set_ylabel('P(X = state)');

Now, let's imagine that we decide to be a bit more precise in the way we
characterize the robot's states. Imagine that we make an extra distinction modes
$a$ and $b$ for each of the 6 states we plotted earlier (i.e., we now distinguish
between state $1a$ and state $1b$ where $P(X=1a) + P(X=1b)$ equals $P(X=1)$ from the first table). This doubles the size of our sample space:

| state | P(X=state) |
|:-----:|:----------:|
|   1a  |     1/6    |
|   1b  |      0     |
|   2a  |    1/24    |
|   2b  |    1/24    |
|   3a  |    1/24    |
|   3b  |     1/8    |
|   4a  |     1/9    |
|   4b  |     2/9    |
|   5a  |    1/12    |
|   5b  |      0     |
|   6a  |    1/18    |
|   6b  |     1/9    |

Visually, this looks like:

In [None]:
locs = np.arange(1,13)
labels = ['1a', '1b', '2a', '2b', '3a', '3b', '4a', '4b', '5a', '5b', '6a', '6b'] 
probs = [1./6, 0., 1./24, 1./24, 1./24, 1./8, 1./9, 2./9, 1./12, 0., 1./18, 1./9]

fig, ax = plt.subplots()
ax.bar(locs, probs, align='center');
ax.set_xticks(locs)
ax.set_xticklabels(labels);
ax.set_xlabel('State');
ax.set_ylabel('P(X = state)');

In comparing Table/Plot 1 to Table/Plot 2, notice that the probability of the robot being in any particular state has either remained the same or decreased. 

For a more extreme example, imagine a uniform distribution over a discrete sample space. When there is only a single point in the sample space, the probability of that point must necessarily be 1. However, when we add a new point to the space, we must divide the probability mass between the original sample point and the new point; if we divide it equally, each sample point will now have a probability of $0.5$. As we continue adding sample points to the space, we can see that the probability of any individual sample point gets smaller and smaller. You can visualize this with the following slider:

In [None]:
@interact
def plot_pmf(num_sample_points=(1, 50, 1)):
    probs = np.ones(num_sample_points) * (1./num_sample_points)
    locs = np.arange(num_sample_points) + 1
    
    fig, ax = plt.subplots()
    ax.bar(locs, probs, align='center');
    ax.set_xticks(locs);
    ax.set_ylim([0, 1.]);
    ax.set_xlabel('x');
    ax.set_ylabel('P(X = x)');

This is an important insight:

> As we increase the number of points in our sample space, the probability associated with each individual point shrinks

We can extrapolate this notion to the case where there are an (uncountably) infinite number of states in our sample space. In this case, the probability of being in any single state shrinks to 0. State spaces of this type are called continuous spaces. The random variables that operate on these state spaces are called (surprise!) continuous random variables.

### Continuous Random Variables

In contrast to discrete random variables, a continuous random variable can take any of an (uncountably) infinite number of values. This infinite sample space complicates things somewhat. 
1. Whereas we could express their probability distributions for discrete random variables in the form of a table, we can no longer do so for continuous random variables since this would entail creating a table with an uncountably infinite number of values!
2. The probability that a continuous r.v. takes any _specific_ value is 0. Instead, we have to talk about the probability that a continuous random variable will fall within a _range_ of values. To this end, we call the probability functions for continuous random variables probability _density_ functions.


### Probability Density Functions
Unlike the probability mass functions for discrete random variables, the probability distributions for continuous random variables will be _smooth_, since they are defined over a continuous sample space. One example of a continuous probability function is the univariate **Gaussian (normal) distribution**:

$$f(x) = \frac{1}{\sqrt{2 \pi} \sigma} \exp \left\{ -\frac{(x-\mu)^2}{2\sigma^2}\right\} $$

Below, we plot the univariate normal distribuion with mean $\mu = 0$ and standard deviation $\sigma =1$.

In [None]:
def gaussian(x, mu, sigma): 
    """
    Evaluate and return the Gaussian function with mean mu
    and standard deviation sigma at the values in x.
    """
    return np.exp(-( (x - mu) ** 2) / (2 * (sigma ** 2))) / (np.sqrt(2 * np.pi) * sigma)

In [None]:
# plot a normal distribution with mean 0 and standard deviation 1
x_vals = np.linspace(-3, 3, 120)
plt.plot(x_vals, gaussian(x_vals, 0, 1));
plt.xlabel('x');
plt.ylabel('P(x)');

Imagine a new continuous random variable $X$ whose values follow a normal distribution with mean 0 and standard deviation 1. We would like to calculate the probability that $X$ falls between the values 0 and 1. That is, we wish to calculate $P(0 \leq x \leq 1)$. Graphically, this corresponds to the shaded region in the plot below:

In [None]:
# display the area under the curve representing P(0 <= x <= 1)
x_vals = np.linspace(-3, 3, 120)
plt.plot(x_vals, gaussian(x_vals, 0, 1));
plt.xlabel('x');
plt.ylabel('P(x)');
section = np.arange(0, 1, 1./20)
plt.fill_between(section, gaussian(section, 0, 1));

In order to compute this value, we must integrate our Gaussian equation over the interval $[0, 1]$. That is,

$$P(0 \leq x \leq 1) = \int_0^1 \frac{1}{\sqrt{2 \pi} \sigma} \exp \left\{ -\frac{(x-\mu)^2}{2\sigma^2}\right\}\ dx$$

where $\sigma = 1$ and $\mu = 0$. This is approximately equal to:

In [None]:
# computes P(0 < x <= 1) for a standard normal r.v. X
from scipy.stats import norm
area = norm.cdf(1) - norm.cdf(0)
print('P(0 < x <= 1): {}'.format(area))

Above we use the _cumulative distribution function_ (CDF) for the the standard normal distribution to approximate this integral. For the purposes of this class, however, you will not be asked to evaluate any of these functions yourself (lucky you!). For those interested, have a look at the [error function](https://en.wikipedia.org/wiki/Error_function), which is quite helpful for integrating the Gaussian.