In [1]:
%matplotlib inline
%config InlineBackend.figure_format = "retina"

# Probability theory

**Note that this lecture is currently incomplete; if you come back to the page and this message is gone, you can consider it complete.**

## Intro

### Recap of last time

In the [Introductory lecture](./intro), we X

what we'll covere here: **probability theory**.

### Today: Probability theory

abc

Before jumping in further, let's re-load that same dataset, and while we're at it import the needed packages we'll need to make nice plots:

In [2]:
import xarray as xr

filepath_in = "../data/central-park-station-data.nc"
ds_central_park = xr.open_dataset(filepath_in)
precip_central_park = ds_central_park["precip"]
temp_central_park = ds_central_park["temp_avg"]

In [3]:
# First, import the matplotlib package that we'll use for plotting.
from matplotlib import pyplot as plt

# Then update the plotting aesthetics using my own custom package named "puffins"
# See: https://github.com/spencerahill/puffins
from puffins import plotting as pplt
plt.rcParams.update(pplt.plt_rc_params_custom)

## The 3 axioms

Recall the 3 axioms:

1. **Non-negativity**: the probability $P$ of any event $E$ is at least zero: $P(E)\geq0$
2. **X**: the probability $P$ of the sample space $S$ is unity: $P(S)=1$.
3. **Additivity**: For two mutually exclusive events $E_1$ and $E_2$, the probability of their union $E_1\cup E_2$ is the sum of their individual probabilities: $P(E_1\cup E_2)=P(E_1)+P(E_2)$

In class, we used these to formally prove a couple things:

- All probabilities are bounded by 0 and 1: for any event $E$, $0\leq P(E) \leq 1$.
- Probability of the **complement**: If $E$ is an event and $E^C$ is its complement, then $P(E^C)=1-P(E)$

Use these to prove the following: $P((E_1\cup E_2)^C)=1-P(E_1\cup E_2)=1-[P(E_1)+P(E_2)-P(E_1\cap E_2)]$

## Observations, random variables

We consider each **observation**---that is, each inidividual value in any dataset, to be the result of an experiment performed on nature.

A **random variable** is a function that maps the outcome of any observation to a real number.

## Discrete vs. continuous variables

A **discrete** variable is one that can only take on a finite number of values.  A **continuous** variable is one that can take on infinitely many values.

In practice, often physical quantities that are in reality continuous, like temperature, end up as effectively discrete, because they are only reported up to a finite precision.  But unless this precision is quite coarse, we can usually still usefully treat them as if they really were continuous.

## Probability distributions

### Probability mass and density functions

For discrete random variables, the probability mass function specifies the probability of every possible outcome of that variable.  For example, the probability mass function of a fair 6-sided dice would be 1/6 for each of the 6 faces, since they're all equally likely.  

Notice in this dice roll case that the probability mass function summed over all possible up to exactly one...that is true for all probability mass functions.

For *continuous* random variables, 

### Cumulative distribution functions

The **cumulative distribution function** (CDF) of a random variable---whether continuous or discrete---gives the probability for each possible value that the variable is less than or equal to that value.  In other words, for each value $x$, it gives the corresponding **quantile**.  As such, it always ranges from 0 (for values less than the variable's minimum value, or for $-\infty$ if there is no minimum value) to 1 (for values greater than the variable's maximum value, or for $+\infty$ if there is no maximum value).

For discrete variables, the CDF is the sum of the probability mass function over all values less than or equal to the given value: 
$$F(x_j)=\sum_{i=1}^j p(x_i),$$
where $x_j$ is the value of interest, $f(x)$ is the probability mass function, and the values of $x$ are assumed to be ordered from the smallest value $x_0$ to their largest value $x_N$.

For continuous variables, the CDF is the *integral* of the probability density function: 
$$F(x)=P(X\leq x)=\int_{-\infty}^xp(u)\,\mathrm{d}u.$$



## Expectation and population mean

Conceptually, **expectation** (also known as "expected value") is simply a probability-weighted average over a random variable.  

For a discrete variable, this is
$$E[g(X)]=\sum_{i=1}^Ng(X_i)p_i,$$
where $g(X)$ is some function.

For a continuous variable, the expectation is
$$E[g(X)]=\int_{-\infty}^{\infty}g(x)p(x)\,\mathrm{d}x.$$

## Theoretical distributions

### Discrete

#### Uniform

#### Binomial

### Continuous

#### Normal ("Gaussian")

The **normal distribution** is crucially important.  Its probability density is given by
$$p(x)=\frac{1}{\sqrt{2\pi}}\frac{1}{\sigma}\exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right),$$
where
- $\mu$ is the mean
- $\sigma$ is the standard deviation

If $\mu=0$ and $\sigma=1$, the resulting distribution is called the **standard normal**:
$$p(x)=\frac{1}{\sqrt{2\pi}}\exp\left(-\frac{x^2}{2}\right).$$

## Central Limit Theorem

Conceptually / in essence: the sum of random variables tends to be Gaussian, whether or not the variable themselves are Gaussian.

Formally:

Let $X_1$, ..., $X_N$ be independent and identically distributed ("IID") random variables, all with identical mean $\mu_X$ and identical (finite) variance $\sigma_X$.  *Note that, while they must be IID, their distribution does **not** have to be the Gaussian.*  Then the random variable
$$Z=\frac{\hat\mu_X-\mu_X}{\sigma_X/\sqrt{N}}$$
converges to the standard normal distribution as $N\rightarrow\infty$.

## Conclusions

## Supplementary Materials