In [2]:
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Chapter 2 - Probability

This chapter introduces probability theory (and the differences between frequentists and baysians), some common statistics and examples of discrete and continous distributions. It also presents transformation of variables, monte carlo methods and information theory.

## Rules of probability

### Sum rule
The probability of the conjunction of two events (or assertions) is given by: 

$$p(A\ or\ B) = p(A) + p(B) - p(A\ and\ B)$$

### Product rule
The probability of the event $A$ and $B$ is given by:

$$p(A\ and\ B) = p(A|B)p(B)$$ 

### Conditional
The probability of the event $A$ given that the event $B$ is true is given by:


$$p(A|B) = \frac{p(A\ and\ B)}{p(B)}$$

### Bayes rule
The Bayes rule is:

$$p(X|Y) = \frac{p(X)p(Y|X)}{p(Y)}$$

This can be derived from the sum and the product rule

## Independence

The events $A$ and $B$ are independent if:
    
$$p(A\ and\ B) = p(A)p(B)$$

Note that this is basically the product rule, with the condition that $p(A|B) = p(A)$.

## Continuous random variables

To deal with continuous random variables we define $F(x) = p(X\leq x)$. This means that:

$$p(a < X \leq b) = F(b) - F(a)$$

## Common Statistics

### Quantiles

The $\alpha$ quantile of a cdf $F$, denoted $F^{-1}(\alpha)$, is the value $x_\alpha$ such that

$$F(x_\alpha) = P(X \leq x_\alpha) = \alpha$$

The value $F^{-1}(0.5)$ is the **median** of the distribution.

### Mean

The **mean** or **expected value** of a discrete distribution, commonly denoted by $\mu$, is defined as

$$\mathbb{E}[X] = \sum_{x\in\mathcal{X}}x~p(x)$$

Whereas for a continuous distribution, the mean is defined as

$$\mathbb{E}[X] = \int_{\mathcal{X}}x~p(x)$$

### Variance

The **variance**, denoted by $\sigma^2$, is measure of the "spread" of a distribution, defined as

$$\sigma^2 = \mathrm{var}[X] = \mathbb{E}{(X-\mu)^2}$$

where $\mu = \mathbb{E}[X]$. A useful result is

$$\mathbb{E}[X^2] = \sigma^2 + \mu^2$$

The **standard deviation** is defined as $\mathrm{std}[X] = \sqrt{\sigma^2} = \sigma$

## Common discrete distributions

* Bernoulli:

$$\mathrm{Ber}(x~|~\theta) =
    \left\{
    \begin{array}{ll}
		\theta  & \mbox{if } x = 1 \\
		1-\theta & \mbox{if } x = 0
	\end{array}
    \right.
$$

 

* Binomial:

$$\mathrm{Bin}(k~|~n,\theta) = \binom{n}{k}\theta^k(1-\theta)^{n-k}$$

* Multinomial:

$$\mathrm{Mu}(x~|~n,\theta) = \binom{n}{x_1,...,x_K}\prod_{j=1}^{K}\theta_{j}^{x_j}$$

the **multinomial coeffiecient** is defined as

$$\binom{n}{x_1,...,x_K} = \frac{n!}{x_1! \dots x_K!}$$

* Poisson

$$\mathrm{Poi}(x~|~\lambda) = e^{-\lambda}\frac{\lambda^x}{x!}$$

* Empirical distribution

Given a dataset $\mathcal{D} = \{x_1, \dots, x_N\}$, the empirical distribution is defined as

$$p_{\mathrm{emp}}(A) = \frac{1}{N}\sum_{i=1}^{N}w_i \delta_{x_i}(A)$$

where $0\leq w_i \leq 1$ and $\sum w_i = 1$

## Common continuous distributions

* Normal
* Laplace
* Gamma
* Exponential
* Chi-squared
* Beta
* Pareto
* Student *t*
* Multivariate Normal
* Multivariate Student *t*
* Dirichlet

## Joint probability distributions

## Transformation of random variables

## Monte Carlo Methods

## Information Theory