# Introduction to Probability Distributions

A probability space is the tuple (Ω, ℱ, P) with
- a sample space Ω, such as {H, T} for coin flips (head or tail)
- an event space ℱ, such as {H, T, HH, HT} for coin flips
- a probability function P defined on the ℱ, such as 0.5 for H and T.

## Probability Mass Function(PMF)

The PMF of a random variable $X$ is
$$ p_X(x) = \begin{cases}
               P(X=x), &\ \text{if $x$ \in ℱ} \\
               0,      &\ \text{else}
            \end{cases}
$$

It satisfies:
1. $ \sum_{x \in ℱ}p_X(x) = 1 $
2. $ p_X(x) \geqslant 0 $,  for all $x$ in ℱ

## Probability Density Function(PDF)

The density function $f(x)$ of a probability space is the graph of ℱ - P

![discrete](discrete_pdf.png)

$f(x)$ satisfies
1. $ P(X \in A) = \int_A{f(x)}dx $, where $A \subseteq F$.
2. $ \int_F{f(x)}dx = 1$
3. $1 > f(x) > 0$ for $x \in F$ or $0$ for $x \notin F$

Another way to write $f(x)$ is
$$ P(a < X \leqslant b) = \int_a^b{f(x)dx $$
Intutively, the probability is the area under $f(x)$.

## Cumulative Distribution Function(CDF)

In [Wiki](https://en.wikipedia.org/wiki/Cumulative_distribution_function), the CDF function $F(x)$ is defined as
$$ F_X(x) = P(X \leqslant x) $$
This means
$$ P(a < X \leqslant b) = F_X(b) - F_X(a) $$
The PDF $f(x)$ can be expressed as
$$ f(x) = \frac{dF(x)}{dx} $$
and
$$ F_X(x) = \int_{-\infty}^x f(t)dt $$

$F(x)$ satisfies:
1. $F(x) \leqslant F(y)$ for $ -\infty < x < y < \infty $
2. $ \lim\limits_{x \to -\infty} F(x) = 0 $ and $ \lim\limits_{x \to \infty} F(x) = 1 $

## Complementary CDF(CCDF)

CCDF is defined as $ \overline{F}_X(x) = 1 - F_X(x) $. Sometimes it is called survivor function. It describes the tail distribution.

## Expectation and Variance

The expectation of an RV X is the weighted average with probabilities.
$$ \mu = \mathbb{E}[X] = \int_\mathbb{R}xf(x)dx $$

Property: $$ \mathbb{E}[aX + b] = a\mathbb{E}[X] + b $$

The Variance of an RV X is $ Var(X) = \mathbb{E}\Big[(X - \mathbb{E}[X])^2\Big] $

The Standard Deviation of an RV X is $ \sigma = std(X) = \sqrt{Var(X)} $

Properties:
1. $ Var(X) = \mathbb{E}[X^2] - \Big(\mathbb{E}[X]\Big)^2 $
2. $ Var(a) = 0 $ for constant $a$
3. $ Var(aX + b) = a^2Var(X) $
4. Markov inequality: $ P(X \geqslant a) \leqslant \frac{\mathbb{E}[X]}{a} $

### Covariance and Correlation
- https://en.wikipedia.org/wiki/Covariance
- https://en.wikipedia.org/wiki/Correlation

## Moments

### Definitions
The kth moment of an RV X is
$$ m_k = \mathbb{E}[X^n] = \int_{-\infty}^{\infty}x^kf(x)dx $$

The kth central moment(or moment around the mean) is
$$ \overline{m}_k = \mathbb{E}\Big[(X - \mu)^k\Big] = \int_{-\infty}^{\infty}(x - \mu)^kf(x)dx$$

The kth standard central moment is
$$ \tilde{m}_k = \frac{\overline{m}_k}{\sigma^k $$

### First 4 moments
- 1st moment is the expectation, 1st central moment and standard central moment is $0$. It measures the central value.
- 2nd moment is the variance/dispersion - small variance means more similarity, i.e., more close to mean.
- 3rd moment is the skewness - measure the asymmetry around the peak. It's often approximate by $(mean - median) / std$
    - positive - fat long tail
    - negative - thin long tail
- 4th moment is the kurtosis - measure the flatness
    - positive values - sharp peak
    - negative values - slow varying

More details:
- Explain moments: https://gregorygundersen.com/blog/2020/04/11/moments/
- https://stats.stackexchange.com/questions/123251/intuition-for-moments-about-the-mean-of-a-distribution
- https://stats.stackexchange.com/questions/17595/whats-so-moment-about-moments-of-a-probability-distribution
- https://mathoverflow.net/questions/3525/when-are-probability-distributions-completely-determined-by-their-moments

## Moment Generating Functions
- http://www.milefoot.com/math/stat/rv-moments.htm


## KL Divergence
Compare 2 distributions
- https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence


## References:
- https://en.wikipedia.org/wiki/Kernel_(statistics)
- https://en.wikipedia.org/wiki/Mathematical_Alphanumeric_Symbols