In [1]:
import numpy as np

# Probabilities

## Events

$$P(A\cup B) = P(A) + P(B) - P(A \cap B)$$
$$P(E|F) = \frac{P(E \cap F)}{P(F)}$$
$A$ and $B$ independants $\iff P(A \cap B) = P(A)P(B)$

### Bayes formula

$$P(E|F) = \frac{P(F|E)P(E)}{P(F)}$$

## Discrete random variable

$X$ discrete, can only take an enumerable number values.  

### Probability Mass Function (PMF)

$$p(a) = P\{X = a\}$$
$$\forall i: \space p(x_i) \geq 0$$
$$\sum_{i=1}^\infty p(x_i) = 1$$


### Cumlative Distribution Fuction (CDF)

$$F(a) = P\{X \leq a\} = \sum_{x_i \leq a} p(x_i)$$

## Continous Random Variable

$X$ conitnous variable, can take an infinite number of real values.

### Probability Density Function (PDF):

$$P\{X \in B\} = \int_{B} f(x)dx$$
$$P{a \leq X \leq a} = \int_a^b f(x)dx$$
$$P \{ X \in ]-\infty; +\infty[ \} = 1$$
$$P \{ X = a \} = 0$$

### Cumulative Distribution Function (CDF)

$$F(a) = P \{X < a \} = P \{ X \leq a \} = \int_{-\infty}^a f(x)dx$$
$$\frac{d}{da} F(a) = f(a)$$
$$P \{a \leq X \leq b \} = F(b) - F(a)$$

## Expectation

The expection of a random variable is the average value of this random variable.

$$\mathbb{E}[X] = \sum_{i=1}^n x_ip(x_i) \text{  (discrete variable)}$$
$$\mathbb{E}[X] = \int_{-\infty}^{+\infty} xf(x)dx \text{  (continous variable)}$$

The expecation of an expression $\mathbb{E}_{x \sim X}[g(x)]$ is the average value of $f$, when $x$ comes from the random variable $X$.

$$\mathbb{E}[g(X)] = \sum_{i=1}^n g(x_i)p(x_i) \text{  (discrete variable)}$$
$$\mathbb{E}[g(X)] = \int_{-\infty}^{+\infty} g(x)f(x)dx \text{  (continous variable)}$$

### Properties

$$\mathbb{E}[X+Y] = \mathbb{E}[X] + \mathbb{E}[Y]$$
$$\mathbb{E}[\alpha X] = \alpha \mathbb{E}[X], \space \alpha \in \mathbb{R}$$

## Variance

The variance is a mesure of how much the value of a random variable change from it's expected value.

$$\text{Var}(X) = \mathbb{E}[(X - \mathbb{E}[X])^2] = \mathbb{E}[X^2] - (\mathbb{E}[X])^2$$

$$\text{Var}(\alpha X + \beta) = \alpha^2 \text{Var}(X), \space \alpha, \beta \in \mathbb{R}$$

Standard deviation $\sigma(X)$:
$$\sigma(X) = \sqrt{\text{Var}(X)}$$

## Covariance

Covariance between 2 random variables $X$ and $Y$.
$$\text{cov}(X,Y) = \mathbb{E}[(X - \mathbb{E}[X])(Y - \mathbb{E}[Y])]$$
$$\text{cov}(X,Y) = \mathbb{E}[XY] - \mathbb{E}[X] \mathbb{E}[Y]$$


### Properties

$$\text{cov(X,X)} = \text{Var(X)}$$
$$\text{cov(X,Y)} = \text{cov(Y,X)}$$

## Discrete joint distributions

$$p(x,y) = P \{X=x, Y=y\}$$
$$p_X(x) = \sum_y p(x,y)$$
$$p_Y(y) = \sum_x p(x,y)$$

## Discrete continous distributions

$$P \{X \in A, Y \in B\} = \int_B \int_A f(x,y)dxdy$$
$$f(a, b) = \frac{\partial^2}{\partial a \partial b} F(a,b)$$
$$P \{X \in A\} = \int_A \int_{-\infty}^{+\infty} f(x,y)dydx$$
$$P \{Y \in B\} = \int_B \int_{-\infty}^{+\infty} f(x,y)dxdy$$

## Conditional distributions

$$P_{X|Y}(x|y) = P\{X = x | Y=y\} = \frac{p(x,y)}{p_Y(y)} \text{ (discrete variables)}$$
$$f_{X|Y}(x|y) = P\{X = x | Y=y\} = \frac{f(x,y)}{f_Y(y)} \text{ (continuous variables)}$$

## Common distribution

### Normal distribution (Gaussian)

$$X \sim \mathcal{N}(\mu, \sigma^2)$$
Parameters:
- $\mu$: mean
- $\sigma^2 \geq 0$: variance


$$\text{PDF: } f(x) = \frac{1}{\sqrt{2\pi \sigma^2}} \text{exp}(-\frac{(x - \mu)^2}{2\sigma^2})$$

$$\text{CDF: } F(x) = \frac{1}{2}[1 + \text{erf}(\frac{x - \mu}{\sigma \sqrt{2}})]$$
$$\mathbb{E}[X] = \mu$$
$$\text{Var}(X) = \sigma^2$$

$$ \text{erf}(x) = \frac{1}{\sqrt{\pi}} \int_{-x}^{x} e^{-t^2}dt$$

In [2]:
_box_muller = [None]
def norm_box_muller():    
    if _box_muller[0] is not None:
        res = _box_muller[0]
        _box_muller[0] = None
        return res
    
    u1, u2 = np.random.rand(2)
    r = np.sqrt(-2*np.log(u1))
    theta = 2*np.pi*u2
    x = r * np.cos(theta)
    y = r * np.sin(theta)
    _box_muller[0] = x
    return y

_marsagalia_polar = [None]
def norm_marsagalia_polar():
    if _marsagalia_polar[0] is not None:
        res = _marsagalia_polar[0]
        _marsagalia_polar[0] = None
        return res

    while True:
        x, y = 2 * np.random.rand(2) - 1
        s = x**2 + y**2
        if s < 1 and s>0:
            break
    
    f = np.sqrt((-2*np.log(s))/s)
    a, b = x*f, y*f
    _marsagalia_polar[0] = a
    return b
    

N = 1000000

x = np.random.randn(N) * 4.5 - 1.3
print('[NP]  mu =', np.mean(x))
print('[NP] std =', np.std(x))


x = np.empty(N)
for i in range(N): x[i] = 4.5 * norm_box_muller() - 1.3
print('[BM]  mu =', np.mean(x))
print('[BM] std =', np.std(x))

x = np.empty(N)
for i in range(N): x[i] = 4.5 * norm_marsagalia_polar() - 1.3
print('[MP]  mu =', np.mean(x))
print('[MP] std =', np.std(x))

[NP]  mu = -1.3009659425732243
[NP] std = 4.5022249987259455
[BM]  mu = -1.3036275867587594
[BM] std = 4.493453817736419
[MP]  mu = -1.304057230429059
[MP] std = 4.498215805410772


###  Binomial distribution

$$X \sim B(n, p)$$
Parameters:
- $n$: number of trials
- $p \in [0, 1]$: success probability for each trial.

$p(X = k)$: $k$: number of successes.

$$\text{PMF: } f(k) = \binom{n}{k} p^k(1-p)^{n-k}$$
$$\mathbb{E}[X] = np$$
$$\text{Var}(X) = np(1 - p)$$

$$\binom{n}{k} = \frac{n!}{k!(n-k)!}$$

### Multinomial distribution

Parameters:
- $n$: number of trials
- $p_i$: probability of event $i$: $\sum p_i = 1$, $p_i >= 0$

$X$ discrete vector of size $K$: $X_i$: number of realisations of the event $i$.

$$\text{PMF: } f(x) = \binom{n}{x_1\text{...} x_k} \prod_{i=1}^K p_i^{x_i}$$

$$\mathbb{E}[X_i] = np_i$$
$$\text{Var}(X_i) = np_i(1 - p_i)$$
$$\text{Cov}(X_i, X_j) = -np_ip_j \space (i \neq j)$$

$$\binom{n}{k_1 \text{...} k_m}= \frac{n!}{\prod_{i=1}^m k_i!}$$

In [3]:
def rand_multinomial(p):
    s = 0
    p2 = np.empty(len(p))
    for i in range(len(p)-1):
        s += p[i]
        p2[i] = s
    p2[-1] = 1
    
    u = np.random.rand()
    k = 0
    while u > p2[k]:
        k += 1
    return k

N = 1000000
x = np.empty(N).astype(np.int)
p = [0.1, 0.6, 0.3]
for i in range(N):
    x[i] = rand_multinomial(p)
    
print('p[0]:', np.mean(x==0))
print('p[1]:', np.mean(x==1))
print('p[2]:', np.mean(x==2))

p[0]: 0.100111
p[1]: 0.599824
p[2]: 0.300065


### Multivariate Normal distribution

$$X \sim \mathcal{N}(\mu, \Sigma)$$
Parameters:
- $\mu \in \mathbb{R}^p$: mean
- $\Sigma \in \mathbb{R}^{p*p}$: covariance matrix (positive semi-definite)

$$\text{PDF: } f(x) = ((2\pi)^{p} \text{det}(\Sigma))^{-\frac{1}{2}} \exp(-\frac{1}{2} (x - \mu)^T \Sigma^{-1}(x-\mu))$$

$$\mathbb{E}[X] = \mu$$
$$\text{Var}(X) = \Sigma$$

In [4]:

rmu = np.array([0.5, -1.2, 4.6])
rsig = np.array([[0.4, 1.2, -1.8],[2.5,-2.8,-1.9],[-1.4,6.7,2.5]])
rsig = rsig.T @ rsig
N = 1000000

print('mu =', rmu)
print('sig=')
print(rsig)

X = np.random.multivariate_normal(rmu, rsig, size=N, check_valid='raise')
mu = np.mean(X, axis=0)
sig = 1/N * (X - mu.reshape(1,3)).T @ (X - mu.reshape(1,3))
print('[NP] mu =', mu)
print('[NP] sig=')
print(sig)


def normal_multivariate(mu, sig, size):
    N = size
    p = len(mu)
    X = np.empty((N,p))
    d, V = np.linalg.eig(sig)
    Q = np.sqrt(d).reshape(1,p) * V 
    
    
    for i in range(N):
        xn = np.random.randn(p)
        X[i] = Q @ xn + mu
    return X
    
X = normal_multivariate(rmu, rsig, size=N)
mu = np.mean(X, axis=0)
sig = 1/N * (X - mu.reshape(1,3)).T @ (X - mu.reshape(1,3))
print('mu =', mu)
print('sig=')
print(sig)

mu = [ 0.5 -1.2  4.6]
sig=
[[  8.37 -15.9   -8.97]
 [-15.9   54.17  19.91]
 [ -8.97  19.91  13.1 ]]
[NP] mu = [ 0.50222592 -1.20955802  4.59724489]
[NP] sig=
[[  8.35834937 -15.88065631  -8.96131016]
 [-15.88065631  54.14404598  19.89265238]
 [ -8.96131016  19.89265238  13.09290894]]
mu = [ 0.50484791 -1.2159041   4.59343395]
sig=
[[  8.38059383 -15.94549354  -8.98644397]
 [-15.94549354  54.29586645  19.97139495]
 [ -8.98644397  19.97139495  13.1232424 ]]


### Beta distribution

$$X \sim \text{Beta}(\alpha, \beta)$$

Parameters:
- $\alpha \in \mathbb{R} > 0$
- $\beta \in \mathbb{R} > 0$

The parameter $x \in \mathbb{R}$ must bet in $[0,1]$

$$\text{PDF: } f(x) = \frac{x^{\alpha-1} (1-x)^{\beta - 1}}{B(\alpha,\beta)}$$

$$\text{where } B(\alpha,\beta) = \frac{\Gamma (\alpha) \Gamma(\beta)}{\Gamma (\alpha + \beta)}$$

$$\text{where } \Gamma(z) = \int_{0}^{+\infty} x^{z-1} e^{-x}dx$$  

$$E[X] = \frac{\alpha}{\alpha + \beta}$$
$$\text{Var}(X) = \frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)}$$

The beta distribution is the conjugate prior probability distribution of the bernoulli, bonomial, and geometric distributions.  
It is usually used to describe prior knowledge concerning the probability of success of an event.

### Dirichlet distribution

$$X \sim \text{Dir}(\alpha)$$

Parameters:
$\alpha \in \mathbb{R}^K$, $K \geq 2$, $\alpha_k > 0$

Input: $x \in \mathbb{R}^K$, with $x_k \in [0,1]$, and $\sum_{k=1}^Kx_k=1$

$$\text{PDF: } \frac{1}{B(\alpha)} \prod_{i=1}^K x_i^{\alpha_i-1}$$

$$\text{where } B(\alpha) = \frac{\prod_{i=1}^K \Gamma(\alpha_i)}{\Gamma(\sum_{i=1}^K\alpha_i)}$$

$$E[X_i] = \frac{\alpha_i}{\sum_{k=1}^K \alpha_k}$$
$$\text{Var}(X_i) = \frac{\alpha_i(\alpha_0 - \alpha_i)}{\alpha_0^2(\alpha_0 + 1)}$$
$$\text{where } \alpha_0 = \sum_{i=1}^K \alpha_i$$  

The dirichlet distribution is a multivariate generalization of the beta distribution.  
It's the conjugate prior probability distribution of the categorical and polynomial distribution.