In [1]:
import sys
sys.path.append('..')

import math
import numpy as np

import metrics

# Discrete random variable

$X$ discrete, can only take an enumerable number values.  

## Probability Mass Function (PMF)

$$p(a) = P\{X = a\}$$
$$\forall i: \space p(x_i) \geq 0$$
$$\sum_{i=1}^\infty p(x_i) = 1$$


## Cumlative Distribution Fuction (CDF)

$$F(a) = P\{X \leq a\} = \sum_{x_i \leq a} p(x_i)$$

# Continous Random Variable

$X$ conitnous variable, can take an infinite number of real values.

## Probability Density Function (PDF):

$$P\{X \in B\} = \int_{B} f(x)dx$$
$$P{a \leq X \leq a} = \int_a^b f(x)dx$$
$$P \{ X \in ]-\infty; +\infty[ \} = 1$$
$$P \{ X = a \} = 0$$

## Cumulative Distribution Function (CDF)

$$F(a) = P \{X < a \} = P \{ X \leq a \} = \int_{-\infty}^a f(x)dx$$
$$\frac{d}{da} F(a) = f(a)$$
$$P \{a \leq X \leq b \} = F(b) - F(a)$$

# Joint distributions

## Discrete joint distributions

$$p(x,y) = P \{X=x, Y=y\}$$
$$p_X(x) = \sum_y p(x,y)$$
$$p_Y(y) = \sum_x p(x,y)$$

## Continuous joint distributions

$$P \{X \in A, Y \in B\} = \int_B \int_A f(x,y)dxdy$$
$$f(a, b) = \frac{\partial^2}{\partial a \partial b} F(a,b)$$
$$P \{X \in A\} = \int_A \int_{-\infty}^{+\infty} f(x,y)dydx$$
$$P \{Y \in B\} = \int_B \int_{-\infty}^{+\infty} f(x,y)dxdy$$

# Conditional distributions

$$P_{X|Y}(x|y) = P\{X = x | Y=y\} = \frac{p(x,y)}{p_Y(y)} \text{ (discrete variables)}$$
$$f_{X|Y}(x|y) = P\{X = x | Y=y\} = \frac{f(x,y)}{f_Y(y)} \text{ (continuous variables)}$$

# Dependance

Two random variables are independant if the value of one does not affect the probability distribution of the other.  

Thwo discrete random variables $X$ and $Y$ are independant if and only if:
$$p_{X,Y}(x,y) = p_X(x)p_Y(y) \space \forall x,y$$

Thwo continuous random variables $X$ and $Y$ are independant if and only if:
$$F_{X,Y}(x,y) = F_X(x)F_Y(y) \space \forall x,y$$

The same condition holds with the probability density $f$.  

If two random variables are not indepandant, they are called depandant.

# Expectation

The expection of a random variable is the average value of this random variable.

$$\mathbb{E}[X] = \sum_{i=1}^n x_ip(x_i) \text{  (discrete variable)}$$
$$\mathbb{E}[X] = \int_{-\infty}^{+\infty} xf(x)dx \text{  (continous variable)}$$

The expecation of an expression $\mathbb{E}_{x \sim X}[g(x)]$ is the average value of $f$, when $x$ comes from the random variable $X$.

$$\mathbb{E}[g(X)] = \sum_{i=1}^n g(x_i)p(x_i) \text{  (discrete variable)}$$
$$\mathbb{E}[g(X)] = \int_{-\infty}^{+\infty} g(x)f(x)dx \text{  (continous variable)}$$

## Properties

$$\mathbb{E}[X+Y] = \mathbb{E}[X] + \mathbb{E}[Y]$$
$$\mathbb{E}[\alpha X] = \alpha \mathbb{E}[X], \space \alpha \in \mathbb{R}$$

## Estimate

Let $x \in \mathbb{R}^N$ a sample of size $N$ from random variable $X$.  An estimate of the expectation (or mean) of $X$ is:
$$\bar{x} = \frac{1}{N} \sum_{i=1}^N x_i$$

In [2]:
def mean(x):
    N = len(x)
    res = 0
    for i in range(N):
        res += x[i]
    return res / N

x = np.random.randn(137) * 1.5 + 3.1
print(mean(x))
print(np.mean(x))

3.1737433992156947
3.173743399215695


# Variance

The variance is a mesure of how much the value of a random variable change from it's expected value.

$$\text{Var}(X) = \mathbb{E}[(X - \mathbb{E}[X])^2] = \mathbb{E}[X^2] - (\mathbb{E}[X])^2$$

$$\text{Var}(\alpha X + \beta) = \alpha^2 \text{Var}(X), \space \alpha, \beta \in \mathbb{R}$$

Standard deviation $\sigma(X)$:
$$\sigma(X) = \sqrt{\text{Var}(X)}$$  

Let $x \in \mathbb{R}^N$ a sample of size $N$ from random variable $X$.  An estimate of the variance of $X$ is:
$$\text{Var}(x) = \frac{1}{N} \sum_{i=1}^N (x_i - \bar{x})^2$$

But this estimate is a biased estimate. Bessel's correction tries to correct the bias by dividing by $N-1$ instead of $N$:

$$\text{Var}(x) = \frac{1}{N-1} \sum_{i=1}^N (x_i - \bar{x})^2$$

In [3]:
def variance(x, bias=True):
    N = len(x)
    div = N if bias else N - 1
    mu = np.mean(x)
    return np.sum((x - mu)**2) / div

def std(x, bias=True):
    return np.sqrt(variance(x, bias))

x = np.random.randn(137) * 1.5 + 3.1
print(variance(x))
print(np.var(x))
print(std(x))
print(np.std(x))

print(np.cov(x.reshape(1, -1)))
print(variance(x, bias=False))

1.960040148361013
1.960040148361013
1.4000143386269346
1.4000143386269346
1.9744522082754326
1.974452208275432


# Covariance

The Covariance is a mesure of the joint variability of 2 random variables.  
A positive value means there is a positive linear relationship ($X$ have great values when $Y$ have great values and $X$ have low values when $Y$ have low values).  
A negative value means there is a negative linear relationship ($X$ have great values when $Y$ have low values and $X$ have low values when $Y$ have great values).  

Covariance between 2 random variables $X$ and $Y$.
$$\text{cov}(X,Y) = \mathbb{E}[(X - \mathbb{E}[X])(Y - \mathbb{E}[Y])]$$
$$\text{cov}(X,Y) = \mathbb{E}[XY] - \mathbb{E}[X] \mathbb{E}[Y]$$


## Properties

$$\text{cov(X,X)} = \text{Var(X)}$$
$$\text{cov(X,Y)} = \text{cov(Y,X)}$$

## Estimate

Let $x$ and $y \in \mathbb{R}^N$ samples from respectives random variables $X$ and $Y$. An estimate of the covariance between $X$ and $Y$ is:

$$\text{cov}(x,y) = \frac{1}{N} \sum_{i=1}^N (x_i - \bar{x})(y_i - \bar{y})$$

As for the variance, we can correct the bias:
$$\text{cov}(x,y) = \frac{1}{N-1} \sum_{i=1}^N (x_i - \bar{x})(y_i - \bar{y})$$

## Covariance Matrix

Let $X = (X_1, \text{...}, X_n)$ a random vector, where each entry $X_i$ is a random variable.  
We define the covariance matrix $\Sigma \in \mathbb{R}^{n*n}$ suchat that the entry $(i,j)$ is the covariance between $X_i$ and $X_j$:
$$\Sigma_{ij} = \text{cov}(X_i, X_j)$$
$$\Sigma_{ii} = \text{Var}(X_i)$$
$$\Sigma_{ij} = \Sigma_{ji}$$


It is also called the auto-covariance matrix or the variance-covariance matrix.  
It generalizes the notion of variance to multiple dimensions.  

Let $X \in \mathbb{R}^{n*p}$, where each a row is a sample of size $p$ of the random variable $X_i$.  
We can compute an estimate of the covariance matrix $\Sigma \in \mathbb{R}^{N*N}$:
$$\Sigma_{ij} = \text{cov}(x_i, x_j) = \frac{1}{p} \sum_{k=1}^p (x_{ik} - \bar{x}_i)(x_{jk} - \bar{x}_j)$$

When the matrix $X$ is centered (each row has mean 0), it simplifies to:
$$\Sigma = \frac{1}{p} \sum_{k=1}^p  x_{:,k} x_{:,k}^T = \frac{1}{p} X X^T$$

In [4]:
def covar(x, y, bias=True):
    N = len(x)
    div = N if bias else N - 1
    mux = np.mean(x)
    muy = np.mean(y)
    return np.sum((x - mux) * (y - muy)) / div

def covar_mat(X, bias=True):
    n = len(X)
    C = np.empty((n,n))
    for i in range(n):
        C[i,i] = variance(X[i], bias=bias)
    for i in range(n):
        for j in range(i+1,n):
            C[i,j] = covar(X[i], X[j], bias=bias)
            C[j,i] = C[i,j]
    return C

def covar_mat2(X, bias=True):
    p = X.shape[1]
    div = p if bias else p - 1
    X -= np.mean(X, axis=1, keepdims=True)
    return (X @ X.T) / div

x = np.random.randn(137) * 1.1 - 1.7
y = np.random.randn(137) * 0.3 + 4.1

print(covar(x, y))
print(np.cov(np.vstack((x,y)), bias=True)[0,1])

X = np.random.randn(108, 37) * 1.45 - 0.67
C1 = np.cov(X, bias=True)
C2 = covar_mat(X, bias=True)
print(metrics.tdist(C1, C2))
C1 = np.cov(X, bias=False)
C2 = covar_mat(X, bias=False)
print(metrics.tdist(C1, C2))

C1 = np.cov(X, bias=True)
C2 = covar_mat2(X, bias=True)
print(metrics.tdist(C1, C2))

0.0027510876759691025
0.0027510876759690977
9.207890622788475e-15
9.487829036289386e-15
3.933925374527707e-15


# Correlation

Correlation is a mesure to how close two variables are to having a linear relationship with each other.  
The correlation is often mesured by a correlation coefficient. They exist different types of correlation coefficients.  

Two variables are said uncorellated when their correlation coefficient is $0$. It means there is no increasing or decreasing trends between the 2 variables.  

## Correlation and dependance

The correlation can be seen as a more specific kind of dependance. Two variables can be uncorrelated (no sign of specific trends between them), and yet be dependant.  
$X$ and $Y$ correlated implies that they are dependants (but the opposite direction is false).  
Similarlyy, $X$ and $Y$ independant implies that they are uncorrelated (but the opposite direction is false). 

## Pearson correlation coefficient

It is a measure of the linear correlation between 2 variables $X$ and $Y$, ranging from $-1$ to $1$, with $+1$ a total positive linear correlation, $-1$ a total negative linear correlation, and $0$ no linear correlation.  

![correlation_examples](https://upload.wikimedia.org/wikipedia/commons/0/02/Correlation_examples.png)

$$\rho(X,Y) = \frac{\text{cov}(X,Y)}{\sigma(X)\sigma(Y)}$$

## Estimate

Let $x$ and $y \in \mathbb{R}^N$ samples from respectives random variables $X$ and $Y$. An estimate of the pearson correlation coefficient between $X$ and $Y$ is:

$$\rho(x,y) =\frac{\text{cov}(x,y)}{\sigma(x)\sigma(y)} = \frac{\sum_{i=1}^N (x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum_{i=1}^N (x_i - \bar{x})} \sqrt{\sum_{i=1}^N (y_i - \bar{y})}}$$

## Correlation Matrix

The idea is similar to the covariance matrix. We present the correlation matrix for the pearson correlation coefficient.  
Let $X = (X_1, \text{...}, X_n)$ a random vector, where each entry $X_i$ is a random variable.  
We define the correlation matrix $\Sigma \in \mathbb{R}^{n*n}$ suchat that the entry $(i,j)$ is the correlation between $X_i$ and $X_j$:
$$\Sigma_{ij} = \text{cov}(X_i, X_j)$$
$$\Sigma_{ii} = 1$$
$$\Sigma_{ij} = \Sigma_{ji}$$  

Let $X \in \mathbb{R}^{n*p}$, where each a row is a sample of size $p$ of the random variable $X_i$.  
We can compute an estimate of the correlation matrix $\Sigma \in \mathbb{R}^{N*N}$:
$$\Sigma_{ij} = \rho(x_i, x_j)$$

When each rows of $X$ as mean $0$ and starndard deviation $1$, it simplifies to:
$$\Sigma = \frac{1}{p} \sum_{k=1}^p  x_{:,k} x_{:,k}^T = \frac{1}{p} X X^T$$

In [5]:
def corr(x, y):
    sx = std(x)
    sy = std(y)
    return covar(x,y) / (sx * sy)

def corr_mat(X):
    p = X.shape[1]
    X -= np.mean(X, axis=1, keepdims=True)
    X /= np.std(X, axis=1, keepdims=True)
    return (X @ X.T) / p
    

x = np.random.randn(137) * 1.1 - 1.7
y = np.random.randn(137) * 0.3 + 4.1

print(corr(x, y))
print(np.corrcoef(np.vstack((x,y)))[0,1])

X = np.random.randn(4, 37) * 1.45 - 0.67

C1 = np.corrcoef(X)
C2 = corr_mat(X)
print(metrics.tdist(C1, C2))

0.05647196527337517
0.0564719652733752
6.849675967350832e-16
