# Probability

## What is probability?

__Frequentist interpretation__ - probabilities represent long run frequencies of events

__Bayesian interpretation__ - probability is used to represent a degree of belief about an uncertain event

## Probability axioms

1. __Nonnegativity__: $\mathbf{P}(A)\geq0$, for every event $A$
2. __Additivity__: $\mathbf{P}(A\cup B)=\mathbf{P}(A)+\mathbf{P}(B)$, if $A$ and $B$ are disjoint
3. __Normalization__: $\mathbf{P}(\Omega)=1$, where $\Omega$ is the entire sample space


## Random variables

-  A __random variable__ is a real-valued function of the experiment outcome.
-  A __function of a random variable__ defines another random variable
-  We can associate with each random variable certain "averages" of interest, such as the __mean__ and the __variance__
-  A random variable can be __conditioned__ on an event or on another random variable
-  There is a notion of __independence__ of a random variable from and event or from another random variable




### Discrete random variables

-  A __discrete random variable__ is a real-valued function of the outcome the experiment that can take a finite or countable infinite number of values
-  A discrete random variable has an associated __probability mass function (PMF)__, which gives the probability of each numerical value that the random variable can take
-  A __function of a discrete random variable__ defines another discrete random variable, whose PMF can be obtained from the PMF of the original random variable



### Probability mass function
As per probability axioms PMF must satisfy the following properties

1. $\,\,\forall{x}\in{X}\,,0\leq P(X=x)\leq1$

2. $\,\,\sum_{x\in{X}}P(X=x)=1$

### Continuous random variables

A random variable $X$ is called __continuous__ if there is a nonnegative function $p$ called the __probability density__ function of $X$, such that:

$\mathbf{P}(X\in{B})=\int_B{p(x)dx}\,\,\,$, for every subset $B$ of the real line. 

In particular,

$\mathbf{P}(a\leq X\leq b) = \int_a^b{p(x)dx}$

$\mathbf{P}(-\infty\leq X\leq\infty) = \int_{-\infty}^{\infty}{p(x)dx}=1$


## Marginal probability

Sometimes we know the probability distribution over a set of variables and we want to know the probability distribution over just a subset of them. This is called __marginal probability distribution__ and we can calculate it with the __sum rule__.

$\forall_{x\in{X}},\,P(X=x)=\sum_y P(X=x,Y=y)$

$p(x)=\int{p(x,y)dy}$



## Conditional probability


We are interested in the probability of some event, given that some other event has happend. This is called a __conditional probability__.


$P(Y=y|X=x)=\dfrac{P(Y=y,X=x)}{P(X=x)}$

The __chain rule of conditional probabilities__

$P(X^{(1)},...,X^{(n)})=P(X^{(1)})\prod_{i=2}^n P(X^{(i)}|X^{(1)},...,X^{(i-1)})$

## Independence and Conditional Independence

Two random variables $X$ and $Y$ are __independent__ if:

$\forall_{x\in{X},y\in{Y}},\,P(X=x,Y=y)=P(X=x)P(Y=y)$

Twor random variables $X$ and $Y$ are __conditionally independent__ given a random variable $Z$ if

$\forall_{x\in{X},y\in{Y},z\in{Z}},\,P(X=x,Y=y|Z=z)=P(X=x|Z=z)P(Y=y|Z=z)$


## Bayes' Rule

Combining the defintion of conditional probability with the product and sum rules yields __Bayes rule__.


$P(x\,|y)=\dfrac{P(x,\,y)}{P(y)}=\dfrac{P(x)P(y\,|x)}{P(y)}=\dfrac{P(x)P(y\,|x)}{\sum_{x\in{X}}P(x)P(y\,|x)}$


## Expectations

The __expectation__ or expected value of some function $f(x)$ with respect to a probability distribution $P(X)$ is the average, or mean value, that $f$ takes on when $x$ is drawn from $P$.

For discrete random variables:

$\mathbb{E}_{X\sim P}[f(x)]=\sum_x{P(x)f(x)}$

For continuous random variables:

$\mathbb{E}_{X\sim p}[f(x)]=\int{p(x)f(x)dx}$

Expectations are linear:

$\mathbb{E}_X[\alpha f(x)+\beta g(x)]=\alpha \mathbb{E}_X[f(x)] + \beta \mathbb{E}_X[g(x)]$


## Variance and Covariance 

The __variance__ gives a measure of how the values of a function of a random variable $X$ vary as we sample difference values $x$ from its probability distribution:

$\mathsf{Var}(f(x))=\mathbb{E}[(f(x)-\mathbb{E}[f(x)])^2]$

The __covariance__ gives some sense of how much two values are linearly related to each other, as well as the scale of these variables:

$\mathsf{Cov}(f(x),\,g(y))=\mathbb{E}[(f(x)-\mathbb{E}[f(x)])(g(y)-\mathbb{E}[g(y)])]$

The __covariance matrix__ of a random vector $\mathbf{x}\in \mathbb{R}^n$, is an $n\,\mathsf{x}\,n$ matrix, such that:

$\mathsf{Cov}(\mathbf{x})_{i,j}=\mathsf{Cov}(x_i, x_j)$

The diagonal elements of the convariance matrix give the variance.

$\mathsf{Cov}(x_i,x_i)=\mathsf{Var}(x_i)$