# Probability

## Statistical inequalities

Inequalities are interesting because they set bounds to probabilities (and maybe having a bound on the values is informative enough for our purposes).

The four most important inequalities:

### 1. Cauchy-Schwarz inequality

$$
|\mathbf{E}[XY]| \le \sqrt{\mathbf{E}[X^2]\mathbf{E}[Y^2]}
$$

If the vars are uncorrelated, then this inequality is not useful, as we already have an equality for that case: $\mathbf{E}[XY] = \mathbf{E}[X] \times \mathbf{E}[Y]$.

As a consequence of this theorem we can state that the correlation coefficient is bounded between -1 and 1.

### 2. Jensen's inequality

If _g_ is a convex function, then for any random variable _X_:

$$
\mathbf{E}[g(X)] \ge g(\mathbf{E}[X])
$$

<img src='img/jensens_inequality.png' />

The opposite is valid for concave functions (in that case, the inequality flips).

This theorem is useful because it allows us to swap the expectation and the function application to the random var.

### 3. Markov's inequality

For any $c \gt 0$ and any random var _X_: 

$$
P(|X| \ge c) \le \frac{| \mathbf{E}[X] |}{c}
$$

The strength of this inequality is simplicity and generality (its just a crude upper bound).

Markov's inequality relates expected values to probabilities (for instance, if we know that the expected value is small, then we know that the probability of _X_ taking large values is also small).

### 4. Chebyshev's inequality

This theorem falls from Markov's inequality.

For any $c \gt 0$:

$$
P(| X - \mu | > c) \le \frac{\sigma^2}{c^2}
$$

Making $c = a \times \sigma$, then: $P(| X - \mu | > a \sigma) \le \frac{1}{a^2}$

This inequality related variances (or standard deviations) to probabilities (for example, if we know the variance is small, we know that the probability of being away from the mean is also small).
As we move farther away from the mean (namely, _a_ increases), the probability gets smaller (cause $a^2$ is in the denominator on the right hand side).

## Limit theory

In the following theorems, we will be analysing hypothetical situations in which the sample size goes to infinity ($n \rightarrow \infty$); that is, in the limit.

### Convergence in statistics

A random variable converges to a value _a_ if, as the sample size increases, the values of the random var gets closer to the value _a_ (and remains close to _a_). "Gets and remains close", means that, if we define an interval around _a_, the values fall inside it and remain there; that is, that the majority of the random var distribution's mass inside the band increases (and so, its mass outside the band decrease):

<img src='img/convergence.png' />

Analytically, for every $\epsilon \gt 0$:

$$
\lim_{n \rightarrow \infty} P(|Y_n - a| \ge \epsilon) = 0
$$

## The Law of large numbers and Central limit theorem

Both theorems describe what happens to the sample mean when $n$ gets large (actually, goes to infinity).

We are assuming $X_1, X_2, ..., X_n$ are i.i.d random variables with a finite $\mu$ and $\sigma^2$.

### The Law of large numbers

This is a neat property, because, with it in mind, we can do counting in order to estimate proportions and probabilities (with would have no theoretical justification for doing it withouth this theorem).

#### Strong law of large numbers

The law of large number says the following: $\bar{X_n} \rightarrow \mu$, with probability $1$.

That is, the sample mean converges to $\mu$ (the true mean, the true expected value) with probability $1$.

#### Weak law of large numbers

The aforementioned theorem is called the _Strong law of large numbers_, there is a _Weak law of large numbers_, which is a very similar statement to the strong law, in term of intuition, that states: $P(|\bar{X_n} - \mu| > c) \rightarrow 0$, for any $c \gt 0$, as $n \rightarrow \infty$.

#### Gambler's fallacy

Example: you are flipping a coin, and it is landing heads (H-H-H) all the time, and because of the law of large numbers, you know that, in the long run, the probability of landing heads will be $0.5$, so you expect to get a lot of consecutive tails in the following tries and you change your bets to getting tails.

That's totally not how it works... the coin is memory-less. The law of the large numbers works on the infinity, so, no matter how many times you are getting heads, it will be nothing compared against flipping the coin an infinite amount of times (swamp down).

### Central limit theorem

The strong law of large numbers states that $\bar{X_n}$ will be equal to $\mu$ (as $n \rightarrow \infty$), but it does not say anything about the distribution of $\bar{X}$, nor how fast does $\bar{X}$ gets close to $\mu$.

The central limit theorem says that the distribution of $n^\frac{1}{2}\frac{(\bar{X} - \mu)}{\sigma}$ converges to a standard normal when $n \rightarrow \infty$.

Though the theorem works thinking of $n$ going to $\infty$, in practice, we use it for when $n$ is only a small set.