# Probability Axioms and Counting

## Probability Axioms

- Sample space ${\displaystyle \Omega}$ should be: mutually exclusive, collectively exhaustive, with the right granularity
- ${\displaystyle P(A)>= 0}$
- ${\displaystyle P(\Omega)=1}$
- ${\displaystyle P(A ∩ \Omega)= {P(A)}}$
- ${\displaystyle P(A) + P(A^c) = 1}$
- if $A \subseteq B$ then $P(A) \leq P(B)$
- ${\displaystyle P(A ∪ B) = P(A) + P(B) - P(A ∩ B)}$
- ${\displaystyle P(A ∩ B^c) = P(A) - P(A ∩ B)}$
- ${\displaystyle S ∩ (T ∪ U) = (S ∩ T) ∪ (S ∩ U)}$
- ${\displaystyle S ∪ (T ∩ U) = (S ∪ T) ∩ (S ∪ U)}$
- Bonferroni Inequality: ${\displaystyle P(A_1 ∩ A_2) \geq P(A_1) + P(A_2)}$
- [De Morgan Laws](https://brilliant.org/wiki/de-morgans-laws/)

## Conditioning Rules

### Multiplication Rules

$$
\begin{align}
P(A ∩ B) &= P(B)P(A | B) \\
         &= P(A)P(B | A) \\
         &= P(A ∩ B ∩ C) = P(A)P(B | A)P(C | A ∩ B)
\end{align} 
$$

### Total Probability Theorem

i.e. probability of an event $A$ is the sum of the probabilities of that event happening under every possible scenario $B_n$ times the probability of that scenario happening (the same applies for **expectations**):

$$
{\displaystyle \Pr(A)=\sum _{n}\Pr(A\cap B_{n})}\\
{\displaystyle \Pr(A)=\sum _{n}\Pr(A\mid B_{n})\Pr(B_{n})}\\
{\displaystyle \Pr(A)=\sum_{n}P(A\mid B_{n})\Pr(B_{n}),}\\
{\displaystyle E[X]=\sum_{n}P(A_n)E[X\mid A_{n}]}\\{\displaystyle E[X]=\sum_{y}p_Y(y)E[X\mid Y = y]}
$$

### Bayes Rules

$${\displaystyle P(A\mid B)={\frac {P(B\mid A)P(A)}{P(B)}}}$$

$${\displaystyle P(A\mid B)={\frac {P(B ∩ A)}{P(B)}}}$$

### Independence

$${\displaystyle P(A | B) = P(A)}$$

$${\displaystyle P(A ∩ B)=P(A)P(B)}$$

- $A$ and $B$ independent ⇒ $A$ and $B^c$ independent ⇒ $B^c$ and $A$ independent ⇒ $B^c$ and $A^c$ independent
- Independent events $≠$ Disjoint events!
- Independence **does not imply** conditional independence
- **Independence intuitive definition:** info of some event does not change probabilities of the other event.


## Counting
- **Counting Principle**: product for $i$ that goes from $1$ to $n$
- **Permutations**: number of ways of ordering $n$ elements: $n!$
- **Number of all possible subsets** of size ${n} \rightarrow 2^n$
- **Combinations**: number of $k$ elements subsets of a given $n$ elements set<br>

  $${\displaystyle \binom{n}{k} = {\frac {n!}{(n - k)!k!}}}$$


- **Binomial probabilities**: probability of obtaining $k$ heads in $n$ tosses is:<br>

  $$P(X=k) = \binom{n}{k}p^k(1-p)^{n-k}$$


- Sampling $k$ items from $n$ items **with** replacement: $n^k$  
- Sampling $k$ items from $n$ items **without** replacement: $n!/(n-k)!$ = $n(n-1)...(n-k+1)$
- Partitions
- Convention: $0! = 1$
- Any **ordered** arrangement of objects is called a **permutation**. The number of different permutations of $N$ objects is $N!$. The number of different permutations of $n$ objects taken from $N$ objects is 
  
  $$\frac{N!}{(N-n)!}$$
  
  
- Any **unordered** arrangement of objects is called a **combination**. The number of different combinations of $n$ objects taken from $N$ objects is 
  
  $$\frac{N!}{(N-n)!n!}$$ 
    
  We typically denote this with $n \choose N$ ("N choose n")

- $$\sum_{r=0}^k {m \choose r}{n\choose k-r} = {m+n \choose k}$$

<div class="alert alert-info"><h3>Example</h3> <br> Which of the following identities expresses the probability of choosing from  m  fruits and  n  veggies a selection of  r  pieces of food, where exactly $k \leq r$  of them are fruits and the rest are veggies? Answer: $$\frac{\binom{m}{k}\binom{n}{r-k}}{\binom{m+n}{r}}$$<br>There are  $\binom{m}{k}\binom{n}{r-k}$ ways to choose a set of $r$ foods with exactly $k$ fruits and the rest veggies. There are a total of $\binom{m + n}{r}$ ways to choose a set of $r$ foods with no other restrictions. The probability is thus the former divided by the latter.</div>


## Counting Useful Formulas

Infinite Sum Law with $a$ positive and $\leq 1$, i.e. geometric series

$$
{\displaystyle \sum_{k=0}^{\infty}a^{k}=\frac{1}{1-a}}
$$
Starting from one:

$$
{\displaystyle \sum_{k=1}^{\infty}a^{k}=\frac{a}{1-a}}
$$
Starting from $j$:

$$
{\displaystyle \sum_{k=j}^{\infty}2^{-k}=2^{-j+1}}
$$
Infinite Sum Law $b$:

$$
{\displaystyle 0 + 1 + ... n = \frac{n(n+1)}{2}}
$$

[Sequence](https://en.wikipedia.org/wiki/Geometric_series#Formula) / Sequence Convergence


### Joint RVs
   

### Functions of RVs

Let $X$  be a uniform random variable on $[0,1]$ and let  $Y=\frac{1}{x}$.

$$F_{Y}(y) = P(Y\leq y) = P(\frac{1}{X} \leq y) = P(X\geq \frac{1}{y}) = 1 - \frac{1}{y}$$

### Probability Integral Transform

Given any random continuous variable $X$, define $Y=F_{X}(X)$. Then:

$$ \begin{align}
F_Y (y) &= \operatorname{P}(Y\leq y) \\
        &= \operatorname{P}(F_X (X)\leq y) \\
        &= \operatorname{P}(X\leq F^{-1}_X (y)) \\
        &= F_X (F^{-1}_X (y)) \\
        &= y
\end{align} $$

$F_Y$ is just the CDF of a $\text{Uniform}(0, 1)$ random variable. Thus, $Y$ has a uniform distribution on the interval $[0,1]$. As per the probability integral transformation, we need to be able to sample from  $U[0,1]$ , together with $F^{-1}_Y$, in order to do pseudorandom number generation of $Y$. Basically for each  $u \in U[0,1]$ that we sample from $U[0,1]$, we calculate $y: F^{-1}_Y(u)$ as the corresponding sample from $Y$.

### Convolution

$${\displaystyle h(z)=(f*g)(z)=\int _{-\infty }^{\infty }f(z-t)g(t)dt=\int _{-\infty }^{\infty }f(t)g(z-t)dt}$$

### Order Statistics

Let $Y_n = \text{max}(X_1, . . . ,X_n)$. This is called the $n^{th}$ order statistic.

How is the nth order statistic distributed (given independence): 
1. by definition of $Y_n$
2. by independence


$$F_n(y) = P(Y_n \leq y) = P(X_1 \leq y, X_2 \leq y, ... , X_n \leq y)\\
P(X_1 \leq y) P(X_2 \leq y) \dots P(X_n \leq y)\\
=F_X(y)^n$$

First order stat

$$P(X_1 \geq y) P(X_2 \geq y) \dots P(X_n \geq y)\\
F_1(y) = (1 - F_X(n))^n\\
f_1(y) = n(1 - F_X(n))^n f_X(y)$$

### St. Petersburg Paradox

$$E(Y)=\sum_{n=1}^{\infty} 2^n 0.5^{n-1}*0.5\\
=\sum_{n=1}^{\infty} 2^n *(\frac 12)^n\\
= \sum_{n=1}^{\infty} 1\\
=\inf$$

$$E[Y] = E[X_{1}] * E[X_{2}]$$
only if $X_1$ and $X_2$ are independent

This is false. Unlike the case for expectations, the variance of a sum of random variables is equal to the sum of the variances of the random variables only when the random variables have zero covariance (which is always the case when they are independent).


Law of Total Variance
$${\displaystyle \operatorname {Var} (Y)=\operatorname {E} [\operatorname {Var} (Y\mid X)]+\operatorname {Var} (\operatorname {E} [Y\mid X]).}$$

Law of Total Variance 
$${\displaystyle \operatorname {E} (X)=\operatorname {E} (\operatorname {E} (X\mid Y)),}$$

### Minimum Distribution

If the cdf of $X_i$ is denoted by $F(x)$, then the cdf of the minimum is given by $1−(1−F(x))^n$.

### Bayes Rules

$${\displaystyle P(A\mid B)={\frac {P(B\mid A)P(A)}{P(B)}}}$$

$${\displaystyle P(A\mid B)={\frac {P(B ∩ A)}{P(B)}}}$$