# Probability Theory

## Combinatorics

### Permutations

Permutations are rearrangements of objects in unqiue sequences. The total number of permutations of $n$ objects where $n_1, n_2, \cdots, n_k$ are alike is calculated as:

\begin{equation*}
\frac{n!}{n_1!n_2! \cdots n_k!} = \frac{n!}{(n-k)!}
\label{eq:1} \tag{1}
\end{equation*}

If we're just looking to order $n$ different objects, then the total number of permutations would just be $n!$.

### Combinations

Unlike permutations, combinations are an unordered collection of objects. For $n$ distinct objects taken $k$ at a time without repetition, we can calculate the total number of combinations using the Binomial Coefficient:

\begin{equation*}
{n \choose k} = \frac{n!}{k!(n-k)!}
\label{eq:2} \tag{2}
\end{equation*}

Symmetry rule of the Binomial Coefficient:

\begin{equation*}
{n \choose k} = {n \choose {n-k}}
\label{eq:3} \tag{3}
\end{equation*}

Here's the short proof of that. Let $0 \leq k \leq n$. Then:

\begin{equation*}
\begin{split}
{n \choose k} &= \frac{n!}{k!(n-k)!} = \frac{n!}{(n-k)!k!} \\
&= \frac{n!}{(n-k)!(n-(n-k))!} = {n \choose n-k}
\end{split}
\label{eq:4} \tag{4}
\end{equation*}

### Binomial Theorem

The Binomial Theorem is the expansion of powers of a binomial. 

\begin{equation*}
(x+y)^n = \sum_{k=0}^{n} {n \choose k} x^k y^{n-k}
\label{eq:5} \tag{5}
\end{equation*}

### Inclusion-Exclusion Principle

\begin{equation*}
P(E_1 \cup E_2) = P(E_1) + P(E_2) - P(E_1 E_2)
\label{eq:6} \tag{6}
\end{equation*}

More generally,

\begin{equation*}
P(E_1 \cup \cdots \cup E_n) = \sum_{i=1}^{n} P(E_i) - \sum_{i_1<i_2} P(E_{i_1} E_{i_2}) + \cdots + (-1)^{r+1} \sum_{i_1<i_2< \cdots <i_r} P(E_{i_1} E_{i_2} \cdots E_{i_r}) + \cdots + (-1)^{n+1} P(E_1 E_2 \cdots E_n)
\label{eq:7} \tag{7}
\end{equation*}

## Conditional Probability

### Bayes' Theorem

Conditional probability is the probability that an event will occur given that another event has already occurred. It's calculated using Bayes' Theorem:

\begin{equation*}
P(E|F) = \frac{P(E F)}{P(F)}, P(F) \neq 0
\label{eq:8} \tag{8}
\end{equation*}

Here's the short proof of that:

\begin{equation*}
\begin{split}
P(E \cap F) &= P(F \cap E) \\
\Rightarrow P(F|E)P(E) &= P(E|F)P(F) \\
\Rightarrow P(E|F) &= \frac{P(F|E)P(E)}{P(F)} = \frac{P(E F)}{P(F)}
\end{split}
\label{eq:9} \tag{9}
\end{equation*}

### Multiple Events

If there were some other event $E^C$ partitioning the sample space, then Bayes' theorem becomes:

\begin{equation*}
P(E|F) = \frac{P(F|E)P(E)}{P(F|E)P(E) + P(F|E^C)P(E^C)}
\label{eq:10} \tag{10}
\end{equation*}

In general, for multiple events $E_j$ partitioning the sample space:

\begin{equation*}
P(E_i|F) = \frac{P(F|E_i)P(E_i)}{\sum_{j=1}^{N} P(F|E_j)P(E_j)}
\label{eq:11} \tag{11}
\end{equation*}

#### Example

I have two fair coins and one double headed coin. I pick one coin at random. What's the chance of having picked a double headed coin if 100 consecutive coin tosses yield 100 heads?

Let $E$ be the event that the coin is fair, $E^C$ be the event that the coin isn't fair, and $F$ be the event that 100 coin tosses yield 100 heads. Then,

\begin{equation*}
\begin{split}
P(E|F) &= \frac{P(F|E)P(E)}{P(F|E)P(E) + P(F|E^C)P(E^C)}
&= \frac{\left( 1\right) \left( \frac{1}{3}\right)}{\left(1\right) \left( \frac{1}{3}\right) + \left( {\frac{1}{2}}^{100} \right) \left(\frac{2}{3} \right)} \approx 1 
\end{split}
\end{equation*}

### Product Rule

As used in the proof above:

\begin{equation*}
P(EF) = P(F|E)P(E)
\label{eq:12} \tag{12}
\end{equation*}

More generally,

\begin{equation*}
P(E_1 E_2 \cdots E_3) = P(E_1)P(E_2|E_1)P(E_3|E_1 E_2) \cdots P(E_n|E_1 \cdots E_{n-1})
\label{eq:13} \tag{13}
\end{equation*}

### Law of Total Probability

For any mutually exclusive events {$F_i$}, where $1 \leq i \leq n$, the probability of event $E$ occurring is:

\begin{equation*}
\begin{split}
P(E) &= P(EF_1) + P(EF_2) + \cdots + P(EF_n) \\
&= P(E|F_1)P(F_1) + P(E|F_2)P(F_2) + \cdots + P(E|F_n)P(F_n) 
= \sum_{i=1}^{n} P(E|F_i)P(F_i) 
\end{split}
\label{eq:14} \tag{14}
\end{equation*}

## Random Variables

### Expected Value

The expected value of a random variable, $E[X]$, is the weighted average of its possible outcomes. If random variable $X$ is **discrete** with probability $P(X = x)$, then:

\begin{equation*}
E [X] = \sum x P(x)
\label{eq:15} \tag{15}
\end{equation*}

If $X$ is **continuous** with a PDF of $f(x)$, then:

\begin{equation*}
E [X] = \int_{-\infty}^{\infty} xf(x)dx
\label{eq:16} \tag{16}
\end{equation*}

#### Linearity of expectation

Linearity of expectation states that the sum of the expected values of random variables is the sum of their individual expected values--regardless of whether they are independent or not. For random variables $X$ and $Y$, this means:

\begin{equation*}
E [X+Y] = E[X] + E[Y]
\label{eq:17} \tag{17}
\end{equation*}

More generally,

\begin{equation*}
E \left[ \sum_{i=1}^{n} X_i \right] = \sum_{i=1}^{n} E[X_i]
\label{eq:18} \tag{18}
\end{equation*}

##### Example

You throw a fair coin one million times. What is the expected number of occurences of HHHHHHTTTTTT?

The probability $p_i$ of getting HHHHHHTTTTTT is $\frac{1}{2^{12}}$. For one million tosses, there are $n = 1000000 - 11$ possible chances for this sequence to occur. Let $X_i$ have a value of $1$ if the sequence starting at $i$ is equal to HHHHHHTTTTTT. Then, using the linearity of expectation, we have:

\begin{equation*}
E \left[ \sum_{i=1}^{n} X_i \right] = \sum_{i=1}^{n} E[X_i] = n p_i = \frac{1000000-11}{2^{12}}
\end{equation*}

### Variance and Covariance

The variance of a random variable $X$ is:

\begin{equation*}
V(X) = \sigma_X^2
\label{eq:19} \tag{19}
\end{equation*}

where $\sigma_X$ is the standard deviation of $X$. In terms of the expected value, it can be rewritten as:

\begin{equation*}
V(X) = E[(X - E[X])^2] = E[X^2] - E[X]^2
\label{eq:20} \tag{20}
\end{equation*}

The covariance of random variables $X$ and $Y$ is:

\begin{equation*}
C(X, Y) = E[XY] - E[X]E[Y]
\label{eq:21} \tag{21}
\end{equation*}

And the correlation of $X$ and $Y$ is:

\begin{equation*}
\rho(X, Y) = \frac{Cov(X, Y)}{Var(X)Var(Y)}
\label{eq:22} \tag{22}
\end{equation*}

If X and Y are independent, then $Cov(X, Y) = 0$ and $\rho(X, Y) = 0$. 

#### Covariance Matrices

Covariance matrices show the covariance between variables. They are symmetrical and positive semidefinite--meaning all values in the matrix are greater than or equal to zero. The determinant test is used to determine whether a matrix is a covariance matrix by computing the determinants of the growing submatrices in the upper left corner of the matrix. If they are all positive semidefinite, then the matrix is a covariance matrix.

##### Example

For three assets $A$, $B$, and $C$, the correlation coefficient of $A$ and $B$ is 0.9 and is 0.8 for $B$ and $C$. With that in mind, can $A$ and $C$ have a correlation coefficient of 0.1?

\begin{equation*}
ABC = \begin{bmatrix}V_{A,A}&C_{A,B}&C_{A,C}\\C_{B,A}&V_{B,B}&C_{B,C}\\C_{C,A}&C_{C,B}&V_{C,C}\end{bmatrix} = \begin{bmatrix}1 & 0.9 & 0.1\\0.9 & 1 & 0.8\\0.1 & 0.8 & 1\end{bmatrix}
\end{equation*}

To solve this, we'll need to use the determinant test:

\begin{equation*}
\begin{split}
&det(1) = 1 \\
&det \left( \begin{bmatrix}1&0.9\\0.9&1\end{bmatrix} \right) = 0.19 \\
&det(ABC) = (1)\begin{bmatrix}1&0.8\\0.8&1\end{bmatrix} - (0.9) \begin{bmatrix}0.9&0.8\\0.1&1\end{bmatrix} + (0.1) \begin{bmatrix}0.9&0.1\\0.1&0.8\end{bmatrix} = -0.307
\end{split}
\end{equation*}

$det(ABC)$ is not positive, which means $ABC$ is not a positive semidefinite matrix. Therefore, $ABC$ cannot be a covariance matrix if the correlation coefficient between $A$ and $C$ is 0.1.

### Random Variable Functions

Random Variable                             | Discrete Function           | Continuous Function
--------------------------------------------|-----------------------------|---------------------------
Cumulative Distribution Function (CDF)      | $F(a)$ = $P(x\leq$ $a)$     | $F(a)$ = $\int_{-\infty}^{a} f(x)$$dx$
Probability Mass/Density Function (PMF/PDF) | $P(X)$ = $P(X$$=x)$         | $f(x)$=$\frac{d}{dx}F(x)$
$E[x]$                                      | $\sum_{P(x)>0}$$xP(x)$      | $\int_{-\infty}^{\infty} xf(x)$$dx$
$E[g(x)]$                                   | $\sum_{P(x)>0}$$g(x)$$p(x)$ | $\int_{-\infty}^{\infty} g(x)$$f(x)$$dx$
$var(x)$                                    | $E[X^2]$ - $E[X]^2$
$std(x)$                                    | $\sqrt{varx}$