## Discrete probability distributions

### Key terms

First, it is important to define a few terms.

Definitions:
- **Event** : A physical event that happens due to the probabilistic nature of the world
- **Random Variable**: A function that takes probabilistic **events** and transforms them into a number

### Learning random variables with examples

<u> Example 1: Fair Coin toss</u> 

$\Omega = \{H, T\}$
$$P(H) = P(T) = \frac{1}{2}$$

In other words, a fair coin toss implies that the probability of getting heads is 0.5, same as that of getting a tail on 1 toss, theoretically. 

Whenever we encounter a binary outcome that has a certain probability $p$ of occurring (the other outcome occurring with $1-p$ probability), this is known as a **Bernoulli** random variable.

Using this example, we can try to understand what expected value means:

**Expected value** of a random variable is the average or mean, the expected value is usually written as $E[X]$. 

If the sample space contains $n$ events:

$$\Omega = \{ E_1, E_2, \dots, E_n \}$$

then the expected value is defined as:
$$E[X] = P(E_1) E_1 + P(E_2) E_2 + \dots P(E_n) E_n = \sum_{i=1}^n P(E_n) E_n $$

Additionally, the expectation operation is <u> Linear </u>, meaning if two random variables are *independent*, meaning random variable's occurence does not affect the other's occurence, then:

$$E[X_1 + X_2] = E[X_1] + E[X_2]$$

In this example:
$$E[X] = p \cdot 1 + (1-p) \cdot 0 = p$$

Another important definition to introduce is the concept of variance, meaning how spread out the random variable is, by definition:

$$Var[X] = E[(X - E[X])^2] = E[X^2] - E[X]^2$$

We already know how to evaluate $E[X]$, but how do we evaluate $E[X^2]$?

$$E[X^2] = \sum_{i=1}^n P(E_i) E_i^2$$

In the context of a bernoulli variable:

$$E[X^2] = p \cdot 1^2 + (1-p) \cdot 0^2 = p$$
$$E[X]^2 = p^2 $$

Hence, the variance of a Bernoulli random variable is:
$$Var[X] = p - p^2 = p(1-p)$$

<u> Example 2: Fair Die Roll</u>

$\Omega = \{1,2,3,4,5,6\}$
$$P(1) = P(2) = \dots = P(6) = \frac{1}{6}$$ 

Again, each event (or value on the roll) has identical probability. The value of a die roll is also known as a **Discrete uniform** random variable.


<u> Example 3: Number of heads in 3 fair coins</u> 

Since we already discussed that the outcome of a fair coin is a Bernoulli random variable wit $p=\frac{1}{2}$, in other words $X \sim \text{Bern(p)}$.
All of the probable events that could happen when you toss 3 fair coins are:
$$\Omega = \{HHH, TTT, HHT, HTH, THH, HTT, THT, TTH\}$$

If we care about the order of the coin toss, i.e. HHT is different from HTH, then each outcome has a probability of $\frac{1}{8}$. However, once we assume that the coins are identical and we can't distinguish between which coin produced which head, all that we care about is the *total number of heads*, we have a **Binomial** random variable.

Assume $N$ represents the total number of heads from the 3 tosses, and $X_i$ represents the number of head for the $i$th coin, then $$N = X_1 + X_2 + X_3$$
Hence, the number sample space containing the number of heads is:

$\Omega = \{0, 1, 2, 3\}$

$$P(N=n) = \binom{3}{n} p^n (1-p)^{3-n}$$

How then can we calculate the mean value of the number of heads? 

$$E[N] = E[X_1 + X_2 + X_3] = E[X_1] + E[X_2] + E[X_3] = \frac{3}{2}$$


To calculate the Variance property of a Binomial Variable, we must understand how to evaluate the variance of a sum of random variables. 

As mentioned before, random variables can be independent or non-independent and hence we must understand what **Covariance** represents.

$$Var[X_1+X_2] = Var[X_1] + Var[X_2] + 2 Cov[X_1, X_2]$$

If $X_1, X_2$ are independent random variables, then $Cov[X_1, X_2] = 0$

Hence, the variance of the number of heads for 3 fair coin tosses:

$$Var[X_1 + X_2 + X_3] = 3 Var[X_1] = 3p(1-p) = 3 \cdot \frac{1}{2} \cdot \frac{1}{2} = \frac{3}{4}$$

<u> Example 4: Random count of discrete events </u>

One of the most important discrete random variables is the **Poisson** random variable. 

The Poisson random variable, $X$,  is used to model/describe the probability of a number of events happening within a fixed time interval, assuming a constant mean arrival rate $\lambda$ (unchanged by when the last event arrived)

$$P(X=n) = \frac{\lambda^n e^{-\lambda}}{n!}$$

It is a useful distribution to model number of arrivals of customers, number of server jobs arriving in a given hour, number of neuron spikes during a fixed period of time...

To derive the mean and variance, again we can apply the definition of the definition of $E[X]$:

$$E[X] = 1 \cdot  \frac{\lambda^1 e^{-\lambda}}{1!} +  2 \cdot  \frac{\lambda^2 e^{-\lambda}}{2!} + \dots =  e^{-\lambda} \sum_{n=0}^\infty n  \frac{\lambda^n}{n!} = e^{-\lambda} \sum_{n=1}^\infty \frac{\lambda^n}{(n-1)!} = e^{-\lambda} \sum_{n'=0}^\infty \frac{\lambda^{(n'+1)}}{(n')!} = e^{-\lambda} e^{\lambda} \lambda = \lambda$$

## Important Continuous Distribution

<u>Normal Distribution</u>

Perhaps the most important continuous distribution used commonly to describe naturally occurring data is the normal distribution (also known as the Gaussian Distribution).

<u>t-distribution</u>

<u>Exponential Distribution</u>