# Probability distributions

## Discrete probability distributions

### Bernoulli distribution

A Bernoulli random variable indicates if the outcome is a success or a failure; that is, Bernoulli random variables can only take the value 1 or 0 (exclusively), with probabilities _p_ and _(1-p)_, respectively.

It is important to note that the probability of success _remains constant across experiments_!

Thus, the distribution is:

$$
P(X=x) = p^x \times (1-p)^{1-x}
$$

#### Mean

Its mean is, according to the definition:
$$
\mu = \sum x p(x) = 0 p(0) + 1 p(1) = p(1) = p^1 \times (1-p)^0 = p
$$

#### Variance
Following the definition:
$$
\sigma^2 = \mathbf{E}[(x-\mu)^2] = \sum (x-\mu)^2 p(x) = (-p)^2 p(0) + (1-p)^2 p(1) = p^2(1-p) + (1-p)^2p = p(1-p) (p + (1-p)) = p(1-p)
$$

#### Fundamental bridge

It is interesting to note that the expectation of a Bernoulli is just equal to the probability of success. This way, we can relate expectations to probabilities, know as the _Fundamental bridge_ (Ref: https://www.youtube.com/watch?v=LX2q356N2rU).

This is useful cause we can narrow down the computation of expectations of much more complicated random variables to the expectation of Bernoullis, by means of linearity of expectations, fundamental bridge, and symmetry. __See the Hypergeometric distribution section below.__

### Binomial distribution

Binomial random variables indicates the number of successes in a repeated independent experiment. We can think of the Binomial random variable (_X_) as summing over the Bernoulli random variables ($I_i$) (cause they are gonna act as indicator variables):
$$
X = I_1 + I_2 + I_3 + ... + I_n
$$

The Binomial random variable is like executing a bernoulli distribution several times (e.g. getting _x_ heads out of _n_ coin flips).

Thus, it can be thought of the probability of one successfull sequence outcome $p^x (1-p)^{1-x}$, and then adjust by the reordering $\binom{n}{x}$. Its distribution is:
$$
P(X=x) = \binom{n}{x} p^x (1-p)^{1-x}
$$


#### Binomial theorem
This distribution owes its name to the Binomial theorem, which states:
$$
(p+q)^n = \sum_n \binom{p}{x} p^x q^{1-x}
$$

With this theorem we can prove that the Binomial distribution sums to 1 ($\sum_n \binom{p}{x} p^x q^{1-x} = (p+q)^n = (p + (1-p))^n = 1^n = 1$).

#### Mean
The mean of the binomial is easy to get if we rely on the Bernoulli mean:
$$
\mathbf{E}[X] = \mathbf{E}[I_1 + I_2 + I_3 + ... + I_n] = n \times p
$$

The mean can also be computed using the definition of the expectation, and the Binomial theorem:
$$
\mathbf{E}[X] = \sum x p(x) = \sum_0^n x \binom{n}{x} p^x (1-p)^{n-x} = \sum_1^n x \binom{n}{x} p^x (1-p)^{n-x} = \sum_1^n x \frac{n!}{x! (n-x)!} p^x (1-p)^{n-x} = \sum_1^n x \frac{n!}{x (x-1)! (n-x)!} p^x (1-p)^{n-x} = \sum_1^n np \frac{(n-1)!}{(x-1)! (n-x)!} p^{x-1} (1-p)^{n-x} = np \sum_1^n \binom{n-1}{x-1} p^{x-1} (1-p)^{n-x} = \text{(Making j=x-1)} = np \sum_{j=0}^{n-1} \binom{n-1}{j} p^{j} (1-p)^{n-(j+1)} = \text{(Using the Binomial theorem)} = np
$$

#### Variance
Again, using the Bernoulli results:
$$
\mathbf{VAR}[X] = \mathbf{VAR}[I_1 + I_2 + I_3 + ... + I_n] = n \times p \times (1-p)
$$

Here is the derivation of these params, without relying on the Bernoulli variables (https://www.youtube.com/watch?v=8fqkQRjcR1M)

### Multinomial distribution

Building on the Binomial case, now lets suppose that our experiment can have more than 2 outcomes (0 or 1). We will now have _k_ events, and each one occur $n_i$ times ($\sum_i n_i = n$), with probabilities $p_i$ ($\sum p_i = 1$).

Still the probabilities remain unchanged accross experiments!

The distribution is:
$$
P(X_1=x_1, X_2=x_2, ..., X_k=x_k) = \binom{n}{x_1,x_2,...,x_k} p_1^{x_1} \times p_2^{x_2} \times ... \times p_k^{x_k}
$$

#### Mean


#### Variance

### Hypergeometric distribution

While the Binomial required independent experiments (sampling with replacement, so that the probability of success remains constant), the Hypergeometric does not.

The way of obtaining the probability of an Hypergeometric variable is by counting the total number of successes vs the total number of combinations:

#### Example: Computing probabilities using a Hypergeometric distribution
There are 8 red balls, 3 yellow balls, and 9 white balls. What is the probability of sampling 6 balls and obtaining 2 red, 1 yellow, and 3 white.
$$
p = \frac{\text{Total ways of reordering the desired sequence}}{\text{Total combinations}} = \frac{ \binom{8}{2} \binom{3}{1} \binom{9}{3} }{ \binom{20}{6} } = \binom{6}{2,1,3} \times P(X_1=r) \times P(X_2=r/X_1=r) \times P(X_3=y) \times P(X_4=w) \times P(X_5=w/X_4=w) \times P(X_6=w/P_4=w, P_5=w) = \frac{6!}{2!1!3!} \times \frac{8}{20} \times \frac{7}{19} \times \frac{3}{18} \times \frac{9}{17} \times \frac{8}{16} \times \frac{7}{15} = 0.182
$$

#### Example: Expectation of an Hypergeometric random var using the Fundamental bridge

X is the nr of aces drawn in a 5-card hand, and we need to find its expectation.

If we think of each card as an indicator random variable (that is, a Bernoulli random var), then:
$$
\mathbf{E}[X] = \text{by ind. random vars} = \mathbf{E}[I_1 + I_2 + I_3 + I_4 + I_5] = \text{by linearity and symmetry} = 5 \times \mathbf{E}[I] = \text{by fund. bridge} = 5 \times P(Ace) = 5 \times \frac{4}{52}
$$

### Poisson distribution

Poisson random variables measure the number of times an event occurs within a certain range (time interval, distance, area, volume, etc); e.g. "number of rats per $m^2$", "number of cars passing every 10 minutes", "number of chips on a cookie".

The results of experiments could be _weakly dependent:_ that is, knowing that _A_ happened might not tell you much about _C_, but knowing that _A_ and _B_ happened might give you some info about _C_). And also, comparing to the Binomial, the probability of each event does not have to be fixed.

One way of thinking of Poisson random variables (the number of events that occur), is that there is a large number of events which can lead to a success or failure, but which have a very low probability. For example, we can subdivide an area into very small pieces (making it a very large number), an each one will have a low probability of something happening in that exact piece (Ref: https://www.youtube.com/watch?v=TD1N4hxqMzY).
From this last point, is where the Poisson can be related to the Binomial distribution (where $n \rightarrow \infty$ and $p \rightarrow 0$).

$$
P(X=x; \lambda t) = \frac{e^{-\lambda t}(\lambda t)^x}{x!}
$$

#### Mean
The mean of the Poisson is $\lambda t$.

#### Variance

This is also $\lambda t$.

## Continuous probability distributions

### Gamma distribution

### Exponential distribution

### Beta distribution