# Probability
---

## Import Python Libraries

In [2]:
# import Python libraries
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd
from scipy import stats

## Left Align Cell Contents

In [3]:
%%html
<style>
table {float:left}
</style>

---

## Simple Probability

Probability is the likelihood that an event will occur.  
In statistics, probability is used to measure how likely a statistical result is.

Probabibilty is written in terms of:

- A fraction such as 1/4
- A percentage such as 25%
- A value between 0 and 1 (inclusive) such as 0.25

A probability of 0 means an event can never occur or is impossible.  
A probability of 1 means an event is certain to occur or is guaranteed.  

Probability of an event is denoted as P(event) and is given by the formula:

The simple probability of an event is given by:

$ P(event) = \frac{\text{Number of favorable outcomes}}{\text{Total number of outcomes}} $

The total number of outcomes for an event is called the **sample space**.  

The simple probability formula only applies to cases where each possible outcome is **equally likely**.  






---

## Experimental and Theoretical Probability

The simple probability formula is the theoretical probablity of an event (with equally likely outcomes).  

However, probability can be estimated by running experiments.  
An experiment is an action that results in the event. For example flipping a coin.  
Each time a coin is flipped that is a single experiment.  
Totaling the results of each experiment gives the calculated experimental probability.  
After many experiments the resulting probability should converge on the theoretical probability.  
This fact about experimental probability is called the **law of large numbers**.

---

## Addition Rule

Sometimes we need to calculate the probability of 2 events occurring together in a single experiment.  

For 2 events, A and B, we have the **addition rule**:  

P(A or B) = P(A) + P(B) - P(A and B)

The P(A or B) is the sum of all occurrences of where just A happens, P(A), and all occurrences where just B happens, P(B).  
But because A and B might happen at the same time need to subtract from the sum of P(A) and P(B) the probability of A and B happening at the same time, P(A and B). 

However, if events A and B cannot occur at the same time, they are **mutually exclusive**, then P(A and B) is 0 and P(A or B) is just the sum of P(A) and P(B).

P(A or B) is called the **union** of A and B.  
P(A and B) is called the **intersection** of A and B.


---

## Multiplication Rule

**Independent events** are events that have no affect on each other, such as the outcomes of flipping a coin.  

To calculate the probability of multiple independent events (or a joint occurrence) we use the **muliplication rule**:  

P(A and B) = P(A) * P(B)  


---

## Conditional Probability

**Dependent events** are events that have an affect on each other, such as selecting a card from a deck without replacing it and then selecting another card. The first event, selecting a card, changes the probability of which card will be selected the second time because the total number of cards in the deck has been reduced by the first event.  

For dependent events, A and B, the multiplication rule becomes:  

P(A and B) = P(A) * P(B | A)

The probability of P(B | A) is the probability of B given that A has occurred and is called a **conditional probability**.  

If the conditional probability P(A | B) = P(A), then the events A and B are independent since the probability of A given B has occurred is the same as the probability of A, meaning B has no effect on A.

---

## Bayes' Theorem

Bayes' Theorem gives the probability of an event given prior knowledge of related events that occurred earlier.  


$$ P(A \mid B) = \frac{P(B \mid A) P(A)}{P(B)} $$

$ P(B \mid A) $, $ P(A) $, and $ P(B) $ are the prior probabilities.  
$ P(A \mid B) $ is the conditional probability that is updated by the prior probabilities.

---

## Sample Space

The sample space is the set of all possible outcomes from observing a phenomena or conducting an experiment.  
Flipping a coin or rolling a die is an example of an experiment.  

The sample space for flipping a coin is (H, T) where H is heads and T is tails.  
The sample space for rolling a die is (1, 2, 3, 4, 5, 6).

---

## Event

An event is the set of outcomes that the experimenter is trying to observe.  
So, for example, that a coin comes up heads first after two flips, or that a die comes up even after being rolled.  

The event space is a subset of the sample space.  

The sample space for flipping a coint twice is (HH, HT, TH, TT) and the event space for heads coming up first after two flips is (HH, HT).  
The sample space for rolling a die is (1, 2, 3, 4, 5, 6) and the event space for the die coming up even after one roll is (2, 4, 6).

---

## Random Variables

A random variable is a function that maps from the sample space of an experiment to the real numbers.  

Types:
- Discrete, can only take discrete values (that is integers)
- Continuous, can take any real value within a certain interval (such as [0, 1])



---

## Probability Distribution

For a discrete random variable the probability distribution is the probability for each possible value the random variable can take.  

For a continuous random variable there is a probability density curve the area under which sums to 1, which is the sum of all the probabilities for every possible value. The probability of any particular value is 0 since there are an infinite number of possible values.  However, the probability of a range of values can be calculated from the area under the probability density curve between the values that define the range.

---

## Expected Value

The expected value of a discrete random value is the mean of the random variable.  

This is sometimes referred to as the long term average as it is the value that one would expect to see after many experiments have been performed.  

To calculate the expected value weight each value in the probability distribution by the probability of that value and sum up the weighted values.  


$ E(X) = \sum_{i=1}^{n} x_i \cdot p(x_i) $


To caculate the variance of a discrete random variable, sum the squared difference of each value and the expected value multiplied by the probability of that value.

$ \text{Var}(X) = \sum_{i=1}^{n} (x_i - E(X))^2 \cdot p(x_i) $

The standard deviation of a discrete random variable is the square root of the variance.

---

## Combinations of Random Variables

Random variables can be combined either through summation or difference.  
The mean (expected value), variance and standard deviation of the combination can be calculated.  
However, if the variables are not independent of each other, then only the mean of the combination can be calculated.  


|          | Combination | Mean | Variance |
|----------|-------------|------|----------|
| Sum | $$ S = X + Y $$ | $$ \mu_S = \mu_X + \mu_Y $$ | $$ \sigma^2_S = \sigma^2_X + \sigma^2_Y $$ |
| Difference | $$ S = X - Y $$ | $$ \mu_S = \mu_X - \mu_Y $$ | $$ \sigma^2_S = \sigma^2_X - \sigma^2_Y $$ |



The standard deviation of a combination is the square root of the variance of the combination.  

Combinations or normally distributed random variables are also normally distributed.

---

## Permutations and Combinations

A permutation is the number of ways a set of things can be arrange where the order matters.  

The formula to calculate a permutation \($ P(n, k) $\) is given by:

$$ P(n, k) = \frac{n!}{(n-k)!} $$

where \($ n $\) is the total number of items, and \($ k $\) is the number of items to choose.


A combination is the number of ways a set of things can be arranged where the order doesn't matter.

The formula to calculate a combination \($ C(n, k) $\) is given by:

$$ C(n, k) = \binom{n}{k} = \frac{n!}{k!(n-k)!} $$

where \($ n $\) is the total number of items, and \($ k $\) is the number of items to choose.


A combination is sometimes written as a binomial coefficient.  

The binomial coefficient \($ \binom{n}{k} $\) is written as:

$$ \binom{n}{k} $$

This represents the number of ways to choose \($ k $\) items from \($ n $\) items without regard to the order of selection.


## Binomial Random Variables

Can only have 2 outcomes.
But the probability of the outcomes does not have to be equal.

A random variable X is a **binomial random variable** if:
- Each trial is independent
- Each trial can be labelled a success or failure
- There are a fixed number of trials
- The probability of success on for each trial is constant

A classic example of a binomial random variable is one that models the outcome of flipping a coin.

Mean: $ \mu_x = E(X) = np $

Variance: $ \sigma_X^2 = np(1 - p) $

Standard deviation: $ \sqrt{\sigma_X^2} = \sigma_X = \sqrt{np(1 - p)}$ 

---

## Binomial Probability

The binomial probability formula is given by:

$ P(X = k) = \binom{n}{k} p^k (1-p)^{n-k} $ 

where:
- \($ n $\) is the number of trials
- \($ k $\) is the number of successful trials
- \($ p $\) is the probability of success on a single trial
- $ \binom{n}{k} $ is the binomial coefficient, calculated as \($\frac{n!}{k!(n-k)!} $ \)


---

## Poisson Process

A **Poisson process** has the following characteristics:
- Each experiment counts the number of times an event occurs over some other measurement (time, distance, area, etc.)
- For each interval of the measurement (seconds, inches, square feet, etc.) the mean is the same
- The number of events in each interval is independent of the other intervals
- The intervals do not overlap
- The probability of the event occurring is proportional to the interval of measurement

The probability of observing ($ k $) events in a Poisson process with an average rate of ($ \lambda $) events per interval is given by:

$$ P(X = k) = \frac{{\lambda^k e^{-\lambda}}}{{k!}} $$

where:
- \($ P(X = k) $\) is the probability of observing \($ k $\) events,
- \($ \lambda $\) is the average rate of events,
- \($ k $\) is the number of events,
- \($ e $\) is the base of the natural logarithm (approximately equal to 2.71828),
- \($ k! $\) is the factorial of \($ k $\).


The Poisson distribution can be used to approximate the binomial distribution when the number, $ n $, of binomial trials is at least 20 and when the probabibility of success, $ p $, is at most 0.05.

The probability of k successes in n attempts given by the Poisson formula is:

$$ P(X = k) = \frac{{(np)^k e^{-np}}}{{k!}} $$

---

## Bernoulli Random Variables

A **Bernoulli random variable** is a binomial random variable where there is exactly 1 trial and success is defined as 1 and failure is defined as 0. Where $ p $ is the probability of success.

Mean: $ \mu = (1 - p)(0) + p(1) = p$ 

Variance: $ \sigma^2 = (1 - p)(0 - \mu)^2 + p(1 - \mu)^2 = p(1 -p) $

Standard deviation: $\sqrt{\sigma^2} = \sigma = \sqrt{p(1 - p)} $

---