# Probability

* world is full of uncertainties that we want to measure

* probability theoretical study of measuring certainty that an event will happen

## Understanding Probability

* probability is how strong we believe an event will happen

  * usually referred in percent or 0.0 to 1.0

* P(X) = 0.7

  * notation for probability

  * X - expected event

* **Probability vs. Likelihood**

  * `Probability` is quantifying predictions of events yet to happen (future)

    * sum of probabilities of all mutually exclusive event must be 1
  
  * `Likelihood` is measuring the frequency of events already happened (past)

    * sum of events totalling to 1 is not applicable for likelihood

* complement

  * probability of an event must be between 0 and 1

  * this means we can calculate the probability of event not happening as

    * P(not A) = 1 - P(A)

      * this is referred to as complement rule

* odds

  * probability can be expressed as odds

    * 7/3, 7:3, 2.333

    * to convert odds to probability

      * $ P(X) = \frac{O(X)}{1 + O(X)} $

    * probability to odds

      * $ O(X) = \frac{P(X)}{1 - P(X)} $

  * odds are used in gambling/betting context as it it intuitive to understand

    * for example odds of 2:1 means, the event is 2 times more likely to happen than not

      * if expressed in percentage it will be 66.66 but 2:1 is more easy to understand

### Probability vs Statistics

* `probability` is purely theoretical of how likely an event is likely to happen, no data is required

* on the other hand, `statistics` uses data to discover probability and provides many tools to describe data

## Probability Math
* single probability of an event is known as `marginal probability` - P(X)

### Joint Probabilities

* probability that two or more separate events occurring together (simultaneously)

* think of AND operator

* joint probability is product of probabilities of individual events

  * $P(A \space AND \space B) = P(A) \times P(B)$

  * this referred to as `product rule`

### Union Probabilities

* probability of getting one of many preferred events

* think of OR operator

* mutually exclusive events

  * mutually exclusive events are events that cannot occur simultaneously

    * for example rolling 1 and 6 in a single die simultaneously - cannot happen

    * union probability of mutex events is sum of individual event probabilities

      * $ P(A \space OR \space B) = P(A) + P(B) $

* non mutually exclusive events

  * events that can occur simultaneously

    * example, getting 1 in a die or Head in a toss 

  * `sum rule of probability`

    * $ P(A \space OR \space B) = P(A) + P (B) - P(A \space AND \space B) $

      * we subtract $ P(A \space AND \space B) $ to remove double counted events

    * this rule applies to mutually exclusive events too, it is just that the $ P(A \space AND \space B) $ will be zero

### Conditional Probability and Bayes' Theorem

* conditional probability is probability of A occurring given B has occurred

  * $ P(A \space GIVEN \space B) \space or \space P(A|B) $

* people get confused with this because they conflate P(A|B) and P(B|A) into something being equal but this is WRONG

* example

  * $ P(COFFEE | CANCER) = 0.85 $

    * probability of people drink coffee given they have cancer is 85%

    * this does not mean there is 85% chance of getting cancer if you drink coffee
    
      * $ P(CANCER | COFFEE) \not = 0.85 $

    * $ P(COFFEE) = 0.65 \space and \space P(CANCER) = 0.05 $

      * we need to take rarity of the cancer into account not just the normality of drinking coffee

    * based on the normality of coffee drinking and rarity of cancer, probability of getting cancer given drinking coffee is very low 

      * this can be calculated by Bayes's theorem

      * $ P(A|B) =\frac{P(B|A)P(A)}{P(B)} $

      * memory trick - think of $P(A|B) \space as \space P(A)/P(B)$ then we just need to multiply numerator by $P(B|A)$

      * so $ P(CANCER | COFFEE) = \frac{0.85 \times 0.005}{0.65} = 0.0065 $

### Joint and Union Conditional Probabilities

* example, consider we want to find the probability of coffee drinks AND cancer patients

  * we can apply simple product rule $ P(COFFEE) \times P(CANCER)$

  * but if we already know $P(COFFEE|CANCER), then it makes sense to use $P(COFFEE|CANCER)$ instead of $ P(COFFEE) $

    * this is because we are already talking about cancer patients

  * this means, our product rule becomes

    * $ P(A \space AND \space B) = P(A|B) \times P(B) $ and this is equal to

    * $ P(B \space AND \space A) = P(B|A) \times P(A) $

  * the way to reason about this is that if A and B are unrelated then $P(A|B) = P(A)$

  * conditional probability sum rule would be

    * $ P(A \space OR \space B) = P(A) + P(B) - P(A|B) \times P(B) $







## Binomial Distribution

* measures how likely `k` success can happen from `n` trials given `p` probability of `n` being success

* for example to determine 80% of success of 10 trial with underlying success probability of 90%, we can use binomial distribution 

    * p = 90%, n = 10 

      * then we have around 26% chance of success (k) less than or equal to 8
      * 38% chance for it being 9 
      * 34% chance for it being 10

## Beta Distribution

* allows us to see the likelihood of different underlying probabilities for an event to occur given alpha success and beta failures

* this is answering different question than Binomial distribution. In Binomial distribution, `p` is fixed an we are finding likelihood of `k` success from `n` trial

* but here we don't know the probability, we have alpha success and beta failures, we are trying to find likelihood of different underlying probability p that would give us this alpha success and beta failure

* example
  * if we want to find probability 8/10 success would yield 90% or higher success rate, then we need to find the area between 0.9 and 1 in the beta distribution function

  ![Image Beta Distribution](img/02.probability-1204064215.png)


## Exercise


In [15]:
from scipy.stats import binom

# 1. 30% chance of rain and 40% chance of umbrella arriving on time, so joint probability is
print(f"Exercise 1: {0.3 * 0.4 * 100}%")

# 2. 30% chance of rain, so 70% chance of not raining, 40% chance of umbrella arriving, so probability of not raining OR umbrella arriving is
print(f"Exercise 2: {((0.7 + 0.4) - (0.7 * 0.4)) * 100}%")

# 3. P(R) = 0.3, P(U) = 0.4, P(U|R) = 0.2, ans = P(R) X P(U|R)
print(f"Exercise 3: {0.3 * 0.2 * 100}%")

# 4. total passengers (n) = 137, p = 0.6, k = (137 - 50) = 87
print(f"Exercise 4: {binom.pmf(87, 137, 0.6) * 100:.2f}%") # wrong answer


Exercise 1: 12.0%
Exercise 2: 82.0%
Exercise 3: 6.0%
Exercise 4: 4.96%
