# Probability


Probability is a value between 0 and 1 that a certain event will occur.

- The probability of a coin flip is .5 (1 / 2)

Writing probability as a percentage 
- multiply by 100.
e.g.
0.5 * 100 == 50% probability.

$P(E_{heads}) = .5 $


Or, for multiple coins:

The probability of getting (k) heads when flipping (n) coins:
$
\begin{equation*}
P(E)   = {n \choose k} p^k (1-p)^{ n-k}
\end{equation*}
$


**Trials**

- The act of flipping a coin or rolling a dice is called a *trial*
- A coin flip is an *independant event*. Each flip has no impact on the next flip.
- Trials have no memory. If a fair coin comes up 5 heads in a row has no impact on the next toss.

**Experiments and Sample Space**
- Each trial of flipping a coin can be called an *experiment*
- Each mutually exclusive outcome is called a *simple event*
- The *Sample Space* is the sum of every possible *simple event*

In the case of rolling a 6 sided die, one roll is a Simple Event.

The Simple Events can be labeled 'Event 1, Event 2,..N:  
$E_{1} = 1, E_{2} = 2,  E_{3} = 3, E_{4} = 4, E_{5} = 5, E_{6} = 6 $


The total *Sample Space*  
S = {$E_{1}, E_{2}, E_{3}, E_{4}, E_{5}, E_{6}$}

The probability that a fair die will roll a six is:
- the simple event $E_{6} = 6$ (one event)
- the total Sample Space - S = {$E_{1}, E_{2}, E_{3}, E_{4}, E_{5}, E_{6}$} (6 possible outcomes)
- So the Probability is P(roll 6) = 1/6

1 / 6 = 0.16  
.16 * 100 = 16% - so 16% chance.

$P(E_{any side}) = .16 $

Exercise:  
- A company makes a total of 50 Trumpet Valves and the rate of valve failure is 1 in 50.
- If each trumpet has 3 valves, what is the probability that a trumpet has a defective valve?

1. First: Calculate the probability of having a defective valve

$P(E_{defectiveValve}$) = 1/50 = .02 = 2% 

2. Then calculate the probability of having a defective Trumpet?  
Since there are 3 valves per trumpet (possibilities for the event to occur), multiply the rate of valve failure by number of possibility for failure to occur:

$P(E_{defective trumpet}) = 3 x P(E_{defectiveValve})$ = .06, or %6.

3 * .02 == .06 or 6%

There is a %6 probability that a trumpet has a defective valve.

## Permutations

Permutations can be *With Repetition* or *Without Repetition* and use different formulas.


What is a permutation?
- a *permutation* of a set of objects is an *arrangement of the objects in some set or order*

In the set of objects (a, b, c)
- how many ways can I arrange the objects in different orders?
- use *every possible permutation of letters*
 
 a,b,c
 a,c,b
 b,c,a
 b,a,c
 c,a,b
 c,b,a
 
 For a simple example like (a,b,c) you can calculate the number of possible permutations with n! ('n factorial')
 - Take the total number of items and set to *n*
 - 3! = 3 * 2 * 1 = *6 permutations*
 
 **subsets**
 You can also take a *subset of items* in the list of *n* for permutations
 
 *Without Repetition*
 
 The number of permutations of a set of *n* objects taken *r* at a time (permutations without repetition) is:
 
 ${}_{n}P_{r} = \frac{n!}{(n - r)!} $
 
 - If you were trying to compare against the entire alphabet in groups of 3 letters: *n* = 26, *r* = 3
 
 Examples:
 - Website requires a 4 character password
 - Characters can be lowercase letters or digits 0-9
 - You may not repeat a letter or number
 - How many different passwords can there be?
 
26 letters + 10 numbers = 36. *n* = 36

4 non-repeating characters = 4. *r* = 4


${}_{36}P_{4} = \frac{36!}{(36 - 4)!}$

Handling factorials in both numerator and denominator:

$\frac{36 x 35 x 34 x 33 x 32 x 31 x...}{(32 x 31 x...)}$

Since both sets contain 32 and below, you can strike them:

$\require{enclose} \frac{36 x 35 x 34 x 33 x \enclose{horizontalstrike}{32 x 31 x...}}{(\enclose{horizontalstrike}{32 x 31 x...})}$

This is equal to saying (36 x 35 x 34 x 33) = 1,413,720 permutations

In [22]: 36 * 35 * 34 * 33  
Out[22]: 1413720

*With Repetition*

The number of arrangements of *n* objects taken *r* at a time *with repetition* is simpler: $n^{r}$

26 letters + 10 numbers = 36. *n* = 36, 4 non-repeating characters = 4. *r* = 4

$36^{4}$ 

In [23]: 36 ** 4  
Out[23]: 1679616

1,679,616 permutations

Another Example:

How many License Plates can be made with 4 digits using 0-9 *with repetition*?
 
$10^{4}$

In [24]:  10**4  
Out[24]: 10000

10,000 permutations


Wrap up:  
- Total permutations of set *N*: $n!$
- Permutations taken *r* at a time for given set *N* with No Repetition: ${}_{n}P_{r} = \frac{n!}{(n - r)!} $
- Permutations taken *r* at a time for given set *N* With Repetition: $n^{r}$


#### in Python

**permutations** 

```
from itertools import permutations 
  
#Get all permutations of [1, 2, 3]  
perm = permutations([1, 2, 3])  
  
# Print the obtained permutations   
for i in list(perm):  
    print(i)  
```  

**permutations with length**  
```
from itertools import permutations   
  
# Get all permutations of length 2  
perm = permutations([1, 2, 3], 2)  
  
# Print the obtained permutations  
for i in list(perm):  
    print(i) 
```

In [6]:
%%html
<style>
  table {margin-left: 0 !important;}
</style>

## Combinations

*Unordered arrangements of objects* are called *Combinations*  
- A group of people selected for a team are in the same group, regardless of order
- A pizza that's half-pepperoni/half-sausage is the same as a half-sausage/half-pepperoni



*Combinations without repetition*

The number of combinations of a set of *n* objects taken *r* at a time has the formula 

${}_{n}C_{r} = \frac{n!}{r!(n - r)!}$


Get all 3 letter permutations of 'A, B, C, D, E':

From the previous formula ${}_{5}P_{3} = \frac{5!}{(5 - 3)!} = \frac{120} {2} = 60$

ABC, ACB, BAC, BCA, CAB, CBA <-- notice that the row contains the same letters  
ABD, ADB, BAD, BDA, DAB, DBA  
ABE, AEB, BAE, BEA, EAB, EBA  
ACD, ADC, CAD, CDA, DAC, DCA  
ACE, AEC, CAE, CEA, EAC, ECA  
ADE, AED, DAE, DEA, EAD, EDA  
BCD, BDC, CBD, CDB, DBC, DCB  
BCE, BEC, CBE, CEB, EBC, ECB  
BDE, BED, DBE, DEB, EBD, EDB  
CDE, CED, DCE, DEC, ECD, EDC  


Since it doesn't matter what order they're in, they can be compressed to the first row:  
ABC  
ABD  
ABE  
ACD  
ACE  
ADE  
BCD  
BCE  
BDE  
CDE

How many 3 letter combinations can come from the set of letters (A, B, C, D, E)?

- ${}_{5}C_{3} = \frac{5!}{3!(5 - 3)!} = \frac{5!}{3! * 2!} = \frac{120}{12} = 10$


Example 1:  
- for a study you choose 4 people at random from a group of 10 people
- how many ways can this be done?

${}_{10}C_{4} = \frac{10!}{4!(10 - 4)!} = \frac{10!}{4! * 6!} = \frac{5040}{24} = 210$


Example 1 restated:  
- 4 toppings are chosen from a total of 10 toppings
- how many combinations of pizza (each topping once)?
- 210


*Combinations with Repetition* 

${}_{n+r-1}{C}_{r}  = \frac{(n + r)!}{r!(n+r)!}$ 

Pizza Example 2 (repetition):  
- 4 toppings are chosen from a total of 10 toppings
- ex: Use 3x pepperoni and tomatoes x1 

${}_{n+r-1}{C}_{r}  = \frac{(n + r)!}{r!(n+r)!} = \frac{13!}{4!(9!)} = 715$


**Table of types with formulas**  

|Order Matters? |Repetition| Formula |In Excel|
|---|:---|:---|:---|
|Yes (permutation)|No|  ${}_{n}P_{r} = \frac{n!}{(n - r)!} $| =PERMUT(n,r)|
|No (combination)| No| ${}_{n}C_{r} = \frac{n!}{r!(n - r)!}$| =COMBIN(n,r)|
|Yes (permutation)| Yes| $n^{r}$| =PERMUTATION(n,r)|
|Yes (combination)| Yes| ${}_{n+r-1}{C}_{r}  = \frac{(n + r)!}{r!(n+r)!}$| =COMBINA(n,r)|


#### In Python:

**permutations** 

```
from itertools import permutations 
  
#Get all permutations of [1, 2, 3]  
perm = permutations([1, 2, 3])  
  
# Print the obtained permutations   
for i in list(perm):  
    print(i)  
```  

**permutations with length**  
```
from itertools import permutations   
  
# Get all permutations of length 2  
perm = permutations([1, 2, 3], 2)  
  
# Print the obtained permutations  
for i in list(perm):  
    print(i) 
```

**Combinations without repetition**  
```
from itertools import combinations  
  
# Get all combinations of [1, 2, 3]  
# and length 2  
comb = combinations([1, 2, 3], 2)  
  
# Print the obtained combinations  
for i in list(comb):  
    print(i)  
```

**Combinations with duplicates in input**  

```
from itertools import combinations  
  
# Get all combinations of [1, 1, 3]   
# and length 2  
comb = combinations([1, 1, 3], 2)  
  
# Print the obtained combinations  
for i in list(comb):  
    print(i)  
```

**Combinations with replacement**   

```
from itertools import combinations_with_replacement  
  
# Get all combinations of [1, 2, 3] and length 2  
comb = combinations_with_replacement([1, 2, 3], 2)  
  
# Print the obtained combinations  
for i in list(comb):  
    print(i) 
```

## Intersections, Unions and Complements


Helps to understand how probabilities can interact with each other. Foundational notions for future lectures in  *dependant events* and *conditional probability*


**Intersection** describes the Sample Space where *two events both occur*

Example:

Consider a box of patterned, colored balls. Different colors with different patterns.

<img src="img/colored_and_patterned_balls.png" width="400" height="200">

3 of the balls are both Red and Striped.

<img src="img/balls_venn_diagram.png" width="400" height="200">

What are the odds of choosing a Red, Striped Ball from a blind selection?

- Assign *A* as the event of Red Balls
- Assign *B* as the event of Striped Balls

then the intersection of A and B is shown as: $A \cap B$.
- order doesn't matter. *intersection* of $A \cap B$ is the same as $B \cap A$

As read, $P(A \cap B)$ is "the probability of A and B ocurring" or "the probability at the intersection of A and B"

For this example $P(A \cap B) = \frac{3}{15} = .02$ or a 2% chance.


**Union** describes the Sample Space where *A or B* occurs. $A \cup B$
- order doesn't matter. *union* of $A \cup B$ is the same as $B \cup A$

The probability is described as:
$P(A \cup B) = P(A) + P(B) - P(A \cap B)$ 


For this example, if the Probability of A was getting a Red Ball and B was getting a Striped Ball:

$P(A \cup B) = \frac{9}{15} + \frac{9}{15} - \frac{3}{15}= \frac{15}{15} = 1.0 $ or 100% chance.

You get a Red or Striped ball every time. (checked balls are only Red)


**Complements** of an event choose everything *outside the event*. The symbol is $\bar{A}$

- Complement Rule states that the sum of the probabilities of an event and its complement must equal

The Probability of "Not A" (not picking a Red Ball) is "1 minus the probability of A occurring":  
$P(\bar{A}) = 1 - P(A) = \frac{15}{15} - \frac{9}{15} = \frac{6}{15} = .40$ or a 40% chance.




In [7]:
%%html
<style>
  table {margin-left: 0 !important;}
</style>

## Independant and Dependant Events


*Replacement*:  
- With Replacement: the events are Independent (the chances don't change)
- Without Replacement: the events are Dependent (the chances change)


**Independant** Series of Events occur when the outcome of one event has *no effect( on the outcome on another event
- flipping a fair coin twice 
- two flips of a fair coin are: $P(H_{1}H_{2}) = P(H_{1}) * P(H_{2})$

$P(H_{1}) * P(H_{2}) = \frac{1}{2} x \frac{1}{2} = \frac{1}{4}$

This can be described with an outcome table for two tosses. 

There are 4 possibilities, so the probability of the two tests having that outcome is 1 out of 4. 

|1st toss|2nd Toss|
|---|---|
|H|H|
|H|T|
|T|H|
|T|T|

**Dependant** events occur when the outcome of the first event *does* affect the probability of a second event.

(Will be covered in more detail in the section on *Conditional Probability*)

- the common example is to draw colored marbles from a bag *without replacement* 
- once a marble is pulled from a bag it will not be replaced.
- A bag has 2 blue marbles and 3 red marbles: what is the probability that they are both red?

<img src="img/probability-marbles-tree2.svg" width="300" height="200">

- the color of the first marble changes the probability of drawing the second marble

The first draw is easy:
- $P(R_{1}) = \frac{3}{5}$
- Since this changes the marbles in the bag, the probability of drawing a *second* red marble is written as: 
- "The probability of a second red marble *given* that the first marble was red" $P(R_{2}|R_{1})$
- $P(R_{2}$ given (  **|** the existance of) $R_{1})$

In this example $P(R_{2}|R_{1}) = \frac{2}{4}$ since there are 2 reds and 2 blues left

So the probability of 2 red marbles can be written as:

$P(R_{1} \cap R_{2}) = P(R_{1}) * P(R_{2}|R_{1})$

or

$\frac{3}{5} * \frac{2}{4} = \frac{6}{20} = .03$ or a 3% chance of picking a second red marble





## Conditional Probability

The idea that want to know the probability of event A given that event B has occurred is *Conditional Probability*

Written as $P(A|B)$

Going back to the probability of getting red marbles, which are dependant events 

$P(R_{1} \cap R_{2}) = P(R_{1}) * P(R_{2}|R_{1})$

- $P(R_{2}|R_{1}$ is the *conditional*

Rearranging the formula $P(A|B) = \frac{P(A \cap B)}{P(B)}$

The Probability of A given B equals the Probability of A and B divided by the Probability of B. 


Exercise:  
A company finds out that out of every 100 projects:
- 48 are completed on-time
- 62 are completed under-budget
- 16 are completed both on-time and under-budget

Given that the project is completed on-time what is the probability that it was completed under-budget?

$\frac{P(on-time \cap underbudget)}{P(under-budget)}$

- P(A) = probability completed under-budget = $62/100 = .62$ or 62%
- P(B) = probability completed on-time = $48/100 = .48$ or 48%  
- $P(A \cap B) =$ are completed both on-time and under-budget = $16/100$ = .16 or 16%

On-time and under-budget?

- $P(A|B) = \frac{P(A \cap B)} {P(B)} =  \frac{16} {48} = .33$ or 33% 


## Some addition and Multiplication Rules

**Addition Rule** $P(A \cup B) = P(A) + P(B) - P(A \cap B)$

What if we took the Projects Example and asked: What is the Probability of a project completing on-time OR under budget?

$\frac{48}{100}+ \frac{62}{100} - \frac{16}{100} = .48 + .62 - .16 = .94$ or 94%

**Addition Rule for Mutually Exclusive Events** 
- When two events cannot *both* happen they are *mutually exclusive*

Example:  
Health Plan Enrollments
- 32 employees have chosen Plan A
- 62 employees have chosen Plan B
- They cannot choose both so they are *Mutually Exclusive*
- Since they can't overlap, just strike out the part that includes both $\require{enclose} P(A \cup B) = P(A) + P(B) \enclose{horizontalstrike}{- P(A \cap B)}$
- So the rule now becomes $P(A \cup B) = P(A) + P(B)$


**Multiplication Rule** The probability of A and B occurring is $P(A \cap B) = P(A) * P(B|A)$ 

Example:
- in a standard deck of 52 cards what is the probability of drawing 4 aces?

$P(A \cap B \cap C \cap D) = P(A) * P(B|A) * P(C|AB) * P(D|ABC)$


$ = \frac{4}{52} * \frac{3}{51} * \frac{2}{50} * \frac{1}{49} = \frac{24}{6497400} = \frac{1}{270725} $

Capturing this in python and getting the fractional:  
In [90]: 24/6497400  
Out[90]: 3.6937852063902484e-06

Capture as a variable called 'out'  
In [91]: out = 24/6497400

print out the decimal with f strings:  
In [109]: f'{out:.20f}'  
Out[109]: '0.00000369378520639025


Get Fraction from Decimal:  
from fractions import Fraction  
In [115]: Fraction(out).limit_denominator()  
Out[115]: Fraction(1, 270725)

or $\frac{1}{270725}$


### True Positives and False Negatives

You have a test for a disease. 

*Tests correctly working*
- Disease is present and the test indicates the disease is present **True Positive**
- Disease is absent and the test indicates the disease is absent **True Negative**

*Tests failing*
- Disease is present and the test indicates the disease is absent **False Negative**
- Disease is absent and the test indicates the disease is present **False Positive**

|Actual||Predicted|
|---|---|---|
||*Negative*|*Positive*|
|*Negative*|True Negative|False Positive|
|*Positive*|False Negative|True Positive|


<img src="img/true_positive_false_negative.png" width="300" height="200">

## Bayes Theorem 

**Bayes' Theorem**  describes the probability of an event, based on conditions that might be related to the event. Bayes Theorem allows us to use previously known information to asess likelihood of another related event.

Bayes’s theorem is named after Reverend Thomas Bayes (1701?–1761 - an English statistician, philosopher and Presbyterian minister), who first used conditional probability to provide an algorithm (his Proposition 9) that uses evidence to calculate limits on an unknown parameter, published as An Essay towards solving a Problem in the Doctrine of Chances (1763).


in *Conditional Probability* we see that $P(A|B) = \frac{P(A \cap B)}{P(B)}$ provided that $P(B) > 0$

So $P(B | A) = \frac{P(B\cap A}{P(A)} = \frac{P(A\cap B)}{P(A)}$ = provided that P(A) > 0

These are two different ways to write out the probability of B, given A occuring. 

Connecting those two conditional probability formulas gets Bayes Theorem $ P(A|B) = \frac{P(B|A)P(A)}{P(B)} $

- The probability that A, given B has occurred, is equal to the probability of B, given that A has occurred, multiplied by the probability of A, divided by the probability of B, if B is > 0. 
- This joins together both "B if A has occurred" and "A if B has occurred"
- Bayes Theory is used to determine the probability of a *Parameter* given a certain event.

Usually Bayes Theorem is displayed in one of two ways:

$ P(A|B) = \frac{P(B|A)P(A)}{P(B)} $ or $ P(A|B) = \frac{P(B|A)P(A)}{P(B|A)P(A) + P(B|not A)P(not A)}$ given that $P(B) > 0$


 

**Exercise 1:**  
- A manufacturing company finds that 1 out of every 500 products are defective (or .002) or %2
- The company buys a diagnostic tool that correctly identifies a defective part 99% of the time (not %100)
- If the part is diagnosed as defective, what's the probability that it's really defective (true positives/false negatives)



The Questions:

If a part is diagnosed as defective what is the probability that it really is defective?

- $ P(A)$ The probability of being defective
- $ P(B)$ The probability of testing as defective
- $ P(A|B)$ The probability of being defective if the tool indicated a defect - True Positive
- $ P(B|A)$ The probability of the tool indicating a defect if the product is defective

True Positive - $P(A|B)$
True Negative - 
False Positive - $P(B|- A)$
False Negative -

- $ P(A|B)$ - ? we're solving for this
- $ P(B|A)$ The accuracy rate of the diagnostic tool (.99) or 99%
- $ P(A)$  The correct failure rate (.002) (1 / 500) or 2%
- $ P(B)$ - ? we need to calculate this

 

1. **First calculate $P(B)$**
- P(B) - Probability of testing positive  = (P(true positive) + P(false positive))

Calculate this with:  
**P(B) = $P(B|A) * P(A) + P(B|- A) * P(-A)$**
- The probability of a True Positive ($P(B)$) is the probability of Testing Defective given that it was actually defective ($P(A|B)$) 
- Multiplied by the probability of an actual defect ($P(A)$) 
- Add the probability that it tested defective but was not defective (False Positive) ($P(B|- A)$)
- Multiplied by the probability that it was not defective ($P(A)$)


- $ P(B|A)$ The accuracy rate of the diagnostic tool (.99) or 99%
- $ P(A)$  The correct failure rate (.002) (1 / 500) or 2%

Use the Complement Rule (the Probability of "Not A" is "1 minus the probability of A occurring")

*Calculate $P(B|-A)$:*
- $P(B|-A)  = 1 - P(B|A) = 1 - .99 =$ **.01**

*Calculate $P(-A)$*:
- $P(-A) = 1 - P(A) = 1 - .002 =$ **.998** - The probability of getting a false positive.

*Calculating $P(B)$*:
- $P(B) = $P(B|A) * P(A) + P(B|- A) * P(-A)$ = .99 * .002 + .01 * .998  = **.01196**

- $ P(A|B)$ = ??
- $ P(B|A)$ = .99
- $ P(A)$ = .002
- $ P(B)$ = .01196
- $ P(-A)$ = .998
- $ P(B|-A)$ = .01 

*So, plugging in the numbers to calculate $P(A|B)$*:

numerator:  
In [3]: .99*.002  
Out[3]: 0.00198  
denominator:  
In [2]: .99 * .002 + .01 * .998  
Out[2]: 0.011960000000000002

The expanded formula:
- $P(A|B) = \frac{P(B|A) * P(A)} {P(B|A) * P(A) + P(B|-A) * P(-A)}$

with the real numbers:

- $P(A|B) = \frac{.99 * .002} {.99 * .002 + .01 * .998} = \frac{0.00198}{0.01196} = .165$ or **%16.5** 

In [118]: 0.00198 / 0.011960000000000002  
Out[118]: 0.16555183946488292


So a positive test only has a **%16.5 chance of correctly identifying a defective part** 
- $ P(A|B)$ = .165 or **%16.5 True Positive Rate**
- $ P(B|A)$ = .99 Probability of the test producing a True Positive (actual defect)
- $ P(A)$ = .002  Probability of Being Defective
- $ P(B)$ = .01196 Probability of Testing Defective


**Exercise 2:** What if a second test on the same part comes up that also returns positive (shows a defect)?

- Fill in the details from the first run through.
- Because it's already gone through the diagnostic test the probability goes up.

- $ P(A|B)$ = ??
- $ P(B|A)$ = .99
- $ P(A)$ = Changes from .002 to .165
- $ P(B)$ = .01196
- $ P(-A)$ = Changes from .998 to .835
- $ P(B|-A)$ = .01 

So:
- $P(A|B) = \frac{P(B|A) * P(A)} {P(B|A) * P(A) + P(B|-A) * P(-A)}$

with the real numbers:

numerator:  
In [4]: .99*.165  
Out[4]: 0.16335

denominator:  
In [6]: .99 * .165 + .01 * .835  
Out[6]: 0.1717

total:  
In [7]: 0.16335 / 0.1717  
Out[7]: 0.9513686662783926

- $\require{enclose} P(A|B) = \frac{.99 *  \enclose{horizontalstrike}{.002} .165} {.99 * \enclose{horizontalstrike}{.002} .165 + .01 * \enclose{horizontalstrike}{.998} .835} = \frac{0.16335}{0.1717} = 0.9513$ or **%95**

So the probability gets much higher and closer to $ P(B|A)$ - the diagnostic tool's error rate
- $P(A|B) = \frac{.99 * .165} {.99 * .165 + .01 * .835} = .951$ or **95.1% probability that the part is defective**



**Exercise 2: Try it a third time**
- $ P(A|B)$ = ??
- $ P(B|A)$ = .99
- $ P(A)$ = Changes from .165 to .951
- $ P(B)$ = .01196
- $ P(-A)$ = Changes from .998 to 0.049
- $ P(B|-A)$ = .01 

numerator: 
In [8]: .99*.951
Out[8]: 0.9414899999999999

$ P(-A)$  
In [9]: 1 - .951
Out[9]: 0.049000000000000044

denominator:
In [10]: .99 * .951 + .01 * .049
Out[10]: 0.9419799999999999

Outcome:
In [11]: 0.9414899999999999 / 0.9419799999999999
Out[11]: 0.9994798191044396

- $ P(A|B)$ = **%99.94** chance that the the part is actually showing a defect



## Other Bayesian notes:

- base rate neglect
- remember your priors when analyzing
- 

**base rate fallacy**
The base rate fallacy, also called base rate neglect or base rate bias, is a fallacy. If presented with related base rate information (i.e. generic, general information) and specific information (information pertaining only to a certain case), the mind tends to ignore the former and focus on the latter. Base rate neglect is a specific form of the more general extension neglect. 

Base-rate neglect refers to the phenomenon whereby people ignore or undervalue that probability, typically in lieu of less informative, but more intuitively appealing information about an individual case (Kahneman & Tversky, 1973).

also known as: neglecting base rates, base rate neglect, prosecutor's fallacy 

**prior probability distribution**
In Bayesian statistical inference, a prior probability distribution, often simply called the prior, of an uncertain quantity is the probability distribution that would express one's beliefs about this quantity before some evidence is taken into account. For example, the prior could be the probability distribution representing the relative proportions of voters who will vote for a particular politician in a future election. The unknown quantity may be a parameter of the model or a latent variable rather than an observable variable. 

Bayes' theorem calculates the renormalized pointwise product of the prior and the likelihood function, to produce the posterior probability distribution, which is the conditional distribution of the uncertain quantity given the data.

Similarly, the prior probability of a random event or an uncertain proposition is the unconditional probability that is assigned before any relevant evidence is taken into account. 



### Bayesian Odds

Bayes' Rule can be expressed in terms of odds: 
Posterior Odds = Prior Odds * Likelihood Ratio


Odds, a:b, and probability

Odds are commonly written as the ratio of two numbers separated by a colon. For example, if P(A) = 2/3, the odds would be 2, but this would most likely be written as 2:1.

The relation between odds, a:b, and probability, p is as follows: 

$ a:b=p:(1-p)$

$p=\frac{a}{a+b} $

Suppose you have a box that has a 5% chance of containing a diamond. You also have a diamond detector that beeps half of the time if there is a diamond, and one fourth of the time if there is not. You wave the diamond detector over the box and it beeps.

- The prior odds of the box containing a diamond are 1:19. (95% of 20 = 19 with 1 chance that it's empty)
- The likelihood ratio of a beep is 1/2:1/4 = 2:1. 
- The posterior odds are 1:19 * 2:1 = 2:19. 
- This corresponds to about a probability of 2/21, which is about 0.095 or 9.5%. 

||Contains Diamond|Doesn't contain diamond|
|---|---|---|
|Prior Odds Ratio|1 |19 | 
|Likelihood Ratio|2|1|
||__|__|
|Posterior Odds Ratio|2|21|


Bayesian Proportionality

 Bayesian Thinking allows us to keep account of priors and likelihood information to predict a posterior probability.

Imagine a management consultation firm hires only two types of employees: IT and business consultants. You meet an employee who is very shy, but don't know their job role. 

If your guess is IT using only shyness as an attribute, then you have fallen for an inherent cognitive bias: Base Rate Neglect. Base Rate Neglect occurs when we do not take into account the underlying proportion of a group in the population. To answer the question we need to find out the proportion of IT consultant to Business consultants. In this case for every 1 IT person the firm hires 10 business consultants for a ratio of 10 to 1.

Another assumption could be made about shyness as an attribute. It would be fair to assume shyness is more common in IT as compared to business consultants (geeks vs people persons). Let’s assume, 75% of IT professionals are in fact shy corresponding to about 15% of business consultants.

Use the proportion of employees in the firm as the prior odds. Then use the shyness as an attribute as the Likelihood. 

The figure below demonstrates when we take a product of the two, we get posterior odds.

<img src="bayesian_proportions.png" width="400" height="200">

 Bayesian Thinking allows us to keep account of priors and likelihood information to predict a *Posterior probability*.
 
 
||IT Consultant|Business Consultant|
|---|---|---|
|Prior Odds Ratio|1 |10 | 
|Likelihood Ratio|75|15|
||__|__|
|Posterior Odds Ratio|1|2|
 
 or 2 to 1.
 

#### Principles of Bayesian Thinking
**Rule 1 – Remember your priors!**

As we saw earlier how easy it is to fall for the base rate neglect trap. The underlying proportion in the population is often times neglected and we as human beings have a tendency to just focus on just the attribute. Think of priors as the underlying or the background knowledge which is essentially an additional bit of information in addition to the likelihood. A product of the priors together with likelihood determines the posterior odds/probability.

**Rule 2 – Question your existing belief**

This is somewhat tricky and counter-intuitive to grasp but question your priors. Present yourself with a hypothesis what if your priors were irrelevant or even wrong? How will that affect your posterior probability? Would the new posterior probability be any different than the existing one if your priors are irrelevant or even wrong?

**Rule 3 – Update incrementally**

We live in a dynamic world where evidence and attributes are constantly shifting. While it is okay to believe in well-tested priors and likelihoods in the present moment. However, always question does my priors & likelihood still hold true today? In other words, update your beliefs incrementally as new information or evidence surfaces. A good example of this would be the shifting sentiments of the financial markets. What holds true today, may not tomorrow? Hence, the priors and likelihoods must also be incrementally updated.


### Posterior probabilities of hypotheses and Bayes factors


**Prior Odds**  
$O[H_{1}:H_{2}] = \frac {P(H_{1})}{P(H_{2}})$

**Ratio of posterior probabilities and hypotheses**  

$PO[H_{1}:H_{2}] = \frac {P(H_{1}| data)}{P(H_{2}|data)}$
- $PO[H_{1}:H_{2}]$ The probability of $H_{1}$ given data divided by the probability of  $H_{2}$ given data


$PO[H_{1}:H_{2}] = \frac {P(H_{1}| data)}{P(H_{2}|data)}$

Posterior odds expanded:

$PO[H_{1}:H_{2}] = \frac {P(H_{1}| data)}{P(H_{2}|data)} = \frac{(P(data|H1) * P(H1)/P(data)}{P(data|H2 * P(H2)/P(data)}$

The probability of data in the numerator and denominator cancel out, leaving us with:

$\frac{(P(data|H1 * P(H1)}{P(data|H2 * P(H2)}$

- reorganize that as the ratio of the data (given H1) and the data (given H2) multiplied by the ratio of the prior probabilities based on this hypothesis.

$\frac{(P(data|H1)}{P(data|H2} * \frac{P(H1)}{* P(H2)}$

The first half is called The Bayes Factor:

$\frac{(P(data|H1)}{P(data|H2}$

and the second half is the Prior Odds:

$\frac{P(H1)}{* P(H2)}$

In other words  Posterior Odds is the product of the Bayes Factor and the Prior Odds.

- *Bayes Factor* quantifies the evidence of data arising from  Hypothesis-1 vs Hypothesis-2
- in a *discrete case* this ist just the ratio of the likelihoods of the observed data under the two hypotheses.
- in a *continuous case* it's the ratio of the *marginal likelihoods* $BF[H1:H2] = \frac{\int P(data | \theta, H_{1}), d\theta}{\int P(data | \theta, H_{2}), d\theta} $

HIV Testing With ELISA Example:
*Hypotheses*:  
$H_{1}$ - patient does not have HIV
$H_{2}$ - patient does have HIV

*Priors*:  
$P(H_{1})$ - 0.99852  
$P(H_{2})$ - 0.00148

In [12]: 0.99852 / 0.00148  
Out[12]: 674.6756756756756


*Posteriors*:  
$P(H_{1}|+)$ = .8788551  
$P(H_{1}|+)$ = .1211449

Posterior Odds:
In [13]: .8788551 / .1211449  
Out[13]: 7.254577782473715


Interpreting the Bayes Factor:  
Jefferys - 1961
- if they Bayes Factor is between 1 and 3, the evidence against H2 isn't worthwhile
- 3-20 the evidence is positive
- 20-150 the evidence is strong
- >150 - very strong



## Test

EXERCISE #2 - Probability
1. What is probability?
A precise likelihood that an event will occur

2. What is the probability that a flip of a fair coin will come up heads?
.50 or 1/2

3. If a fair coin comes up tails four times in a row, what is the probability that it will come up heads on the fifth toss?
1/2 or .50

4. In a pool of 24 workers, one is a corporate spy. The pool is divided into teams, with four workers per team. If you were to randomly pick one of these teams, what is the probability that the team you picked would contain a spy?
1 in 6 or 16%


PERMUTATIONS
5. Write out all the possible three-digit permutations of the numbers 7,8,9, then show mathematically how many permutations there should be.

(7, 8, 9)
(7, 9, 8)
(8, 7, 9)
(8, 9, 7)
(9, 7, 8)
(9, 8, 7)

 $n! = 6$


6. Next, write out all the possible two-digit permutations of the numbers 7,8,9, with repetition, then show mathematically how many permutations there should be.

(7, 7)
(7, 8)
(7, 9)
(8, 7)
(8, 8)
(8, 9)
(9, 7)
(9, 8)
(9, 9)

$n^{r} = 3^{2} = 9$



COMBINATIONS
7. Of the five projects you completed last year, only three will fit on your resume. How many different combinations of projects are there?

${}_{n}C_{r} = \frac{n!}{r!(n - r)!}$

${}_{5}C_{3} = \frac{5!}{3!(5 - 3)!} = \frac{120} / 12  = 10$
 

INTERSECTIONS, UNIONS & COMPLEMENTS
8. In the above Venn Diagram, if A represents triangles, and B represents hollow figures, express the Intersection (A and B), the Union (A or B) and the Complement of A (not A) mathematically.

<img src="img/intersections_venn_diagram.png" width="400" height="200">

- 4x black triangles in A
- 2x empty triangles in A/B
- 2x empty circles in B
- 2x empty squares in B

triangles = 6
hollow figures = 6

Intersection:
- what's the chance that hollow figures are triangles?
$P(A \cap B) = \frac{2}{10} = .02$ or a 2% chance.

Union: 
- A or B occurs (100%)
$P(A \cup B) = P(A) + P(B) - P(A \cap B)$
$P(A \cup B) = \frac{6}{10} + \frac{6}{10} - \frac{2}{10} = \frac{10}{10}$ = 1 $ or 100% chance.

Complement:
$P(\bar{A}) = 1 - P(A)$ - the probability of 'Not A'
$P(\bar{A}) = 1 - P(A) = \frac{10}{10} - \frac{6}{10} = \frac{4}{10} = .40$ or a 40% chance.

CONDITIONAL PROBABILITY
9. A company conducts a historical study of employees. Out of 250 hires, 180 are still with the company, 120 had been offered promotions, and 75 are both with the company and had been offered promotions. Given that an employee was offered a promotion, what is the probability that they are still with the company?

- 250 hires
- 180 are still with the company
- 120 had been offered promotions
- 75 are both with the company and had been offered promotions.

- P(A) = probability still with company = $180/250 = .72$ or 72%  
- P(B) = probability offered promotion = $120/250 = .48$ or 48%
- $P(A \cap B) =$ still with company and offered promotion = .30 or 30%

So the probability that offering a promotion will entice retention?  
- $P(A|B) = \frac{P(A \cap B)} {P(B)} =  \frac{.30} {.48} = .625$


BAYES THEOREM
10. A spam filtering program uses Bayes Theorem to assign a probability that an incoming email that contains the word "special" is spam (an unwanted message) as opposed to ham (a valid message). The following assumptions are made:
```
- The initial probability 𝑃(𝑆) that an incoming message is spam is 0.50 (ie, it is equally likely to be spam as ham)
- During a learning phase where known spam and ham messages were tested for the occurrence of the word, it is found that 4% of spam messages contain the word, and 1% of ham messages have it. 
What is the probability that a message containing the word is spam? 
```

$P(S|W) = \frac{P(W|S) * P(S)} {P(W)} = \frac{P(W|S) * P(S)} {P(W|S)*P(S) + P(W) * P(H)} $
 
- 𝑃(𝑊|𝑆) = probability of containing the word if spam - .004 4%
- 𝑃(𝑆) = probability of being spam .50
- 𝑃(𝑊|𝐻) = probability of containing the word if ham - .001 1%
- 𝑃(𝐻) = probability of being ham - .50


- 𝑃(𝑊) = probability of containing the word - ???
- 𝑃(𝑆|𝑊) = probability of being spam if containing the word - 80%

$ P(A|B) = \frac{P(B|A)P(A)}{P(B)} $
$P(S|W) = \frac{P(W|S) * P(S)} {P(W)} = \frac{P(W|S) * P(S)} {P(W|S)*P(S) + P(W) * P(H)} $

$P(S|W) =\frac{.04 * .50} {.04 * .50 + .01 * .50} = \frac{.02} {.025} = 0.799 $ or an 80% chance

In [73]: num = (0.04 * 0.50)

In [75]: den = (.04 * .50) + (.01 * .50)

In [77]: den  
Out[77]: 0.025

In [79]: num  
Out[79]: 0.02

In [80]: den  
Out[80]: 0.025

In [81]: num / den  
Out[81]: 0.7999999999999999






