## Introduction

* Key concept to Bayesian stats is Bayes' theorem
* This can be derived easily if conditional probability is understood
* Probability is value between 0 and 1 that represents degree of belief in a prediction.
* Conditional probability is a probability based on background information.
* P(A|B) is the probability that A occurs given that B has occured.
* Conjoint probability is the probability that two things are true, P(A and B)
* For independent events only, P(A and B) = P(A)P(B). Doesn't hold for events with dependencies.
* For independent events, chance that B occurs doesn't change whether or not A has occurred: P(B|A) = P(B) (in this case)
* In genral, probability of conjunction: P(A and B) = P(A|B)P(B) 

## Cookie problem

Two bowls of cookies. Bowl A has 30 plain and 10 chocolate, Bowl B has 20 of each.

Suppose you choose a bowl at random and take a cookie which is plain. What is the probability that you chose bowl A?

Conditional probability: P(Bowl A | plain)

It's not obvious how to compute. However, it would be easy to answer P(plain | bowl A) = 3/4

The two conditional probabilities are not the same but we can use bayes theorem to go from one to the other.

## Bayes' theorem

* Conjuction is commutative: P(A and B) = P(B and A)

* P(A and B) = P(A|B)P(B) and P(B and A) = P(B|A)P(A)

* P(A|B)P(B) = P(B|A)P(A)

* Which can be written: P(A|B) = P(B|A)P(A) / P(B) (Bayes!)

## Solving the cookie problem

* P(bowl A | plain) = P(plain | bowl A) P(bowl A) / P(plain)

* P(plain | bowl A) = 0.75

* P(bowl A) = 0.5

* P(plain) = (30 + 20) / (30 + 10 + 20 + 20) = 5/8

* So, P(bowl A | plain) = (0.75 * 0.5) / 0.625 = 0.6 = 3/5 

## The diachronic interpretation

* Diachronic means something is happening over time, we can use Bayes to see how our hypothesis changes over time given new data.

* We can rewrite the theorem for our hypothesis, H, and the data, D, as:

P(H|D) = P(H) P(D|H) / P(D)

* P(H) is the probability of the hypothesis before we see the data, or the *prior*
* P(H|D) is the probability of the hypothesis after we see the data, or the *posterior* (what we want to compute)
* P(D|H) is the probability of the data under the hypothesis, or the *likelihood*
* P(D) is the probability of the data under any hypothesis, or the *normalising constant*

*"Today's posterior is tomorrows prior"* - https://jimgrange.wordpress.com/2016/01/18/pesky-priors/

* Prior can sometimes be computed using background information, otherwise can be subjective (people might disagree).
* Likelihood is often easy to compute, based on population statistics.
* Normlising constant can be tricky...

* Set (or suite) of hypotheses must be:
    * Mutually exclusive: only one hypothesis (max) can be true
    * Collectively exhaustive: out of all options, one must be true
    
* P(D) can be computed using the law of total probability: if two (or more) exclusive ways something could happen, you can add up the probabilities like this:
P(D) = P(bowl 1) P(D|bowl 1) + p(bowl 2) p(D|bowl 2)

* This gives 1/2 * 3/4 + 1/2 * 1/2 = 5/8 (as we had before)
* Law of total probability: https://www.statology.org/law-of-total-probability/

## The M&M problem

* Old M&M's: Brown 0.3, Yellow 0.2, Red 0.2 Green 0.1, Orange 0.1 and Tan 0.1.
* New M&M's: Blue 0.24, Brown 0.13, Yellow 0.13, Red 0.13, Green 0.2 and Orange 0.16.

* Two bags, one is old and one is new. One M&M from each bag, yellow and green. What is probability that yellow M&M came from old bag?

* We can use the table method to solve this.
* First we state our hypotheses:
    * Hyp A: Bag 1 is old, bag 2 is new
    * Hyp B: Bag 1 is new, bag 2 is old

|      .          | Prior P(H)     | Likelihood P(D\|H)    |P(H)P(D\|H)    | Posterior P(H\|D)    |
| :------------- | :----------: | :----------: |:----------: | -----------: |
| A| 1/2   | (20)(20) |200   | 20/27   |
| B | 1/2 | (10)(14) |70 | 7/27 |

* Priors are set at 50% for each.
* Likelihood is written as percentage not probabilities, so will be out by factor of 10,000 but this accounted for by the normalising constant.
* P(H)P(D|H) is the sum of the prior and likelihood.
* Normalising constant is the sum of the P(H)P(D|H) column, in this case 200 + 70 = 270.

## Monty Hall problem

* The monty hall problem is a counter intuitive probability scenario.
* 3 closed doors. A car behind one door and less valuable prizes behind the other two. Need to guess which door has the car.
* You choose door A. Then door B or C will be opened to reveal a lesser prize. You have the choice to stick with door A or switch to the other remaining closed door. In this case, door B is opened.
* Seems that it would make no difference, but in actual fact you should switch to win (2/3 chance of win vs 1/3).
* We can break this down into 3 hypotheses about which door has the car, A, B or C.


|      .          | Prior P(H)     | Likelihood P(D\|H)    |P(H)P(D\|H)    | Posterior P(H\|D)    |
| :------------- | :----------: | :----------: |:----------: | -----------: |
| A| 1/3   | 1/2 | 1/6   | 1/3   |
| B | 1/3 | 0 |0 | 0 |
| C | 1/3 | 1 |1/3 | 2/3 |

* Priors are straigth forward as prizes are randomised.
* Likelihoods trickier, if car is behind A then probability that he opens B is 1/2.
* If car is behind B, he can't open B so chance it's opened is 0.
* If car is behing C, he has to open B otherwise he would reveal the car, so opening B has prob 1.

* Note, there is another variation where door B will always be opened if possible, and C will only be opened if it's behind B. In this case the table looks like this:

|      .          | Prior P(H)     | Likelihood P(D\|H)    |P(H)P(D\|H)    | Posterior P(H\|D)    |
| :------------- | :----------: | :----------: |:----------: | -----------: |
| A| 1/3   | 1 | 1/3   | 1/2   |
| B | 1/3 | 0 |0 | 0 |
| C | 1/3 | 1 |1/3 | 1/2 |

* So this small difference actually leads to the loss of an effect of switching doors (as door B being chosen reveals no information about the location of the car). Note, this only works if the labels (A,B and C) are private and not known to host and the player.

* Other Bayes problems can be found here: https://allendowney.blogspot.com/2011/10/all-your-bayes-are-belong-to-us.html