# **Chapter 2: Bayes' Theorem**
___

#**The Cookie Problem**
---
* Suppose we have two bowls of cookies
  * ``Bowl 1`` contains 30 vanilla cookies and 10 chocolate cookies.
  * ``Bowl 2`` contains 20 vanilla cookies and 20 chocolate cookies.
  
    Now suppose you choose one of the bowls at random and, without looking, choose a cookie at random. If the cookie is vanilla, what is the probability that it came from ``Bowl 1``?

Now as a simple conditional we we choose from Bowl 1 given that we got a vanilla cookie, P(B<sub>1</sub>|V).

* But what we get from the statement is that chances of getting vanilla cookie given that we chose from ``Bowl 1`` P(V | B<sub>1</sub>), and from ``Bowl 2`` is P(V | B<sub>2</sub>)

Bayes' theorem tells us how they are related:

\begin{align}
        P(B_1 \mid V) = \frac{P(B_1)\,P(V\mid B_1)} {P(V)}
    \end{align}

The term on the left is what we want, 
* P(B<sub>1</sub>)=1/2
* P(V | B<sub>1</sub>), the probabilty is 3/4
* P(V), is the vanilla cookie drawn from either of the bowls 5/8 using total probability: 
\begin{align}
        P(V) = {P(B_1)\,P(V\mid B_1)}+{P(B_2)\,P(V\mid B_2)} \\ P(V) = (1/2) (3/4) + (1/2)(1/2) = 5/8
    \end{align}

* The following code snippets follow from chapter 1 examples. The programming language used here is ``python3``


```python:
def prob(A):
  return A.mean()

def conditional(proposition,given):
  return prob(proposition[given])

total_vanilla=50
total_cookies=80
pv= prob(total_vanilla)
total_probability=prob(bowl1)*conditional(vanilla,given=bowl1)+prob(bowl2)*conditional(vanilla,given=bowl2)

```



# **Diachronic Bayes**
___
> Another way to think about Bayes' theorem: it gives us a way to update the probability of a hypothesis, H, given some body of data D. Diachronic means to change over time; in this case the probability of the hypothesis changes as we see new data.

### **Rewriting Bayes' theorem with H and D yields (produces) :**

\begin{align}
        P(H \mid D) = \frac{P(H)\,P(D\mid D)} {P(D)}
    \end{align}


  * P(H) is probability of the the hypothesis before we see the data, called the prior probability, or just ``prior`` .
  * P(H|D) is the probability after we see the data, called the ``posterior`` .
  * P(D|H) is the probability fo the data under the hypothesis, called the ``likelihood``
  * P(D) is the ``total probability of the data`` , under the hypothesis. 


Sometimes the ``prior`` is calculated based on the background information we might have on the data. 

In other cases, it's subjective, means how the data is interpreted by an individual or a class might differ based on how they see the information.

The ``likelihood`` is usually the easiest part, given that we compute the probabilty of each data under the hypothesis. It P(H) remains the same as it is incurred randomly. 

Most often things are simplified based on a set of hypothesis that are:
* Mututally exclusive, meaning that only one of them can be ``true``. (e.g. [True, False], [False, False] but $\neq$ [True, True])
* Collectively exhaustive, meaning that one of them must be ``true``.

When these conditions apply we can compute $\text P(D)$ using the law of total probability. For example, with two hypotheses,  $\text H_1$ and $\text H_2$ : 


\begin{align}
        P(D) = {P(H_1)\,P(D\mid H_1)} \ + \ {P(H_2)\,[(H_2)}
    \end{align}


In ``python3`` the pseudocode of the above equation is as follows:
```python:
H1=float()
H2=float()
D=float()
total_probability=prob(H1)*conditional(D,given=H1)+prob(H2)*conditional(D,given=H2)
```

And more generally, with any number of hypotheses:

\begin{equation}
  P(D) = \sum_i^n {P(H_i)\, P(D|H_i)} ;\, [{n \, \epsilon \, ℝ}]
\end{equation}


It could be done using following pseudocode in ``python3``:
```python:
hypotheses=[H1,H2,H3,...............,Hn]
total_probability=[]
for hypothesis in hypotheses:
  total_probability.append(prob(hypothesis)*conditional(D,given=hypothesis))
sum(total_probability)
```

# **Bayes Tables**
___
 A convenient tool for doing Bayesian update is to use 'Bayes Table'.
 Let's create an empty ``pandas DataFrame``

In [2]:
import pandas as pd

In [3]:
table=pd.DataFrame(index=['Bowl1', 'Bowl2'])

In [4]:
table['prior']=1/2,1/2
table

Unnamed: 0,prior
Bowl1,0.5
Bowl2,0.5


In [5]:
table['likelihood']=3/4, 1/2 #getting vanilla cookie from the first one and the second one
table

Unnamed: 0,prior,likelihood
Bowl1,0.5,0.75
Bowl2,0.5,0.5


* If we notice our chances before we've seen the data, were 50/50 that is ``prior``, based on the information that we have two bowls.
* After we have seen the data, the likehood has changed the hypothesis, for ``Bowl1`` to be more likely than ``Bowl2`` .
* Notice that the ``likelihood`` don't add up to 1. That's okay, because they are based on two different hypotheses. 
* Next step is similar to what we did with Bayes' Theorem; we multiply the ``priors`` to the ``likelihoods``


Let's do that:

In [6]:
table['unnorm'] = table['prior']*table['likelihood']
table

Unnamed: 0,prior,likelihood,unnorm
Bowl1,0.5,0.75,0.375
Bowl2,0.5,0.5,0.25


The results are depicted as ``'unnorm'`` because they are "unnormalized posteriors". Each of them is the product of a `prior` and a `likelihood`.
Here B<sub>i</sub> is H<sub>i</sub> of Bowl<sub>i</sub> 
\begin{equation}
    P(B_i) P(D|B_i)
 \end{equation}

 Which is the numerator (the number above the line in a vulgar fraction showing how many of the parts indicated by the denominator are taken, for example, 2 in 2/3.) of Bayes' Theorem. If we add them up, we have:


 \begin{equation}
    P(B_1) P(D|B_1) \, + \, P(B_2) P(D|B_2)
 \end{equation}


 Which is the denominator(the number below the line in a vulgar fraction; a divisor. e.g. 3 in 2/3) of Bayes' theorem, $\text P(D).$ (scroll up)



 So we can compute the total probability of the data like this:

In [7]:
prob_data=table['unnorm'].sum()
prob_data

0.625

* Notice that we get 5/8, which is what we got by computing $\text P(D)$ directly.

And we can compute the posterior probabilities like this:

In [8]:
table['posterior']=table['unnorm']/prob_data
table

Unnamed: 0,prior,likelihood,unnorm,posterior
Bowl1,0.5,0.75,0.375,0.6
Bowl2,0.5,0.5,0.25,0.4


When we add up the unnormalized posteriors and divide them through, we force the posteriors to add up to 1. This process is called 'Normalization', which is why the total probability of the Data is termed 'Normalizing Constant'.

# **The Dice Problem**
___

as we can recall the Bayes table can solve 'n' number of hypotheses. 

\begin{equation}
  P(D) = \sum_i^n {P(H_i)\, P(D|H_i)} ;\, [{n \, \epsilon \, ℝ}]
\end{equation}


* What is the the probability that I chose a 6-sided die, given that the outcome is 1, and the types of die are: 
    * 6-sided
    * 8-sided
    * 12-sided

Let's make a Bayes Table, using integers to represent the hypothesis

In [9]:
table2=pd.DataFrame(index=[6,8,12])

In [10]:
from fractions import Fraction
table2['prior']=Fraction(1,3) #because we have three types of dies

In [11]:
table2['likelihood']=Fraction(1,6),Fraction(1,8),Fraction(1,12)
table2

Unnamed: 0,prior,likelihood
6,1/3,1/6
8,1/3,1/8
12,1/3,1/12


Once we have the table ready with prior and likelihood, the rest of the steps are always the same. So let's put them in a function:

In [12]:
def update(table):
  """Compute the posterior probabilities"""
  table['unnorm'] = table['prior']*table['likelihood']
  prob_data=table['unnorm'].sum()
  table['posterior']=table['unnorm']/prob_data
  return prob_data
prob_data=update(table2)

In [13]:
table2

Unnamed: 0,prior,likelihood,unnorm,posterior
6,1/3,1/6,1/18,4/9
8,1/3,1/8,1/24,1/3
12,1/3,1/12,1/36,2/9


In [14]:
prob_data

Fraction(1, 8)

# **The Monty Hall Problem**
___

Let's use a Bayes table to solve one the contentious(causing or likely to cause an argument) problems in porbability.

``The Monty Hall`` Problem is based on a game show called $\text {Let's Make a Deal}$ . Here's how the rules of the game plays out:

* The host, Monty Hall, shows you three closed doors - numbered 1, 2, and 3 - and tells you that there is a prize behind each doors. 
* One prize is valuable (something like a car), the other two are less valuable  (for humourus purposes - goats) .
* The object of the game is guess which door has the valuable item. If you guess right, you can keep the prize.

Suppose you, in your mind chose the Door 1 and Monty opens a door other than your choice, eg. door 3 and shows you there is a goat. Then monty offers you whether you want to stick to your original choice or switch to the remaining unopened door.

How do you maximize your chances between Door 1 and Door 2?

To answer the question, let's make up some assumptions about the behaviour on the host:

1. Monty always opens a door and offers you the option to switch.
2. He never opens the door you picked or the door with the car.
3. If you choose the door with the car, he chooses one of the other doors at random.

Under these assumptions, you better switch because hypothesis has changed after we have seen one the items behind the door. Prior, it was 1/3, now if you switch you win 2/3 of the time.

You might think that I have two doors left and the chances of winning is 50% thus, implying that it doesn't matter if I switch or stick to my answer. But you are utterly wrong.

Let us use the help of a Bayes Table to devise this argument, so that it helps us culminate a better chance.

We'll start with the three Doors hypothesis as `prior`

In [15]:
table3=pd.DataFrame(index=['Door 1','Door 2', 'Door 3'])
table3['prior']=Fraction(1,3) # Before we have seen anything behind the door
table3

Unnamed: 0,prior
Door 1,1/3
Door 2,1/3
Door 3,1/3


The data is that Monty opened Door 3 and revealed a goat. So let's consider the probability of the data under each hypothesis.

* If the car is behind Door 1, Monty chooses Door 2 or 3 at random, so the probability he opens Door 3 is 1/2.
* If the car is beind Door 2, Monty has to open Door 3, so the probability of the data under this hypothesis is 1.
* If the car is beind Door 3, Monty does not open it, so the probability of the data under this hypothesis is 0

So, there is our ``likelihood``

In [17]:
table3['likelihood']=Fraction(1,2),1,0
table3

Unnamed: 0,prior,likelihood
Door 1,1/3,1/2
Door 2,1/3,1
Door 3,1/3,0


Now that we have our ``priors`` and ``likelihoods``, we can use ``update()`` to compute the posterior probabilities:

In [18]:
update(table3)
table3

Unnamed: 0,prior,likelihood,unnorm,posterior
Door 1,1/3,1/2,1/6,1/3
Door 2,1/3,1,1/3,2/3
Door 3,1/3,0,0,0


So, as this shows that or intuition for probability is not always reliable. Bayes' theorem can help by providing a divide and conquer strategy:

1. First, write down the hypotheses and the data.
2. Next, figure the prior probabilities.
3. Finally, compute the likelihoood of the data under each hypothesis.


$\text P(D|H)$ implies,

What are the chances of the $\text D_i$ looking like $\text D_i$ given the fact that our assumption is $\text H_i ? $

#Conclusion
___
Bayes' Table helps with computing total probability of the data more easily, especially when we are posed with more than two hypotheses.