# Week 10: Naive Bayes

## Bayes’ Theorem 

### $P(A|B) = \frac{P(A) P(B|A)}{P(B)}$

$P(B) = (P(A) * P(B|A)) + (P(\bar{A}) * P(B|\bar{A}))$

Recall dependent intersection and multiplication rule:<br />
$P(A \cap B) = P(A) * P(B|A)$

### Sunrise Story

### Frequentist vs Bayesian

* Frequentist finds an expected frequency of something occurring over a large number of experiments
* Baysian has a measure of belief based on prior knowledge

The Master Algorithm: https://www.amazon.com/Master-Algorithm-Ultimate-Learning-Machine-ebook/dp/B012271YB2

### Sample Problem

We have a box and a crate full of apples and oranges, 100 fruit in all. We have 40 apples (or 40 percent of our fruit) and 60 oranges (60% of our fruit). 70% of the apples are in the box. 25% of the oranges are also in the box. You are at the market selling apples and oranges. A customer wants to get an apple from a box. What's the probability of getting an apple from a box?

**Bayes' Approach**

Using Bayes Theorem we want to know the probability of picking an apple given the box. Let's assign $P(A)$ to the apples and assign $P(\overline{A})$ to our oranges. Let's assign $P(B)$ to the box and $P(\overline{B})$ to the crate. Here's our equation once again:

$P(A|B) = \large{\frac{P(A)P(B|A)}{P(B)}}$

$P(A|B)$ reads the probability A given B, or the probability of an apple given the box. 
* $P(A)$ = 0.40 (the number of apples / the number of fruit)
* $P(B|A)$ = 0.70 (the decimal representation of the percentage of apples in the box)

All we need now is $P(B)$:<br />
$P(B) = (P(A) * P(B|A)) + (P(\bar{A}) * P(B|\bar{A}))$

We know that $(P(A) * P(B|A))$ = .40 * .70 = .28

What is $(P(\bar{A}) * P(B|\bar{A}))$?<br />
$P(\bar{A})$ = 1 - .40 = .60 or the number of oranges<br />

How many oranges are in the box? .25<br />
So $(P(\bar{A}) * P(B|\bar{A}))$ = .60 * .25 = .15

$P(B)$ = .28 + .15 = .43

What is $P(A|B)$?

In [1]:
# answer
(.4 * .7) / .43

0.6511627906976744

### Tree Diagrams (Probability Trees)

https://www.mathsisfun.com/data/probability-events-conditional.html<br />
https://en.wikipedia.org/wiki/Tree_diagram_(probability_theory)

### Tree Approach


<pre>
                  0.75  0.6 * 0.75 = 0.45
            crate/ 
                /
            0.6 
           /    \
 oranges  /   box\
         /        0.25  0.6 * 0.25 = 0.15
        /
        \
         \        0.30  0.4 * 0.30 = 0.12
 apples   \ crate/
           \    /
            0.4 
                \
              box\
                  0.70  0.4 * 0.70 = 0.28

</pre>

Our root node is our fruit and is split between apples and oranges. Each apple and orange decision node is split according to how many are in the crate and how many are in the box. 75% of the oranges are in the crate and 25% of the oranges are in the box. 30% of the apples are in the crate and 70% of the apples are in the box (notice all of our nodes add up to 100%). 57% of the fruit is in the crate and 43% of the fruit is in the box. Because there are 60 oranges, we know that there are 45 oranges in the crate. We also know there are 28 apples in the box out of the 40 apples we have. The box contains 15 oranges and 28 apples (43 items in all). The probability of getting an apple from the box is 28 / 43 or roughly 56%.

<img style="float: left;" src="https://raw.githubusercontent.com/gitmystuff/INFO5810/main/Week_3-Prob_Dists_HypTesting/tree_diagram.JPG">
<hr style="width: 100%; visibility: hidden;" />

https://en.m.wikipedia.org/wiki/Tree_diagram_(probability_theory)</td>

### Contingency Table of Probabilities

In our apples and oranges discussion there are four probabilities that can be used to create a contingency table as shown below:

* $P(\bar{A})$ = .60
* $P(A)$ = .40
* $(P(\bar{A})P(B|\bar{A}))$ = .15
* $(P(A)P(B|A))$ = .28

<pre>
                 crate  |   box  
                 ----------------------- 
      oranges     .45   |   .15    | .60
                 -----------------------
      apples      .12   |   .28    | .40
                 -----------------------
                  .57   |   .43    | .100
</pre>

What percentage of fruit are apples? 40 / 100<br />
What percentage of fruit are in a box? 43 / 100<br />
What percentage of apples are in the box? 28 / 40<br />
What's the probability of picking an apple out of a box?

In [2]:
# What's the probability of picking an apple out of a box?
28 / 43

0.6511627906976745

### Baysian Inference

* $P(A)$ a.k.a. the **prior**: A prior probability distribution, often simply called the prior, of an uncertain quantity is the probability distribution that would express one's beliefs about this quantity before some evidence is taken into account. https://en.m.wikipedia.org/wiki/Priowr_probability
* $P(B)$ a.k.a. the **marginal likelihood**: Aa marginal likelihood function, or integrated likelihood, is a likelihood function in which some parameter variables have been marginalized. In the context of Bayesian statistics, it may also be referred to as the evidence or model evidence. https://en.m.wikipedia.org/wiki/Marginal_likelihood
* $P(A|B)$ a.k.a. the **posterior**: This is what we want to know: the probability of a hypothesis given the observed evidence. https://en.m.wikipedia.org/wiki/Bayesian_inference
* $P(B|A)$ a.k.a. the **likelihood**: Indicates the compatibility of the evidence with the given hypothesis. https://en.m.wikipedia.org/wiki/Bayesian_inference

Bayesian inference. (February 8, 2022). In *Wikipedia*. https://en.m.wikipedia.org/wiki/Bayesian_inference 

### Updating Our Prior

**Difference Between Conditional Probability and Bayes' Theorem**<br />
https://www.investopedia.com/terms/b/bayes-theorem.asp

The next day, the same customer comes to you wanting fruit. What's the probability that this same customer will want an apple from a box?

https://www.analyticsvidhya.com/blog/2017/03/conditional-probability-bayes-theorem/ 

$P(A|B) = \large{\frac{P(A)P(B|A)}{P(B)}}$

**First Pick**
* $A$ = apple
* $\overline{A}$ = orange
* $B$ = box
* $\overline{B}$ = crate
* $P(A)$ = .4
* $P(\overline{A})$ = .6
* $P(B|A)$ = .7
* $P(B|\overline{A})$ = .25
* $P(A)$ * $P(B|A)$ = .4 * .7 = .28
* $P(\overline{A})$ * $P(B|\overline{A})$ = .6 * .25 = .15

$P(B)$ = ($P(A)$ * $P(B|A)$) + (P($\overline{A}$) * $P(B|\overline{A})$)<br />
$P(B)$ = (.4 * .7) + (.6 * .25) = .28 + .15 = .43<br />
$P(A|B)$ = .28 / .43 = .65 (rounded)

65% chance of picking an apple out of a box

**Second Pick**
* $A$ = apple
* $\overline{A}$ = orange
* $B$ = box
* $\overline{B}$ = crate
* $P(A)$ = .65
* $P(\overline{A})$ = .35
* $P(B|A)$ = .7
* $P(B|\overline{A})$ = .25
* $P(A)$ * $P(B|A)$ = .64 * .7 = .28
* $P(\overline{A})$ * $P(B|\overline{A})$ = .6 * .25 = .15

$P(B)$ = ($P(A)$ * $P(B|A)$) + (P($\overline{A}$) * $P(B|\overline{A})$)<br />
$P(B)$ = (.65 * .7) + ((1 - .65) * .25) = (.65 * .7) + (.35 * .25) = .455 + .0875 = .543<br />
$P(A|B)$ = .455 / .543 = .84 (rounded)<br />
There is an 84% chance of the same customer wanting an apple out of a box.

## More Info

https://www.mathsisfun.com/data/bayes-theorem.html

## Grilling

We want to decide if our neighbor is going to grill outside today? This decision is based on several factors listed below. Given a condition, we want to see if our neighbor will grill outside. Here are the factors:

**forecast**
* sunny
* overcast
* rainy

**temperature**
* hot
* mild
* cold

**worked this day**
* yes
* no

**grilled**
* yes
* no

We've collected data for 30 days now and arranged our data into the following tables. For example, when it was sunny, there were 10 days that our neighbor grilled and 2 days our neighbor didn't grilled.

<pre>

<strong>forecast</strong>


              grilled
          |  yes  |  no
_____________________________

sunny     |   10  |  2
_____________________________

overcast  |   6   |  8
_____________________________

rainy     |   0   |  4
_____________________________

total     |   16  |  14  |  30


<strong>temperature</strong>

              grilled
          |  yes  |  no
_____________________________

hot       |   5   |  7
_____________________________

mild      |   10  |  3
_____________________________

cold      |   1   |  4
_____________________________

total     |   16  |  14  |  30


<strong>worked this day</strong>

              grilled
          |  yes  |  no
_____________________________

yes       |   8   |  12
_____________________________

no        |   8   |  2
_____________________________

total     |   16  |  14  |  30


<strong>grilled</strong>

              grilled
          |  yes  |  no
_____________________________

total     |   16  |  14  |  30

</pre>

Today is sunny, hot, and our neighbor worked. Will our neighbor grill?

Organize the data based on today:

grilled = yes
* Sunny = 10 out of 12 times
* Hot = 5 out of 12 times
* Worked = 8 out of 20 times
* Grilled = 16 out of 30 times

grilled = no
* Sunny = 2 out of 12 times
* Hot = 7 out of 12 times
* Worked = 12 out of 20 times
* Grilled = 14 out of 30 times

Here's another look at Bayes' Theorem:

$P(A|B) = \large{\frac{P(A)P(B|A)}{P(B)}}$<br />
where<br />
$P(B) = (P(A) * P(B|A)) + (P(\overline{A}) * P(B|\overline{A}))$

Our question: What is the probability of grilling given today

day = sunny, hot, and worked

Some definitions:
* $P(A)$: Grilled
* $P(\overline{A})$: Not grilled
* $P(B|A)$: Grilled on that day
* $P(B|\overline{A})$: Did not grill on that day
* $P(B)$: All the days we obeserved

Here's a solution for $P(B)$

today = sunny, hot, and worked

P(grilled) * P(yes|today):<br />
P(16/30) * P(10/12 * 5/12 * 8/20) = 0.074

P(not grilled) * P(no|today):<br />
P(14/30) * P(2/12 * 7/12 * 12/20) = 0.027

P(B) = 0.074 + 0.027 = 0.101

$\large{\frac{P(willGrill) * P(yes|today)}{P(allOurDays)}}$ = $\large{\frac{0.074}{0.101}}$ is about 73%

$\large{\frac{P(willNotGrill) * P(no|today)}{P(allOurDays)}}$ = $\large{\frac{0.027}{0.101}}$ is about 27%

Today is sunny, hot, and our neighbor worked today. Will our neighbor be grilling?

## Walk or Drive

* https://medium.com/@abhishek.km23/naive-bayes-classifier-calculation-of-prior-likelihood-evidence-posterior-74d7d27eec24
* 10 walks 
* 20 drives 
* how to classify a datapoint in between (4 points are similar)

#### p(walks|x)
* prior = 10 / 30 p(walks)
* likelihood = 3 / 10 p(x|walks)
* marginal likelihood = 4 / 30 p(B)
* = .75

#### p(drives|x)
* prior = 20 / 30 p(drives)
* likelihood = 1 / 20 p(x|drives)
* marginal likelihood = 4 / 30 p(B)
* = 1 - .75 = .25