# Naive Bayes

### Introduction

Netflix, to know the probability people like a movie, then can just count:

$P(E) = \displaystyle{\lim_{n \to \infty}} \frac{n(E)}{n}$

But really netflix, contextualizes this with the following:

$P(E|F) = \frac{P(EF)}{P(F)}$

### Background for Bayes

Let's bring our conversation probability into the context of classification metrics.  Let's say that, in the diagram below, the circle F is all of our predicted positive observations.  And the circle E represents all of our actual positive observations.  

* $F$: Predicted positive
* $E$: Actual positive 

Let's say that we want to calculate our precision.  Remember that $precision = \frac{TP}{TP + FP}.$

<img src="./conditional-probability.png" width="60%">

In other words, precision the smaller wedge divided by the full triangle F.

$Precision = \frac{P(TP)}{P(\text{Predicted Positive})} = \frac{P(FE)}{P(F)}$

And of course, we can break $F$ down into our true positives and false positives:

$Precision = \frac{P(TP)}{P(TP) + P(FP)} = \frac{P(FE)}{P(FE) + P(FE^C)} $

And just to state what precision is, in terms of probability, this is the probability of a $TP$, assuming a prediction of positive.

$Precision = P(E|F) = \frac{P(TP)}{P(TP) + P(FP)} = \frac{P(FE)}{P(FE) + P(FE^C)} $

Once here, we can apply the chain rule.  That is, we can replace: 
* $P(FE) = P(F|E)*P(E)$ and 
* $P(FE^c) = P(F|E^c)*P(E^c)$.

So this gives us the following:

$ P(E|F) = \frac{P(TP)}{P(TP) + P(FP)} =\frac{P(FE)}{P(FE) + P(FE^C)} =  \frac{P(F|E)*P(E)}{P(F|E)*P(E) + P(F|E^c)*P(E^c)} $

This is called Bayes Theorem.  The reason why it is important is because many times we will have $P(E|F)$ (precision) and will want recall, $P(F|E)$, or vice versa.  

### Working through an example

Now let's see some numbers.  Let's again define the following:

* $F$: Predicted positive
* $E$: Actual positive 

$P(F | E) = P(\text{ Predictive positive} | \text{Actual Positive})$

### HIV Testing Example

> The idea, is that new evidence, does not determine beliefs in a background, but rather should update prior beliefs.

<img src="./bayes-3blue.png" width="30%">

Use bayes when have some *hypothesis*, and observe evidence, and then want hypothesis, given evidence is true.

* $P(H) = prior$
* $P(E|H) = likelihood $
> Prob of evidence given a specific hypothesis (prob shy given librarian)

* $P(E | H^c) = $
> Prob of evidence given it's not true 

* $Posterior = P(H|E) =$ belief after seeing the evidence.

A test is .98 effective at detecting HIV.  And has a false positive rate of 1%.  And .5% of the population has HIV.  What's the probability we have hiv if we test positive.

* Let E = test positive for HIV with the test
* Let F = you actually have HIV.

> False positive rate is defined $\frac{\text{false positive}}{\text{condition negative}}$.

* $P(F) = .005$
* $P(E|F^c) = .01$ 
* $P(F^c) = .995$
* $P(E|F) = .98$ (Recall)

$P(F|E)  = \frac{True Positive}{TP + FP}$

$\frac{.98*.005}{.98*.005 + .01*.995}$ = 

In [3]:
.98*.005/(0.0049 + 0.00995)

0.32996632996632996

In [2]:
.01*.995

0.00995

### Intuition of why it works

<img src="./bayes-intuition.png">

[Bayes Video](https://youtu.be/wB0z0nQebNc?list=PLcmJYc2muOR9H96hGlUBV2DkviVZFmHAh&t=3768)

* General idea with bayes:
    * Conditional probability - have a belief, and update belief given more information

<img src="./conf-bayes.png" width="70%">

$P(F|E^c) = \frac{P(E^c | F) P(F)}{P(E^c|F)P(F) + P(E^c | F^c)P(F^c)} = .0001$

In [4]:
(.02*.005)/((.02*.005) + (.99*.995))
# 0.00010150738466223418

0.00010150738466223418

### Spam problem

<img src="./email-spam.png" width="40%">

### Resources

[ML Mastery Naive Bayes Classifier](https://machinelearningmastery.com/naive-bayes-classifier-scratch-python/)

[Naive Bayes - Python Data Science](https://jakevdp.github.io/PythonDataScienceHandbook/05.05-naive-bayes.html)

[Sklearn Naive Bayes](https://scikit-learn.org/stable/modules/naive_bayes.html)