# Lecture 2: Naive Bayes
***

<img src="files/figs/bayes.jpg",width=1201,height=50> 

<!---
![my_image](files/figs/bayes.jpg)
-->


<a id='prob1'></a>

### Problem 1: Bayes Law and The Monte Hall Problem 
***


>Suppose you're on a game show, and you're given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what's behind the doors, opens another door, say No. 3, which has a goat. He then says to you, "Do you want to pick door No. 2?" Is it to your advantage to switch your choice?

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/3/3f/Monty_open_door.svg/2000px-Monty_open_door.svg.png",width=801,height=400>

**A**: What does your intuition say?  Is it in your best interest to switch? 

**B**: Use Bayes Rule to compute that your probability of winning if you switch and if you do not switch.

<a id='prob2'></a>

### Problem 2: Naive Bayes on Symbols
***

> This problem was adopted from [Naive Bayes and Text Classification I: Introduction and Theory](https://arxiv.org/abs/1410.5329) by Sebastian Raschka

Consider the following training set of 12 symbols which have been labeled as either + or -: 

<br>

<img src="files/figs/shapes.png?raw=true"; width=500>

<!---
![](files/figs/shapes.png?raw=true)
-->

Answer the following questions: 


**A**: What are the general features associated with each training example? 

In the next part, we'll use Naive Bayes to classify the following test example: 

<img src="files/figs/bluesquare.png"; width=200>

OK, so this symbol actually appears in the training set, but let's pretend that it doesn't.  

The decision rule can be defined as 

>Classify ${\bf x}$ as + if <br>
>$p(+ ~|~ {\bf x} = [blue,~ square]) \geq p(- ~|~ {\bf x} = [blue, ~square])$ <br>
>else classify sample as -

**B**: What are the Maximum Likelihood Estimates of the priors $p(+)$ and $p(-)$? 


**C**: Identify and compute estimates of the class-conditional probabilities required to predict the class of ${\bf x} = [blue,~square]$?

**D**: Using the estimates computed above, compute the **posterior** scores for each label, and find the Naive Bayes prediction of the label for ${\bf x} = [blue,~square]$. 

**E**: If you haven't already, compute the class-conditional probabilities scores $\hat{p}({\bf x} = [blue,~square] ~|~ +)$ and $\hat{p}({\bf x} = [blue,~square] ~|~ -)$ under the Naive Bayes assumption.  How can you reconsile these values with the final prediction that would made? 

<a id='prob3'></a>

### Problem 3: Laplace Smoothing 
***

Consider the same training set from Problem 2, but suppose we see the following test example: 
    
<img src="figs/greencircle.png"; width=200>

Before you get too far into trying to predict the label of the green circle, look carefully at the training set.  Notice that there are no green shapes labeled - in the training set, so when we try to compute the class-conditional probability $p(green ~|~ -)$ we'll get a zero probability.  To fix this, you'll implement Laplace smoothing. Notice that this is a little different than the SPAM vs HAM example shown in the video.  We actually have two very different features in shapes and colors. We'll apply Laplace Smoothing to the shape and color class-conditional probabilities separately. 

**A**: What would the general formula for the estimate of $p(shape ~|~ class)$ with Laplace Smoothing look like for the given training set?  What is the *vocabulary* in the shape case?  

**B**: What would the general formula for the estimate of $p(color ~|~ class)$ with Laplace Smoothing look like for the given training set?  What is the *vocabulary* in the shape case?  

**C**: Predict the label for the green circle using the Laplaced smoothed class-conditional probability formulas.  Don't forget to apply Laplace Smoothing to the priors as well! 

[[Problem 3 Answers]](#prob3ans)

<br><br>

<a id='prob4'></a>

### Problem 4: Unknown Features
***

Once again consider the training set from Problem 2, but suppose we see the following test example: 
    
<img src="figs/yellowsquare.png"; width=200>

OK, this is a weird one.  Up until this point, we've never seen the color *yellow*, and thus don't include it in the color vocabulary.  One way that we could handle this is to add to the color vocabulary, and then recompute the the class-conditional probabilities with *yellow* included in the vocabulary. 

But what happens when on the next test example we see a *pink* circle (or worse, a triangle)? We'd rather not continue to modify our probability estimates whenever we see shape or color that we haven't see before.  One solution to this is to just assume we'll see weird things in the future and combine all of the posibilities into an UNK feature. If we do this, then our class-conditional probabilities become 

$$
p(feature ~|~ class) = \frac{\#~instances~of~feature~in~class + 1}{\#~total~symbols~in~class + |V| + 1}
$$

where here the vocabular $V$ is the same vocabular defined by the training set. 

**A**: Predict the label of the yellow square.  

<a id='prob1ans'></a>

<br><br><br><br>
<br><br><br><br>
<br><br><br><br>
<br><br><br><br>
### Helper Functions 
***

In [2]:
from IPython.core.display import HTML
HTML("""
<style>
.MathJax nobr>span.math>span{border-left-width:0 !important};
</style>
""")

from IPython.display import Image