# DSCI 6003 4.2 Practicum: Naive Bayes

In this exercise you will implement Naive Bayes classification in Python. 

##Background

- Naive Bayes primarily relies on the Bayes Theorem:

  $$p(y|x) = \frac{p(x|y) \times p(y)}{p(x)}$$

  <br>

  where 

  - $p(y|x)$ is the probability of observing a particular label / class given the data (posterior)
  - $p(x|y)$ is the probability of observing the data given a particular label / class (likelihood)
  - $p(y)$ is the probability of observing the a particular label / class (prior)
  - $p(x)$ is the probability of observing the data

  <br>

- It is assumed that $p(x)$ is constant, and therefore we can ignore the term and rewrite the formulation for Naive Bayes as:

  $$p(y|x) \propto p(x|y) \times p(y)$$

  <br>

- In more concrete terms, we can express the likelihood of observing the data as the joint probability of observing all the features in the data:

  $$p(x|y) = p(x_i|y) \cdot p(x_{i+1}|y) \cdot p(x_{i+2}|y) \cdot \text{...} \cdot p(x_n|y)$$
  
  <br>
  
- We would compute the likelihood based on exisiting data and set a prior based on the class distribution
- Based on the likelihood and prior, we can then compute the probability observing a certain class given I have observed feature i two times and  feature i+1 3 times:

  $$p(y|x) \propto p(x_i|y)^2 \times p(x_{i+1}|y)^3 \times p(y)$$

  <br>

- To take the log form of the above formulation, we will get:

  $$log(p(y|x)) \propto 2log(p(x_i|y)) + 3log(p(x_{i+1}|y)) + log(p(y))$$
  
  <br>
  
- The general form to compute the posterior would be:

  $$log(p(y|x)) \propto \sum_{i=1}^n  x_i log(p(x_i|y)) + log(p(y))$$

  <br>
  
- To compute the likelihood of observing a certain feature given a class, $p(x_i|y)$:

  $$p(x_i|y) = \frac{S_{y,i} + \alpha}{S_y + \alpha p}$$
  
  where 
  - p is the number of features
  - $\alpha$ is a smoothing terming which prevents undefined probability, usually set to 1
  - $S_{y,i}$ is the sum of all of the $i^{th}$ features for all the datapoints in class $y$
  - $S_y$ is the sum of all of the features for all the datapoints in class $y$