<br>
<br>
<br>
<br>

# DAV 6150 Module 10: Naive Bayes Classifiers
<br>
<br>
<br>

# Naive Bayes Classifiers Explained

__Naive Bayes Classifiers__ are __supervised learning algorithms__ that assume the __conditional independence of all features within an observation__.


Naive Bayes Classifiers make use of __Bayes Theorem__ for purposes of assigning probabilistic estimates of the proper classification for a previously unseen observation.  


Bayes Theorem allows us to answer the question: __"How much should you trust your evidence?"__

## Bayes Theorem

## $ P(A|B) = \frac{P(B|A) P(A)}{P(B)}$

__Explanation__: Find the probability of an event $A$ happening (our __hypothesis__) given that $B$ (our __evidence__) has already occurred.


__P(A|B)__: Represents the __posterior probability__, i.e., the likelihood that the model accurately reflects the probability of $A$ given that $B$ has occurred


__P(A)__: Represents the __prior probability__, i.e., the degree to which we believe the model accurately describes reality based on all available prior information


__P(B|A)__: Represents the __likelihood__, i.e., a measure of how well the model actually predicts our response variable.


__P(B)__: Represents the __normalizing constant__, i.e., a constant value that ensures that the posterior probability density function will integrate to a value of $1$


__Conditional Independence__: Bayes Theorem assumes that all explanatory variables are __independent__ from one another, i.e., the presence of any given explanatory variable value is __NOT__ dependent on the presence of any other particular explanatory variable value within a given observation. This means that __every explanatory variable is assumed to have an equivalent amount of effect on the outcome of the classifier__.


While this is a __simplifying assumption__ for purposes of minimizing the complexity of the Naive Bayes approach, it very often __is not representative of the actual content of a given data set__, since we quite often are able to discern tangible correlations between explanatory variables. 


Nevertheless, the Naive Bayes approach is often as effective as other more complex types of machine learning models. 



## How it Works

- Each feature is assumed to be normally distributed + conditionally independent


- The likelihood of an observation having a specific classification value is simply the product of the probabilities of the individual explanatory variables having certain values (e.g. the "play golf? scenario shown in the Module 10 Naive Bayes intro video (https://www.youtube.com/watch?v=CPqOCI0ahss&list=PL_Nji0JOuXg2udXfS6nhK3CkIYLDtHNLp&index=8) and re-created in tabular format here (https://scienceprog.com/simple-explanation-of-naive-bayes-classifier/):  

$P(Play|Sunny) =  ( P( Sunny | Play) * P(Play) ) / P (Sunny) $


## Types of Naive Bayes Classifiers

- __Gaussian__: Used when the explanatory variables are continuous numeric values and follow a normal (a.k.a., "Gaussian") distribution


- __Multinomial__: Mostly used for document classification problems. Uses word frequency counts as the explanatory variables


- __Bernoulli__: Similar to multinomial algorithm; Explanatory variables are boolean values instead of frequency counts, e.g., does a word occur within a document?


## Advantages

- Fast and scalable: requires relatively little in the way of CPU + RAM resources + scales linearly with the number of explanatory variables + observations.


- Relatively easy to implement, understand and interpret


- Can be very effective when applied to small data sets that have a relatively large number of features (e.g., images, text, speech data) since the conditional independence assumption makes it very unlikely that it will produce a model with high variance.


- Can readily handle missing data values within a given feature


- If the conditional independence assumptions hold true for the data, a Naive Bayes model is highly likely to outperform other types of classification algorithms (e.g., logistic regression)


## Disadvantages

- Conditional independence assumption eliminates any possibility of capturing any interaction between explanatory variables


- Can perform poorly if numeric explanatory variables do not actually follow a normal distribution


- Can easily produce an ineffective model if the training data is skewed / not sufficiently representative of the class distributions found within the overall population / data set.


- The __Zero Frequency__ problem: If an observation contained within the testing data set contains a classification value that was for some reason not present in the training data set, the Naive Bayes classifier will not be able to make any prediction for that observation. 


## Common Applications

- Medical diagnostics


- Sentiment Analysis (i.e., classifying data according to the “emotional tone” of its content, e.g., text classification, spam filters, etc.)


- Recommender Systems (e.g., Amazon, Google Search, Netflix, etc.)


- Real-time predictions


- Multi-class predictions


## How to Implement a Naive Bayes Classifier in Python

The __scikit-learn__ library includes pre-built functions for each of the types of Naive Bayes classifiers mentioned above:

- __Gaussian__: https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.GaussianNB.html


- __Multinomial__: https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.MultinomialNB.html


- __Bernoulli__: https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.BernoulliNB.html


## Sentiment Analysis Using Naive Bayes

An example of how to apply sentiment analysis using Naive Bayes concepts via the Python NLTK library from the Module 10 assigned readings: http://blog.chapagain.com.np/python-nltk-sentiment-analysis-on-movie-reviews-natural-language-processing-nlp/

An additional example is provided in the supplemental __Sentiment Analysis__ notebook provided within canvas