# Introduction to Naive Bayes 
In this notebook, it will show you:
* what is navie bayes algorithm?
* what is the pros and cons of naive bayes?
* what application could use naive bayes?
* naive bayes implementation with sk-learn.
* tips to improve naive bayes model.

### 1. What is navie bayes algorithm?
It is a classifiction algorithm based on [Bayes’ Theorem](https://en.wikipedia.org/wiki/Bayes%27_theorem) with an assumption of independence among predictors.
In a simple term, a Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature([Independence Rule](https://www.mathsisfun.com/data/probability-events-independent.html)), and that is why it is call "Naive".

Naive Bayes model is easy to build and particularly useful for very large data sets. Along with simplicity, Naive Bayes is known to outperform even highly sophisticated classification methods.

Bayes theorem provides a way of calculating posterior probability P(c|x) from P(c), P(x) and P(x|c). Look at the equation below:

<img src="asset/Bayes_Theorm.png",width=600,height=600, style="float: left;">


### 2. What is the pros and cons of naive bayes?
**Pros:**
* It is **easy and fast **to predict class of test data set. It also perform well in multi class prediction.

* When assumption of independence holds, a Naive Bayes classifier performs better compare to other models like logistic regression and you need less training data.

* It perform well in case of categorical input variables compared to numerical variable(s). For numerical variable, normal distribution is assumed (bell curve, which is a strong assumption).

**Cons:**
* If categorical variable has a category (in test data set), which was not observed in training data set, then model will assign a 0 (zero) probability and will be unable to make a prediction. This is often known as “Zero Frequency”.

* To solve this, we can use the smoothing technique. One of the simplest smoothing techniques is called Laplace estimation.

* On the other side naive Bayes is also known as a **bad estimator**, so the probability outputs from predict_proba are not to be taken too seriously.

* Another limitation of Naive Bayes is the assumption of independent predictors. In real life, it is almost impossible that we get a set of predictors which are completely independent.

* For text related tasks, it **ignores the context and semantic**, it just does prediction based on the occurrency of word.


### 3. What application could use naive bayes?

* **Real time Prediction**: Naive Bayes is an eager learning classifier and it is sure **fast**. Thus, it could be used for making predictions in real time.

* **Multi class Prediction**: This algorithm is also well known for multi class prediction feature. Here we can predict the probability of multiple classes of target variable.

* **Text classification/ Spam Filtering/ Sentiment Analysis**: Naive Bayes classifiers mostly used in text classification (due to better result in multi class problems and independence rule) have higher success rate as compared to other algorithms. As a result, it is widely used in Spam filtering (identify spam e-mail) and Sentiment Analysis (in social media analysis, to identify positive and negative customer sentiments)

* **Recommendation System**: Naive Bayes Classifier and [Collaborative Filtering](https://en.wikipedia.org/wiki/Collaborative_filtering) together builds a Recommendation System that uses machine learning and data mining techniques to filter unseen information and predict whether a user would like a given resource or not

### 4. Naive Bayes implementation with sk-learn.
[scikit learn](http://scikit-learn.org/stable/modules/naive_bayes.html) (python library) will help here to build a Naive Bayes model in Python. There are three types of Naive Bayes model under scikit learn library:

* **Gaussian**: It is used in classification and it assumes that **features follow a normal distribution**.

* **Multinomial**: It is used for discrete counts. For example, let’s say,  we have a text classification problem. Here we can consider bernoulli trials which is one step further and instead of “word occurring in the document”, we have “count how often word occurs in the document”, you can think of it as “number of times outcome number x_i is observed over the n trials”.

* **Bernoulli**: The binomial model is useful if your feature vectors are binary (i.e. zeros and ones). One application would be text classification with ‘bag of words’ model where the 1s & 0s are “word occurs in the document” and “word does not occur in the document” respectively.

Based on your data set, you can choose any of above discussed model. Below is the example of Gaussian model with [iris dataset](http://scikit-learn.org/stable/auto_examples/datasets/plot_iris_dataset.html).

In the below Implementation, we use K-Fold [Cross-Validation](http://scikit-learn.org/stable/modules/cross_validation.html#cross-validation) to evaluate model performance.

**Golden Rule**: never use your testing data for training until the end.

<img src="asset/Cross_Validation.png",width=400,height=400, style="float: left;">




In [52]:
# import libs
from sklearn import datasets
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import KFold, train_test_split

In [53]:
def BuildGaussianNBModel(train_features, train_labels):
    gnb = GaussianNB()
    gnb.fit(train_features, train_labels)
    return gnb

In [72]:
# define cross-validation K-Fold
kf = KFold(n_splits=3, shuffle=True, random_state=42)
# load data and split into train data and validation data
iris = datasets.load_iris()
data = iris.data
label = iris.target
data_train, data_test, label_train, label_test = train_test_split(data, label, test_size=0.25, random_state=42)
k_fold_num = 1
for train_idxs, validation_idxs in kf.split(data_train):
    # traing model
    train_features= data_train[train_idxs]
    train_labels= label_train[train_idxs]
    model = BuildGaussianNBModel(train_features, train_labels)
    # validate model
    validation_features= data_train[validation_idxs]
    validation_labels= label_train[validation_idxs]
    n_validation = validation_features.shape[0]
    preds = model.predict(validation_features)
    accuracy = (validation_labels==preds).sum()/n_validation
    print('K-{0} accuracy: {1}'.format(k_fold_num, accuracy))
    k_fold_num += 1 

K-1 accuracy: 0.9736842105263158
K-2 accuracy: 0.918918918918919
K-3 accuracy: 0.918918918918919


In [75]:
n_test = data_test.shape[0]
preds = model.predict(data_test)
accuracy = (label_test==preds).sum()/n_test
print('test accuracy: {0}'.format(accuracy))

test accuracy: 1.0
