# What is a classifier?

A classifier is a machine learning model that is used to discriminate different objects based on certain features.

# Principle of Naive Bayes Classifier:

A Naive Bayes classifier is a probabilistic machine learning model that’s used for classification task. The crux of the classifier is based on the Bayes theorem.

# Types of Naive Bayes Classifier:

## Multinomial Naive Bayes:
This is mostly used for document classification problem, i.e whether a document belongs to the category of sports, politics, technology etc. The features/predictors used by the classifier are the frequency of the words present in the document.


## Bernoulli Naive Bayes:
This is similar to the multinomial naive bayes but the predictors are boolean variables. The parameters that we use to predict the class variable take up only values yes or no, for example if a word occurs in the text or not.

## Gaussian Naive Bayes:
When the predictors take up a continuous value and are not discrete, we assume that these values are sampled from a gaussian distribution.

$$P(x_i|y)=\frac{1}{\sqrt{2\pi \sigma^2_y}} e^{\left(\frac{-\left(x_i-\mu_y\right)^2}{2\sigma_y^2}\right)}$$

# import libraies

In [61]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import confusion_matrix, accuracy_score

# Importing Data set

In [75]:
dataset=pd.read_csv("Social_Network_ads.csv")
X=dataset.iloc[:,1:-1]
y=dataset.iloc[:,-1]
dataset

Unnamed: 0,Age,EstimatedSalary,Purchased
0,19,19000,0
1,35,20000,0
2,26,43000,0
3,27,57000,0
4,19,76000,0
...,...,...,...
395,46,41000,1
396,51,23000,1
397,50,20000,1
398,36,33000,0


# Splitting

In [76]:
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.25)
X_train.head()

Unnamed: 0,EstimatedSalary
13,18000
47,54000
38,72000
173,43000
326,72000


# Feature Scaling

In [77]:
sc=StandardScaler()
X_train=sc.fit_transform(X_train)
X_test=sc.transform(X_test)

# Training Naive Baye's Classifier


In [78]:
classifier=GaussianNB()
classifier.fit(X_train,y_train)

GaussianNB()

# Prediction

In [79]:
y_pred=classifier.predict(X_test)
y_pred

array([0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,
       0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0,
       0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0], dtype=int64)

# Confusion Matrix and accuracy

In [80]:
cm=confusion_matrix(y_test,y_pred)
cm

array([[65,  2],
       [17, 16]], dtype=int64)

In [81]:
accuracy_score(y_test,y_pred)

0.81

# Applications 

Naive Bayes algorithms are mostly used in sentiment analysis, spam filtering, recommendation systems etc. They are fast and easy to implement but their biggest disadvantage is that the requirement of predictors to be independent. In most of the real life cases, the predictors are dependent, this hinders the performance of the classifier.