<h1> Classifying Iris Flower Dataset Using Naive Bayes Classifier </h1>

<h2> Naive Bayes Classifier </h2>

Naive Bayes classifiers are a collection of classification algorithms based on Bayes’ Theorem. It comprises of a collection of algorithms where all of them share a common principle, that is every pair of features being classified is independent of each other.

The fundamental Naive Bayes assumption is that each feature is:

* Independent: We assume that no pair of features are dependent.
* Equal: Each feature is given the same weight (or importance). None of the attributes is irrelevant and assumed to be contributing equally to the outcome.

<h2>Pros and Cons of Naive Bayes</h2>
<b>Pros:</b>
<ul>
  <li>This algorithm works very fast and can easily predict the class of a test dataset.</li>
  <li>You can use it to solve multi-class prediction problems as it’s quite useful with them.</li>
  <li>Naive Bayes classifier performs better than other models with less training data if the assumption of independence of features holds.</li>
  <li>If you have categorical input variables, the Naive Bayes algorithm performs exceptionally well in comparison to numerical variables.</li>
</ul>
<b>Cons:</b>
<ul>
  <li>If your test data set has a categorical variable of a category that wasn’t present in the training data set, the Naive Bayes model will assign it zero probability and won’t be able to make any predictions in this regard. This phenomenon is called ‘Zero Frequency,’ and you’ll have to use a smoothing technique to solve this problem.</li>
  <li>It assumes that all the features are independent. While it might sound great in theory, in real life, you’ll hardly find a set of independent features.</li>
</ul>

<h2>What is Naive Bayes Classifier used for?</h2>

Naive Bayes Classifier has a wide range of application but is mostly used for cases that include multi class classification.

<ol>
  <li><em>Real time Prediction:</em> Naive Bayes is an eager learning classifier and it is sure fast. Thus, it could be used for making predictions in real time. </li>
  <li><em>Multi class Prediction:</em> This algorithm is also well known for multi class prediction feature. Here we can predict the probability of multiple classes of target variable.</li>
  <li><em>Text classification/ Spam Filtering/ Sentiment Analysis:</em> Naive Bayes classifiers mostly used in text classification, have higher success rate as compared to other algorithms. As a result, it is widely used in Spam filtering and Sentiment Analysis.</li>
  <li><em>Recommendation System:</em> Naive Bayes Classifier and Collaborative Filtering together builds a Recommendation System that uses machine learning and data mining techniques to filter unseen information and predict whether a user would like a given resource or not.</li>
</ol>

<h2>Steps to Build a Naive Bayes Classifier</h2>

<ol>
    <li>Import the Dataset</li>
    <li>Split the Dataset into Training and Testing Values</li>
    <li>Create Naive Bayes Model</li>
    <li>Predict the Output Values</li>
    <li>Check the Error Rate</li>
</ol>

<h3>1. Import the Dataset</h3>

In [3]:
#Importing the IRIS dataset from scikit-learn 

from sklearn.datasets import load_iris

In [4]:
#Load the IRIS dataset into a variable called iris

iris = load_iris()

In [62]:
#X contains all the features/labels (feature matrix: sepal length, sepal width, petal length and petal width) of the IRIS dataset
X = iris.data

#y contains all the target labels (response vector: Setosa, Versicolor, or Virginica) of the IRIS dataset
y = iris.target

In [63]:
#Display the features
X

array([[5.1, 3.5, 1.4, 0.2],
       [4.9, 3. , 1.4, 0.2],
       [4.7, 3.2, 1.3, 0.2],
       [4.6, 3.1, 1.5, 0.2],
       [5. , 3.6, 1.4, 0.2],
       [5.4, 3.9, 1.7, 0.4],
       [4.6, 3.4, 1.4, 0.3],
       [5. , 3.4, 1.5, 0.2],
       [4.4, 2.9, 1.4, 0.2],
       [4.9, 3.1, 1.5, 0.1],
       [5.4, 3.7, 1.5, 0.2],
       [4.8, 3.4, 1.6, 0.2],
       [4.8, 3. , 1.4, 0.1],
       [4.3, 3. , 1.1, 0.1],
       [5.8, 4. , 1.2, 0.2],
       [5.7, 4.4, 1.5, 0.4],
       [5.4, 3.9, 1.3, 0.4],
       [5.1, 3.5, 1.4, 0.3],
       [5.7, 3.8, 1.7, 0.3],
       [5.1, 3.8, 1.5, 0.3],
       [5.4, 3.4, 1.7, 0.2],
       [5.1, 3.7, 1.5, 0.4],
       [4.6, 3.6, 1. , 0.2],
       [5.1, 3.3, 1.7, 0.5],
       [4.8, 3.4, 1.9, 0.2],
       [5. , 3. , 1.6, 0.2],
       [5. , 3.4, 1.6, 0.4],
       [5.2, 3.5, 1.5, 0.2],
       [5.2, 3.4, 1.4, 0.2],
       [4.7, 3.2, 1.6, 0.2],
       [4.8, 3.1, 1.6, 0.2],
       [5.4, 3.4, 1.5, 0.4],
       [5.2, 4.1, 1.5, 0.1],
       [5.5, 4.2, 1.4, 0.2],
       [4.9, 3

In [64]:
#Display the target labels
y

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

<h3>2. Split the Dataset into Training and Testing Values</h3>

In [65]:
#Split the dataset into testing and training data

from sklearn.model_selection import train_test_split

In [66]:
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2, random_state=1)

<h3>3. Create Naive Bayes Model</h3>

In [67]:
#Import the Gaussian Naive Bayes Classifier

from sklearn.naive_bayes import GaussianNB

In [68]:
#Create and fit the model

gnb = GaussianNB()
gnb.fit(X_train,y_train)

GaussianNB()

<h3>4. Predict the Output Values</h3>

In [69]:
#Predict the class for test features

y_pred = gnb.predict(X_test)

In [73]:
#Creating a dataframe of the actual and predicted values of the IRIS dataset
import pandas as pd

df= pd.DataFrame({'Actual':y_test,'Predicted':y_pred})
df

Unnamed: 0,Actual,Predicted
0,0,0
1,1,1
2,1,1
3,0,0
4,2,2
5,1,1
6,2,2
7,0,0
8,0,0
9,2,2


<h3>5. Check the Error Rate</h3>

In [70]:
#Import metrics to check the accuarcy and loss of the model

from sklearn import metrics

In [71]:
#Testing the accuracy of the model

acc = metrics.accuracy_score(y_test, y_pred)*100

print("Accuracy of the Gaussian Naive Bayes Model is:", acc)

Accuracy of the Gaussian Naive Bayes Model is: 96.66666666666667


In [72]:
#Testing the error of the model

print('Mean Squared Error:', metrics.mean_squared_error(y_test, y_pred))

print('Mean Absolute Error:', metrics.mean_absolute_error(y_test, y_pred))

print('Max Error:', metrics.max_error(y_test, y_pred))

Mean Squared Error: 0.03333333333333333
Mean Absolute Error: 0.03333333333333333
Max Error: 1


<h3>Final Outcome</h3>

Frome the above results we can see that that we have successfully built a Naive Bayes Classifier with 96.67% accuracy which means that the classifier is able correctly classify the output class of a new unknown data sample with 96.67% accuracy.

The dataframe above show the actual class and the predicted class values of the IRIS Dataset.