# Naive Bayes Classification

## What is Naive Bayes Classification ?

It is a classification technique based on Bayes’ Theorem with an assumption of independence among predictors. In simple terms, a Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature. For example, a fruit may be considered to be an apple if it is red, round, and about 3 inches in diameter. Even if these features depend on each other or upon the existence of the other features, all of these properties independently contribute to the probability that this fruit is an apple and that is why it is known as ‘Naive’.

Naive Bayes model is easy to build and particularly useful for very large data sets. Along with simplicity, Naive Bayes is known to outperform even highly sophisticated classification methods.

(Credit - Analytics Vidyha)

## How to Implement Naive Bayes Classification ?

### Importing the libraries

In [3]:
#Data Processing Libraries
import numpy as np
import pandas as pd

# Machine Learning Library
from sklearn.preprocessing import LabelEncoder # Encode Categorical Variable to Numerical Variable
from sklearn.metrics import accuracy_score # Library for model evaluation
from sklearn.model_selection import train_test_split # Library to split datset into test and train

from sklearn.naive_bayes import GaussianNB #Naive Bayes Classifier

### Getting the data

In [4]:
iris_dataset = pd.read_csv("C:\\Users\\jagan\\OneDrive\\Documents\\Machine Learning - Projects\\Iris\iris_dataset.csv")

In [5]:
iris_dataset.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
sepal_length    150 non-null float64
sepal_width     150 non-null float64
petal_length    150 non-null float64
petal_width     150 non-null float64
class           150 non-null object
dtypes: float64(4), object(1)
memory usage: 5.9+ KB


In [6]:
iris_dataset.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,class
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


### Now converting the class into numerical variables

In [7]:
labelencoder_species = LabelEncoder()
labelencoder_X=LabelEncoder()
iris_dataset['class'] = labelencoder_species.fit_transform(iris_dataset['class'])

In [8]:
iris_dataset.head()  # Iris-Setosa - 0 ; Iris-virsicolor - 1 ; Iris-virginica - 2;

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,class
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


### Getting columns into X (feature variable) and y(target variable)

In [9]:
X = iris_dataset .iloc[:,0:4].values
y = iris_dataset.iloc[:,4].values

In [10]:
X[1:5]

array([[4.9, 3. , 1.4, 0.2],
       [4.7, 3.2, 1.3, 0.2],
       [4.6, 3.1, 1.5, 0.2],
       [5. , 3.6, 1.4, 0.2]])

In [11]:
y[1:5]

array([0, 0, 0, 0], dtype=int64)

### Dividing the dataset into test & train

In [12]:
X_train, X_test, y_train, y_test = train_test_split(X, y,test_size=0.30)
print('There are {} samples in the training set and {} samples in the test set'.format(X_train.shape[0], X_test.shape[0]))

There are 105 samples in the training set and 45 samples in the test set


## Run Naive Bayes Classification

In [13]:
classifier = GaussianNB() # This is the Naive Bayes classifier from Scikit Learn library

In [14]:
classifier.fit(X_train, y_train) # Fitting the training Set

GaussianNB(priors=None)

In [15]:
# Predicting the output on training datset
y_pred = classifier.predict(X_test)

In [16]:
#Accuracy
score_test = accuracy_score(y_test, y_pred)
print(score_test)

0.9555555555555556


### Naive Bayes Classifier Parameter details
http://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.GaussianNB.html