### Naive Bayes Classification using Scikit-learn

• Suppose you are a product manager, you want to classify customer reviews in positive and negative classes.

• Or As a loan manager, you want to identify which loan applicants are safe or risky? As a healthcare analyst, you want to predict which patients can suffer from diabetes disease.

• All the examples have the same kind of problem to classify reviews, loan applicants, and patients.

• Naive Bayes is the most straightforward and fast classification algorithm, which is suitable for a large chunk of data.

• Naive Bayes classifier is successfully used in various applications such as spam filtering, text classification, sentiment analysis, and recommender systems.

• It uses Bayes theorem of probability for prediction of unknown class.

### Classification Workflow

• Whenever you perform classification, the first step is to understand the problem and identify potential features and label.

• Features are those characteristics or attributes which affect the results of the label.

• For example, in the case of a loan distribution, bank manager's identify customer’s occupation, income, age, location, previous loan history, transaction history, and credit score.

• These characteristics are known as features which help the model classify customers.

• The classification has two phases, a learning phase, and the evaluation phase.

• In the learning phase, classifier trains its model on a given dataset and in the evaluation phase, it tests the classifier performance.

• Performance is evaluated on the basis of various parameters such as accuracy, error, precision, and recall.

What is Naive Bayes Classifier?

• Naive Bayes is a statistical classification technique based on Bayes Theorem.

• It is one of the simplest supervised learning algorithms.

• Naive Bayes classifier is the fast, accurate and reliable algorithm.

• Naive Bayes classifiers have high accuracy and speed on large datasets.

• Naive Bayes classifier assumes that the effect of a particular feature in a class is independent of other features.

• For example, a loan applicant is desirable or not depending on his/her income, previous loan and transaction history, age, and location.

• Even if these features are interdependent, these features are still considered independently.

• This assumption simplifies computation, and that's why it is considered as naive.

• This assumption is called class conditional independence.

                                    P(A/B) = P(B/A)P(A)/P(B)

Where,

    • P(A): the probability of hypothesis h being true (regardless of the data). This is known as the prior probability of h.

    • P(B): the probability of the data (regardless of the hypothesis). This is known as the prior probability.

    • P(A|B): the probability of hypothesis h given the data D. This is known as posterior probability.
    
    • P(B|A): the probability of data d given that the hypothesis h was true.  This is known as posterior probability.

### How Naive Bayes classifier works?

• Let’s understand the working of Naive Bayes through an example. Given an example of weather conditions and playing sports.

• You need to calculate the probability of playing sports.

• Now, you need to classify whether players will play or not, based on the weather condition.

• Naive Bayes classifier calculates the probability of an event in the following steps:

    Step 1: Calculate the prior probability for given class labels

    Step 2: Find Likelihood probability with each attribute for each class

    Step 3: Put these value in Bayes Formula and calculate posterior probability.

    Step 4: See which class has a higher probability, given the input belongs to the higher probability class.

### Classifier Building in Scikit-learn

<B><h3>Naive Bayes Classifier</B></h3>

<h4><B>Defining Dataset</B></h4>

• In this example, you can use the dummy dataset with three columns: weather, temperature, and play.

• The first two are features(weather, temperature) and the other is the label.

In [4]:
# Assigning features and label variables
weather=['Sunny','Sunny','Overcast','Rainy','Rainy','Rainy','Overcast','Sunny','Sunny',
'Rainy','Sunny','Overcast','Overcast','Rainy']

temp=['Hot','Hot','Hot','Mild','Cool','Cool','Cool','Mild','Cool','Mild','Mild','Mild','Hot','Mild']

play=['No','No','Yes','Yes','Yes','No','Yes','No','Yes','Yes','Yes','Yes','Yes','No']

### Encoding Features

• First, you need to convert these string labels into numbers.

• for example: 'Overcast', 'Rainy', 'Sunny' as 0, 1, 2.

• This is known as label encoding.

• Scikit-learn provides LabelEncoder library for encoding labels with a value between 0 and one less than the number of discrete classes.

In [5]:
# Import LabelEncoder
from sklearn import preprocessing
#creating labelEncoder
le = preprocessing.LabelEncoder()
# Converting string labels into numbers.
weather_encoded=le.fit_transform(weather)
print(weather_encoded)

[2 2 0 1 1 1 0 2 2 1 2 0 0 1]


• Similarly, you can also encode temp and play columns.

In [8]:
# Converting string labels into numbers
temp_encoded=le.fit_transform(temp)
label=le.fit_transform(play)
print ("Temp:",temp_encoded)
print ("Play:",label)

Temp: [1 1 1 2 0 0 0 2 0 2 2 2 1 2]
Play: [0 0 1 1 1 0 1 0 1 1 1 1 1 0]


• Now combine both the features (weather and temp) in a single variable (list of tuples).

In [9]:
#Combinig weather and temp into single listof tuples
features=list(zip(weather_encoded,temp_encoded))
features

[(2, 1),
 (2, 1),
 (0, 1),
 (1, 2),
 (1, 0),
 (1, 0),
 (0, 0),
 (2, 2),
 (2, 0),
 (1, 2),
 (2, 2),
 (0, 2),
 (0, 1),
 (1, 2)]

### Generating Model

• Generate a model using naive bayes classifier in the following steps:

    • Create naive bayes classifier

    • Fit the dataset on classifier
    
    • Perform prediction

In [10]:
# Import Gaussian Naive Bayes model
from sklearn.naive_bayes import GaussianNB

# Create a Gaussian Classifier
model = GaussianNB()

# Train the model using the training sets
model.fit(features,label)

# Predict output
predicted = model.predict([[1, 2]]) # 0 : Overcast, 1: Rainy, 2 : Mild
print("Predicted Value:", predicted)

Predicted Value: [1]


• Here, 1 indicates that players can 'play'.