<a href="https://colab.research.google.com/github/sandipanpaul21/KNN-Naive-Bayes-in-Python/blob/main/08_Naive_Bayes.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Naive Bayes** works both for Classification and Regression Algorithim and Bayes Theorem is used to find Conditional Probabilities 

**How does the Naive Bayes classifier work?**

**Why is Naive Bayes naive?**

- Naive Bayes' underlying assumption is that the predictors (attributes / independent variables) are independent of each other. 
- This is a big assumption because it is easy to show that there is often at least some correlation between variables in real life. 
- It is precisely this assumption of independence that makes Bayes classification “naive.”

But Naive Bayes is a fast algorithm since it scales easily to include many predictors without having to handle multi-dimensional correlations.

**Conditional probabilities**
- To understand Naive Bayes, we first need to understand conditional probabilities. 
- For that, let’s use the below example.
  - Assume we have a bucket filled with red and black balls. 
  - In total, there are 15 balls: 7 red and 8 black.
  - The probability of randomly picking a red ball out of the bucket is 7/15. You can write it as P(red) = 7/15.
  - If we were to draw balls one at a time without replacing them, what is the probability of getting a black ball on a second attempt after drawing a red one on the first attempt?
  - You can see that the above question is worded to provide us with the condition that needs to be satisfied first before the second attempt is made.
  - That condition says that a red ball must be drawn during the first attempt.
  - As stated earlier, the probability of getting a red ball on the first attempt (P(red)) is 7/15. That leaves 14 balls inside a bucket with 6 red and 8 black. Hence, the probability of getting a black ball next is 8/14 = 4/7.
  - We can write this as a conditional probability:
P(black|red) = 4/7. (probability of black given red)
  - Similarly, P(red and black) = P(red) * P(black|red) = 7/15 * 8/14 = 4/15.
  - Similarly, P(black and red) = P(black) * P(red|black) = 8/15 * 7/14 = 4/15.

**Bayes’ theorem**
- The Bayes’ theorem helps us calculate conditional probabilities of an event when we know the likelihood of a reverse event. 
- Using the example above, we would write it as follows:

  P(Black|Red) = [P(Black) * P(Red|Black)] / P(Red)
  
  This will also give result as 4/15

**Naive Bayes classifier**
  
    P(A|B) = [P(B|A) * P(A)] / P(B)

    where,

    P(A) : prior probability, probability which we get before any additional information is obtained

    P(B) : evidence

    P(A|B) : posterior probability, probability which we get or revised after any additional information is obtained or also known as conditional probability of event A occuring for the event B which has already occured

    p(B|A) : likelihood, 

    Posterior = Likelihood * Prior / Evidence


  - Example,
    - P(Fire) is the Prior
    - P(Smoke|Fire) is the Likelihood
    - P(Smoke) is the Evidence
    - Then according to formula,
      
      P(Fire|Smoke) = P(Smoke|Fire)*P(Fire)/P(Smoke)

  - Another Example,
    - P(Cloud) is the Prior
    - P(Rain|Cloud) is the Likelihood
    - P(Rain) is the Evidence
    - Then according to formula,
      
      P(Cloud|Rain) = P(Rain|Cloud)*P(Cloud)/P(Rain)

  - Here the assumption is each input variable is independent from each other
  - By above formula, it calculate conditional probability of each class for a given instance. In other word, conditional probablity of all the variable given the class label.

**Why Naive Bayes is called Naive**

- Naive means show lack of experience / wisdom / judegement
- Naive Bayes is naive because it makes the assumption that features of the dataset are indepedent of each other
- This is a strong assumption and unrealistic for real data and thats why Naive

**Disadvantages**
- The assumption of independent features. 
- In practice, it is almost impossible that model will get a set of predictors which are entirely independent.






In [1]:
#  Libraries

from sklearn import datasets
import pandas as pd
import statsmodels.api as sm
# Import Gaussian Naive Bayes model
from sklearn.naive_bayes import GaussianNB
from sklearn import metrics
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score,recall_score,precision_score,f1_score
from sklearn.metrics import confusion_matrix, classification_report, roc_auc_score
import warnings
warnings.filterwarnings("ignore")
import numpy as np
from sklearn.metrics import classification_report

  import pandas.util.testing as tm


In [2]:
# IRIS Dataset
iriss = datasets.load_iris()
iris = pd.DataFrame(iriss.data)
iris.columns = iriss.feature_names
iris['species'] = iriss.target
iris.columns = iris.columns.str.replace(" ","")
iris.columns = iris.columns.str.replace("(","")
iris.columns = iris.columns.str.replace(")","")
iris.head()

Unnamed: 0,sepallengthcm,sepalwidthcm,petallengthcm,petalwidthcm,species
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


In [3]:
# Target Column Distribution
iris['species'].value_counts()

2    50
1    50
0    50
Name: species, dtype: int64

In [4]:
# Distribution (mean) of Independent Columns respect to Dependent Column
iris.groupby('species').mean().round(2)

Unnamed: 0_level_0,sepallengthcm,sepalwidthcm,petallengthcm,petalwidthcm
species,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,5.01,3.43,1.46,0.25
1,5.94,2.77,4.26,1.33
2,6.59,2.97,5.55,2.03


In [5]:
# Independent Variables
Independent_Variable_Base_Set = iris[iris.columns[0:4]]
Independent_Variable_Base_Set.head()

Unnamed: 0,sepallengthcm,sepalwidthcm,petallengthcm,petalwidthcm
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2


In [6]:
# Dependent Variable
Dependent_Variable = iris[iris.columns[-1:iris.columns.size]]
Dependent_Variable.head()

Unnamed: 0,species
0,0
1,0
2,0
3,0
4,0


In [7]:
# Split the Dataset

# Model 1
# Lets start with selecting one variable
Independent_Variable_Set_v1 = iris[iris.columns[0:1]]
X_train, X_test, y_train, y_test = train_test_split(Independent_Variable_Set_v1,Dependent_Variable,test_size = 0.3,random_state = 21)

# Create a Gaussian Classifier
model = GaussianNB()

# Train the model using the training sets
result = model.fit(X_train,y_train)

# Model Prediction
print("Sample Prediction of Model 1")
pred = result.predict(X_test)
model_prediction = pd.DataFrame(pred.round(2),columns = ['Prediction'])
print(model_prediction['Prediction'].head())

# Test Set Target Column Distribution
print("\nTest Set Distribution")
print(y_test['species'].value_counts())

# Predicted Set Target Column Distribution
print("\nPredicted Set Distribution")
print(model_prediction['Prediction'].value_counts())

Sample Prediction of Model 1
0    1
1    0
2    0
3    0
4    1
Name: Prediction, dtype: int64

Test Set Distribution
1    16
2    15
0    14
Name: species, dtype: int64

Predicted Set Distribution
1    19
0    14
2    12
Name: Prediction, dtype: int64


In [8]:
# Model Performance

# Take the average of the f1-score for each class: that's the avg / total result above. 
# It's also called macro averaging.

# Compute the f1-score using the global count of true positives / false negatives, etc. 
# (Sum the number of true positives / false negatives for each class). Aka micro averaging.

y_pred = model_prediction[['Prediction']]
model_1_accuracy = accuracy_score(y_test,y_pred).round(2)
print("Model 1 Performance")
print("\nModel 1, Accuracy :",model_1_accuracy)
model_1_precision = precision_score(y_test,y_pred, average="micro").round(2)
print("Model 1, Precision :",model_1_precision)
model_1_recall = recall_score(y_test,y_pred, average="micro").round(2)
print("Model 1, Recall :",model_1_recall)
model_1_fscore = f1_score(y_test,y_pred, average="micro").round(2)
print("Model 1, F1 Score :",model_1_fscore)
print("\nConfusion Matrix, Model 1")
model_1_cm = confusion_matrix(y_test,y_pred)
print(model_1_cm)
print("\nClassification Report, Model 1")
model_1_cr = classification_report(y_test, y_pred)
print(model_1_cr)

print("Inference : Good Fit, can we make it better ?")

Model 1 Performance

Model 1, Accuracy : 0.67
Model 1, Precision : 0.67
Model 1, Recall : 0.67
Model 1, F1 Score : 0.67

Confusion Matrix, Model 1
[[13  1  0]
 [ 1 10  5]
 [ 0  8  7]]

Classification Report, Model 1
              precision    recall  f1-score   support

           0       0.93      0.93      0.93        14
           1       0.53      0.62      0.57        16
           2       0.58      0.47      0.52        15

    accuracy                           0.67        45
   macro avg       0.68      0.67      0.67        45
weighted avg       0.67      0.67      0.66        45

Inference : Good Fit, can we make it better ?


In [9]:
# Model 2
# Lets build model with all variables

X_train, X_test, y_train, y_test = train_test_split(Independent_Variable_Base_Set,Dependent_Variable,test_size = 0.3,random_state = 21)
# Train the model using the training sets
result = model.fit(X_train,y_train)

# Model Prediction
print("Sample Prediction of Model 2")
pred = result.predict(X_test)
model_prediction = pd.DataFrame(pred.round(2),columns = ['Prediction'])
print(model_prediction['Prediction'].head())

# Test Set Target Column Distribution
print("\nTest Set Distribution")
print(y_test['species'].value_counts())

# Predicted Set Target Column Distribution
print("\nPredicted Set Distribution")
print(model_prediction['Prediction'].value_counts())

Sample Prediction of Model 2
0    1
1    0
2    0
3    0
4    1
Name: Prediction, dtype: int64

Test Set Distribution
1    16
2    15
0    14
Name: species, dtype: int64

Predicted Set Distribution
2    17
1    14
0    14
Name: Prediction, dtype: int64


In [10]:
# Model Performance

# Take the average of the f1-score for each class: that's the avg / total result above. 
# It's also called macro averaging.

# Compute the f1-score using the global count of true positives / false negatives, etc. 
# (Sum the number of true positives / false negatives for each class). Aka micro averaging.

y_pred = model_prediction[['Prediction']]
model_2_accuracy = accuracy_score(y_test,y_pred).round(2)
print("Model 2 Performance")
print("\nModel 2, Accuracy :",model_2_accuracy)
model_2_precision = precision_score(y_test,y_pred, average="micro").round(2)
print("Model 2, Precision :",model_2_precision)
model_2_recall = recall_score(y_test,y_pred, average="micro").round(2)
print("Model 2, Recall :",model_2_recall)
model_2_fscore = f1_score(y_test,y_pred, average="micro").round(2)
print("Model 2, F1 Score :",model_2_fscore)
print("\nConfusion Matrix, Model 2")
model_2_cm = confusion_matrix(y_test,y_pred)
print(model_2_cm)
print("\nClassification Report, Model 2")
model_2_cr = classification_report(y_test, y_pred)
print(model_2_cr)

print("Inference : Model 2 is OVERFIT, so choose Model 1")

Model 2 Performance

Model 2, Accuracy : 0.96
Model 2, Precision : 0.96
Model 2, Recall : 0.96
Model 2, F1 Score : 0.96

Confusion Matrix, Model 2
[[14  0  0]
 [ 0 14  2]
 [ 0  0 15]]

Classification Report, Model 2
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        14
           1       1.00      0.88      0.93        16
           2       0.88      1.00      0.94        15

    accuracy                           0.96        45
   macro avg       0.96      0.96      0.96        45
weighted avg       0.96      0.96      0.96        45

Inference : Model 2 is OVERFIT, so choose Model 1
