# Naive Bayes Classifier Tutorial with Python

Hello friends,

**Naive Bayes** is a family of supervised machine learning algorithms based on Bayes’ Theorem.  
It is called "Naive" because it assumes that the features are independent of each other,  
which is often not true in real-world data, but still works surprisingly well.

---

## Types of Naive Bayes Classifiers
1. **Gaussian Naive Bayes** – Assumes features follow a normal (Gaussian) distribution. Best for continuous data (like Age, Salary).  
2. **Multinomial Naive Bayes** – Works well with count data or text classification (like word frequencies).  
3. **Bernoulli Naive Bayes** – Assumes binary features (0/1). Useful for yes/no features.

---

## Real-world Applications
- Spam email detection  
- Sentiment analysis  
- Document/text classification  
- Medical diagnosis  


In [4]:
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing dataset utilities
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, accuracy_score
from sklearn.preprocessing import StandardScaler

# Import Naive Bayes models
from sklearn.naive_bayes import GaussianNB, MultinomialNB, BernoulliNB


In [5]:
# Importing the dataset
dataset = pd.read_csv("../data/logistic_classification.csv")
X = dataset.iloc[:, [2, 3]].values   # Age & Estimated Salary
y = dataset.iloc[:, -1].values       # Purchased (0 or 1)

# Splitting the dataset into Training and Test sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.20, random_state=0
)

# Feature Scaling (important for Gaussian NB, optional for others)
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)


In [6]:
# Training Naive Bayes Models

# Gaussian Naive Bayes
gnb = GaussianNB()
gnb.fit(X_train, y_train)
y_pred_gnb = gnb.predict(X_test)

# # Multinomial Naive Bayes
# mnb = MultinomialNB()
# mnb.fit(X_train, y_train)
# y_pred_mnb = mnb.predict(X_test)

# Bernoulli Naive Bayes
bnb = BernoulliNB()
bnb.fit(X_train, y_train)
y_pred_bnb = bnb.predict(X_test)


In [8]:
# Evaluation of models

def evaluate_model(name, y_true, y_pred):
    cm = confusion_matrix(y_true, y_pred)
    acc = accuracy_score(y_true, y_pred)
    print(f"Model: {name}")
    print("Confusion Matrix:\n", cm)
    print("Accuracy:", acc)
    print("-"*40)
    return acc

acc_gnb = evaluate_model("GaussianNB", y_test, y_pred_gnb)
# acc_mnb = evaluate_model("MultinomialNB", y_test, y_pred_mnb)
acc_bnb = evaluate_model("BernoulliNB", y_test, y_pred_bnb)

print("Summary of Accuracies:")
print("GaussianNB Accuracy:", acc_gnb)
# print("MultinomialNB Accuracy:", acc_mnb)
print("BernoulliNB Accuracy:", acc_bnb)


Model: GaussianNB
Confusion Matrix:
 [[55  3]
 [ 4 18]]
Accuracy: 0.9125
----------------------------------------
Model: BernoulliNB
Confusion Matrix:
 [[55  3]
 [11 11]]
Accuracy: 0.825
----------------------------------------
Summary of Accuracies:
GaussianNB Accuracy: 0.9125
BernoulliNB Accuracy: 0.825


# Summary & Key Takeaways

- **GaussianNB** usually works best for continuous features like Age and Salary.  
- **MultinomialNB** is more suited to discrete counts (e.g., text word counts).  
- **BernoulliNB** is effective when features are binary (yes/no).  

In this dataset, **GaussianNB is expected to perform better**.  
Naive Bayes is a simple, fast, and surprisingly effective algorithm for many classification problems.  
