<a href="https://colab.research.google.com/github/shahahmad-dev/machine_learnig/blob/main/21_naive_bayes.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Naive Bayes**:

---

## **What is Naive Bayes?**

**Naive Bayes** is a **machine learning algorithm** used for **classification tasks**.
It is called ‚ÄúNaive‚Äù because it **assumes that all features are independent** of each other ‚Äî which is a simplification, but it works surprisingly well in practice ‚úÖ.

---

### **Key Points**

1. **Probabilistic Model** üìä

   * It calculates the probability of each class based on the features.
   * Predicts the class with **highest probability**.

2. **Types of Naive Bayes**:

   * **GaussianNB** ‚Üí For numeric data (like age, salary)
   * **MultinomialNB** ‚Üí For count data (like word counts in text)
   * **BernoulliNB** ‚Üí For binary features (yes/no, 0/1)

3. **Advantages** ‚úÖ

   * Very fast and simple
   * Works well with large datasets
   * Performs surprisingly well even if the independence assumption is not true

4. **Disadvantages** ‚ö†Ô∏è

   * Assumes feature independence (not always true)
   * Doesn‚Äôt capture interactions between features

---

### **Example Use Case**

* Email **spam detection** üì®
* Job salary classification üíº
* Text sentiment analysis üòÄüò°

---
- **GaussianNB** ‚Üí numeric features ke liye best
- **MultinomialNB** ‚Üí text/count features ke liye
- **BernoulliNB** ‚Üí binary features ke liye



In [None]:
# import libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

from sklearn.naive_bayes import GaussianNB, MultinomialNB, BernoulliNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn.datasets import load_iris


In [None]:
# load the dataset
iris = load_iris()
X = iris.data
y = iris.target

# train test split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)



1. **Gaussian Naive Bayes (GaussianNB)**
   - Used for **numeric features**.
   - Assumes that the features follow a **normal (Gaussian) distribution**.
   - Example: Age, Salary, Height, Weight.

In [None]:
# model initliation
gnb = GaussianNB()

# train the model
gnb.fit(X_train, y_train)

# predict the model
y_pred = gnb.predict(X_test)

# evalate the model
print(f"Accuracy: {accuracy_score(y_test, y_pred)}")
print(f"Confusion Matrix: \n{confusion_matrix(y_test, y_pred)}")
print(f"Classification Report: \n{classification_report(y_test, y_pred)}")


Accuracy: 1.0
Confusion Matrix: 
[[10  0  0]
 [ 0  9  0]
 [ 0  0 11]]
Classification Report: 
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      1.00      1.00         9
           2       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30



2. **Multinomial Naive Bayes (MultinomialNB)**
   - Used for **count data** or **frequency-based features**.
   - Commonly used in **text classification** (e.g., word counts in documents).
   - Example: Document classification, spam detection.

In [None]:
# model initalize
mnb = MultinomialNB()

# train the model
mnb.fit(X_train, y_train)

# predict the model
y_pred = mnb.predict(X_test)

# evalate the model
print(f"Accuracy: {accuracy_score(y_test, y_pred)}")
print(f"Confusion Matrix: \n{confusion_matrix(y_test, y_pred)}")
print(f"Classification Report: \n{classification_report(y_test, y_pred)}")

#

Accuracy: 0.9
Confusion Matrix: 
[[10  0  0]
 [ 0  9  0]
 [ 0  3  8]]
Classification Report: 
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       0.75      1.00      0.86         9
           2       1.00      0.73      0.84        11

    accuracy                           0.90        30
   macro avg       0.92      0.91      0.90        30
weighted avg       0.93      0.90      0.90        30



In [None]:
# model initialize
gnb = GaussianNB()

# train the model
gnb.fit(X_train, y_train)

# predict the model
y_pred = gnb.predict(X_test)

# evalate the model
print(f"Accuracy: {accuracy_score(y_test, y_pred)}")
print(f"Confusion Matrix: \n{confusion_matrix(y_test, y_pred)}")
print(f"Classification Report: \n{classification_report(y_test, y_pred)}")


Accuracy: 1.0
Confusion Matrix: 
[[10  0  0]
 [ 0  9  0]
 [ 0  0 11]]
Classification Report: 
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      1.00      1.00         9
           2       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30

