<a href="https://colab.research.google.com/github/singh-damanpreet04/Machine_Learning/blob/main/Day24.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Naive Bayes Algorithm


##  What is Naive Bayes Algorithm?

Naive Bayes is a **supervised learning** algorithm based on **Bayes‚Äô Theorem**.

It is used for **classification** problems and works well when the input features are **independent** (naive assumption).

---

## Real-World Example

Imagine you're building a spam filter. You want to classify emails as:

üì© **Spam** or ‚úÖ **Not Spam**

Based on words in the email:

* If it contains "free", "win", "cash" ‚Üí more likely **spam**
* If it contains "project", "meeting", "report" ‚Üí more likely **not spam**

---

##  Bayes‚Äô Theorem

The algorithm is based on this formula:

$$
P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}
$$

Where:

* **P(A|B)** = Probability of class A given feature B (posterior)
* **P(B|A)** = Probability of feature B given class A (likelihood)
* **P(A)** = Probability of class A (prior)
* **P(B)** = Probability of feature B (evidence)

---

## üß™ How Naive Bayes Works (Simplified Steps)

1. **Calculate Prior Probabilities**
   Example:

   ```
   P(Spam) = No. of spam emails / Total emails
   P(Not Spam) = No. of not spam emails / Total emails
   ```

2. **Calculate Likelihood for Each Feature**
   Example:

   ```
   P("free" | Spam) = No. of spam emails with "free" / Total spam emails
   ```

3. **Apply Bayes‚Äô Theorem** to get `P(Class | Features)` for each class.

4. **Choose the class with the highest probability**.

---

## ‚úÖ Types of Naive Bayes

| Type               | Use Case                                                                    |
| ------------------ | --------------------------------------------------------------------------- |
| **Gaussian NB**    | When features are continuous and follow Gaussian distribution (bell curve). |
| **Multinomial NB** | For text classification (e.g., spam detection).                             |
| **Bernoulli NB**   | When features are binary (0/1).                                             |
| **Categorical NB** | For categorical features (e.g., sunny, rainy)                               |

---

## üí° Key Assumption

> Naive Bayes assumes that all features are **independent** of each other.
> That's why it's called **naive**.

---


## ‚úÖ Advantages

* Simple and fast
* Works well with large data
* Performs well with text classification (e.g., spam detection)

## ‚ùå Limitations

* Assumes features are independent
* Doesn't perform well if features are correlated or if data is very complex






In [25]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import CategoricalNB
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score, classification_report


In [26]:
df = pd.read_csv("naive_bayes_dataset.csv")
df.head()

Unnamed: 0,Weather,Temperature,Play
0,Sunny,Hot,No
1,Sunny,Hot,No
2,Overcast,Hot,Yes
3,Rain,Mild,Yes
4,Rain,Cool,Yes


In [27]:
le_weather = LabelEncoder()
le_temp = LabelEncoder()
le_play = LabelEncoder()

In [28]:
df['Weather'] = le_weather.fit_transform(df['Weather'])
df['Temperature'] = le_temp.fit_transform(df['Temperature'])
df['Play'] = le_play.fit_transform(df['Play'])

In [29]:
df.head()

Unnamed: 0,Weather,Temperature,Play
0,2,1,0
1,2,1,0
2,0,1,1
3,1,2,1
4,1,0,1


In [30]:
X = df[['Weather', 'Temperature']]
y = df['Play']


In [31]:
# Example

#  X1 | X2 | Y
# --------------
#  3    5    9
#  3    5    9
#  3    5    9
#  3    5    9
#  3    5    9
# ------------------
#  3    5    9
#  3    5    9
#  3    5    9
#  3    5    9


# Training Part - 80
# 20

In [32]:
# Step 4: Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=23)



In [33]:
# Step 5: Train Naive Bayes model
model = CategoricalNB()

model.fit(X_train, y_train)

In [34]:
# Step 6: Make predictions
y_pred = model.predict(X_test)
y_pred

array([1, 1, 1, 1, 1])

In [35]:
y_test

Unnamed: 0,Play
0,0
1,0
2,1
10,1
13,0


In [36]:
# Step 7: Evaluate
print("Accuracy:", accuracy_score(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))

Accuracy: 0.4

Classification Report:
               precision    recall  f1-score   support

           0       0.00      0.00      0.00         3
           1       0.40      1.00      0.57         2

    accuracy                           0.40         5
   macro avg       0.20      0.50      0.29         5
weighted avg       0.16      0.40      0.23         5



  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


# Naive Bayes Example 2

In [37]:
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, classification_report

In [38]:
from sklearn.datasets import load_iris
data = load_iris()
data

{'data': array([[5.1, 3.5, 1.4, 0.2],
        [4.9, 3. , 1.4, 0.2],
        [4.7, 3.2, 1.3, 0.2],
        [4.6, 3.1, 1.5, 0.2],
        [5. , 3.6, 1.4, 0.2],
        [5.4, 3.9, 1.7, 0.4],
        [4.6, 3.4, 1.4, 0.3],
        [5. , 3.4, 1.5, 0.2],
        [4.4, 2.9, 1.4, 0.2],
        [4.9, 3.1, 1.5, 0.1],
        [5.4, 3.7, 1.5, 0.2],
        [4.8, 3.4, 1.6, 0.2],
        [4.8, 3. , 1.4, 0.1],
        [4.3, 3. , 1.1, 0.1],
        [5.8, 4. , 1.2, 0.2],
        [5.7, 4.4, 1.5, 0.4],
        [5.4, 3.9, 1.3, 0.4],
        [5.1, 3.5, 1.4, 0.3],
        [5.7, 3.8, 1.7, 0.3],
        [5.1, 3.8, 1.5, 0.3],
        [5.4, 3.4, 1.7, 0.2],
        [5.1, 3.7, 1.5, 0.4],
        [4.6, 3.6, 1. , 0.2],
        [5.1, 3.3, 1.7, 0.5],
        [4.8, 3.4, 1.9, 0.2],
        [5. , 3. , 1.6, 0.2],
        [5. , 3.4, 1.6, 0.4],
        [5.2, 3.5, 1.5, 0.2],
        [5.2, 3.4, 1.4, 0.2],
        [4.7, 3.2, 1.6, 0.2],
        [4.8, 3.1, 1.6, 0.2],
        [5.4, 3.4, 1.5, 0.4],
        [5.2, 4.1, 1.5, 0.1],
  

In [39]:
X = data.data
y = data.target

In [40]:
y

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

In [41]:
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=67)

In [42]:
# Create a Gaussian Naive Bayes model
model = GaussianNB()

In [43]:
# Train the model
model.fit(X_train, y_train)

In [44]:
# Make predictions on the test set
y_pred = model.predict(X_test)
y_pred

array([0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 2, 2, 1, 2, 1, 2, 1, 0, 1, 0, 1,
       1, 0, 0, 2, 1, 2, 1, 2, 0, 2, 2, 2, 2, 0, 1, 0, 1, 2, 2, 2, 1, 1,
       1])

In [45]:
y_test

array([0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 2, 2, 1, 1, 1, 2, 1, 0, 1, 0, 1,
       1, 0, 0, 2, 1, 2, 1, 2, 0, 2, 2, 2, 2, 0, 1, 0, 1, 2, 2, 2, 1, 1,
       1])

In [46]:
# Evaluate the model
print("Accuracy:", accuracy_score(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))

Accuracy: 0.9777777777777777

Classification Report:
               precision    recall  f1-score   support

           0       1.00      1.00      1.00        12
           1       1.00      0.95      0.97        20
           2       0.93      1.00      0.96        13

    accuracy                           0.98        45
   macro avg       0.98      0.98      0.98        45
weighted avg       0.98      0.98      0.98        45

