# 📊 Naive Bayes Classifier

## 🌟 Real-Life Example: The Weather & Golf Decision

Imagine you're **Tom**, and every morning you decide whether to play golf based on the weather. You've kept a diary for 14 days:

| Day | Outlook   | Temperature | Humidity | Windy | Play Golf? |
|-----|-----------|-------------|----------|-------|------------|
| 1   | Rainy     | Hot         | High     | No    | ❌ No      |
| 2   | Rainy     | Hot         | High     | Yes   | ❌ No      |
| 3   | Overcast  | Hot         | High     | No    | ✅ Yes     |
| 4   | Sunny     | Mild        | High     | No    | ✅ Yes     |
| 5   | Sunny     | Cool        | Normal   | No    | ✅ Yes     |
| 6   | Sunny     | Cool        | Normal   | Yes   | ❌ No      |
| 7   | Overcast  | Cool        | Normal   | Yes   | ✅ Yes     |
| 8   | Rainy     | Mild        | High     | No    | ❌ No      |
| 9   | Rainy     | Cool        | Normal   | No    | ✅ Yes     |
| 10  | Sunny     | Mild        | Normal   | No    | ✅ Yes     |
| 11  | Rainy     | Mild        | Normal   | Yes   | ✅ Yes     |
| 12  | Overcast  | Mild        | High     | Yes   | ✅ Yes     |
| 13  | Overcast  | Hot         | Normal   | No    | ✅ Yes     |
| 14  | Sunny     | Mild        | High     | Yes   | ❌ No      |

**Today's weather**: Sunny, Hot, Normal humidity, No wind  
**Question**: Should Tom play golf today?

---

## 🧠 Step 1: What Would a Smart Person Do?

A smart person would look at their diary and ask:
- "On **sunny** days, how often did I play golf?"
- "When it was **hot**, how often did I play?"
- "With **normal humidity**, what happened?"
- "When it was **not windy**, what was the outcome?"

Then they'd combine all this information to make a decision.

**This is exactly what Naive Bayes does!**

---
## 📐 Bayes' Theorem

### Definition of Conditional Probability:
$$
P(A|B) = \frac{P(A \cap B)}{P(B)}
$$

$$
P(B|A) = \frac{P(A \cap B)}{P(A)}
$$

### Bayes' Theorem:
$$
P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}
$$

$$
P(B|A) = \frac{P(A|B) \cdot P(B)}{P(A)}
$$

**Where:**
- $P(A|B)$: Probability of A given B (posterior)
- $P(B|A)$: Probability of B given A (likelihood)  
- $P(A)$: Prior probability of A (prior)
- $P(B)$: Total probability of B (evidence)
- $P(A \cap B)$: Probability of both A and B occurring together

---

## 🔢 Step 2: Let's Count From the Diary

### Overall Statistics:
- **Total days**: 14
- **Played golf (Yes)**: 9 days → P(Yes) = 9/14 ≈ 0.64
- **Didn't play (No)**: 5 days → P(No) = 5/14 ≈ 0.36

### For "Sunny" Outlook:
- **Sunny days**: 5 total (Days 4, 5, 6, 10, 14)
- **Sunny + Played**: 3 days (Days 4, 5, 10) → P(Sunny | Yes) = 3/9 = 0.33
   - golf played days in sunny days / total golf played days
   - i.e. probability of sunny days given that golf is played .

- **Sunny + Didn't Play**: 2 days (Days 6, 14) → P(Sunny | No) = 2/5 = 0.40
   - probability of sunny days given that golf is not played

### For "Hot" Temperature:
- **Hot days**: 4 total (Days 1, 2, 3, 13)
- **Hot + Played**: 2 days (Days 3, 13) → P(Hot | Yes) = 2/9 ≈ 0.22
- **Hot + Didn't Play**: 2 days (Days 1, 2) → P(Hot | No) = 2/5 = 0.40

### For "Normal" Humidity:
- **Normal humidity days**: 7 total
- **Normal + Played**: 6 days → P(Normal | Yes) = 6/9 ≈ 0.67
- **Normal + Didn't Play**: 1 day → P(Normal | No) = 1/5 = 0.20

### For "No Wind":
- **No wind days**: 8 total
- **No wind + Played**: 6 days → P(No Wind | Yes) = 6/9 ≈ 0.67
- **No wind + Didn't Play**: 2 days → P(No Wind | No) = 2/5 = 0.40

---

## 🧮 Step 3: The "Naive" Assumption

Here's the key insight: **Naive Bayes assumes each weather condition is independent**.

This means:
- Whether it's sunny doesn't affect whether it's hot
- Humidity doesn't depend on wind
- Each condition gives separate evidence

**In reality, this isn't true** (sunny days are often hot), but it makes the math simple and still works well!

So instead of calculating P(Sunny AND Hot AND Normal AND No Wind | Yes), we calculate:

P(Sunny | Yes) × P(Hot | Yes) × P(Normal | Yes) × P(No Wind | Yes)


---

## 🎯 Step 4: Calculate the Scores

### Score for "Play Golf = YES":

P(Yes) × P(Sunny | Yes) × P(Hot | Yes) × P(Normal | Yes) × P(No Wind | Yes)

= 0.64 × 0.33 × 0.22 × 0.67 × 0.67

≈ 0.64 × 0.033

≈ 0.021


### Score for "Play Golf = NO":

P(No) × P(Sunny | No) × P(Hot | No) × P(Normal | No) × P(No Wind | No)

= 0.36 × 0.40 × 0.40 × 0.20 × 0.40

= 0.36 × 0.0128

≈ 0.0046



---

## 🏆 Step 5: Make the Decision

- **YES score**: 0.021
- **NO score**: 0.0046

Since **0.021 > 0.0046**, Naive Bayes predicts: **✅ PLAY GOLF!**

---

## 📚 Connecting to Technical Terms

### What We Just Did = Naive Bayes Algorithm

1. **Bayes' Theorem**: 
   - We calculated P(Class | Features) using P(Features | Class) × P(Class)
   - This is the core of Bayes' theorem

2. **"Naive" Assumption**: 
   - We assumed P(Feature1 AND Feature2 | Class) = P(Feature1 | Class) × P(Feature2 | Class)
   - This independence assumption is why it's called "naive"

3. **Prior Probability**: 
   - P(Yes) = 0.64 and P(No) = 0.36 are called "priors"
   - They represent our initial belief before seeing today's weather

4. **Likelihood**: 
   - P(Sunny | Yes) = 0.33 is the "likelihood"
   - It tells us how likely sunny weather is given that we played golf

5. **Posterior Probability**: 
   - The final scores (0.021 and 0.0046) are proportional to "posterior probabilities"
   - We choose the class with the highest posterior

---

## 🔄 How This Works for Any Problem

### General Formula:
For any new example with features [F1, F2, F3, ..., Fn]:

Score for Class C = P(C) × P(F1 | C) × P(F2 | C) × ... × P(Fn | C)



### For Different Data Types:

#### **Text Classification (Spam Detection)**:
- Features = words in email
- P("FREE" | Spam) = how often "FREE" appears in spam emails
- P("meeting" | Ham) = how often "meeting" appears in legitimate emails

#### **Medical Diagnosis**:
- Features = symptoms (fever, cough, headache)
- P(Fever | Flu) = how often fever occurs with flu
- P(Cough | Common Cold) = how often cough occurs with common cold

#### **Product Reviews**:
- Features = words in review
- P("excellent" | Positive) = how often "excellent" appears in positive reviews
- P("terrible" | Negative) = how often "terrible" appears in negative reviews

---

## ⚠️ The Zero Probability Problem

### What if a word never appeared?
Imagine in your golf diary, you never had a "Sunny + Hot + Normal + No Wind" combination.

**Problem**: If any P(Feature | Class) = 0, the entire score becomes 0!

### Solution: Laplace Smoothing
Instead of counting raw frequencies, we add 1 to every count:

**Original**: P(Sunny | Yes) = 3/9  
**With smoothing**: P(Sunny | Yes) = (3+1)/(9+3) = 4/12 = 0.33

Where "3" is the number of possible outlook values (Sunny, Rainy, Overcast).

This ensures no probability is ever zero!

---

## 🎯 Why Naive Bayes Works So Well

### The Secret Sauce:
1. **It doesn't need perfect independence** – even if features are somewhat related, the relative scores still work
2. **It focuses on the biggest patterns** – small errors in individual probabilities cancel out
3. **It's incredibly fast** – just counting and multiplying
4. **It works with small data** – you don't need millions of examples

### When It's Perfect:
- **Text classification**: Words in documents are somewhat independent
- **Real-time decisions**: Email spam filtering, chatbot responses
- **Baseline models**: Quick first attempt before complex models

---

## 🚀 Key Takeaways

- **Naive Bayes = Smart counting + Simple math**
- **"Naive" = assumes features are independent** (simplification that works)
- **Works by calculating scores for each class** and picking the highest
- **Perfect for text problems** like spam detection and sentiment analysis
- **Handles the "never seen before" problem** with Laplace smoothing
- **Fast, simple, and surprisingly accurate**

> **Remember**: Naive Bayes is like your friend who makes decisions by looking at past patterns and saying "Based on what I've seen before, this is most likely what will happen!" 🎯

### Types of Naive Bayes
| Type | Use-case | Data Type |
|------|-----------|-----------|
| GaussianNB | Continuous | Real numbers |
| MultinomialNB | Discrete counts | Word frequency |
| BernoulliNB | Binary | Word presence (0/1) |

# 1. Gaussian Naive Bayes

In [22]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import confusion_matrix, classification_report, accuracy_score
from sklearn.datasets import load_wine

In [23]:
data = load_wine()
df = pd.DataFrame(data.data, columns=data.feature_names)
df["Target"] = data.target

X = df.drop("Target", axis=1)
y = df["Target"]

In [24]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

In [25]:
model = GaussianNB()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

In [26]:
cm = confusion_matrix(y_test, y_pred)
acc = accuracy_score(y_test, y_pred)
cr = classification_report(y_test, y_pred)

print(f"Cnfusion Matrix:\n {cm}")
print(f"\nAccuracy Score: {acc:.3f}")
print(f"\nClassification Report: {cr}")

Cnfusion Matrix:
 [[14  0  0]
 [ 0 14  0]
 [ 0  0  8]]

Accuracy Score: 1.000

Classification Report:               precision    recall  f1-score   support

           0       1.00      1.00      1.00        14
           1       1.00      1.00      1.00        14
           2       1.00      1.00      1.00         8

    accuracy                           1.00        36
   macro avg       1.00      1.00      1.00        36
weighted avg       1.00      1.00      1.00        36



# 2. Multinomial Naive Bayes

building a SPAM classifier using multinomial Naive Bayes. Exactly how Gmail or Whatsapp spam filters start out.

### 🧠 Step 1: Real-world idea

We want to predict whether a message is spam or not spam.

Example:

| Message                        | Label          |
| ------------------------------ | -------------- |
| "Win money now!!!"             | spam           |
| "Your project meeting at 10am" | ham (not spam) |
| "Free laptop just for you"     | spam           |
| "Can we talk later?"           | ham            |


### 🧩 Step 2: Concept

The algorithm looks for which words are common in spam vs ham messages.

It calculates probabilities like:
- P(word = “win” | spam)
- P(word = “win” | ham)

Then, when a new message comes in — say "Win a free phone now" —
it multiplies those probabilities for all words and picks whichever class (spam/ham) gives a higher probability.

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import confusion_matrix, classification_report, accuracy_score
from sklearn.feature_extraction.text import CountVectorizer

In [31]:
messeges = [
    "Win money now",
    "Free cash prize",
    "Click to claim your reward",
    "Earn extra income today",
    "Meeting at 10am",
    "Let's have lunch tomorrow",
    "Project submission due",
    "Are you free tonight?",
]

labels = ["spam", "spam", "spam", "spam", "ham", "ham", "ham", "ham"]

In [None]:
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(messeges)

""" 
WHAT HAPPENED HERE ?

#############################################
PHASE-1 : fit() phase → VOCABULARY GENERATION
#############################################

1. CountVectorizer.fit() scans all messages 
2. Splits each message into individual words (tokenization)
3. Collects all unique words across all messages
4. sorts them alphabetically
5. stores thi svocabulary internally in the vectozier object

- Now vectorizer.vocabulary_ contains all the word-to-index mapping :

{'win': 25, 'money': 15, 'now': 16, 'free': 9, 'cash': 3, 'prize': 17, 'click': 5, 'to': 21, 'claim': 4, 'your': 27, 'reward': 19, 'earn': 7, 'extra': 8, 'income': 11, 'today': 22, 'meeting': 14, 'at': 2, '10am': 0, 'let': 12, 'have': 10, 'lunch': 13, 'tomorrow': 23, 'project': 18, 'submission': 20, 'due': 6, 'are': 1, 'you': 26, 'tonight': 24}

#############################################################
PHASE-2 : transform() phase → DOCUMENT-TERM MATRIX GENERATION
#############################################################

1. CountVectorizer.transform() uses the vocabulary created in phase-1
2. for each message, counts how many times each word appears
3. creates the numerical matrix (sparse matrix format)

- Now X conatains our document-term matrix

👇 click the below link to see How document-term matrix Looks like 👇

"""

" \nWHAT HAPPENED HERE ?\n\n#############################################\nPHASE-1 : fit() phase → VOCABULARY GENERATION\n#############################################\n\n1. CountVectorizer.fit() scans all messages \n2. Splits each message into individual words (tokenization)\n3. Collects all unique words across all messages\n4. sorts them alphabetically\n5. stores thi svocabulary internally in the vectozier object\n\n- Now vectorizer.vocabulary_ contains all the word-to-index mapping :\n\n{'win': 25, 'money': 15, 'now': 16, 'free': 9, 'cash': 3, 'prize': 17, 'click': 5, 'to': 21, 'claim': 4, 'your': 27, 'reward': 19, 'earn': 7, 'extra': 8, 'income': 11, 'today': 22, 'meeting': 14, 'at': 2, '10am': 0, 'let': 12, 'have': 10, 'lunch': 13, 'tomorrow': 23, 'project': 18, 'submission': 20, 'due': 6, 'are': 1, 'you': 26, 'tonight': 24}\n\n#############################################################\nPHASE-2 : transform() phase → DOCUMENT-TERM MATRIX GENERATION\n#############################

click here 👉 [Document-term Matrix](./assets/document_Matrix.xlsx)

In [32]:
X_train, X_test, y_train, y_test = train_test_split(
    X, labels, test_size=0.25, random_state=42
)

In [33]:
model = MultinomialNB()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

In [34]:
cm = confusion_matrix(y_test, y_pred)
acc = accuracy_score(y_test, y_pred)
cr = classification_report(y_test, y_pred)

print(f"Cnfusion Matrix:\n {cm}")
print(f"\nAccuracy Score: {acc:.3f}")
print(f"\nClassification Report: {cr}")

Cnfusion Matrix:
 [[1 0]
 [1 0]]

Accuracy Score: 0.500

Classification Report:               precision    recall  f1-score   support

         ham       0.50      1.00      0.67         1
        spam       0.00      0.00      0.00         1

    accuracy                           0.50         2
   macro avg       0.25      0.50      0.33         2
weighted avg       0.25      0.50      0.33         2



  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])


#### Testing our model :--

In [35]:
sample = [
    "Win a free laptop now",
    "Let's schedule a meeting",
    "Click here to grab your free cash prize",
]

sample_vector = vectorizer.transform(sample)
print(f"\nPredictions: {model.predict(sample_vector)}")


Predictions: ['spam' 'ham' 'spam']
