### Naïve Bayes

### What is Bayes’ Theorem

It’s a way to calculate the probability of something happening based on prior knowledge.

Real-Life Example (Cricket fan 🎯)

Suppose:

30% of people in a city are cricket fans → 
𝑃
(
𝐴
)
=
0.3
P(A)=0.3.

80% of cricket fans wear jerseys → 
𝑃
(
𝐵
∣
𝐴
)
=
0.8
P(B∣A)=0.8.

20% of non-fans also wear jerseys → 
𝑃
(
𝐵
∣
¬
𝐴
)
=
0.2
P(B∣¬A)=0.2.

Total chance someone wears a jersey = mix of both = 
𝑃
(
𝐵
)
=
0.8
⋅
0.3
+
0.2
⋅
0.7
=
0.38
P(B)=0.8⋅0.3+0.2⋅0.7=0.38.

👉 Now if you see a person wearing a jersey (B), what’s the chance they are a cricket fan (A)?

𝑃
(
𝐴
∣
𝐵
)
=
𝑃
(
𝐵
∣
𝐴
)
⋅
𝑃
(
𝐴
)
𝑃
(
𝐵
)
=
0.8
⋅
0.3
0.38
≈
0.63
P(A∣B)=
P(B)
P(B∣A)⋅P(A)
	​

=
0.38
0.8⋅0.3
	​

≈0.63

P(A∣B)=P(A) * P(B∣A) / P(B)

### Topic 2: Types of Naïve Bayes

Naïve Bayes has different “flavors” depending on the type of data you’re working with.

### 1️⃣ Gaussian Naïve Bayes

Used when features are continuous numbers (like age, height, weight).

Assumes data follows a bell curve (Gaussian/Normal distribution).

👉 Real-life example:
Classify whether a student will pass/fail based on:

Hours studied (continuous)

Exam score (continuous)

Here, the algorithm says:

“Students who pass usually study ~5–10 hrs, those who fail ~0–3 hrs.”

Uses probability curves (bell curve) to decide.

### 2️⃣ Multinomial Naïve Bayes

Used when features are counts of things (discrete).

Especially good for text classification.

👉 Real-life example (Spam Filter):

Email 1: “Buy now! Free offer!” → words like “buy”, “free” count more.

Email 2: “Meeting schedule attached” → words like “meeting”, “schedule”.

Algorithm counts words → if “buy/free” appear a lot → higher spam probability.

This is how Naïve Bayes is often used for spam detection or sentiment analysis.

✅ Quick Summary:

Gaussian NB → for continuous numbers (age, salary, exam marks).

Multinomial NB → for counts, especially text data (word frequency).

In [2]:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score

# Sample dataset (spam/ham emails)
emails = [
    "Win money now",                # spam
    "Limited time offer",           # spam
    "Buy cheap meds online",        # spam
    "Meeting schedule attached",    # ham
    "Let’s catch up tomorrow",      # ham
    "Project deadline is near",     # ham
]

labels = ["spam", "spam", "spam", "ham", "ham", "ham"]

# Step 1: Convert text into word counts
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(emails)

# Step 2: Train/Test split
X_train, X_test, y_train, y_test = train_test_split(X, labels, test_size=0.3, random_state=42)

# Step 3: Train Naïve Bayes (Multinomial)
model = MultinomialNB()
model.fit(X_train, y_train)

# Step 4: Predictions
y_pred = model.predict(X_test)

# Step 5: Accuracy
print("Accuracy:", accuracy_score(y_test, y_pred))

# Step 6: Test on new messages
new_emails = ["Free money offer", "Schedule a meeting tomorrow"]
X_new = vectorizer.transform(new_emails)
print("Predictions:", model.predict(X_new))


Accuracy: 0.0
Predictions: ['ham' 'ham']
