```{contents}
```

## Intuition

Naive Bayes is a **"vote counting" machine** powered by probability.

* You look at the features of a new sample (like words in an email).
* For each possible class (e.g., *spam* vs *ham*), you ask:
  *“If this sample belonged to this class, how likely would I see these features?”*
* Multiply those likelihoods by how common that class is overall (*prior*).
* Whichever class gives the highest probability wins.

👉 Even if features are correlated, it still works because it **compares relative evidence**, not exact truth.

---

### Mathematical Intuition

Bayes’ theorem:

$$
P(y|X) = \frac{P(X|y) \cdot P(y)}{P(X)}
$$

We only care about the class with the **maximum posterior probability**, so denominator $P(X)$ can be ignored:

$$
\hat{y} = \arg\max_y P(X|y) \cdot P(y)
$$

Now apply the **naive assumption** (feature independence):

$$
P(X|y) = P(x_1, x_2, …, x_n | y) \approx \prod_{i=1}^n P(x_i | y)
$$

So the classifier becomes:

$$
\hat{y} = \arg\max_y P(y) \prod_{i=1}^n P(x_i | y)
$$

* $P(y)$ → how frequent the class is (prior).
* $P(x_i|y)$ → how often feature $x_i$ appears in class $y$.

---

### 3. **Example Intuition with Math**

Suppose we want to classify an email with the words: *“win lottery”*.

From training data:

* $P(\text{spam}) = 0.4$, $P(\text{ham}) = 0.6$.
* $P(\text{win}|\text{spam}) = 0.8$, $P(\text{win}|\text{ham}) = 0.1$.
* $P(\text{lottery}|\text{spam}) = 0.7$, $P(\text{lottery}|\text{ham}) = 0.05$.

Compute:

$$
P(\text{spam}|\text{“win lottery”}) \propto 0.4 \cdot 0.8 \cdot 0.7 = 0.224
$$

$$
P(\text{ham}|\text{“win lottery”}) \propto 0.6 \cdot 0.1 \cdot 0.05 = 0.003
$$

📌 Prediction: **Spam**, because 0.224 > 0.003.

---

### 4. **Why it works despite being naive**

* Even if "win" and "lottery" are correlated, multiplying probabilities still boosts the correct class compared to the wrong one.
* The absolute values may be wrong, but **relative comparison is good enough for classification**.

---

⚡ **Summary:**

* **Intuition:** Pick the class that best explains the observed features.
* **Math intuition:** Bayes’ theorem + independence assumption → product of simple probabilities.
* **Outcome:** Fast, effective classifier, especially for text/NLP.

Would you like me to also make a **visual probability tree diagram** to show the intuition graphically?
