# Naive Bayes Algorithm - Detailed Explanation

Naive Bayes algorithm is specifically used to solve **classification problems**, including **binary** and **multi-class classification**.  
To understand Naive Bayes, we first need a basic understanding of **probability concepts**, especially **conditional probability**, which leads to **Bayes Theorem**.

---

## 1. Probability Concepts

### 1.1 Independent Events
An **independent event** is one where the outcome of one event **does not affect** the probability of another event.

**Example:** Rolling a dice.  
Possible outcomes: \(1,2,3,4,5,6\)

$$
P(\text{1}) = \frac{1}{6}, \quad P(\text{2}) = \frac{1}{6}, \dots
$$

**Reason:** Probability of one outcome does not change the probability of another outcome.

---

### 1.2 Dependent Events
A **dependent event** is one where the outcome of one event **affects** the probability of another.

**Example:** Drawing marbles from a bag.

- Bag contains 3 orange and 2 yellow marbles.  
- Event 1: Draw an orange marble

$$
P(\text{Orange}) = \frac{3}{5}
$$

- Event 2: Draw a yellow marble **after removing an orange marble**

$$
P(\text{Yellow | Orange}) = \frac{2}{4} = \frac{1}{2}
$$

**Combined probability** (dependent events):

$$
P(\text{Orange and Yellow}) = P(\text{Orange}) \cdot P(\text{Yellow | Orange}) = \frac{3}{5} \cdot \frac{1}{2} = \frac{3}{10}
$$

**General formula for dependent events:**

$$
P(A \text{ and } B) = P(A) \cdot P(B|A)
$$

This is called **conditional probability**.

---

## 2. Bayes Theorem

Using conditional probability, **Bayes Theorem** is derived as:

$$
P(A|B) = \frac{P(A) \cdot P(B|A)}{P(B)}
$$

Where:

- \(P(A|B)\) = Probability of event \(A\) given \(B\) has occurred  
- \(P(A)\) = Probability of event \(A\)  
- \(P(B|A)\) = Probability of event \(B\) given \(A\) has occurred  
- \(P(B)\) = Probability of event \(B\)

This theorem is the **foundation of Naive Bayes algorithm**.

---

## 3. Naive Bayes in Machine Learning

In ML, we have:

- **Independent features**: \(X_1, X_2, X_3, \dots, X_n\)  
- **Dependent feature (target)**: \(Y\)

We want to predict \(Y\) given the features:

$$
P(Y|X_1, X_2, X_3, \dots, X_n) = \frac{P(Y) \cdot P(X_1, X_2, \dots, X_n | Y)}{P(X_1, X_2, \dots, X_n)}
$$

### 3.1 Naive Assumption

Naive Bayes assumes **feature independence**:

$$
P(X_1, X_2, \dots, X_n | Y) = P(X_1|Y) \cdot P(X_2|Y) \cdot \dots \cdot P(X_n|Y)
$$

Thus, the prediction formula becomes:

$$
P(Y|X_1, X_2, \dots, X_n) \propto P(Y) \cdot \prod_{i=1}^{n} P(X_i|Y)
$$

> Denominator \(P(X_1, X_2, \dots, X_n)\) is constant for all classes, so we can ignore it for comparison.

---

## 4. Example: Tennis Dataset

### Dataset Features:

- Outlook: Sunny, Overcast, Rain  
- Temperature: Hot, Mild, Cool  
- Output: Play Tennis (Yes/No)

### Step 1: Calculate Probabilities

#### 4.1 Target Probability

$$
P(\text{Yes}) = \frac{9}{14}, \quad P(\text{No}) = \frac{5}{14}
$$

#### 4.2 Conditional Probabilities (example for Outlook)

| Outlook   | P(Outlook|Yes) | P(Outlook|No) |
|-----------|----------------|---------------|
| Sunny     | 2/9            | 3/5           |
| Overcast  | 4/9            | 0/5           |
| Rain      | 3/9            | 2/5           |

For Temperature:

| Temperature | P(Temp|Yes) | P(Temp|No) |
|-------------|-------------|------------|
| Hot         | 2/9         | 2/5        |
| Mild        | 4/9         | 2/5        |
| Cool        | 3/9         | 1/5        |

---

### Step 2: Predict for new data

**Test Data:** Outlook = Sunny, Temperature = Hot

#### Probability of Yes:

$$
\begin{align}
P(\text{Yes | Sunny, Hot}) &\propto P(\text{Yes}) \cdot P(\text{Sunny | Yes}) \cdot P(\text{Hot | Yes}) \\
&= \frac{9}{14} \cdot \frac{2}{9} \cdot \frac{2}{9} \\
&= \frac{4}{126} \approx 0.0317
\end{align}
$$

#### Probability of No:

$$
\begin{align}
P(\text{No | Sunny, Hot}) &\propto P(\text{No}) \cdot P(\text{Sunny | No}) \cdot P(\text{Hot | No}) \\
&= \frac{5}{14} \cdot \frac{3}{5} \cdot \frac{2}{5} \\
&= \frac{6}{70} \approx 0.0857
\end{align}
$$

---

### Step 3: Normalize Probabilities

$$
P(\text{Yes | Sunny, Hot}) = \frac{0.0317}{0.0317 + 0.0857} \approx 0.27
$$

$$
P(\text{No | Sunny, Hot}) = \frac{0.0857}{0.0317 + 0.0857} \approx 0.73
$$

**Conclusion:**  
- Output = **No** (not play tennis) because probability of No is higher.

---

## 5. Summary

1. Naive Bayes uses **Bayes Theorem** to calculate probabilities.  
2. Assumes **feature independence**.  
3. Computes conditional probabilities \(P(X_i|Y)\) for all features.  
4. Multiplies with class probability \(P(Y)\) to get the predicted class.  
5. Can be used for **binary** and **multi-class classification**.

---

**References:**  

- Bayes Theorem:  
$$
P(A|B) = \frac{P(A) \cdot P(B|A)}{P(B)}
$$

- Naive Bayes Classification:  
$$
P(Y|X_1, \dots, X_n) \propto P(Y) \cdot \prod_{i=1}^{n} P(X_i|Y)
$$
