# Naive Bayes Algorithm Variants

In this discussion, we will cover the **three main variants of Naive Bayes**:

1. **Bernoulli Naive Bayes**  
2. **Multinomial Naive Bayes**  
3. **Gaussian Naive Bayes**

These variants are important because the choice of the algorithm depends on the type of dataset you have.

---

## 1. Bernoulli Naive Bayes

**When to use:**  
Use Bernoulli Naive Bayes when the **features follow a Bernoulli distribution** (binary outcomes: 0 or 1).

**Bernoulli distribution:**  
- Only two outcomes: success/failure, yes/no, pass/fail, heads/tails.  
- Example: Tossing a coin results in either 0 or 1.

**Example Dataset Features:**

| Feature | Values         |
|---------|----------------|
| F1      | Yes, Yes, No, Yes |
| F2      | Pass, Fail, Pass, Fail |
| F3      | Male, Female, Male, Female |

- Output: Binary classification (Yes/No) or multi-class classification.  
- If most features are binary, **Bernoulli Naive Bayes** should be used.

**Key Idea:**  

$$
P(Y|X_1, X_2, \dots, X_n) \propto P(Y) \cdot \prod_{i=1}^{n} P(X_i|Y)
$$

Here, each feature \(X_i\) is binary (0 or 1).

---

## 2. Multinomial Naive Bayes

**When to use:**  
Use Multinomial Naive Bayes when the **input features are in the form of counts**, typically for **text data**.

**Example Problem:** Spam Classification  
- Input: Email body (text)  
- Output: Spam or Ham (Not Spam)

**Text Conversion:**  
Text data must be converted into **numerical vectors** before applying the algorithm. Common techniques include:

1. **Bag of Words (BoW)**
2. **TF-IDF**
3. **Word2Vec**  

**Example:**

| Email Body                    | Output |
|-------------------------------|--------|
| "You have $1 million lottery" | Spam   |
| "Krish, you have done a good job" | Ham    |

- Count-based features: Number of occurrences of each word.  
- Suitable for **discrete features** where each value represents a count.

**Key Idea:**  

$$
P(Y|X_1, X_2, \dots, X_n) \propto P(Y) \cdot \prod_{i=1}^{n} P(X_i|Y)
$$

- \(X_i\) represents the **count of the word** in the document.

---

## 3. Gaussian Naive Bayes

**When to use:**  
Use Gaussian Naive Bayes when the **features are continuous** and approximately follow a **Gaussian (Normal) distribution**.

**Gaussian (Normal) Distribution:** Bell-shaped curve

**Example Dataset:** Iris Dataset  
- Features: Sepal length, Sepal width, Petal length, Petal width (continuous values)  
- Output: Flower species (multi-class classification)

**Feature Examples (continuous values):**

| Sepal Length | Sepal Width | Petal Length | Petal Width | Species |
|--------------|-------------|--------------|-------------|---------|
| 5.1          | 3.5         | 1.4          | 0.2         | Setosa  |
| 7.0          | 3.2         | 4.7          | 1.4         | Versicolor |

- Continuous features like age, height, weight, etc., are suitable for Gaussian Naive Bayes.  
- The algorithm uses **mean** and **standard deviation** of each feature to calculate probabilities.

**Probability Calculation for Continuous Feature \(X_i\) given class \(Y\):**

$$
P(X_i|Y) = \frac{1}{\sqrt{2\pi\sigma_Y^2}} \exp\Bigg(-\frac{(X_i - \mu_Y)^2}{2\sigma_Y^2}\Bigg)
$$

Where:

- \(\mu_Y\) = mean of feature \(X_i\) for class \(Y\)  
- \(\sigma_Y\) = standard deviation of feature \(X_i\) for class \(Y\)

**Key Idea:**  

$$
P(Y|X_1, X_2, \dots, X_n) \propto P(Y) \cdot \prod_{i=1}^{n} P(X_i|Y)
$$

- Here, \(X_i\) is continuous and probabilities are calculated using the Gaussian formula.

---

## 4. Choosing the Right Variant

| Variant      | Feature Type                     | Example Use Case        |
|-------------|----------------------------------|------------------------|
| Bernoulli   | Binary (0/1)                     | Pass/Fail, Yes/No      |
| Multinomial | Count-based / Text               | Spam Classification    |
| Gaussian    | Continuous (Real Values)         | Iris Dataset, Age, Height |

**Guideline:**  
- If most features are binary → Bernoulli Naive Bayes  
- If features are counts or text → Multinomial Naive Bayes  
- If features are continuous → Gaussian Naive Bayes  

---

**References:**

- Bayes Theorem:  

$$
P(A|B) = \frac{P(A) \cdot P(B|A)}{P(B)}
$$

- Naive Bayes Classification:  

$$
P(Y|X_1, \dots, X_n) \propto P(Y) \cdot \prod_{i=1}^{n} P(X_i|Y)
$$

