# üéØ Naive Bayes Algorithm

**Classification algorithm (binary & multi-class) based on Bayes Theorem**

Uses conditional probability to predict outcomes by calculating probabilities of each class given input features.

---
## üìä Probability Foundation

### Types of Events

#### 1. Independent Events
- One outcome doesn't affect probability of other outcomes
- **Example:** Rolling dice
  - Outcomes: {1, 2, 3, 4, 5, 6}
  - P(1) = P(2) = P(3) = 1/6 (all equal)
  - Probability remains constant across events

#### 2. Dependent Events
- One event affects probability of subsequent events
- **Example:** Bag of marbles (3 orange, 2 yellow)
  - Event 1: Remove orange ‚Üí P(orange) = 3/5
  - Event 2: Remove yellow (4 marbles left) ‚Üí P(yellow | orange removed) = 2/4 = 1/2

> **üìò Definition: Conditional Probability**
>
> Probability of event A given event B has occurred.
>
> $$P(A|B) = \frac{P(A \cap B)}{P(B)}$$

**Combined Probability for Dependent Events:**

$$P(\text{orange AND yellow}) = P(\text{orange}) \times P(\text{yellow}|\text{orange}) = \frac{3}{5} \times \frac{1}{2} = \frac{3}{10}$$

**Generic Formula:**

$$P(A \cap B) = P(A) \times P(B|A)$$

## üî¨ Bayes‚Äô Theorem ‚Äì Step-by-Step Derivation

| Step | Description | Formula |
|:----:|:------------|:--------|
| 1 | Start with symmetry of joint probability | $P(A \cap B) = P(B \cap A)$ |
| 2 | Expand using conditional probability | $P(A)\,P(B \mid A) = P(B)\,P(A \mid B)$ |
| 3 | Rearrange to isolate $P(A \mid B)$ | $\displaystyle P(A \mid B) = \frac{P(B \mid A)\,P(A)}{P(B)}$ |

### Final Bayes‚Äô Theorem

$$
P(A \mid B) = \frac{P(B \mid A)\,P(A)}{P(B)}
$$


> üî¥ **Important: Bayes‚Äô Theorem**

$$
P(A \mid B) = \frac{P(A)\,P(B \mid A)}{P(B)}
$$

| Component | Description |
|:----------|:------------|
| $P(A \mid B)$ | Probability of A given B occurred |
| $P(A)$ | Prior probability of event A |
| $P(B)$ | Prior probability of event B |
| $P(B \mid A)$ | Likelihood ‚Äì probability of B given A occurred |


---
## ü§ñ Naive Bayes for Machine Learning

**Problem Setup:**
- Independent features: $X_1, X_2, X_3, \ldots, X_n$
- Dependent feature (target): $Y$ (output - Yes/No, Class A/B/C)

**Goal:** Predict $Y$ given features

> **üî¥ Important: Naive Bayes Formula**
>
> $$P(Y|X_1, X_2, X_3) = \frac{P(Y) \times P(X_1, X_2, X_3|Y)}{P(X_1, X_2, X_3)}$$
>
> **Expanded (assuming feature independence):**
>
> $$P(Y|X_1, X_2, X_3) = \frac{P(Y) \times P(X_1|Y) \times P(X_2|Y) \times P(X_3|Y)}{P(X_1) \times P(X_2) \times P(X_3)}$$

> **üìù Note:** Denominator is constant for all classes, so we can simplify:
>
> $$P(Y|X_1, X_2, X_3) \propto P(Y) \times P(X_1|Y) \times P(X_2|Y) \times P(X_3|Y)$$

---
## üìù Manual Calculation Example

### Dataset: Tennis Playing Prediction

| Outlook | Temp | Humidity | Wind | Play |
|:--------|:-----|:---------|:-----|:-----|
| Sunny | Hot | High | Weak | No |
| Overcast | Hot | High | Weak | Yes |
| Rain | Mild | High | Weak | Yes |
| ... | ... | ... | ... | ... |

### Calculate Feature Probabilities

**Outlook Feature:**

| Value | Yes Count | No Count | P(Value\|Yes) | P(Value\|No) |
|:------|:---------:|:--------:|:-------------:|:------------:|
| Sunny | 2 | 3 | 2/9 | 3/5 |
| Overcast | 4 | 0 | 4/9 | 0/5 |
| Rain | 3 | 2 | 3/9 | 2/5 |

**Temperature Feature:**

| Value | Yes Count | No Count | P(Value\|Yes) | P(Value\|No) |
|:------|:---------:|:--------:|:-------------:|:------------:|
| Hot | 2 | 2 | 2/9 | 2/5 |
| Mild | 4 | 2 | 4/9 | 2/5 |
| Cool | 3 | 1 | 3/9 | 1/5 |

**Output Probabilities (14 total records):**
- Yes: 9 records ‚Üí P(Yes) = 9/14
- No: 5 records ‚Üí P(No) = 5/14

### Prediction for Test Data

**Test Input:** Outlook = Sunny, Temperature = Hot

**Calculate P(Yes | Sunny, Hot):**

$$P(\text{Yes}|\text{Sunny, Hot}) = P(\text{Yes}) \times P(\text{Sunny}|\text{Yes}) \times P(\text{Hot}|\text{Yes})$$

$$= \frac{9}{14} \times \frac{2}{9} \times \frac{2}{9} = \frac{2}{63} \approx 0.031$$

**Calculate P(No | Sunny, Hot):**

$$P(\text{No}|\text{Sunny, Hot}) = P(\text{No}) \times P(\text{Sunny}|\text{No}) \times P(\text{Hot}|\text{No})$$

$$= \frac{5}{14} \times \frac{3}{5} \times \frac{2}{5} = \frac{3}{35} \approx 0.085$$

**Normalize to Percentages:**

$$P(\text{Yes}|\text{Sunny, Hot}) = \frac{0.031}{0.031 + 0.085} \approx 0.27 = \boxed{27\%}$$

$$P(\text{No}|\text{Sunny, Hot}) = \frac{0.085}{0.031 + 0.085} \approx 0.73 = \boxed{73\%}$$

> **üí° Prediction Result**
>
> **NO** (73% > 27%) ‚Üí Person will **NOT** play tennis

---
## üîÄ Three Variants of Naive Bayes

### 1. Bernoulli Naive Bayes

> **üìò Definition**
>
> Used when features follow **Bernoulli distribution** (binary: 0/1, Yes/No, Pass/Fail, Male/Female)

**Characteristics:**
- Binary features only
- Sparse matrices (mostly 0s and 1s)
- Text classification with binary word presence

**Example Dataset:**

| F1 (Pass?) | F2 (Male?) | F3 (Yes?) | Output |
|:----------:|:----------:|:---------:|:------:|
| 1 | 1 | 1 | Class A |
| 0 | 0 | 0 | Class B |
| 1 | 1 | 0 | Class A |

### 2. Multinomial Naive Bayes

> **üìò Definition**
>
> Used when input is **text data**. Commonly used in NLP problems.

**Use Cases:**
- Spam classification
- Sentiment analysis
- Document categorization

**Example: Spam Detection**

| Email Body | Output |
|:-----------|:------:|
| "You have $1 million lottery" | Spam |
| "Krish, you have done good job" | Ham |

**Text ‚Üí Numerical Conversion:**
- Bag of Words (BoW)
- TF-IDF (Term Frequency-Inverse Document Frequency)
- Word2Vec

### 3. Gaussian Naive Bayes

> **üìò Definition**
>
> Used when features follow **Gaussian (Normal) distribution**. Features are continuous numerical values.

**Characteristics:**
- Continuous features (age, height, weight, temperature)
- Bell curve distribution
- Can handle slightly skewed distributions

**Example Datasets:**
- Iris: sepal length, sepal width, petal length, petal width ‚Üí classify species(Setosa, Versicolor, Virginica)
- Medical: age, height, weight ‚Üí predict overweight (Yes/No)

> **üí° Tip: Choosing the Right Variant**
>
> | Feature Type | Variant |
> |:-------------|:--------|
> | Binary (0/1) | **Bernoulli** |
> | Text data | **Multinomial** |
> | Continuous | **Gaussian** |
> | Mixed | Choose based on majority type |

---
## üíª Python Implementation

### Gaussian Naive Bayes (Iris Dataset)

In [1]:
# Import Libraries
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

In [2]:
# Load iris dataset
X, y = load_iris(return_X_y=True)

# Train-test split (70-30)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

print(f"Training samples: {len(X_train)}")
print(f"Testing samples: {len(X_test)}")

Training samples: 105
Testing samples: 45


In [3]:
# Initialize Gaussian Naive Bayes
gnb = GaussianNB()

# Fit model
gnb.fit(X_train, y_train)

# Predict
y_pred = gnb.predict(X_test)

In [4]:
# Evaluation
print("="*50)
print("MODEL EVALUATION")
print("="*50)

# Accuracy
print(f"\nAccuracy: {accuracy_score(y_test, y_pred):.4f}")

# Confusion Matrix
print(f"\nConfusion Matrix:\n{confusion_matrix(y_test, y_pred)}")

# Classification Report
print(f"\nClassification Report:\n{classification_report(y_test, y_pred)}")

MODEL EVALUATION

Accuracy: 0.9778

Confusion Matrix:
[[19  0  0]
 [ 0 12  1]
 [ 0  0 13]]

Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        19
           1       1.00      0.92      0.96        13
           2       0.93      1.00      0.96        13

    accuracy                           0.98        45
   macro avg       0.98      0.97      0.97        45
weighted avg       0.98      0.98      0.98        45



> **üìù Note:** 100% accuracy expected on Iris - small, well-separated dataset

### Other Variants

In [5]:
# Bernoulli Naive Bayes
from sklearn.naive_bayes import BernoulliNB

bnb = BernoulliNB()
bnb.fit(X_train, y_train)
print(f"Bernoulli NB Accuracy: {bnb.score(X_test, y_test):.4f}")

Bernoulli NB Accuracy: 0.2889


In [6]:
# Multinomial Naive Bayes
from sklearn.naive_bayes import MultinomialNB

mnb = MultinomialNB()
mnb.fit(X_train, y_train)
print(f"Multinomial NB Accuracy: {mnb.score(X_test, y_test):.4f}")

Multinomial NB Accuracy: 0.9556


---
## üìã Practice Assignment

**Dataset:** Seaborn Tips

In [7]:
import seaborn as sns
import pandas as pd

# Load dataset
df = sns.load_dataset('tips')
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


**Tasks:**
- [ ] Predict: `time` (dinner/lunch) OR `smoker` (Yes/No)
- [ ] Convert categorical to numerical (One-Hot or Label Encoding)
- [ ] Features: `total_bill`, `tip`, `sex`, `day`
- [ ] Apply Gaussian Naive Bayes
- [ ] Evaluate performance

In [8]:
# Your solution here
# Step 1: Prepare features and target

# Step 2: Encode categorical variables

# Step 3: Train-test split

# Step 4: Train Gaussian Naive Bayes

# Step 5: Evaluate

## ‚úÖ Key Takeaways

üí° **Remember**

| Concept | Description |
|:--------|:------------|
| **Independence** | Naive Bayes assumes feature independence |
| **Foundation** | Based on Bayes Theorem: $P(Y \mid X) = \frac{P(Y)\,P(X \mid Y)}{P(X)}$ |
| **Variants** | Bernoulli (binary), Multinomial (text), Gaussian (continuous) |
| **Speed** | Fast training & prediction |
| **Small Data** | Works well on small datasets |
| **Text** | Excellent for text classification with BoW / TF-IDF |
