# 🎯 <span style="color:#3498db; font-weight:bold;">Understanding Probability Concepts in Bayesian Classification</span>

Before learning **Bayesian Classification**, we need to understand the fundamental probability concepts:  
- **Joint Probability**
- **Conditional Probability**
- **Marginal Probability**  
These concepts help in constructing **Bayes’ Theorem**, which is the foundation of **Naïve Bayes Classifier**.

---

## 📊 <span style="color:#e74c3c;">Example Dataset: Employees' Gender and Job Class</span>
We have collected data on **employees** based on **two variables**:  
- **Gender (M, F)**
- **Class (C1, C2, C3)**

| Gender | C1  | C2  | C3  | Total |
|--------|----|----|----|------|
| **M**  | 30 | 15 | 55 | 100  |
| **F**  | 10 | 15 | 25 | 50   |
| **Total** | 40 | 30 | 80 | 150  |

From this table:
- There are **150 employees in total**.
- **100 males** and **50 females**.
- Employees are categorized into **three job classes** (**C1, C2, C3**).

---

## 🔹 <span style="color:#16a085;">1. Joint Probability</span>
**Definition:**  
**Joint probability** measures the likelihood of **two events happening together**.

$$
P(A \cap B) = \frac{\text{Number of occurrences of (A, B)}}{\text{Total observations}}
$$

### 💡 <span style="color:#f39c12;">Example from Dataset</span>
- Probability that a randomly chosen employee is **Male and belongs to Class C1**:

$$
P(M \cap C1) = \frac{30}{150} = 0.20
$$

- Probability that an employee is **Female and in Class C3**:

$$
P(F \cap C3) = \frac{25}{150} = 0.1667
$$

---

## 🔹 <span style="color:#9b59b6;">2. Marginal Probability</span>
**Definition:**  
**Marginal probability** is the probability of **a single event occurring**, regardless of any other event.

$$
P(A) = \frac{\text{Total occurrences of A}}{\text{Total observations}}
$$

### 💡 <span style="color:#f39c12;">Example from Dataset</span>
- Probability that an employee is **Male**:

$$
P(M) = \frac{100}{150} = 0.6667
$$

- Probability that an employee belongs to **Class C2**:

$$
P(C2) = \frac{30}{150} = 0.20
$$

---

## 🔹 <span style="color:#c0392b;">3. Conditional Probability</span>
**Definition:**  
Conditional probability is the probability of **event A occurring, given that event B has already occurred**.

$$
P(A | B) = \frac{P(A \cap B)}{P(B)}
$$

### 💡 <span style="color:#f39c12;">Example from Dataset</span>
- Probability that an employee is **Male, given they belong to Class C1**:

$$
P(M | C1) = \frac{P(M \cap C1)}{P(C1)}
$$

$$
P(M | C1) = \frac{30}{40} = 0.75
$$

- Probability that an employee is **Female, given they belong to Class C3**:

$$
P(F | C3) = \frac{P(F \cap C3)}{P(C3)}
$$

$$
P(F | C3) = \frac{25}{80} = 0.3125
$$

---

## 🔁 <span style="color:#8e44ad;">4. Expressing One Probability in Terms of Another</span>
Each type of probability can be rewritten in terms of the others.

### 🟢 **Marginal Probability Using Joint Probability**
Marginal probability can be found by **summing** joint probabilities.

$$
P(A) = \sum_B P(A \cap B)
$$

**Example:**  
To get the probability of **C1**:

$$
P(C1) = P(M \cap C1) + P(F \cap C1)
$$

$$
P(C1) = \frac{30}{150} + \frac{10}{150} = \frac{40}{150} = 0.2667
$$

---

### 🔵 **Joint Probability Using Conditional and Marginal Probability**
Using the **definition of conditional probability**:

$$
P(A \cap B) = P(A | B) \cdot P(B)
$$

**Example:**  
To find $ P(M \cap C1) $:

$$
P(M \cap C1) = P(M | C1) \cdot P(C1)
$$

$$
P(M \cap C1) = 0.75 \times 0.2667 = 0.20
$$

---

### 🟠 **Conditional Probability Using Joint and Marginal Probability**
Rearranging the joint probability equation:

$$
P(A | B) = \frac{P(A \cap B)}{P(B)}
$$

**Example:**  
To find $ P(F | C3) $:

$$
P(F | C3) = \frac{P(F \cap C3)}{P(C3)}
$$

$$
P(F | C3) = \frac{0.1667}{0.5333} = 0.3125
$$

---

# 🔮 <span style="color:#2c3e50;">5. Bayes' Theorem</span>
Now, using the relationships we established, we derive **Bayes' Rule**, which is the basis for Bayesian Classification.

$$
P(A | B) = \frac{P(B | A) P(A)}{P(B)}
$$

This equation allows us to **reverse conditional probabilities**, meaning we can compute **P(A | B)** from **P(B | A)**.

## 🔮 <span style="color:#2c3e50;">Derivation of Bayes' Theorem</span>

### 🔹 **Starting from the Definition of Conditional Probability**
By definition, the conditional probability of **A given B** is:

$$
P(A | B) = \frac{P(A \cap B)}{P(B)}
$$

Similarly, the conditional probability of **B given A** is:

$$
P(B | A) = \frac{P(A \cap B)}{P(A)}
$$

Since **P(A ∩ B)** (the joint probability of A and B) is the same in both equations, we equate them:

$$
P(A | B) \cdot P(B) = P(B | A) \cdot P(A)
$$

Rearranging for **P(A | B):**

$$
P(A | B) = \frac{P(B | A) P(A)}{P(B)}
$$

This is the **Bayes' Theorem**, which allows us to reverse conditional probabilities.

---

# 📌 <span style="color:#16a085;">Understanding the Terms in Bayes' Theorem</span>

<img src="https://drive.google.com/uc?id=1KBETj61Usb_ra3QCvjM5KyNG6_uPqrVa"/>

- **$P(A)$ - Prior Probability**  
  - This is the initial probability of **A** before considering evidence **B**.  
  - It represents what we already know about **A** independently.  

- **$P(A | B)$ - Posterior Probability**  
  - This is the probability of **A given B**.  
  - It is called the **posterior probability** because it is calculated after incorporating the new evidence **B**.  

- **$P(B | A)$ - Likelihood**  
  - This is the probability of **B given A**.  
  - It represents how likely we are to observe **B**, assuming **A** is true.  

- **$P(B)$ - Evidence - Normalizing Constant**  
  - This is the **prior probability of B**, ensuring that probabilities sum to 1.  
  - It acts as a **scaling factor** to adjust for all possible cases.  

---

# 💡 <span style="color:#f39c12;">Example: Predicting Gender Given Job Class</span>
Using our employee dataset, let’s compute:

**What is the probability that an employee is Male given they belong to Class C1?**  
That is, we want to find **P(M | C1)**.

From our dataset:
- **Prior Probability** of Male:  
  $$ P(M) = \frac{100}{150} = 0.6667 $$

- **Likelihood** of Class C1 given Male:  
  $$ P(C1 | M) = \frac{30}{100} = 0.30 $$

- **Marginal Probability** of Class C1:  
  $$ P(C1) = \frac{40}{150} = 0.2667 $$

Applying **Bayes' Theorem**:

$$
P(M | C1) = \frac{P(C1 | M) P(M)}{P(C1)}
$$

$$
P(M | C1) = \frac{0.30 \times 0.6667}{0.2667}
$$

$$
P(M | C1) = \frac{0.20}{0.2667} = 0.75
$$

- 🔹 **Conclusion:** Given that an employee belongs to **Class C1**, there is a **75% probability** that they are **Male**.
---

<img src="https://drive.google.com/uc?id=13knr-2KjZ944PNEGdWkMBgx-8dlJEw3L"/>

---


# 📌 **Example 1:**
### **Problem Statement**
We have two bags:

- **Bag I** contains **4 white** and **6 black** balls.
- **Bag II** contains **4 white** and **3 black** balls.
- A bag is chosen **at random**, and **one ball is drawn**.
- The ball drawn is **black**.
- **What is the probability that it came from Bag I?**

---

### **Step 1: Define Events**
- **Let \( A_1 \) be the event of choosing Bag I**.
- **Let \( A_2 \) be the event of choosing Bag II**.
- **Let \( B \) be the event of drawing a black ball**.

We are given:
- **Since any bag is equally likely to be chosen**:  
  $$
  P(A_1) = P(A_2) = 0.5
  $$

- **Probability of drawing a black ball from Bag I**:  
  $$
  P(B | A_1) = \frac{6}{10} = 0.6
  $$

- **Probability of drawing a black ball from Bag II**:  
  $$
  P(B | A_2) = \frac{3}{7} \approx 0.4286
  $$

---

### **Step 2: Compute Total Probability of Drawing a Black Ball (\( P(B) \))**
Using the **law of total probability**:

$$
P(B) = P(B | A_1) P(A_1) + P(B | A_2) P(A_2)
$$

Substituting values:

$$
P(B) = (0.6 \times 0.5) + (0.4286 \times 0.5)
$$

$$
P(B) = 0.3 + 0.2143 = 0.5143
$$

---

### **Step 3: Compute \( P(A_1 | B) \) Using Bayes’ Theorem**
Applying Bayes’ Rule:

$$
P(A_1 | B) = \frac{P(B | A_1) P(A_1)}{P(B)}
$$

Substituting values:

$$
P(A_1 | B) = \frac{(0.6 \times 0.5)}{0.5143}
$$

$$
P(A_1 | B) = \frac{0.3}{0.5143} \approx 0.5837
$$

🔹 **Conclusion:**  
If a **black ball** is drawn, the probability that it came from **Bag I** is **58.37%**.

---

# 📌 **Example 2:**
We want to find the probability that **it rained on Sunday, given that it rained on Monday**.

---

### **Step 1: Define Events**
- **Let \( A \) be the event that it rains on Sunday.**
- **Let \( B \) be the event that it rains on Monday.**

We are given:
- **Prior probability of rain on Sunday**:
  $$
  P(A) = 0.40
  $$
- **Probability of rain on Monday given it rained on Sunday**:
  $$
  P(B | A) = 0.10
  $$
- **Probability of rain on Monday given it did NOT rain on Sunday**:
  $$
  P(B | A') = 0.80
  $$

---

### **Step 2: Compute Total Probability of Rain on Monday (\( P(B) \))**
Using the **law of total probability**:

$$
P(B) = P(B | A) P(A) + P(B | A') P(A')
$$

Since **\( P(A') = 1 - P(A) = 0.60 \)**, we get:

$$
P(B) = (0.10 \times 0.40) + (0.80 \times 0.60)
$$

$$
P(B) = 0.04 + 0.48 = 0.52
$$

---

### **Step 3: Compute \( P(A | B) \) Using Bayes’ Theorem**
Applying Bayes’ Rule:

$$
P(A | B) = \frac{P(B | A) P(A)}{P(B)}
$$

Substituting values:

$$
P(A | B) = \frac{(0.10 \times 0.40)}{0.52}
$$

$$
P(A | B) = \frac{0.04}{0.52} = 0.0769
$$

🔹 **Conclusion:**  
If it **rained on Monday**, the probability that it **also rained on Sunday** is **7.69%**.

---

# 🎯 **Bayes’ Theorem in Machine Learning**

Bayes' Theorem is widely used in **Machine Learning**, particularly in **classification problems**, to **update probabilities** based on new data. One of the most common applications of Bayes' Theorem in ML is the **Naïve Bayes Classifier**.

---

## 🔹 **How is Bayes’ Theorem Used in Machine Learning?**
In **classification problems**, we aim to assign an instance **$ \mathbf{x} $** to a particular class **$ C_k $**, based on given features.

Using **Bayes' Theorem**, we can compute the probability that a given instance **belongs to class $ C_k $**, given its feature values:

$$
P(C_k | \mathbf{x}) = \frac{P(C_k) P(\mathbf{x} | C_k)}{P(\mathbf{x})}
$$

where:
- **$ P(C_k | \mathbf{x}) $** = **Posterior Probability** (Probability of class $ C_k $ given the features $ \mathbf{x} $)
- **$ P(C_k) $** = **Prior Probability** (Initial belief about class $ C_k $)
- **$ P(\mathbf{x} | C_k) $** = **Likelihood** (Probability of observing the feature set $ \mathbf{x} $ if the instance belongs to $ C_k $)
- **$ P(\mathbf{x}) $** = **Evidence** (Total probability of $ \mathbf{x} $ occurring across all classes)

Since **$ P(\mathbf{x}) $** is a normalizing constant and does not depend on $ C_k $, we focus on:

$$
P(C_k | \mathbf{x}) \propto P(C_k) P(\mathbf{x} | C_k)
$$

---

## 🔹 **Example: Playing Golf Decision Based on Weather Conditions**
Imagine we want to predict whether a person will **play golf ($Y$) or not ($N$)** based on the weather conditions.

### **Features Affecting the Decision:**
1. **Outlook** (Sunny, Overcast, Rainy)
2. **Temperature** (Hot, Mild, Cool)
3. **Humidity** (High, Normal)
4. **Wind** (Weak, Strong)

### **How Each Term Maps to the Problem**
- **$ P(Y) $**: The probability of playing golf before considering the weather conditions (**Prior Probability**).  
- **$ P(\text{Outlook = Sunny} | Y) $**: The probability of a **Sunny day** given that the person plays golf (**Likelihood**).  
- **$ P(\text{Outlook = Sunny}) $**: The overall probability of a **Sunny day** occurring (**Evidence**).  
- **$ P(Y | \text{Outlook = Sunny}) $**: The updated probability of **playing golf** given that it is **Sunny** (**Posterior Probability**).  

Using Bayes' Theorem, we update our belief about whether the person will play golf based on the weather.

---

# 🔹 **Concept of Independence & Naïve Assumption**
The key assumption in **Naïve Bayes** is that **all features $ x_i $ are conditionally independent given the class $ C_k $**.

### **Example: Naïve Assumption in the Golf Problem**
Suppose we want to compute:

$$
P(Y | \text{Outlook=Sunny, Temperature=Hot, Humidity=High, Wind=Weak})
$$

A **normal classifier** would consider how these weather conditions interact with each other. But **Naïve Bayes assumes that each feature contributes independently** to the probability:

$$
P(\text{Outlook=Sunny, Temperature=Hot, Humidity=High, Wind=Weak} | Y)
$$

Applying the **Naïve Assumption (Conditional Independence)**:

$$
P(\text{Outlook=Sunny} | Y) P(\text{Temperature=Hot} | Y) P(\text{Humidity=High} | Y) P(\text{Wind=Weak} | Y)
$$

This **simplifies** calculations but may not always be **100% realistic** because weather conditions often **depend on each other**. However, **Naïve Bayes still performs well in practice**, especially for text classification.

---

# 🔹 **Final Naïve Bayes Formula**
Using the **Naïve Assumption**, the final probability of a class $ C_k $ given feature set $ \mathbf{x} $ is:

$$
P(C_k | \mathbf{x}) \propto P(C_k) \prod_{i=1}^{n} P(x_i | C_k)
$$

where:
- **$ P(C_k) $** = Prior probability of class $ C_k $
- **$ P(x_i | C_k) $** = Probability of feature $ x_i $ given class $ C_k $

---

# 🔹 **Final Prediction Rule**
To classify a new instance $ \mathbf{x} $, we compute:

$$
y_{\text{pred}} = \arg\max_{C_k} P(C_k) \prod_{i=1}^{n} P(x_i | C_k)
$$

🔹 **Conclusion:**  
- **Naïve Bayes is simple, fast, and effective**, especially for **text classification** (spam detection, sentiment analysis, etc.).  
- The **Naïve assumption** of **feature independence** may not always hold, but Naïve Bayes still performs well in practice.


# 🎯 **Naïve Bayes Classification: Numerical Example**

We will apply **Naïve Bayes Classification** to predict whether a person will **play golf ($ Y $) or not ($ N $)** given the **weather conditions**.

<img src="https://drive.google.com/uc?id=1lf6VOdjQDbwu4l8z_mWUEb_xYaZw8MQb"/>
---

## 🔹 **Step 1: Understanding the Formula**
Using **Bayes' Theorem**, the probability of class **$ y $** given features **$ \mathbf{X} = (x_1, x_2, ..., x_n) $** is:

$$
P(y | \mathbf{X}) = \frac{P(y) \prod_{i=1}^{n} P(x_i | y)}{P(\mathbf{X})}
$$

Since **$ P(\mathbf{X}) $** is the same for all classes, we can ignore it:

$$
P(y | \mathbf{X}) \propto P(y) \prod_{i=1}^{n} P(x_i | y)
$$

We compute **$ P(Y | \mathbf{X}) $** and **$ P(N | \mathbf{X}) $**, and classify based on the higher probability.

---

## 🔹 **Step 2: Given Data from the Dataset**
We have computed probabilities from the dataset:

<img src="https://drive.google.com/uc?id=1XljrXkHEXeI1cDEneFw1mfUi_R9cOUTY"/>

- **Class Probabilities:**
  - $ P(Y) = \frac{9}{14} $
  - $ P(N) = \frac{5}{14} $

- **Conditional Probabilities from the Dataset:**
  - **For $ Y $ (Play Golf = Yes):**
    - $ P(\text{Sunny} | Y) = \frac{2}{9} $
    - $ P(\text{Hot} | Y) = \frac{2}{9} $
    - $ P(\text{Normal} | Y) = \frac{6}{9} $
    - $ P(\text{Weak} | Y) = \frac{6}{9} $
  
  - **For $ N $ (Play Golf = No):**
    - $ P(\text{Sunny} | N) = \frac{3}{5} $
    - $ P(\text{Hot} | N) = \frac{2}{5} $
    - $ P(\text{Normal} | N) = \frac{1}{5} $
    - $ P(\text{Weak} | N) = \frac{2}{5} $

---

## 🔹 **Step 3: Compute Posterior Probabilities**
We now classify **today’s weather: (Sunny, Hot, Normal, Weak)**.

### **Calculate $ P(Y | \text{today}) $**
Using:

$$
P(Y | \text{today}) \propto P(Y) P(\text{Sunny} | Y) P(\text{Hot} | Y) P(\text{Normal} | Y) P(\text{Weak} | Y)
$$

Substituting values:

$$
P(Y | \text{today}) \propto \left(\frac{9}{14}\right) \times \left(\frac{2}{9}\right) \times \left(\frac{2}{9}\right) \times \left(\frac{6}{9}\right) \times \left(\frac{6}{9}\right)
$$

$$
P(Y | \text{today}) \propto \frac{9 \times 2 \times 2 \times 6 \times 6}{14 \times 9 \times 9 \times 9 \times 9}
$$

$$
P(Y | \text{today}) \propto 0.02116
$$

---

### **Calculate $ P(N | \text{today}) $**
Using:

$$
P(N | \text{today}) \propto P(N) P(\text{Sunny} | N) P(\text{Hot} | N) P(\text{Normal} | N) P(\text{Weak} | N)
$$

Substituting values:

$$
P(N | \text{today}) \propto \left(\frac{5}{14}\right) \times \left(\frac{3}{5}\right) \times \left(\frac{2}{5}\right) \times \left(\frac{1}{5}\right) \times \left(\frac{2}{5}\right)
$$

$$
P(N | \text{today}) \propto \frac{5 \times 3 \times 2 \times 1 \times 2}{14 \times 5 \times 5 \times 5 \times 5}
$$

$$
P(N | \text{today}) \propto 0.0068
$$

---

## 🔹 **Step 4: Normalize and Compare Probabilities**
Since:

$$
P(Y | \text{today}) > P(N | \text{today})
$$

We classify **today’s weather** as:

$$
\textbf{Play Golf (Yes)}
$$

---

# 🏆 **Conclusion**
Using **Naïve Bayes Classification**, we predicted that the person will **play golf today** based on the given weather conditions.

🚀 **Next Step:** Implement this in Python using **scikit-learn**!


## 🔹 **Normalize Probability**
$ P(\text{today}) $
The total probability of today's weather conditions occurring is:

$$
P(\text{today}) = P(Y) P(\text{Sunny} | Y) P(\text{Hot} | Y) P(\text{Normal} | Y) P(\text{Weak} | Y) + P(N) P(\text{Sunny} | N) P(\text{Hot} | N) P(\text{Normal} | N) P(\text{Weak} | N)
$$

Substituting the values:

$$
P(\text{today}) = (0.02116) + (0.0068)
$$

$$
P(\text{today}) = 0.02796
$$

Now, we compute the **actual probabilities** by normalizing:

$$
P(Y | \text{today}) = \frac{P(Y) P(\text{Sunny} | Y) P(\text{Hot} | Y) P(\text{Normal} | Y) P(\text{Weak} | Y)}{P(\text{today})}
$$

$$
P(Y | \text{today}) = \frac{0.02116}{0.02796} \approx 0.7565
$$

$$
P(N | \text{today}) = \frac{P(N) P(\text{Sunny} | N) P(\text{Hot} | N) P(\text{Normal} | N) P(\text{Weak} | N)}{P(\text{today})}
$$

$$
P(N | \text{today}) = \frac{0.0068}{0.02796} \approx 0.2435
$$

---

## 🔹 **Final Decision**
Since:

$$
P(Y | \text{today}) > P(N | \text{today})
$$

We classify **today’s weather** as:

$$
\textbf{Play Golf (Yes)}
$$

Now, we have fully normalized probabilities that sum to 1:

$$
P(Y | \text{today}) + P(N | \text{today}) = 1
$$

---


In [None]:
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import classification_report

# Creating the dataset
data = {
    'Outlook': ['Rainy', 'Rainy', 'Overcast', 'Sunny', 'Sunny', 'Sunny', 'Overcast', 'Rainy', 'Rainy', 'Sunny',
                'Rainy', 'Overcast', 'Overcast', 'Sunny'],
    'Temperature': ['Hot', 'Hot', 'Hot', 'Mild', 'Cool', 'Cool', 'Cool', 'Mild', 'Cool', 'Mild',
                    'Mild', 'Mild', 'Hot', 'Mild'],
    'Humidity': ['High', 'High', 'High', 'High', 'Normal', 'Normal', 'Normal', 'High', 'Normal', 'Normal',
                 'Normal', 'High', 'Normal', 'High'],
    'Windy': ['False', 'True', 'False', 'False', 'False', 'True', 'True', 'False', 'False', 'False',
              'True', 'True', 'False', 'True'],
    'Play Golf': ['No', 'No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'No', 'Yes', 'Yes',
                  'Yes', 'Yes', 'Yes', 'No']
}

df = pd.DataFrame(data)

# Encoding categorical variables
encoder = LabelEncoder()
for col in df.columns:
    df[col] = encoder.fit_transform(df[col])

# Splitting features and target variable
X = df.drop(columns=['Play Golf'])
y = df['Play Golf']

# Training the Multinomial Naïve Bayes model
model = MultinomialNB()
model.fit(X, y)

# Predicting on training data
y_pred = model.predict(X)

# Generating classification report
print("Classification Report:\n", classification_report(y, y_pred, target_names=['No', 'Yes']))


Classification Report:
               precision    recall  f1-score   support

          No       1.00      0.40      0.57         5
         Yes       0.75      1.00      0.86         9

    accuracy                           0.79        14
   macro avg       0.88      0.70      0.71        14
weighted avg       0.84      0.79      0.76        14



# 🎯 **Scikit-learn Implementations of Naïve Bayes**
Scikit-learn provides four implementations of Naïve Bayes, each differing in **assumptions about the feature distribution**:

1. **Gaussian Naïve Bayes (`GaussianNB`)** – Assumes features follow a **normal (Gaussian) distribution**.
2. **Multinomial Naïve Bayes (`MultinomialNB`)** – Used for **count-based** data (e.g., text classification).
3. **Bernoulli Naïve Bayes (`BernoulliNB`)** – Used for **binary features** (e.g., presence/absence of words).
4. **Complement Naïve Bayes (`ComplementNB`)** – A variant of `MultinomialNB`, designed for **imbalanced datasets**.

---

## 🔹 **Gaussian Naïve Bayes**
Gaussian Naïve Bayes assumes that **features are continuous** and follow a **normal distribution**:

<img src="https://drive.google.com/uc?id=1qBMsnlUtmGywIJhHYf6jEnxEZQTqhamB" width=600>


$$
P(x = v | C_k) = \frac{1}{\sqrt{2 \pi \sigma_k^2}} \exp \left( -\frac{(v - \mu_k)^2}{2 \sigma_k^2} \right)
$$

where:
- $ \mu_k $ is the **mean** of the feature for class $ C_k $.
- $ \sigma_k^2 $ is the **variance** of the feature for class $ C_k $.
- $ v $ is the observed value of the feature.

The parameters $ \mu_k $ and $ \sigma_k^2 $ are estimated using **maximum likelihood estimation**.

---

## 🔹 **Gender Classification Example**
### **Problem Statement**
We want to classify whether a given person is **male or female** based on three continuous features:
- **Height (feet)**
- **Weight (lbs)**
- **Foot Size (inches)**

<img src="https://drive.google.com/uc?id=14hW9hH3rg0sZYGHnTFqGueiLzWjnkST-" width=300/>

| Gender  | Height (feet) | Weight (lbs) | Foot Size (inches) |
|---------|--------------|-------------|------------------|
| Male    | 6.00         | 180         | 12              |
| Male    | 5.92 (5'11")| 190         | 11              |
| Male    | 5.58 (5'7") | 170         | 12              |
| Male    | 5.92 (5'11")| 165         | 10              |
| Female  | 5.00        | 100         | 6               |
| Female  | 5.50 (5'6") | 150         | 8               |
| Female  | 5.42 (5'5") | 130         | 7               |
| Female  | 5.75 (5'9") | 150         | 9               |

---

## 🔹 **Step 1: Compute Class Statistics**
To classify a new sample, we first compute the **mean and variance** of each feature for each class:

| Gender  | Mean Height | Variance Height | Mean Weight | Variance Weight | Mean Foot Size | Variance Foot Size |
|---------|------------|----------------|-------------|----------------|---------------|----------------|
| Male    | 5.855      | 0.035          | 176.25      | 122.916        | 11.25         | 0.916         |
| Female  | 5.4175     | 0.097          | 132.5       | 558.333        | 7.5           | 1.6667        |

We assume **equiprobable classes**, meaning:

$$
P(\text{Male}) = P(\text{Female}) = 0.5
$$

---

## 🔹 **Step 2: Classifying a New Sample**
A new person has the following features:

| Height (feet) | Weight (lbs) | Foot Size (inches) |
|--------------|-------------|------------------|
| 6.00        | 130         | 8               |

We now compute the **posterior probability** for each gender.

---

## 🔹 **Step 3: Compute Likelihoods**
The likelihood of the given height, weight, and foot size under **each gender** is calculated using the **Gaussian probability density function**:

For **Male:**
$$
P(\text{Height} | \text{Male}) = \frac{1}{\sqrt{2\pi \times 0.035}} \exp \left( -\frac{(6.00 - 5.855)^2}{2 \times 0.035} \right)
$$

For **Female:**
$$
P(\text{Height} | \text{Female}) = \frac{1}{\sqrt{2\pi \times 0.097}} \exp \left( -\frac{(6.00 - 5.4175)^2}{2 \times 0.097} \right)
$$

Similarly, we compute **$ P(\text{Weight} | \text{Gender}) $** and **$ P(\text{Foot Size} | \text{Gender}) $**.

---

## 🔹 **Step 4: Compute Posterior Probability**
Since **$ P(\text{today}) $** is the same for both classes, we only compare the numerators:

For **Male:**
$$
P(\text{Male} | \text{Sample}) \propto P(\text{Male}) P(\text{Height} | \text{Male}) P(\text{Weight} | \text{Male}) P(\text{Foot Size} | \text{Male})
$$

For **Female:**
$$
P(\text{Female} | \text{Sample}) \propto P(\text{Female}) P(\text{Height} | \text{Female}) P(\text{Weight} | \text{Female}) P(\text{Foot Size} | \text{Female})
$$

From our calculations:


| Gender  | $ P(\text{Height} | \text{Gender}) $ | $ P(\text{Weight} | \text{Gender}) $ | $ P(\text{Foot Size} | \text{Gender}) $ | Posterior Numerator |
|---------|------------------------------|------------------------------|------------------------------|----------------------|
| Male    | 1.578                        | 5.9867e-06                   | 0.0013                       | $ 6.19707e-09 $      |
| Female  | 0.223                        | 0.0167                       | 0.2866                       | $ 0.00053 $          |


Since:

$$
P(\text{Female} | \text{Sample}) > P(\text{Male} | \text{Sample})
$$

we classify the **sample as Female**.

---

# 🏆 **Final Conclusion**
Using **Gaussian Naïve Bayes**, we determined that the given person is **Female** based on height, weight, and foot size.

🚀 **Next Step:** Implement this in Python using **Scikit-learn**!


In [None]:
import numpy as np
import pandas as pd
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import classification_report, ConfusionMatrixDisplay

# Creating the dataset
data = {
    'Height': [6.00, 5.92, 5.58, 5.92, 5.00, 5.50, 5.42, 5.75],
    'Weight': [180, 190, 170, 165, 100, 150, 130, 150],
    'Foot_Size': [12, 11, 12, 10, 6, 8, 7, 9],
    'Gender': ['Male', 'Male', 'Male', 'Male', 'Female', 'Female', 'Female', 'Female']
}

df = pd.DataFrame(data)

# Encoding categorical variable (Gender: Male=1, Female=0)
df['Gender'] = df['Gender'].map({'Male': 1, 'Female': 0})

# Splitting features and target variable
X = df[['Height', 'Weight', 'Foot_Size']]
y = df['Gender']

# Training the Gaussian Naïve Bayes model
model = GaussianNB()
model.fit(X, y)

# New sample to classify
sample_data = pd.DataFrame([[6, 130, 8]], columns=['Height', 'Weight', 'Foot_Size'])  # Fixing feature names issue

# Predicting the class
prediction = model.predict(sample_data)
predicted_class = 'Male' if prediction[0] == 1 else 'Female'
print("Predicted Gender:", predicted_class)

# Probability estimates for each class
probabilities = model.predict_proba(sample_data)
print("Probability (Male):", probabilities[0][1])
print("Probability (Female):", probabilities[0][0])

# Predicting on training data
y_pred = model.predict(X)

# Generating classification report
print("Classification Report:\n", classification_report(y, y_pred, target_names=['Female', 'male']))


Predicted Gender: Female
Probability (Male): 1.5442663163060025e-07
Probability (Female): 0.999999845573368
Classification Report:
               precision    recall  f1-score   support

      Female       1.00      1.00      1.00         4
        male       1.00      1.00      1.00         4

    accuracy                           1.00         8
   macro avg       1.00      1.00      1.00         8
weighted avg       1.00      1.00      1.00         8

