<a href="https://colab.research.google.com/github/samiha-mahin/A-Machine-Learning-Models-Repo/blob/main/Probabilistic_Generative_%26_Probabilistic_Discriminative_Models.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**Probabilistic Generative Models** and **Probabilistic Discriminative Models**

## 🔵 1. Probabilistic Generative Models

### 📌 What it means:

These models **learn how each class generates the data**. They estimate the **joint probability**:

$$
P(x, y) = P(x | y) \cdot P(y)
$$

Then use Bayes’ Theorem to compute the **posterior**:

$$
P(y | x) = \frac{P(x | y) \cdot P(y)}{P(x)}
$$

### ✅ Key Features:

* Model the **distribution of inputs** for each class.
* Can **generate new data** samples (hence “generative”).
* Works well when class-conditional distributions are well-separated.

---

### 📊 Common Generative Models:

| Model                                     | Description                                                                      | Usage Example                                         |
| ----------------------------------------- | -------------------------------------------------------------------------------- | ----------------------------------------------------- |
| **Gaussian Naive Bayes**                  | Assumes features are independent within each class and normally distributed      | Text classification, spam detection                   |
| **LDA (Linear Discriminant Analysis)**    | Assumes each class has a Gaussian distribution with **shared covariance** matrix | Image recognition, early-stage medical classification |
| **QDA (Quadratic Discriminant Analysis)** | Same as LDA but allows **each class to have its own covariance** matrix          | Better for non-linear boundaries                      |
| **Gaussian Mixture Models (GMM)**         | Unsupervised generative model (uses EM algorithm) to model data                  | Clustering with soft labels, speaker recognition      |

---

### 📌 Equation Example:

Given:

$$
p(x|C_k) \sim \mathcal{N}(x | \mu_k, \Sigma)
$$

Then, posterior:

$$
P(C_1 | x) = \sigma(w^T x + w_0)
$$

Where:

* $w = \Sigma^{-1} (\mu_1 - \mu_2)$
* $w_0 = -\frac{1}{2} \mu_1^T \Sigma^{-1} \mu_1 + \frac{1}{2} \mu_2^T \Sigma^{-1} \mu_2 + \ln \frac{P(C_1)}{P(C_2)}$

This is how **generative model ends up using logistic-like posterior**, but it's still a generative approach.

---

## 🔴 2. Probabilistic Discriminative Models

### 📌 What it means:

These models **directly model the posterior** probability:

$$
P(y | x)
$$

without learning how the data was generated.

---

### ✅ Key Features:

* Focus on the **decision boundary** directly.
* Often more **accurate** than generative models when the goal is just classification.
* Can't generate new data.

---

### 📊 Common Discriminative Models:

| Model                      | Description                                                           | Usage Example                       |                                          |
| -------------------------- | --------------------------------------------------------------------- | ----------------------------------- | ---------------------------------------- |
| **Logistic Regression**    | Models (P(y                                                           | x)) using sigmoid/logistic function | Binary classification, medical diagnosis |
| **Softmax Regression**     | Extension of logistic regression for multi-class problems             | Handwritten digit recognition       |                                          |
| **Neural Networks**        | Powerful discriminative models that learn complex decision boundaries | Deep learning, image classification |                                          |
| **Support Vector Machine** | Maximizes margin between classes, often used with kernels             | Text classification, bioinformatics |                                          |

---

## 🟠 Quick Summary Table:

| Feature           | Generative Model                        | Discriminative Model         |       |
| ----------------- | --------------------------------------- | ---------------------------- | ----- |
| Learns P(x        | y) + P(y)                               | ✅ Yes                        | ❌ No  |
| Learns P(y        | x) directly                             | ❌ No (uses Bayes)            | ✅ Yes |
| Can generate data | ✅ Yes                                   | ❌ No                         |       |
| Model example     | LDA, QDA, Naive Bayes, GMM              | Logistic Regression, SVM, NN |       |
| Better for        | Small data, inference, generative tasks | Classification accuracy      |       |


---

## ✅ Which One Should You Use?

| Situation                         | Model to Prefer       |
| --------------------------------- | --------------------- |
| Simple and interpretable task     | Logistic Regression   |
| Well-separated Gaussian data      | LDA or QDA            |
| High-dimensional sparse text data | Naive Bayes           |
| Complex nonlinear classification  | Neural Networks / SVM |


---

#**QDA (Quadratic Discriminant Analysis)**

## 🔵 What is QDA?

### ✅ QDA = **Quadratic Discriminant Analysis**

QDA is a **probabilistic generative model** used for **classification**. It assumes:

* Each class generates data following a **multivariate Gaussian distribution**.
* Each class has its **own covariance matrix**.

This makes QDA more flexible than LDA (Linear Discriminant Analysis), which assumes **shared covariance**.

---

## 🔁 The Math Behind QDA:

For class $C_k$, we assume:

$$
p(x | C_k) \sim \mathcal{N}(x | \mu_k, \Sigma_k)
$$

Then, the posterior (using Bayes' theorem) is:

$$
P(C_k | x) = \frac{P(C_k) \cdot \mathcal{N}(x | \mu_k, \Sigma_k)}{\sum_j P(C_j) \cdot \mathcal{N}(x | \mu_j, \Sigma_j)}
$$

The decision boundary is found by comparing:

$$
\log P(C_1 | x) = \log P(C_2 | x)
$$

Due to different covariances ($\Sigma_k$), the boundary turns out to be a **quadratic surface**.

---

## 🔍 Key Differences: LDA vs QDA

| Feature             | LDA                                  | QDA                           |
| ------------------- | ------------------------------------ | ----------------------------- |
| Covariance matrix   | Shared across all classes ($\Sigma$) | Unique per class ($\Sigma_k$) |
| Decision boundary   | Linear                               | Quadratic                     |
| Flexibility         | Less flexible                        | More flexible                 |
| Risk of overfitting | Lower                                | Higher (if less data)         |

---


## 📈 Visualization (How the Boundary Looks)

QDA produces **curved/quadratic boundaries** between classes. This helps when:

* The class clouds are **not linearly separable**
* One class is more spread out or rotated than the other

![Visual Concept — QDA](https://scikit-learn.org/stable/_images/sphx_glr_plot_lda_qda_001.png)

*(Blue vs red: LDA = straight line, QDA = curved line)*

---

## ✅ When to Use QDA?

| Use QDA When...                                        |
| ------------------------------------------------------ |
| Classes are **not linearly separable**                 |
| You suspect **different covariance shapes per class**  |
| You have **enough data** to estimate multiple matrices |

### ⚠️ Avoid QDA if:

* You have **few samples** (too many parameters in QDA).
* Your data is actually **well-separated linearly** — LDA is safer.

---

## 🧠 Summary:

| Aspect            | QDA Explanation                                                           |
| ----------------- | ------------------------------------------------------------------------- |
| Type              | Probabilistic Generative Classifier                                       |
| Covariance        | Different per class                                                       |
| Shape of Boundary | Quadratic                                                                 |
| Pros              | More flexible than LDA                                                    |
| Cons              | Prone to overfitting with small data                                      |
| Real-world Use    | Complex medical diagnosis, speech recognition, handwriting classification |

---



In [1]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis
from sklearn.metrics import accuracy_score

# Load data
iris = load_iris()
X = iris.data
y = iris.target

# Use only 2 classes: Setosa (0) and Versicolor (1)
mask = y < 2
X = X[mask][:, :2]  # only first two features
y = y[mask]

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

# Scale the data
scaler = StandardScaler()
X_train_std = scaler.fit_transform(X_train)
X_test_std = scaler.transform(X_test)

# Apply QDA
qda = QuadraticDiscriminantAnalysis()
qda.fit(X_train_std, y_train)
y_pred = qda.predict(X_test_std)

# Accuracy
print("QDA Accuracy:", accuracy_score(y_test, y_pred))


QDA Accuracy: 1.0
