# Module 1: Introduction to Scikit-Learn

## Part 8: Quadratic Discriminant Analysis (QDA)

In this part, we will explore Quadratic Discriminant Analysis (QDA), a classification algorithm that relaxes the equal covariance assumption of Linear Discriminant Analysis (LDA). QDA is useful when the class covariances are different, and it can capture more complex decision boundaries.

### 8.1 Understanding Quadratic Discriminant Analysis (QDA)

Quadratic Discriminant Analysis (QDA) is a supervised classification algorithm designed to separate multiple classes using a linear combination of features. Unlike Linear Discriminant Analysis (LDA), QDA offers increased flexibility by allowing for distinct covariance structures among classes. This adaptability is valuable when class covariances vary significantly.

It's important to note that Quadratic Discriminant Analysis (QDA) differs from Linear Discriminant Analysis (LDA) in its fundamental purpose. QDA is primarily a classification algorithm rather than a dimensionality reduction technique. Unlike LDA, which seeks to reduce dimensionality while preserving class separability, QDA leverages Gaussian distributions and computes separate covariance matrices for each class. Consequently, it doesn't aim to reduce the dimensionality of the data.

In QDA, each class is characterized by its unique covariance matrix, and the resulting decision boundaries assume quadratic forms. This enables QDA to capture complex, non-linear decision boundaries, making it advantageous when dealing with non-linear class distributions.

Overall, QDA provides a powerful tool for classification tasks, particularly in scenarios where class covariance structures exhibit substantial disparities or when non-linear decision boundaries are essential.

### 8.2 Training and Evaluation

To train a QDA model, we need a labeled dataset with the target variable and the corresponding feature values. The model learns by estimating the class means and class covariances based on the training data.

Once trained, we can evaluate the model's performance using evaluation metrics suitable for classification tasks, such as accuracy, precision, recall, F1-score, or area under the ROC curve (AUC-ROC).

Scikit-Learn provides the QuadraticDiscriminantAnalysis class for performing QDA. Here's an example of how to use it:

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis, QuadraticDiscriminantAnalysis
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Create a synthetic dataset. 2 Gaussians samples with different covariance matrices
n, dim = 300, 2
C = np.array([[0.0, -1.0], [2.5, 0.7]]) * 2.0
X = np.r_[np.dot(np.random.randn(n, dim), C),np.dot(np.random.randn(n, dim), C.T) + np.array([1, 4]),]
y = np.hstack((np.zeros(n), np.ones(n)))
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Create an LDA model
lda = LinearDiscriminantAnalysis()
lda.fit(X_train, y_train)
# Predict the target values for testing data using LDA
y_pred_lda = lda.predict(X_test)
# Calculate accuracy for LDA
accuracy_lda = accuracy_score(y_test, y_pred_lda)
print("LDA Accuracy:", accuracy_lda)

# Create a QDA model
qda = QuadraticDiscriminantAnalysis()
qda.fit(X_train, y_train)
# Predict the target values for testing data using QDA
y_pred_qda = qda.predict(X_test)
# Calculate accuracy for QDA
accuracy_qda = accuracy_score(y_test, y_pred_qda)
print("QDA Accuracy:", accuracy_qda)

# Create a meshgrid for decision boundary plotting
xx, yy = np.meshgrid(np.linspace(X[:, 0].min() - 1, X[:, 0].max() + 1, 200),
                     np.linspace(X[:, 1].min() - 1, X[:, 1].max() + 1, 200))
X_grid = np.c_[xx.ravel(), yy.ravel()]

# Calculate Z for decision boundaries
zz_lda = lda.predict(X_grid).reshape(xx.shape)
zz_qda = qda.predict(X_grid).reshape(xx.shape)

# Create scatter plots for the original data
plt.figure(figsize=(12, 5))
plt.subplot(1, 3, 1)
plt.scatter(X[:, 0], X[:, 1], c=y, cmap='viridis', edgecolors='k')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('Original Data')

plt.subplot(1, 3, 2)
plt.contourf(xx, yy, zz_lda, cmap='viridis', alpha=0.5)
plt.scatter(X[:, 0], X[:, 1], c=y, cmap='viridis', edgecolors='k')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('LDA Decision Boundary')

plt.subplot(1, 3, 3)
plt.contourf(xx, yy, zz_qda, cmap='viridis', alpha=0.5)
plt.scatter(X[:, 0], X[:, 1], c=y, cmap='viridis', edgecolors='k')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('QDA Decision Boundary')
plt.tight_layout()
plt.show()

This code demonstrates the use of Linear Discriminant Analysis (LDA) and Quadratic Discriminant Analysis (QDA) on a synthetic dataset with two Gaussian clusters, each having different covariance matrices. The data points in each cluster are not only distributed differently along each feature but also have different levels of variability or correlation between features. This creates a more complex and non-linear data distribution, which can be challenging for some machine learning algorithms to classify accurately.

The dataset is split into training and testing sets, and both LDA and QDA models are applied.

LDA is first utilized to create a linear decision boundary for classification. The accuracy of LDA on the test data is approximately 63.33%. Subsequently, QDA is employed, which allows for more flexible decision boundaries due to its consideration of class-specific covariance matrices. QDA achieves a higher accuracy of around 78.89% on the same test data.

### 8.3 Summary

Quadratic Discriminant Analysis (QDA) is a useful classification algorithm that relaxes the equal covariance assumption of LDA and allows for different class covariances. It can capture more complex decision boundaries and is particularly useful when the class distributions have different covariance structures.