# Linear Discriminant Analysis (LDA) Step-by-Step Example

This notebook provides a detailed, step-by-step guide to applying Linear Discriminant Analysis (LDA) on a dataset. We'll cover each step with explanations, code, and visualizations.

## Step 1: What is LDA?

Linear Discriminant Analysis (LDA) is a dimensionality reduction technique that is commonly used for supervised classification problems. Unlike PCA, which is unsupervised and focuses on maximizing variance, LDA aims to maximize the separation between multiple classes.

LDA works by projecting the data onto a lower-dimensional space with a focus on maximizing class separability.

## Step 2: Importing Required Libraries

We'll start by importing the necessary libraries for this analysis, including NumPy, Pandas, Matplotlib, and Scikit-learn.


In [None]:
import numpy as np
import pandas as pd
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
import matplotlib.pyplot as plt


## Step 3: Loading and Understanding the Dataset

For this demonstration, we'll use the Iris dataset, which contains 150 samples of iris flowers with 4 features each. The goal is to classify the flowers into one of three species.


In [None]:
# Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target
df = pd.DataFrame(X, columns=iris.feature_names)
df['species'] = y

df.head()


## Step 4: Splitting the Dataset

We'll split the dataset into a training set and a testing set to evaluate the performance of the LDA model.


In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)


## Step 5: Data Standardization

Just like with PCA, LDA is affected by the scale of the data. We'll standardize the features before applying LDA.


In [None]:
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)


## Step 6: Applying LDA

Now, we'll apply LDA to reduce the dimensionality of the data. In this case, we'll reduce the data from 4 features down to 2 linear discriminants.


In [None]:
lda = LDA(n_components=2)
X_train_lda = lda.fit_transform(X_train, y_train)
X_test_lda = lda.transform(X_test)


## Step 7: Visualizing the LDA Results

We can visualize the data in the new 2D space defined by the two linear discriminants. This will show how well LDA has separated the different classes.


In [None]:
plt.figure(figsize=(8, 6))
for species in np.unique(y_train):
    plt.scatter(X_train_lda[y_train == species, 0], X_train_lda[y_train == species, 1], label=iris.target_names[species])
plt.xlabel('Linear Discriminant 1')
plt.ylabel('Linear Discriminant 2')
plt.title('LDA: Iris Training Set')
plt.legend(loc='best')
plt.grid(True)
plt.show()


## Step 8: Model Evaluation

Finally, we'll evaluate the performance of the LDA model by predicting the species on the test set and comparing it with the actual species.


In [None]:
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

# Predicting the test set results
y_pred = lda.predict(X_test)

# Evaluating the model
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred, target_names=iris.target_names)

print(f'Accuracy: {accuracy:.2f}')
print('Confusion Matrix:')
print(conf_matrix)
print('Classification Report:')
print(class_report)
