Introduction
Linear discriminant analysis (LDA) is a dimensionality reduction technique used for classification problems with continuous independent variables. This technique can be used in a wide variety of applications including image recognition, marketing, biological classification, and more. It works by using linear algebra techniques to find a subspace within the space of independent variables that simplifies classification problems.

How LDA Works
Let’s start with a simple example. Here’s a classification problem with two continuous independent variables and two classes: a red class and a blue class.

![image.png](attachment:image.png)

In this example you could easily draw a line that would separate most of the red data points from the blue ones. But could you simplify the problem by reducing the data to a single dimension? If we project the data onto a 1-dimensional subspace, we might be able to separate red from blue with only one dimension instead of two.

Here are two different ways of projecting the data onto 1-dimensional subspaces.

In the first example, there’s no good way to separate red from blue. But in the second example you could easily choose a good decision boundary. This second subspace isn’t just any subspace, it’s actually the subspace that does the best job of separating red and blue. It was obtained by using LDA.

This is what LDA does in general. It finds the best possible subspace for a given classification problem. When there are only two classes, LDA finds a subspace that maximizes the ratio of variance between the classes to variance within the classes. This means that it finds a subspace where classes are far apart from each other, but the observations within each individual class are close to each other. LDA can also be applied when there are more than two classes, but this is slightly more complicated.

Once LDA finds the best subspace, data points can be projected onto that subspace. This yields a data set with fewer dimensions but nearly as much predictive power. LDA can also be used as a classifier by itself with the additional step of computing a decision boundary.

Notice that LDA is similar to Principal Component Analysis (PCA). Both LDA and PCA are dimensionality reduction tools. They both reduce dimensions in a similar way: by using linear algebra to find optimal subspaces for a statistical or machine learning problem.

A key difference between LDA and PCA is that they have different applications. PCA uses linear algebra techniques to find a subspace that maximizes the variance of a data set. This means that PCA basically finds the subspace that is best for linear regression problems. LDA, on the other hand, uses linear algebra techniques to find a subspace that is best for classification problems.

While LDA and PCA are similar, LDA can only be used in classification problems where the dependent variable is discrete.

In [2]:
# Import libraries
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

# Load data
beans = pd.read_csv('Dry_Bean.csv')
X = beans.drop('Class', axis=1)
y = beans['Class']

# Create an LDA model
lda = LinearDiscriminantAnalysis(n_components=1)

# Fit lda to X and y and create a subspace X_new
X_new = lda.fit_transform(X, y)

# Create a logistic regression model
lr = LogisticRegression()

# Fit lr to X_new and y
lr.fit(X_new, y)

# Model accuracy
lr_acc = lr.score(X_new, y)
print(lr_acc)



0.6549114686650503


LDA as a classifier
We can also use LDA itself as a classifier. In this case, scikit-learn will project the data onto a subspace and then find a decision boundary that is perpendicular to that subspace. Put simply, it will do LDA and use the result to find a linear decision boundary.

Since we’ve already fit lda to X and y, we can simply look at the accuracy of the model by using the score() method.

In [3]:
print(lda.score(X, y))

0.9050033061494379


In [None]:
# Import libraries
import pandas as pd
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

# Load data
beans = pd.read_csv('beans.csv')
X = beans.drop('Class', axis=1)
y = beans['Class']

# Create LDA model
lda = LinearDiscriminantAnalysis(n_components=1)

# Fit the data and create a subspace X_new
lda.fit(X,y)

# Print LDA classifier accuracy
lda_acc = (lda.score(X,y))
print(lda_acc)
