<a href="https://colab.research.google.com/github/sandipanpaul21/Dimensionality-Reduction-in-Python/blob/master/LDA.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# LDA

# PCA is an unsupervised while LDA is a supervised dimensionality reduction technique.
# PCA has no concern with the class labels. 
# In simple words, PCA summarizes the feature set without relying on the output. 
# PCA tries to find the directions of the maximum variance in the dataset. 
# In a large feature set, there are many features that are merely duplicate of the other features 
# or have a high correlation with the other features. 
# Such features are basically redundant and can be ignored. 
# The role of PCA is to find such highly correlated or duplicate features 
# and to come up with a new feature set where there is minimum correlation between the features 
# In other words feature set with maximum variance between the features. 
# Since the variance between the features doesn't depend upon the output, 
# therefore PCA doesn't take the output labels into account.

# Unlike PCA, LDA tries to reduce dimensions of the feature set while retaining the information 
# that discriminates output classes. 
# LDA tries to find a decision boundary around each cluster of a class. 
# It then projects the data points to new dimensions in a way that the clusters are as separate 
# from each other as possible and the individual elements within a cluster 
# are as close to the centroid of the cluster as possible. 
# The new dimensions are ranked on the basis of their ability to maximize the distance 
# between the clusters and minimize the distance between the data points within a cluster and
# their centroids. These new dimensions form the linear discriminants of the feature set.

In [1]:
# Libraries 
from sklearn import datasets 
import pandas as pd
import seaborn as sns
import numpy as np

  import pandas.util.testing as tm


In [3]:
# Load the Datasets 

# Iris Dataset for Classification
# Load Dataset
iris = datasets.load_iris()
# Convert to DataFrame
iris_pd = pd.DataFrame(iris.data)
# Feature Name
iris_pd.columns = iris.feature_names
# Target Variable
iris_pd["Class"] = iris.target
dataset = iris_pd
dataset.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),Class
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


In [4]:
# Data Preprocessing

# Once dataset is loaded into a pandas data frame object, 
# the first step is to divide dataset into features and corresponding labels 
# and then divide the resultant dataset into training and test sets. 

X = dataset.iloc[:, 0:4].values
y = dataset.iloc[:, 4].values

# Inference : 
# The above script assigns the first four columns of the dataset i.e. the feature set to X variable 
# while the values in the fifth column (labels) are assigned to the y variable.

In [5]:
# Training and Test sets (80:20)

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

In [6]:
# Feature Scaling

# As was the case with PCA, we need to perform feature scaling for LDA too. 
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

In [7]:
# LDA

from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA
lda = LDA(n_components=1)
X_train = lda.fit_transform(X_train, y_train)
X_test = lda.transform(X_test)

# Inference : 
#  Like PCA, we have to pass the value for the n_components parameter of the LDA, 
# which refers to the number of linear discriminates that we want to retrieve. 
# In this case we set the n_components to 1, since we first want to check the performance 
# of our classifier with a single linear discriminant. 
# Finally we execute the fit and transform methods to actually retrieve the linear discriminants.
# Notice, in case of LDA, the transform method takes two parameters: the X_train and the y_train. 
# However in the case of PCA, the transform method only requires one parameter i.e. X_train. 
# This reflects the fact that LDA takes the output class labels into account 
# while selecting the linear discriminants, while PCA doesn't depend upon the output labels.

In [9]:
# Training and Making Predictions

# Since we want to compare the performance of LDA with one linear discriminant 
# to the performance of PCA with one principal component, 
# we will use the same Random Forest classifier that we used to 
# evaluate performance of PCA-reduced algorithms.

from sklearn.ensemble import RandomForestClassifier
classifier = RandomForestClassifier(max_depth=2, random_state=0)
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)

In [10]:
# Evaluating the Performance

# As always, the last step is to evaluate performance of the algorithm 
# with the help of a confusion matrix and find the accuracy of the prediction. 
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score

cm = confusion_matrix(y_test, y_pred)
print(cm)
print('Accuracy' + str(accuracy_score(y_test, y_pred)))

# Inference : 
# You can see that with one linear discriminant, the algorithm achieved an accuracy of 100%

[[11  0  0]
 [ 0 13  0]
 [ 0  0  6]]
Accuracy1.0


In [None]:
# PCA vs LDA: What to Choose for Dimensionality Reduction?
# In case of uniformly distributed data, LDA almost always performs better than PCA.
# However if the data is highly skewed (irregularly distributed) 
# then it is advised to use PCA since LDA can be biased towards the majority class.

# Finally, it is beneficial that PCA can be applied to labeled as well as unlabeled data 
# since it doesn't rely on the output labels. 
# On the other hand, LDA requires output classes for finding linear discriminants and 
# hence requires labeled data.