In [3]:
import numpy as np
import pandas as pd
import matplotlib.pylab as plt
import seaborn as sns
import statsmodels.api as sm

  return f(*args, **kwds)


## LDA vs PCA
- Linear Discriminant Analysis is most commonly used as `dimensionality reduction` technique in the pre-processing step for pattern-classification and machine learning applications.
- The general LDA approach is very similar to a PCA but in addition to finding the component axes that `maximize the variance of our data(PCA)` we are additionally interested in the axes that `maaximize the separation between multiple clasees(LDA)`
- Both are `linear transformation` techniques that are commonly used for dimensionality reduction.
- PCA can be described as an `unsupervised algorithm , maximize the variance in a dataset`
- LDA is `supervised` and computes the `directions`
- PCA : component axes that maximize the variance.
- LDA : maximizing the component axies for class-separation


### Summarizing the LDA approach in 5 steps
- Compute the d-dimensional mean vectors for the different classes from the dataset.
- Compute the scatter matrices (in-between-class and within-class scatter matrix).
- Compute the eigenvectors (ee1,ee2,...,eed) and corresponding eigenvalues (λλ1,λλ2,...,λλd) for the scatter matrices.
- Sort the eigenvectors by decreasing eigenvalues and choose k eigenvectors with the largest eigenvalues to form a d×k dimensional matrix WW (where every column represents an eigenvector).
- Use this d×k eigenvector matrix to transform the samples onto the new subspace. This can be summarized by the matrix multiplication: YY=XX×WW (where XX is a n×d-dimensional matrix representing the n samples, and yy are the transformed n×k-dimensional samples in the new subspace).

### Preparing the sample data set
- About the Iris dataset

In [7]:
from sklearn.datasets import load_iris
iris = load_iris()
dfX = pd.DataFrame(iris.data,columns=iris.feature_names)
dfy = pd.DataFrame(iris.target,columns=['y'])
df = pd.concat([dfX,dfy],axis=1)
df.tail()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),y
145,6.7,3.0,5.2,2.3,2
146,6.3,2.5,5.0,1.9,2
147,6.5,3.0,5.2,2.0,2
148,6.2,3.4,5.4,2.3,2
149,5.9,3.0,5.1,1.8,2


In [11]:
from sklearn.model_selection import train_test_split
train_X,test_X,train_y,test_y = train_test_split(iris.data,iris.target,test_size=0.2,random_state=0)

### Normality assumptions
- It should be mentioned that LDA assumes `normal distributed data`, features that are statistically independent,and `identical covariance` matrices for every class, However, this only applies for LDA as classifier and LDA for dimensionality reduction can also work reasonably well if those `assumptions are violated`
- `“linear discriminant analysis frequently achieves good performances in the tasks of face and object recognition, even though the assumptions of common covariance matrix among groups and normality are often violated (Duda, et al., 2001)” (Tao Li, et al., 2006).`

## A comparison of PCA and LDA
- In order to compare the features subspace that we obtained via the LDA, we will use the PCA class from the scikit-learn machine-learning lib. The documentation can be found here

## LDA via scikit-learn
Now after we have seen how an Linear Discriminant Analysis works using a step-by-step approach, there is also a more convenient way to achive the same via the LDA class implemented in the scikit-learn machine learning lib

In [29]:
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA
# LDA
lda = LDA(n_components=2)
train_X = lda.fit_transform(train_X,train_y)

In [30]:
lda.explained_variance_ratio_,sum(lda.explained_variance_ratio_)

(array([0.99128129, 0.00871871]), 1.0)

### `explained_variance_ratio_`
Percentage of variance explained by each of the selected components. If `n_components` is not set then all components are stored and the sum of explained variances is equal to 1.0. Only available when eigen or svd solver is used.

## Comparison of LDA and PCA 2D projection of Iris datset
- The iris dataset represents 3 kind of Iris flowers with 4 attiributes
- `PCA` applied to this data identifies the combination of attributes that account for the `most variance` in the data. 
- `LDA` tries to identify attributes that account for the `most variance between classes` In particular,LDA,in contrast to PCA, is a `supervised method,using known class labels`

In [37]:
print(__doc__)
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.decomposition import PCA
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

iris = load_iris()
X,y = iris.data,iris.target

pca = PCA(n_components=2)
X_r = pca.fit(X).transform(X)

lda = LinearDiscriminantAnalysis(n_components=2)
X_r2 = lda.fit(X,y).transform(X)

print('explained variance ratio(first two components): %s' \
      %str(pca.explained_variance_ratio_))
print('explained variance ratio(first two components): %s' \
      %str(lda.explained_variance_ratio_))

Automatically created module for IPython interactive environment
explained variance ratio(first two components): [0.92461621 0.05301557]
explained variance ratio(first two components): [0.99147248 0.00852752]
