<a id="Dimensionality Reduction"></a>

# Lab 09 - Dimensionality Reduction

***
In this lab session we will learn
   * Data Compression via PCA (Principle Component Analysis)
   * PCA and others using scikit-learn function
   * Imputer from sklearn preprocessing


__ Dimensionality reduction summarizes the information contents of a higher dimension dataset which is usually difficult to understand & visualize and transforming them into lower dimension space. This approach of dimensionality reduction is also referred to as feature extraction or Data Compression [1]. __

[1]. Sebastian Raschka. 2015. Python Machine Learning. Packt Publishing.

# Linear Reduction

## 1. PCA
* Unsupervised linear transformation
* Applications include dimensionality reduction, exploratory data analysis, de-noisinf signals etc
* Finds the direction of max variuance in high dimension data and projects into new subspace
* No of components can be equal to or less than the original 

In Tuesday's class, you had opportunity to look into applying PCA and LDA for iris and blob dataset respectively. Now, lets apply PCA & LDA for **wine dataset** using function from scikit learn. Before dwelling into inbuilt functions, it would be worth to look 
* into the functioning of PCA by computing covraiance and eigen values
* procedure to select the no of components for higher dimension datasets

In [None]:
import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
%matplotlib inline
from sklearn.datasets import load_wine

In [None]:
wine = load_wine()
df = pd.DataFrame(wine.data, columns=wine.feature_names)
df.head()

> ## 1.1 Principal Component Analysis - Internal working of PCA (without scikit-learn)

Mathematical approach for PCA was detailed in the class. Refer to week09 lecture notes. 

Steps to implement PCA is briefed below:
1. Standardize the dataset
2. construct the covariance matrix
3. get the eigen vectors and values by decomposing the covariance matrix
4. opt for 'n' eigen vectors that correspons to 'n' largest eigen values where n is the no of components that is needed to be reduced to.
5. Project mtrix from top 'n' eigen vectors
6. transform dataset using project matrix to new 'n' dimenional feature subspace

* __Standardize the dataset__

In [None]:
# standardize the dataset
x = StandardScaler().fit_transform(df)

# Use pandas dataframe for easy handling
X = pd.DataFrame(x, columns = wine.feature_names )
X.head()

* __Covariance matrix__

In [None]:
import numpy as np
CovMatrix = np.cov(X.T)

* __Eigen Vectors and Values__

In [None]:
eigen_val, eigen_vec = np.linalg.eig(CovMatrix)

* __Opt for 'n' eigen vectors that correspons to 'n' largest eigen values__
* __Project matrix from top 'n' eigen vectors__
* __Transform dataset using project matrix to new 'n' dimenional feature subspace__

In [None]:
# constructing a projection matrix
eigen_pairs = [(np.abs(eigen_val[i]), eigen_vec[:,i]) for i in range(len(eigen_val))]
# plt.bar(eigen_pairs)
eigen_pairs.sort(reverse = True)
# print (eigen_pairs
w = np.hstack((eigen_pairs[0][1][:, np.newaxis], eigen_pairs[1][1][:, np.newaxis]))
# print (w)

PCA_X = pd.DataFrame(X.dot(w))
PCA_X.columns = ['PC 1', 'PC 2']
species = pd.DataFrame(wine.target, columns =['class'])
PCA_X = pd.concat([PCA_X, species], axis = 1)

* __Visualizing the PC1 and PC2 gives us few insight__

In [None]:
fig = plt.figure(figsize = (8,8))
ax = fig.add_subplot(1,1,1) 
ax.set_xlabel('Principal Component 1', fontsize = 15)
ax.set_ylabel('Principal Component 2', fontsize = 15)
ax.set_title('2 Component PCA', fontsize = 20)


targets = [0 , 1, 2]
colors = ['r', 'g', 'b']
for target, color in zip(targets, colors):
    indicesToKeep = PCA_X['class'] == target
    ax.scatter(PCA_X.loc[indicesToKeep, 'PC 1'], PCA_X.loc[indicesToKeep, 'PC 2'], c=color, s = 50)
ax.legend(targets)
ax.grid()

> ## 1.2 How many dimension should we keep ??

Choosing the right no of dimensions- 

* Cumulative percentage of variance - 70% to 95%
* Scree plot - plot of eigen values, look for elbow
* The Broken Stick 
* Size of variance - variance greater than 0.7

In [None]:
print (eigen_val)

tot = sum(eigen_val)
var_exp = [(i/tot) for i in sorted(eigen_val, reverse = True)]
cum_var_exp = np.cumsum(var_exp)

plt.plot(range(1,14), var_exp,'-*')
plt.step(range(1,14), cum_var_exp, c='r', where='mid')

In [None]:
cum_var_exp

## Scikit-learn method for PCA

Now using scikit learn modules, implement the PCA.
* Use no of component to 2 to easy visualization
* Plot the PC1 and PC2 
* Use different colors for class of wine
* Do we need the target labels or y here?

## <font color='red'>1.3 Repeat section 1.1 but using PCA using Scikit-learn</font>

1. https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html

2. https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html#sklearn.preprocessing.StandardScaler.fit_transform
3. https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html

In [None]:
# import PCA here 
from sklearn.decomposition import PCA


# standardize the dataset, use link 1
x = 

# Use pandas dataframe for easy handling
X = pd.DataFrame(x, columns =  )
# X.head()


# for simplification lets Opt for only 2 components
pca_sci = PCA()

# use fit_transform
PCs = pca_sci


# pca has attribute that gives out the eigen values and its ratio. use dir(pca) to check the list of acceptable attribute
print ()

# Transforming all the array and list data into pandas for easy control
PCs_df = pd.DataFrame(data = PCs, columns = ['PC 1', 'PC 2'])
species = pd.DataFrame(wine.target, columns =['class'])
PCs_df = pd.concat([PCs_df, species], axis = 1)
print (PCs_df.head())

In [None]:
# You can use the piece of code for ploting the scatter plot
fig = plt.figure(figsize = (8,8))
ax = fig.add_subplot(1,1,1) 
ax.set_xlabel('Principal Component 1', fontsize = 15)
ax.set_ylabel('Principal Component 2', fontsize = 15)
ax.set_title('2 Component PCA', fontsize = 20)

targets = [0 , 1, 2]
colors = ['r', 'g', 'b']
for target, color in zip(targets, colors):
    indicesToKeep = PCs_df['class'] == target
    ax.scatter(PCs_df.loc[indicesToKeep, 'PC 1'], PCs_df.loc[indicesToKeep, 'PC 2'], c=color, s = 50)
ax.legend(targets)
ax.grid()

### <font color='red'> 1.3.1 What percentage of variance does PC1 and PC2 together give? will only 2 component reproduces the original dataset faithfully?</font>

Look into the attributes section: https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html

In [None]:
print ()

**Together, the first two principal components contain only %55.4 of the information. The first principal component contains 36.2% of the variance and the second principal component contains 19.2% of the variance.**

### <font color='red'>1.3.2 Without using loop, can we iterate the no of component and obtain their variance?</font> 

In [None]:
# Modify the code below
pca1 = 
PCs = pca1.fit_transform()
np.cumsum(pca1)

### <font color='red'>1.3.3 Choosing the right dimension </font> 

* we know that we can set the dimension in the PCA using 'n_components= 2 or 3 or n-1' where n is the no of dimension. we can fixate on the dimension by gauging using the scree or looking into variance ratio
* there is another way to PCA; by chosing the desired variance. This is done by chosing the n_components< 1
* Find how many components needed to preserve a variance of atleast 60, 75 and 95 % 

In [None]:
# 60% 
pca2 = PCA()
X_reduced = 
X_reduced.shape

In [None]:
# 75% 


In [None]:
# 95% 


## 2. LDA

LDA features
* Linear transformation technique
* Also known as **Fisher's LDA**
* used for feature extraction 
* increases computation efficiency
* reduces overfitting
* In contrast to PCA methodology i.e., finding orthogonal component axes with maximum variance , LDA works on finding feature subspace that optimise class separability
* Supervised


For internal working of LDA refer to section - Supervised data compression via Linear Discriminant Analysis (LDA), from textbook by Sebastian Raschka, 2015, "Python Machine Learning", Packt Publishing.

Also see: 

http://scikit-learn.org/stable/modules/lda_qda.html

http://scikit-learn.org/0.16/modules/generated/sklearn.lda.LDA.html

Note: 

Some version may not import using this line of code: "from sklearn.lda import LDA", instead import from sklearn.discriminant_analysis


## <font color='red'> 2.1 Using links from the above, implement the LDA from scikit-learn module </font> 

* Perform using Singular Value Decomposition (**SVD**)
* For simplicity chose two components
* Implement first for wine dataset

In [None]:
y = pd.DataFrame(wine.target, columns =['class'])

In [None]:
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA

# from sklearn.lda import LDA, use SVD
lda = LDA()

X_lda = lda.fit_transform()

In [None]:
LDA_df = pd.DataFrame(data = X_lda, columns = ['LD1', 'LD2'])
LDA_df = pd.concat([LDA_df, species], axis = 1)
print (LDA_df.head())

In [None]:
# You can use the same piece of code used above for PCA to plot scatte
fig = plt.figure(figsize = (8,8))
ax = fig.add_subplot(1,1,1) 
ax.set_xlabel('LD1', fontsize = 15)
ax.set_ylabel('LD2', fontsize = 15)
ax.set_title('2 Component Linear Disciminant', fontsize = 20)

targets = [0 , 1, 2]
colors = ['r', 'g', 'b']
for target, color in zip(targets, colors):
    indicesToKeep = LDA_df['class'] == target
    ax.scatter(LDA_df.loc[indicesToKeep, 'LD1'], LDA_df.loc[indicesToKeep, 'LD2'], c=color, s = 50)
ax.legend(targets)
ax.grid()

## <font color='red'>2.2 Compare between PCA and LDA. Which method results in more fidelity response to Dataset with only two components?</font>


In [None]:
print ()

** LDA seems to produce good result with two components contributing to almost 99%.** 

# 3. Nonlinear Methods for Dimensionality reduction

In the class you had learnt about the dimensionality reduction for nonlinear dataset. 

Refer to section 3.1 on Multidimensional scaling (**MDS**), section 3.2 for Manifold Learnings , particularly the Locally Linear Embedding (**LLE**), Isometric Feature Mapping (**ISOMAP**) and Hessian EigenMaps.

We will now implement those nonlinear dimensionality reduction concept on a Nonlinear dataset. S-curve manifold.


http://scikit-learn.org/stable/modules/manifold.html#manifold


An example of application of manifold learning on digits dataset is in the link : http://scikit-learn.org/stable/auto_examples/manifold/plot_lle_digits.html#sphx-glr-auto-examples-manifold-plot-lle-digits-py


### Implement Nonlinear methods studied in class to S-curve dataset

In [None]:
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from matplotlib.ticker import NullFormatter

from sklearn import manifold, datasets

Axes3D

n_points = 5000
X, color = datasets.samples_generator.make_s_curve(n_points, random_state=10)
n_neighbors = 10
n_components = 2

In [None]:
fig = plt.figure(figsize=(15, 8))
plt.suptitle("S-Curve dataset", fontsize=20)
ax = fig.add_subplot(111, projection='3d')
ax.scatter(X[:, 0], X[:, 1], X[:, 2], c=color, cmap=plt.cm.Spectral)
ax.view_init(10, -60)

### We now have the S-curve dataset, apply the linear transformation methods like PCA and observe the significance of Nonlinear & geometric methods.

### <font color='Red'>Do the following: </font>
* ### <font color='Red'> Apply PCA for desired variance of 90% </font>
* ### <font color='Red'> Apply PCA for n_component = 2 </font>
* ### <font color='Red'> Apply LLE </font>
* ### <font color='Red'> Apply ISOMAP </font>
* ### <font color='Red'> Apply Hessian EigenMaps </font>

## PCA

In [None]:
# import PCA here 
from sklearn.decomposition import PCA

# main the desired variance of 90%
pca_S = 
Y = pca_S.


print (Y.shape)

ax = fig.add_subplot(111)
plt.scatter(Y[:, 0], np.zeros((n_points,1))+0.01, c=color, cmap=plt.cm.Spectral)

In [None]:
pca = PCA()

Y = pca.




ax = fig.add_subplot(111)
plt.scatter(Y[:, 0], Y[:, 1], c=color, cmap=plt.cm.Spectral)

### Important Note: 
### PCA is an unsupervised method and doesnot use class label to maximize the variance, which is not the case with LDA method.

## LLE

https://scikit-learn.org/stable/modules/generated/sklearn.manifold.LocallyLinearEmbedding.html

In [None]:
# use no of neighbors 10 and no of component 2.

Y = 


ax = fig.add_subplot(111)
plt.scatter(Y[:, 0], Y[:, 1], c=color, cmap=plt.cm.Spectral)
plt.axis('tight')

## ISOMAP

https://scikit-learn.org/stable/modules/generated/sklearn.manifold.Isomap.html

In [None]:
Y = 



ax = fig.add_subplot(111)
plt.scatter(Y[:, 0], Y[:, 1], c=color, cmap=plt.cm.Spectral)
plt.axis()

## Hessian EigenMaps

Hessian eigen maps can be realized from LLE. Refer to the below link.

https://scikit-learn.org/stable/modules/generated/sklearn.manifold.LocallyLinearEmbedding.html

In [None]:
Y = 



ax = fig.add_subplot(111)
plt.scatter(Y[:, 0], Y[:, 1], c=color, cmap=plt.cm.Spectral)
plt.axis('tight')

## Extra lab works:

### Repeat the above PCA and LDA for Cancer dataset and apply SVM for the reduced Xtrain and Xtest