# Dimensionality Reduction

### Principal Component Analysis (PCA)
 - Linear dimensionality reduction using Singular Value Decomposition of the data to project it to a lower dimensional space.
 - Statistical procedure that utilizes orthogonal transformation technology.
 - Convert possible correlated features (predictors) into linearly uncorrelated features called principal components.
 - \# of principal components <= number of features.
 - First principal component explains the largest possible variance.
 - Each subsequent component has the highest variance subject to the restriction that it must be orthogonal to the preceding components.
 - A collection of the components are called vectors.
 - Sensitive to scaling.
 
 
### Linear Discriminant Analysis (LDA)
 - Most commonly used as a dimensionality reduction technique in the pre-processing step for pattern-classification and machine learning applications.
 - Goal is to project a dataset onto a lower-dimensional space with good class-separability in order to avoid overfitting ("curse of dimensionality") and also reduce the computational costs.
 - Locate the 'boundaries' around clusters of classes.
 - Projects data points on a line.
 - A centroid will be allocated to each cluster or have a centroid nearby.
 - Maximizing the component axes for class-separation.

***

## Linear Algebra Refresher

Linear Algebra revision:
$$A=\begin{bmatrix} 1. & 2. \\ 10. & 20. \end{bmatrix}$$

$$B=\begin{bmatrix} 1. & 2. \\ 100. & 200. \end{bmatrix}$$

\begin{align}
A \times B & = \begin{bmatrix} 1. & 2 \\ 10. & 20. \end{bmatrix} \times \begin{bmatrix} 1. & 2. \\ 100. & 200. \end{bmatrix} \\ & = \begin{bmatrix} 201. & 401. \\ 2010. & 4020. \end{bmatrix} \\
\end{align}

By parts:
$$A \times B = \begin{bmatrix} 1. \times 1. + 2. \times 100. & 1. \times 2. + 2. \times 200. \\
10. \times 1. + 20. \times 100. & 10. \times 2. + 20. \times 200. \end{bmatrix}$$


**Libraries**

In [1]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import pandas as pd
import seaborn as sns
sns.set()

In [2]:
A = [[1., 2.], [10., 20.]]
B = [[1., 2.], [100., 200.]]

In [4]:
A, B

([[1.0, 2.0], [10.0, 20.0]], [[1.0, 2.0], [100.0, 200.0]])

In [5]:
np.dot(A, B)

array([[ 201.,  402.],
       [2010., 4020.]])