# Dimensionality Reduction

Dimensionality reduction works by summarizing higher dimensional information in a smaller space. There are three main approaches:
1. Principal component analysis (PCA) - Used for unsupervised data compression.
2. Linear Discriminant Analysis (LDA) - Used for supervised dimensionality reduction which maximises class separability. 
3. Kernel principal component analysis - Used for nonlinear dimensionality reduction. 

## Principal Component Analysis (PCA)

PCA works by finding the directions of maximum variance in high-dimensional data before projecting it onto a smaller subspace. The result is the principal components are ordered by their variances.

The algorithm itself works as follows:
1. Standardize the $d$-dimensional dataset.
2. Construct the covariance matrix.
3. Decompose the covariance matrix into its eigenvectors and eigenvalues.
4. Select the top $k$ eigenvectors that correspond to the $k$ largest eigenvalues.
5. Construct a projection matrix using $k$ eigenvectors.
6. Transform the $d$-dimensional dataset using the projection matrix. 

In sklearn it looks something like this:
```
from sklean.preprocessing import StandardScaler
from sklearn.decomposition import PCA

stds = StandardScaler()
X_train_stds = stds.fit_transform(X_train)
X_test_stds = stds.transform(X_test)

pca = PCA(n_components=2) # n_components=None results in all components being returned.
X_train_pca = pca.fit_transform(X_train_stds)
X_test_pca = pca.transform(X_test_stds)
```

## Linear Discriminant Analysis (LDA)

Like PCA, LDA is a linear transformation. However, this case the end goal is to optimize class seperability. While LDA is better in many cases, PCA can often perform better on classes with small numbers of samples. 

The algorithm itself works as follows:
1. Standardize the $d$-dimensional dataset.
2. For each class, compute the $d$-dimensional mean vector.
3. Construct the between-class scatter matrix, and the within-class scatter matrix.
4. Compute the eignvectors and eigenvalues of the inverse within-class multiplied by the between class matrix.
5. Select the top $k$ eigenvectors that correspond to the $k$ largest eigenvalues.
6. Construct a projection matrix using $k$ eigenvectors.
7. Transform the $d$-dimensional dataset using the projection matrix. 

LDA assumes normal distributions and indepence; as well as identical covariance matrices. However, violation of these assumptions can still result in useful data. Although falling back to PCA might be advised. 

In sklearn it looks something like this:
```
from sklean.preprocessing import StandardScaler
from sklearn.lda import LDA

stds = StandardScaler()
X_train_stds = stds.fit_transform(X_train)
X_test_stds = stds.transform(X_test)

pca = LDA(n_components=2) # n_components=None results in all components being returned.
X_train_lda = lda.fit_transform(X_train_stds, y_train)
X_test_lda = lda.transform(X_test_stds)
```

## Kernel Principal Component Analysis

Where data seperability is a non-linear problem, the first step is to use kernels to project into a higher dimensionality space where data is linearly seperable before applying PCA. 

```
from sklearn.decomposition import KernelPCA
scikit_kpca = KernelPCA(n_components=2, kernel='rbf', gamma=15)
X_skernpca = scikit_kpca.fit_transform(X_train)
```