
# Principal Component Analysis (PCA) with Whitening Overview

This notebook provides an overview of Principal Component Analysis (PCA), focusing on the whitening transformation, its mathematical foundation, and a basic implementation using a dataset.



## Background

### Principal Component Analysis (PCA)

PCA is a dimensionality reduction technique that transforms data into a new coordinate system such that the greatest variances by any projection of the data come to lie on the first coordinates (called principal components), the second greatest variances on the second coordinates, and so on.

### Whitening in PCA

Whitening is a preprocessing step that decorrelates the input data, ensuring that all features have unit variance. This is often done after performing PCA to further transform the data and make it suitable for certain machine learning algorithms. Whitening can reduce redundancy in the dataset and improve model performance.

### Applications of PCA and Whitening

PCA is widely used for data compression, noise reduction, and visualization. Whitening is particularly useful in neural network training and independent component analysis.



## Mathematical Foundation

### PCA

Given a dataset \( X \) with zero mean, PCA involves the following steps:

1. **Covariance Matrix**: Compute the covariance matrix \( \Sigma \):

\[
\Sigma = \frac{1}{n} X^T X
\]

2. **Eigen Decomposition**: Compute the eigenvalues and eigenvectors of \( \Sigma \):

\[
\Sigma v = \lambda v
\]

3. **Principal Components**: The eigenvectors corresponding to the largest eigenvalues are the principal components. Project the data onto these components:

\[
Z = X W
\]

Where \( W \) is the matrix of principal components.

### Whitening

Whitening involves scaling the principal components such that the resulting features are uncorrelated and have unit variance. This can be achieved by dividing the principal components by the square root of their eigenvalues:

\[
Z_{\text{whitened}} = Z \Lambda^{-\frac{1}{2}}
\]

Where \( \Lambda \) is the diagonal matrix of eigenvalues.



## Implementation in Python

We'll implement PCA with and without whitening using Scikit-Learn on the Iris dataset.


In [None]:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.datasets import load_iris

# Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Apply PCA without whitening
pca = PCA(n_components=2, whiten=False)
X_pca = pca.fit_transform(X)

# Apply PCA with whitening
pca_whitened = PCA(n_components=2, whiten=True)
X_pca_whitened = pca_whitened.fit_transform(X)

# Plot the results
plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
plt.scatter(X_pca[:, 0], X_pca[:, 1], c=y, cmap='viridis', edgecolor='k')
plt.title("PCA without Whitening")
plt.xlabel("Principal Component 1")
plt.ylabel("Principal Component 2")

plt.subplot(1, 2, 2)
plt.scatter(X_pca_whitened[:, 0], X_pca_whitened[:, 1], c=y, cmap='viridis', edgecolor='k')
plt.title("PCA with Whitening")
plt.xlabel("Principal Component 1")
plt.ylabel("Principal Component 2")

plt.tight_layout()
plt.show()



## Conclusion

This notebook provided an overview of Principal Component Analysis (PCA), focusing on the whitening transformation. We implemented PCA with and without whitening using Scikit-Learn on the Iris dataset, demonstrating the effects of whitening on the transformed data. Whitening helps in making the features uncorrelated and of equal variance, which can be beneficial in certain machine learning tasks.
