# 11 â€” PCA from Scratch

Complete step-by-step Principal Component Analysis with mathematical intuition and NumPy implementation.

## 1. What is PCA?
PCA is an unsupervised dimensionality reduction technique.

Goals:
- Reduce dimensionality
- Preserve maximum variance
- Remove redundancy

## 2. Mathematical Idea
PCA finds orthogonal directions (principal components) that maximize variance.
These directions are eigenvectors of the covariance matrix.

In [1]:
import numpy as np

X = np.array([[2.5, 2.4],
              [0.5, 0.7],
              [2.2, 2.9],
              [1.9, 2.2],
              [3.1, 3.0],
              [2.3, 2.7],
              [2.0, 1.6],
              [1.0, 1.1],
              [1.5, 1.6],
              [1.1, 0.9]])

## 3. Step 1: Mean Centering

In [2]:
X_mean = np.mean(X, axis=0)
X_centered = X - X_mean
X_mean

array([1.81, 1.91])

## 4. Step 2: Covariance Matrix

In [3]:
cov_matrix = np.cov(X_centered.T)
cov_matrix

array([[0.61655556, 0.61544444],
       [0.61544444, 0.71655556]])

## 5. Step 3: Eigen Decomposition

In [4]:
eig_vals, eig_vecs = np.linalg.eig(cov_matrix)
eig_vals, eig_vecs

(array([0.0490834 , 1.28402771]),
 array([[-0.73517866, -0.6778734 ],
        [ 0.6778734 , -0.73517866]]))

## 6. Step 4: Sort Eigenvalues

In [5]:
idx = np.argsort(eig_vals)[::-1]
eig_vals = eig_vals[idx]
eig_vecs = eig_vecs[:, idx]
eig_vals

array([1.28402771, 0.0490834 ])

## 7. Step 5: Projection

In [6]:
k = 1
W = eig_vecs[:, :k]
X_pca = X_centered @ W
X_pca[:5]

array([[-0.82797019],
       [ 1.77758033],
       [-0.99219749],
       [-0.27421042],
       [-1.67580142]])

## 8. Explained Variance

In [7]:
explained_variance = eig_vals / np.sum(eig_vals)
explained_variance

array([0.96318131, 0.03681869])

## 9. PCA using SVD (Preferred in Practice)

In [8]:
U, S, Vt = np.linalg.svd(X_centered)
X_pca_svd = X_centered @ Vt.T[:, :1]
X_pca_svd[:5]

array([[-0.82797019],
       [ 1.77758033],
       [-0.99219749],
       [-0.27421042],
       [-1.67580142]])

## 10. Interview & ML Facts
- PCA is unsupervised
- PCA maximizes variance
- PCA reduces rank intentionally
- SVD-based PCA is numerically stable

## 11. Summary
PCA projects data onto directions of maximum variance for efficient representation.