
## 🧠 PCA Math Intuition (Step by Step)

### 1. Problem Setup

* We have **data in high dimensions** (say 2D: x and y).
* Goal: reduce dimensions while keeping as much **variance (information)** as possible.
* Why variance? Because high variance means the data is "spread out" along that axis → more information is preserved.

---

### 2. Projection

* Suppose we want to reduce 2D → 1D.
* Pick a direction (a **unit vector** `u`).
* Project each data point (vector `p₁`, `p₂`, …) onto `u`.
* Projection formula:

  $$
  p_1' = (p_1 \cdot u)
  $$

  where `·` is the dot product.
* This gives us scalar values (distances along `u`).

---

### 3. Variance as the Objective

* After projection, we compute the **variance of these scalar projections**:

  $$
  \text{Var}(u) = \frac{1}{n}\sum_{i=1}^n \big((p_i \cdot u) - \overline{p \cdot u}\big)^2
  $$
* The **best direction `u`** is the one that **maximizes this variance**.
* So PCA’s optimization problem is:

  $$
  \max_{u}\ \text{Var}(u) \quad \text{subject to } \|u\|=1
  $$

---

### 4. Why Not Guess Directions? → Eigenvectors

* We can’t just test infinite possible `u`’s.
* Linear algebra gives us the answer via **covariance matrix** and **eigen decomposition**.

---

### 5. Covariance Matrix

* Build covariance matrix `Σ` from features:

  $$
  Σ = \frac{1}{n} X^T X
  $$

  (after mean-centering X).
* `Σ[i,j]` tells how features `i` and `j` vary together.
* Captures structure of data spread.

---

### 6. Eigen Decomposition

* Solve:

  $$
  Σv = \lambda v
  $$
* Where:

  * `v` = **eigenvector** (direction in feature space).
  * `λ` = **eigenvalue** (amount of variance captured along `v`).

---

### 7. Selecting Principal Components

* Sort eigenvalues:

  $$
  \lambda_1 \geq \lambda_2 \geq \dots \geq \lambda_d
  $$
* Corresponding eigenvectors = **Principal Components**.
* PC1 = direction of max variance (`largest λ`).
* PC2 = next orthogonal direction of variance (`second λ`), and so on.
* Orthogonality ensures PCs are **uncorrelated**.

---

### 8. Dimensionality Reduction

* To reduce to `k` dimensions:

  * Take top `k` eigenvectors.
  * Form projection matrix `W = [v_1, v_2, …, v_k]`.
  * Transform data:

    $$
    Z = XW
    $$
* `Z` is new data in lower dimension (max variance preserved).

---

✅ **Summary in words**:
PCA finds the **directions (principal components)** where data varies the most, using eigen decomposition of the covariance matrix. The top eigenvectors form the new axes, and projecting onto them gives compressed data with minimal information loss.

