# **4.3 Least Square Approximations**

- When $Ax = b$ has no solution, multiply by $A^T$ and solve $A^TA\hat{x} = A^Tb$


### **Minimizing the Error**
- **Goal:** minimize the error $e = b - Ax$
  - **Geometry:** the best $\hat{x}$ satisfies $e \perp C(A)$  
  - **Algebra:** $A^TA\hat{x} = A^Tb$  
  - **Calculus:** derivative of $\|Ax - b\|^2$ is zero at $\hat{x}$

- The solution $A\hat{x} = p$ leaves the least possible error $e$  
  $$
  \|Ax - b\|^2 = \|Ax - p\|^2 + \|e\|^2
  $$

- Partial derivatives of $\|Ax - b\|^2$ are zero when $A^TA\hat{x} = A^Tb$


### **The Big Picture for Least Squares**
- When $Ax = b$ has no solutions, decompose  
  $$
  b = p + e
  $$


### **Fitting a Straight Line**
- Given $m > 2$ points $(t_i, b_i)$, fit a line $C + Dt$  
- Errors: $e_i = b_i - (C + Dt_i)$  
- Minimize
  $$
  E = e_1^2 + \cdots + e_m^2
  $$

- Two unknowns $(C, D)$ → matrix $A$ has $n = 2$ columns:
  $$
  A =
  \begin{bmatrix}
  1 & t_1 \\
  1 & t_2 \\
  \vdots & \vdots \\
  1 & t_m
  \end{bmatrix},
  \qquad
  x =
  \begin{bmatrix}
  C \\ D
  \end{bmatrix}
  $$

- Usually $b \notin C(A)$ → need least squares  
- Solve the normal equations:
  $$
  A^TA\hat{x} = A^Tb
  $$

- Compute $A^TA$:
  $$
  A^TA =
  \begin{bmatrix}
  m & \sum t_i \\
  \sum t_i & \sum t_i^2
  \end{bmatrix}
  $$

- Compute $A^Tb$:
  $$
  A^Tb =
  \begin{bmatrix}
  \sum b_i \\
  \sum t_i b_i
  \end{bmatrix}
  $$

- Least squares solution:
  $$
  \hat{x} = (A^TA)^{-1}A^Tb
  $$

- Expanded normal equations:
  $$
  \begin{bmatrix}
  m & \sum t_i \\
  \sum t_i & \sum t_i^2
  \end{bmatrix}
  \begin{bmatrix}
  C \\ D
  \end{bmatrix}
  =
  \begin{bmatrix}
  \sum b_i \\ \sum t_i b_i
  \end{bmatrix}
  $$

- Residuals:
  $$
  e = b - A\hat{x}
  $$
  $e$ is perpendicular to the columns of $A$:  
  $$
  A^Te = 0
  $$

- Error function:
  $$
  E(x) = \|Ax - b\|^2 = \sum_{i=1}^m (C + Dt_i - b_i)^2
  $$


**Key Ideas**
1. Least squares minimizes  
   $$
   \|Ax - b\|^2 = x^TA^TAx - 2x^TA^Tb + b^Tb
   $$
2. The minimizer satisfies  
   $$
   A^TA\hat{x} = A^Tb
   $$
3. Fitting a line uses the normal equations to solve for $(C, D)$  
4. Best-fit heights: $p = A\hat{x}$  
   Errors: $e = b - p$, with $A^Te = 0$  
5. When $m > n$, $Ax = b$ is unsolvable; $A^TA\hat{x} = A^Tb$ gives the least squares solution minimizing MSE