# ECE 3 - Lab 9

# Least Squares Introduction








$\color{#EF5645}{\text{Definition}}$: Given a $m \times n$ matrix $A$ and $m$-vector $b$, the **Least Squares** problem is the problem of choosing an $n$-vector $x$ to minimize:
$$‖Ax − b‖^2.$$

- If $\hat x$ is a solution of the linear equation $Ax = b$, then $\hat x$ is a solution of the least square problem. The converse is not true. (Explain Why)

## Exercise 1
Show $$\hat x = (A^T A)^{-1} A^Tb = A^\dagger b$$ is the unique solution to the least square problem.

## Solution
$||Ax - b||^2 = x^TA^TAx - b^TAx - x^TA^Tb + b^Tb$.

Its derivative with respect to $x$ is $2x^TA^TA - 2b^TA$.

Set the derivative to $0$ we get $x^TA^TA = b^TA$.

Take the transpose of both sides of the equation: $A^T A x = A^T b$.

$A^TA$ is invertible (explain why)

Thus $\hat x = (A^T A)^{-1} A^Tb$.

## Exercise 2
If $A$ has $QR$ factorization $A = QR$, show that $A\hat{x} = QQ^Tb$.

## Solution
$A\hat{x} = QR(R^TQ^TQR)^{-1}R^TQ^Tb = QRIR^TQ^Tb = QIQ^Tb = Q$

# Least Square Data Fitting


Given
$$x^{(1)}, . . . , x^{(N)}, y^{(1)}, . . . , y^{(N)}$$

We try to find the function $\hat{f}$ that maps $x$ to $y$.
$$\hat{f}(x) = \theta_1 f_1(x) + ... + \theta_p f_p(x)$$

To measure how good the $\hat{f}$ we found is, we use $\bf{residual}/error$. The best $\hat{f}$ is the one giving the smallest error.
$$r_i = y^{(i)} - \hat{y}^{(i)}.$$

The problem is called "Least Square" because we use the squared of the residual rather than the residual itself so that we do not worry about negative versus positive residual.

Formulated as a least square problem:

Define the $N \times p$ matrix $A$ with elements $A_{ij} = f_j(x^{(i)})$, such that $\hat y =A \theta$. The least square data fitting problem amounts to choose $\theta$ that minimizes:
$$||A\theta - y||^2$$.

## Exercise 2
Given data points $(-0.5,6), (0,3), (1.1,0), (1.6,0),(2.5,3.2)$ and $\hat{f} = \theta_1x^2 + \theta_2x + \theta_3$. Use Python to find the optimal $\theta$.

In [None]:
import numpy as np
y = np.array([[6],[3],[0],[0],[3.2]])
A = np.array([[(-0.5)**2, -0.5, 1],
              [0**2,        0,  1],
              [1.1**2,     1.1, 1],
              [1.6**2,     1.6, 1],
              [2.5**2,     2.5, 1]])
theta = np.linalg.inv(np.transpose(A)@A)@np.transpose(A)@y
theta

array([[ 2.05490805],
       [-5.06380544],
       [ 2.97919598]])