#### Recap: `least squares`

for $b=Ax$, $A\in \mathbf{R}^{m \times n}$, $m>n$, `full rank`

The least squares problem finds $x_{ls}$ that `minimizes` $$\|Ax-b\|^2$$

and we know the analytical equation

$$x_{ls}=\boxed{(A^TA)^{-1}A^Tb}$$

#### `Conditioning` problems

Conditioning refers to sensitivity of solutions $x$ and $y=Ax$ to perturbations in data $A$ and $b$

Three dimensionless parameters are used

`Condition number`

$$\kappa(A)=\frac{\sigma_{\max}}{\sigma_{\min}}=\|A\|\|A^+\|$$

where $A^+$ is the pseudoinverse, $\sigma_{max}$ and $\sigma_{min}$ are largest and smallest singular values, respectively

This is a generalized version of condition number for square matrices (which would be $\kappa(A)=\|A\|\|A^{-1}\|$)

$1\leq \kappa(A) \leq \infty$

`Closedness of fit`

$$\theta=\cos^{-1}\frac{\|y\|}{\|b\|}$$

$0 \leq \theta \leq \frac{\pi}{2}$

`Deviation` of $y$

$$\eta=\frac{\|A\|\|x\|}{\|y\|}=\frac{\|A\|\|x\|}{\|Ax\|}\leq\frac{\sigma_{\max}\|x\|}{\sigma_{\min}\|x\|}=\kappa(A)$$

This comes from that `smallest gain` of $A$ corresponds to the smallest singular value

$$\min_{x\neq 0} \frac{\|Ax\|}{\|x\|} = \sigma_{\min}(A) \rightarrow \|Ax\|\geq \sigma_{\min} \|x\|$$

So, we have

$1\leq \eta \leq \kappa(A)$

With these parameters, we can compute condition numbers describing sensitivities of $y$ and $x$ to perturbations in $b$ and $A$

$$\begin{bmatrix}
 & y & x \\ b & \frac{1}{\cos \theta} & \frac{\kappa(A)}{\eta\cos \theta} \\
 A & \frac{\kappa(A)}{\cos \theta} & \kappa(A) + \frac{\kappa(A)^2\tan \theta}{\eta}
\end{bmatrix}$$

#### Example from NLA book

The setup is to compare final entry of $x$, for which the true value is `1`

In [73]:
import matplotlib.pyplot as plt
import numpy as np
np.set_printoptions(formatter={'float': '{: 0.4f}'.format})

plt.style.use('dark_background')
# color: https://matplotlib.org/stable/gallery/color/named_colors.htm

In [74]:
m = 100
n = 15

t = np.linspace(0, 1, m)

# Construct Vandermonde matrix
A = np.vander(t, n, increasing=True)
b = np.exp(np.sin(4 * t)) / 2006.787453080206

print(A.shape)
print(b.shape)

(100, 15)
(100,)


In [75]:
# Use NumPy solver to obtain to sufficient accuracy the conditioning parameters
x, _, _, _ = np.linalg.lstsq(A, b, rcond=None)
y = A @ x

# Condition number
kappa = np.linalg.cond(A)
print(f"kappa = {kappa:.4e}")

# Closedness of fit (theta)
theta = np.arcsin(np.linalg.norm(b - y) / np.linalg.norm(b))
print(f"theta = {theta:.4e}")

# Deviation (eta)
eta = (np.linalg.norm(A) * np.linalg.norm(x)) / np.linalg.norm(y)
print(f"eta = {eta:.4e}")

kappa = 2.2718e+10
theta = 3.7461e-06
eta = 2.3732e+05


In [76]:
b_y = 1 / np.cos(theta)
b_x = kappa / (eta * np.cos(theta))

A_y = kappa / np.cos(theta)
A_x = kappa + (kappa**2 * np.tan(theta)) / eta

print(f"b_y = {b_y:.4e}")
print(f"b_x = {b_x:.4e}")
print(f"A_y = {A_y:.4e}")
print(f"A_x = {A_x:.4e}")

b_y = 1.0000e+00
b_x = 9.5727e+04
A_y = 2.2718e+10
A_x = 3.0864e+10


#### Analytical equation

In [77]:
x_ls = np.linalg.inv(A.T @ A) @ A.T @ b
print(f"x[15] = {x_ls[14]:.10f}")

x[15] = -0.3520387166


We see that analytical equation is clearly an unstable method for solving general east squares problems...

#### `QR factorization` and back substitution

We can also use reduced QR factorization and back substitution to solve least squares

* $A=QR$ with `Householder`, `Givens`, or `MGS`
* Update $b\leftarrow Q^Tb$
* Solve upper triangular system $Rx=b$ for $x$ with `back substitution`

##### Householder

In [78]:
def householder(A):
    m, n = A.shape
    R = A.copy()
    Q = np.identity(m)

    for i in range(n):
        x = R[i:, i]
        v = np.sign(x[0]) * np.linalg.norm(x) * np.eye(x.shape[0])[:,0] + x
        v /= np.linalg.norm(v)

        R[i:, i:] -= 2 * np.outer(v, v) @ R[i:, i:]
        Q[i:, :] -= 2 * np.outer(v, v) @ Q[i:, :]

    return Q.T, R

def back_substitution(R, b):
    m, n = R.shape
    x = np.zeros(n)
    for i in range(n - 1, -1, -1):
        x[i] = (b[i] - np.dot(R[i, i + 1:], x[i + 1:])) / R[i, i]
    return x

In [79]:
Q, R = householder(A)
b_qr = Q.T @ b
x_qr = back_substitution(R, b_qr)
print(f"x[15] = {x_qr[14]:.10f}")

x[15] = 0.9999999830


##### Givens rotations

In [80]:
def givens_rotation(A):
    m, n = A.shape
    Q = np.identity(m)
    for col in range(n):
        for row in range(col+1, m):
            if A[row, col] != 0:
                alpha = np.sqrt(A[col, col]**2 + A[row, col]**2)
                c = A[col, col] / alpha
                s = A[row, col] / alpha

                G = np.identity(m)
                G[row, col] = -s
                G[col, row] = s
                G[col, col] = c
                G[row, row] = c

                A = G @ A
                Q = Q @ G.T

    return Q, A

In [81]:
Q, R = givens_rotation(A)
b_givens = Q.T @ b
x_givens = back_substitution(R, b_givens)
print(f"x[15] = {x_givens[14]:.10f}")

x[15] = 1.0000000564


##### Modified Gram-Schmidt

In [82]:
def gram_schmidt(A, modified=True):
    n = A.shape[1]
    Q = np.zeros_like(A)
    R = np.zeros((n, n))

    for i in range(n):
        q = A[:, i].copy()

        for j in range(i):
            if modified:
                R[j, i] = np.dot(Q[:, j], q)
            else:
                R[j, i] = np.dot(Q[:, j], A[:, i])
            q -= R[j, i] * Q[:, j]

        R[i, i] = np.sqrt(np.dot(q, q))
        q /=  R[i, i]
        Q[:, i] = q

    return Q, R

In [83]:
Q, R = gram_schmidt(A)
b_mgs = Q.T @ b
x_mgs = back_substitution(R, b_mgs)
print(f"x[15] = {x_mgs[14]:.10f}")

x[15] = 1.0172653298


We see that both Householder and Givens work well, while MSG's result looks very poor. However, it can be stablized by using an augmented system of equations, see NLA Theorem 19.2

Overall, one might as well use just the simplest and cheapest that is the standard `Householder`