#### Least squares

For $b=Ax$, $A\in \mathbf{R}^{m \times n}$, $m>n$, **`full rank`**

The least squares problem finds $x_{ls}$ that `minimizes` $$\|Ax-b\|^2$$

and we know the analytical equation

$$x_{ls}=(A^TA)^{-1}A^Tb$$

We would like to see how different QR factorization-based algorithms perform in solving an ill-conditioned least squares problem

#### Conditioning parameters

Conditioning refers to sensitivity of `solutions` $x$ and $y=Ax$ to perturbations in `data` $A$ and $b$

Three dimensionless parameters are used

`Condition number`

$$\kappa(A)=\frac{\sigma_{\max}}{\sigma_{\min}}=\|A\|\|A^+\|$$

where $A^+$ is the pseudoinverse, $\sigma_{\max}$ and $\sigma_{\min}$ are largest and smallest singular values, respectively

This is a generalized version of condition number for square matrices (which would be $\kappa(A)=\|A\|\|A^{-1}\|$)

$1\leq \kappa(A) \leq \infty$

`Closeness of fit`

$$\theta=\cos^{-1}\frac{\|y\|}{\|b\|}$$

$0 \leq \theta \leq \frac{\pi}{2}$

How much $\|y\|$ falls short of its `maximum possible value`, given $\|A\|, \|x\|$

$$\eta=\frac{\|A\|\|x\|}{\|y\|}=\frac{\|A\|\|x\|}{\|Ax\|}\leq\frac{\sigma_{\max}\|x\|}{\sigma_{\min}\|x\|}=\kappa(A)$$

This comes from that smallest gain of $A$ corresponds to the smallest singular value

$$\min_{x\neq 0} \frac{\|Ax\|}{\|x\|} = \sigma_{\min}(A) \rightarrow \|Ax\|\geq \sigma_{\min} \|x\|$$

So, we have

$1\leq \eta \leq \kappa(A)$

With these parameters, we can compute (relative) `condition numbers` describing sensitivities of $y$ and $x$ to perturbations in $b$ and $A$

$$\begin{bmatrix}
 & y & x \\ b & \frac{1}{\cos \theta} & \frac{\kappa(A)}{\eta\cos \theta} \\
 A & \frac{\kappa(A)}{\cos \theta} & \kappa(A) + \frac{\kappa(A)^2\tan \theta}{\eta}
\end{bmatrix}$$

#### A bit more details

Recall relative condition number is defined as

$$\kappa = \sup_{\delta x}\left(\frac{\|\delta f\|}{\|f(x)\|}\left.\right/\frac{\|\delta x\|}{\|x\|}\right)$$

In case $f$ is differentiable, we have

$$\kappa = \frac{\|J(x)\|}{\|f(x)\|/\|x\|}$$

##### Sensitivity $b\rightarrow y$

Since $y=Ax, x=A^+b$, we have $y=AA^+b=A(A^TA)^{-1}A^Tb$

Notice that $A(A^TA)^{-1}A^T$ is a projection matrix and we know, for a nonzero vector $v\in \mathbf{R}^m$, there are only three possibilities

* if $v\in R(A)$, $A(A^TA)^{-1}A^Tv=v$ and therefore, eigenvalue is 1
* if $v\in R(A)^{\perp}$, $A(A^TA)^{-1}A^Tv=0$ and therefore, eigenvalue is 0
* if $v$ has components in both $R(A)$ and $R(A)^{\perp}$
    * we write $v=v_{R(A)}+v_{R(A)^{\perp}}$, then $A(A^TA)^{-1}A^Tv=v_{R(A)}$
    * this does not introduce any new eigenvalue since $v$ is not an eigenvector here

Therefore, there are no eigenvalues other than 1 and 0 for $A(A^TA)^{-1}A^T$

As a result, the matrix norm of projection matrix is

$$\|A(A^TA)^{-1}A^T\|=1$$

Now, we can write sensitivity of $y$ to perturbations in $b$ as

$$\kappa_{b\rightarrow y}=\frac{\|A(A^TA)^{-1}A^T\|}{\|y\|/\|b\|}=\frac{1}{\cos \theta}$$

##### Sensitivity $b\rightarrow x$

$$\kappa_{b\rightarrow x}=\frac{\|A^+\|}{\|x\|/\|b\|}=\|A^+\|\frac{\|b\|}{\|y\|}\frac{\|y\|}{\|x\|}=\|A^+\|\frac{1}{\cos \theta}\frac{\|A\|}{\eta}=\frac{\kappa(A)}{\eta \cos \theta}$$

Derivation of other two sensitivity metrics: sensitivity $A\rightarrow y$ and $A\rightarrow x$, see NLA book

#### Problem setup

This is a problem from NLA book, where $A$ is ill-conditioned

The setup is to compare final entry of $x$, for which the true value is `1`

In [64]:
import matplotlib.pyplot as plt
import numpy as np
np.set_printoptions(formatter={'float': '{: 0.4f}'.format})
np_seed = 41

plt.style.use('dark_background')
# color: https://matplotlib.org/stable/gallery/color/named_colors.htm

In [65]:
m = 100
n = 15

t = np.linspace(0, 1, m)

# Vandermonde
A_1 = np.vander(t, n, increasing=True)
cond_num = np.linalg.cond(A_1)
print(f"Condition number: {cond_num:.4e}")
b_1 = np.exp(np.sin(4 * t)) / 2006.787453080206

print(A_1.shape)
print(b_1.shape)

Condition number: 2.2718e+10
(100, 15)
(100,)


We first use NumPy solver to get sufficiently accurate result

The NumPy solver is based on QR, therefore, its result serves as a reference to compare different QR algorithms

In [66]:
# Use NumPy solver to obtain to sufficient accuracy the conditioning parameters
x, _, _, _ = np.linalg.lstsq(A_1, b_1, rcond=None)
y = A_1 @ x

# Condition number
kappa = np.linalg.cond(A_1)
print(f"kappa = {kappa:.4e}")

# Closedness of fit (theta)
theta = np.arcsin(np.linalg.norm(b_1 - y) / np.linalg.norm(b_1))
print(f"theta = {theta:.4e}")

# Deviation (eta)
eta = (np.linalg.norm(A_1) * np.linalg.norm(x)) / np.linalg.norm(y)
print(f"eta = {eta:.4e}")

kappa = 2.2718e+10
theta = 3.7461e-06
eta = 2.3732e+05


In [67]:
b_y = 1 / np.cos(theta)
b_x = kappa / (eta * np.cos(theta))

A_y = kappa / np.cos(theta)
A_x = kappa + (kappa**2 * np.tan(theta)) / eta

print(f"b_y = {b_y:.4e}")
print(f"b_x = {b_x:.4e}")
print(f"A_y = {A_y:.4e}")
print(f"A_x = {A_x:.4e}")

b_y = 1.0000e+00
b_x = 9.5727e+04
A_y = 2.2718e+10
A_x = 3.0864e+10


#### QR factorization and back substitution

We now explore `reduced` QR factorization (since $m>n$) and back substitution to solve least squares

* $A=QR$ with `Householder`, `Givens`, or `MGS`
* Update $b\leftarrow Q^Tb$
* Solve upper triangular system $Rx=b$ for $x$ with `back substitution`

##### Householder

In [68]:
def householder(A):
    m, n = A.shape
    R = A.copy()
    Q = np.identity(m)

    # For fat matrices, we only process up to the mth column
    for i in range(min(m, n)):
        x = R[i:, i]
        if np.allclose(x, 0):
            continue  # Skip if x is a zero vector

        v = x.copy()
        sng = np.sign(x[0]) if x[0] != 0 else 1.0 # Per convention in NLA
        v[0] += sng * np.linalg.norm(x)
        v /= np.linalg.norm(v)

        # Since all entries in R[i:, :i] are zero from previous iteration
        # applying transformation to R[i:, i:] would suffice
        R[i:, i:] -= 2 * np.outer(v, v) @ R[i:, i:]

        # If Q is needed explicitly
        Q[i:, :] -= 2 * np.outer(v, v) @ Q[i:, :]

    return Q.T, R

def back_substitution(R, b):
    m, n = R.shape
    x = np.zeros(n)
    for i in range(n - 1, -1, -1):
        x[i] = (b[i] - np.dot(R[i, i + 1:], x[i + 1:])) / R[i, i]
    return x

In [69]:
Q, R = householder(A_1)
print(Q.shape)
print(R.shape)
b_hh = Q[:, :A_1.shape[1]].T @ b_1
x_hh = back_substitution(R[:A_1.shape[1],:], b_hh)
print(f"x[15] = {x_hh[14]:.10f}")

(100, 100)
(100, 15)
x[15] = 0.9999999830


##### Givens rotations

In [70]:
def givens_rotation(A):
    m, n = A.shape
    Q = np.identity(m)
    for col in range(min(m, n)):
        for row in range(col+1, m):
            if A[row, col] != 0:
                alpha = np.sqrt(A[col, col]**2 + A[row, col]**2)
                c = A[col, col] / alpha
                s = A[row, col] / alpha

                G = np.identity(m)
                G[row, col] = -s
                G[col, row] = s
                G[col, col] = c
                G[row, row] = c

                A = G @ A
                Q = Q @ G.T

    return Q, A

In [71]:
Q, R = givens_rotation(A_1)
print(Q.shape)
print(R.shape)
b_givens = Q[:, :A_1.shape[1]].T @ b_1
x_givens = back_substitution(R[:A_1.shape[1],:], b_givens)
print(f"x[15] = {x_givens[14]:.10f}")

(100, 100)
(100, 15)
x[15] = 1.0000000564


##### Modified Gram-Schmidt

In [72]:
def general_gram_schmidt(A, modified=True):
    _, k = A.shape  # Get number of vectors (columns) in A
    Q = []  # Start with empty list, as we don't know how many q's are there
    R = np.zeros((0, k))  # Same here

    for i in range(k):
        # Loop over all a_i
        q = A[:, i].copy()

        # This skips when i=0
        for j in range(len(Q)):
            if modified:
                R[j, i] = np.dot(Q[j], q)
            else:
                R[j, i] = np.dot(Q[j], A[:, i])
            q -= R[j, i] * Q[j]

        # Compute norm of new q
        norm_q = np.sqrt(np.dot(q, q))

        # Only add q to Q if it is not small
        if norm_q > 1e-10:  # Tolerance
            q /= norm_q
            Q.append(q)

            # Expand R to include new row corresponding to new q
            new_row = np.zeros((1, k))
            new_row[0, i] = norm_q
            R = np.vstack([R, new_row])

    Q = np.column_stack(Q)  # Convert to array

    return Q, R

In [73]:
Q, R = general_gram_schmidt(A_1) # Reduced QR by construction
print(Q.shape)
print(R.shape)
b_mgs = Q.T @ b_1
x_mgs = back_substitution(R, b_mgs)
print(f"x[15] = {x_mgs[14]:.10f}")

(100, 15)
(15, 15)
x[15] = 0.9796910998


We see that both Householder and Givens work well, while MGS's result looks poor. However, it can be stabilized by using an augmented system of equations, see NLA Theorem 19.2

#### Note

Since the matrix is very ill-conditioned, it is essentially underdetermined and an infinite number of $x$ can reconstruct $b$ with similar accuracy

Therefore, we should not compare solution $x$ between QR-based solver with other solvers, since there is no guarantee they lead to same "types" of solutions

This example does not show that QR finds the "true" solution $x$, since there is no true solution for underdetermined system. The purpose is to show numerical stability among different QR algorithms