#### Recap: `least squares`

for $b=Ax$, $A\in \mathbf{R}^{m \times n}$, $m>n$, `full rank`

The least squares problem finds $x_{ls}$ that `minimizes` $$\|Ax-b\|^2$$

and we know the analytical equation

$$x_{ls}=\boxed{(A^TA)^{-1}A^Tb}$$

#### `Conditioning` problems

Conditioning refers to sensitivity of solutions $x$ and $y=Ax$ to perturbations in data $A$ and $b$

Three dimensionless parameters are used

`Condition number`

$$\kappa(A)=\frac{\sigma_{\max}}{\sigma_{\min}}=\|A\|\|A^+\|$$

where $A^+$ is the pseudoinverse, $\sigma_{\max}$ and $\sigma_{\min}$ are largest and smallest singular values, respectively

This is a generalized version of condition number for square matrices (which would be $\kappa(A)=\|A\|\|A^{-1}\|$)

$1\leq \kappa(A) \leq \infty$

`Closeness of fit`

$$\theta=\cos^{-1}\frac{\|y\|}{\|b\|}$$

$0 \leq \theta \leq \frac{\pi}{2}$

`Deviation` of $y$

$$\eta=\frac{\|A\|\|x\|}{\|y\|}=\frac{\|A\|\|x\|}{\|Ax\|}\leq\frac{\sigma_{\max}\|x\|}{\sigma_{\min}\|x\|}=\kappa(A)$$

This comes from that `smallest gain` of $A$ corresponds to the smallest singular value

$$\min_{x\neq 0} \frac{\|Ax\|}{\|x\|} = \sigma_{\min}(A) \rightarrow \|Ax\|\geq \sigma_{\min} \|x\|$$

So, we have

$1\leq \eta \leq \kappa(A)$

With these parameters, we can compute condition numbers describing sensitivities of $y$ and $x$ to perturbations in $b$ and $A$

$$\begin{bmatrix}
 & y & x \\ b & \frac{1}{\cos \theta} & \frac{\kappa(A)}{\eta\cos \theta} \\
 A & \frac{\kappa(A)}{\cos \theta} & \kappa(A) + \frac{\kappa(A)^2\tan \theta}{\eta}
\end{bmatrix}$$

#### Example from NLA book

The setup is to compare final entry of $x$, for which the true value is `1`

In [None]:
import matplotlib.pyplot as plt
import numpy as np
np.set_printoptions(formatter={'float': '{: 0.4f}'.format})

plt.style.use('dark_background')
# color: https://matplotlib.org/stable/gallery/color/named_colors.htm

In [None]:
m = 100
n = 15

t = np.linspace(0, 1, m)

# Construct Vandermonde matrix
A = np.vander(t, n, increasing=True)
b = np.exp(np.sin(4 * t)) / 2006.787453080206

print(A.shape)
print(b.shape)

(100, 15)
(100,)


In [None]:
# Use NumPy solver to obtain to sufficient accuracy the conditioning parameters
x, _, _, _ = np.linalg.lstsq(A, b, rcond=None)
y = A @ x

# Condition number
kappa = np.linalg.cond(A)
print(f"kappa = {kappa:.4e}")

# Closedness of fit (theta)
theta = np.arcsin(np.linalg.norm(b - y) / np.linalg.norm(b))
print(f"theta = {theta:.4e}")

# Deviation (eta)
eta = (np.linalg.norm(A) * np.linalg.norm(x)) / np.linalg.norm(y)
print(f"eta = {eta:.4e}")

kappa = 2.2718e+10
theta = 3.7461e-06
eta = 2.3732e+05


In [None]:
b_y = 1 / np.cos(theta)
b_x = kappa / (eta * np.cos(theta))

A_y = kappa / np.cos(theta)
A_x = kappa + (kappa**2 * np.tan(theta)) / eta

print(f"b_y = {b_y:.4e}")
print(f"b_x = {b_x:.4e}")
print(f"A_y = {A_y:.4e}")
print(f"A_x = {A_x:.4e}")

b_y = 1.0000e+00
b_x = 9.5727e+04
A_y = 2.2718e+10
A_x = 3.0864e+10


#### Analytical equation

In [None]:
x_ls = np.linalg.inv(A.T @ A) @ A.T @ b
print(f"x[15] = {x_ls[14]:.10f}")

x[15] = -0.3520387166


We see that analytical equation is clearly an unstable method for solving general least squares problems...

#### `QR factorization` and back substitution

We can also use reduced QR factorization and back substitution to solve least squares

* $A=QR$ with `Householder`, `Givens`, or `MGS`
* Update $b\leftarrow Q^Tb$
* Solve upper triangular system $Rx=b$ for $x$ with `back substitution`

##### Householder

In [None]:
def householder(A):
    m, n = A.shape
    R = A.copy()
    Q = np.identity(m)

    for i in range(n):
        x = R[i:, i]
        v = np.sign(x[0]) * np.linalg.norm(x) * np.eye(x.shape[0])[:,0] + x
        v /= np.linalg.norm(v)

        R[i:, i:] -= 2 * np.outer(v, v) @ R[i:, i:]
        Q[i:, :] -= 2 * np.outer(v, v) @ Q[i:, :]

    return Q.T, R

def back_substitution(R, b):
    m, n = R.shape
    x = np.zeros(n)
    for i in range(n - 1, -1, -1):
        x[i] = (b[i] - np.dot(R[i, i + 1:], x[i + 1:])) / R[i, i]
    return x

In [None]:
Q, R = householder(A)
b_hh = Q[:, :A.shape[1]].T @ b
x_hh = back_substitution(R[:A.shape[1],:], b_hh)
print(f"x[15] = {x_hh[14]:.10f}")

x[15] = 0.9999999830


##### Givens rotations

In [None]:
def givens_rotation(A):
    m, n = A.shape
    Q = np.identity(m)
    for col in range(n):
        for row in range(col+1, m):
            if A[row, col] != 0:
                alpha = np.sqrt(A[col, col]**2 + A[row, col]**2)
                c = A[col, col] / alpha
                s = A[row, col] / alpha

                G = np.identity(m)
                G[row, col] = -s
                G[col, row] = s
                G[col, col] = c
                G[row, row] = c

                A = G @ A
                Q = Q @ G.T

    return Q, A

In [None]:
Q, R = givens_rotation(A)
b_givens = Q[:, :A.shape[1]].T @ b
x_givens = back_substitution(R[:A.shape[1],:], b_givens)
print(f"x[15] = {x_givens[14]:.10f}")

x[15] = 1.0000000564


##### Modified Gram-Schmidt

In [None]:
def gram_schmidt(A, modified=True):
    n = A.shape[1]
    Q = np.zeros_like(A)
    R = np.zeros((n, n))

    for i in range(n):
        q = A[:, i].copy()

        for j in range(i):
            if modified:
                R[j, i] = np.dot(Q[:, j], q)
            else:
                R[j, i] = np.dot(Q[:, j], A[:, i])
            q -= R[j, i] * Q[:, j]

        R[i, i] = np.sqrt(np.dot(q, q))
        q /=  R[i, i]
        Q[:, i] = q

    return Q, R

In [None]:
Q, R = gram_schmidt(A)
b_mgs = Q.T @ b
x_mgs = back_substitution(R, b_mgs)
print(f"x[15] = {x_mgs[14]:.10f}")

x[15] = 1.0172653298


We see that both Householder and Givens work well, while MGS's result looks very poor. However, it can be stabilized by using an augmented system of equations, see NLA Theorem 19.2

Overall, one might as well just use the simplest and cheapest that is the standard `Householder`

#### Cholesky/LDLT factorization

We can also solve least squares problem using Cholesky decomposition, if A is PD

$$Ax=b \Longleftrightarrow A^TAx=A^Tb \Longleftrightarrow Bx=c \Longleftrightarrow LL^Tx=c$$

Then

* solve $Ly=c$ using `forward` substitution
* solve $L^Tx=y$ using `backward` substitution

Or, if A is symmetric and nonsingular, we use LDLT factorization

* $A^TA=LDL^T$
* Forward substitution for $Ly=A^Tb$
* Scaling for $Dz=y$
* Back substitution for $L^Tx=z$

In [None]:
def cholesky_factorization(A):
    m = A.shape[0]
    l_mat = A.copy().astype(float)

    for k in range(m):
        if l_mat[k, k] <= 0:
            raise ValueError('Matrix is not positive definite')

        # Follow the first step, iteratively apply to a smaller and smaller K
        l_mat[k+1:, k+1:] -= np.outer(l_mat[k+1:, k], l_mat[k+1:, k]) / l_mat[k, k]
        l_mat[k:, k] /= np.sqrt(l_mat[k, k])

    return np.tril(l_mat)

def ldlt_factorization(A):
    m = A.shape[0]
    l_mat = A.copy().astype(float)
    d_mat = np.zeros(m)

    for k in range(m):
        d_mat[k] = l_mat[k, k]
        if l_mat[k, k] == 0:
            raise ValueError('Matrix is singular')

        l_mat[k+1:, k+1:] -= np.outer(l_mat[k+1:, k], l_mat[k+1:, k]) / l_mat[k, k]
        l_mat[k:, k] /= l_mat[k, k]

    return np.tril(l_mat), np.diag(d_mat)

def forward_substitution(L, b):
    m, n = L.shape
    x = np.zeros(n)
    for i in range(n):
        x[i] = (b[i] - np.dot(L[i, :i], x[:i])) / L[i, i]
    return x

In [None]:
try:
    L = cholesky_factorization(A.T@A)
except ValueError as e:
    print(e)

Matrix is not positive definite


Use Cholesky, we see that $A^TA$ is not PD

So, we try LDLT factorization

In [None]:
try:
    L, D = ldlt_factorization(A.T@A)
    y = forward_substitution(L, A.T@b)
    z = y / np.diag(D)
    x = back_substitution(L.T, z)
    print(f"x[15] = {x[14]:.10f}")
except ValueError as e:
    print(e)

x[15] = 0.7396039357


We see the result is terrible

#### LU with partial pivoting

We can also try LU with partial pivoting on $A^TA$

We first find $P$, $L$, $U$ such that $PA^TA=LU$

Then, we rewrite $A^TAx=A^Tb$ as $LUx=PA^Tb$ and

* solve $Ly=PA^Tb$ using `forward` substitution
* solve $Ux=y$ using `backward` substitution

In [None]:
def lu_factorization(A):
    m, n = A.shape
    u_mat = A.copy().astype(float)
    l_mat = np.identity(m)
    p_mat = np.identity(m)

    for k in range(m-1):
        # Find pivot
        pivot = np.argmax(np.abs(u_mat[k:, k])) + k

        if pivot != k:
            # Swap rows in u, p, and l
            u_mat[[k, pivot], :] = u_mat[[pivot, k], :]
            p_mat[[k, pivot], :] = p_mat[[pivot, k], :]
            l_mat[[k, pivot], :k] = l_mat[[pivot, k], :k]

        for j in range(k + 1, m):
            l_mat[j, k] = u_mat[j, k] / u_mat[k, k]
            # Subtract multiply of kth row from jth row
            u_mat[j, k:] -= l_mat[j, k] * u_mat[k, k:]

    return p_mat, l_mat, u_mat

In [None]:
p, l, u = lu_factorization(A.T@A)
y = forward_substitution(l, p@A.T@b)
x = back_substitution(u, y)
print(f"x[15] = {x[14]:.10f}")

x[15] = -0.5849236441


The result is also terrible

#### Ill-conditionedness?

To see whether the poor performance of Cholesky/LDLT/LU is related to the ill-conditionedness of $A^TA$, we can generate a random problem

In [None]:
np.random.seed(42)

A_rand = np.random.randn(100, 15)
x = np.random.randn(15)
b = A_rand @ x

Cholesky

In [None]:
L = cholesky_factorization(A_rand.T @ A_rand)
c_ch = A_rand.T @ b
y_ch = forward_substitution(L, c_ch)
x_ch = back_substitution(L.T, y_ch)
print(x_ch)
print(x)
print(np.linalg.norm(x_ch - x))

[ 0.7784 -0.5512 -0.8182 -0.0034 -0.1702 -0.4532  0.6964  0.9553  0.0884
  1.4775 -1.1417 -0.1937 -0.7168 -1.8665 -0.0827]
[ 0.7784 -0.5512 -0.8182 -0.0034 -0.1702 -0.4532  0.6964  0.9553  0.0884
  1.4775 -1.1417 -0.1937 -0.7168 -1.8665 -0.0827]
2.2499811537835648e-15


LDLT

In [None]:
L, D = ldlt_factorization(A_rand.T @ A_rand)
y = forward_substitution(L, A_rand.T @ b)
z = y / np.diag(D)
x_ldlt = back_substitution(L.T, z)
print(x_ldlt)
print(x)
print(np.linalg.norm(x_ldlt - x))

[ 0.7784 -0.5512 -0.8182 -0.0034 -0.1702 -0.4532  0.6964  0.9553  0.0884
  1.4775 -1.1417 -0.1937 -0.7168 -1.8665 -0.0827]
[ 0.7784 -0.5512 -0.8182 -0.0034 -0.1702 -0.4532  0.6964  0.9553  0.0884
  1.4775 -1.1417 -0.1937 -0.7168 -1.8665 -0.0827]
2.2304773745768605e-15


LU with partial pivoting

In [None]:
p, l, u = lu_factorization(A_rand.T @ A_rand)
y = forward_substitution(l, p @ A_rand.T @ b)
x_lu = back_substitution(u, y)
print(x_lu)
print(x)
print(np.linalg.norm(x_lu - x))

[ 0.7784 -0.5512 -0.8182 -0.0034 -0.1702 -0.4532  0.6964  0.9553  0.0884
  1.4775 -1.1417 -0.1937 -0.7168 -1.8665 -0.0827]
[ 0.7784 -0.5512 -0.8182 -0.0034 -0.1702 -0.4532  0.6964  0.9553  0.0884
  1.4775 -1.1417 -0.1937 -0.7168 -1.8665 -0.0827]
7.670464870296537e-16


We see that all three methods performed very well

Therefore, it is likely that computing $A^TA$ can amplify ill-conditionness of $A$ if $A$ is ill-conditioned

As a result, for least squares problem here, we should just stay with QR (Householder) if we want to use factorization-based method

| **Method**                | **Solves Least Squares?** | **Notes on Numerical Stability**              |
|---------------------------|---------------------------|-----------------------------------------------|
| **Householder**| **Yes**                  | Numerically stable (does not form $A^TA$) |
| **Cholesky/LDLT**              | **Yes** (via $A^TA$)| Less stable due to forming $A^TA$       |
| **LU**  | **Yes** (via $A^TA$)| Less stable due to forming $A^TA$       |
