---
# Section 3.3: Solution of the Least Squares Problem
---

The least square problem is

$$
\min_x \|b - Ax\|_2
$$

where $A \in \mathbb{R}^{m \times n}$, $b \in \mathbb{R}^m$, and $m > n$ (i.e., linear system $Ax = b$ is overdetermined).

---

> ### Theorem: ($QR$ for $m \times n$ matrix $A$)
>
> Let $A \in \mathbb{R}^{m \times n}$, $m > n$. Then:
> 1. There exists an orthogonal matrix $Q \in \mathbb{R}^{m \times m}$ and 
>
>   $$R = \begin{bmatrix} \hat{R} \\ 0 \end{bmatrix} \in \mathbb{R}^{m \times n}$$
>
>   with $\hat{R} \in \mathbb{R}^{n \times n}$ upper-triangular such that 
> $ A = QR$. 
>
> 2. The matrix $A$ has **full column rank** (i.e., $\mathrm{rank}(A) = n$) if and only if $\hat{R}$ is nonsingular.

---

### Using $QR$ to solve the least squares problem

First factor $A = QR$. Then,

$$
\|b - Ax\|_2 = \left\|Q^Tb - Rx\right\|_2.
$$

Let $c = Q^Tb$ and partition $c$ conformably with $R = \begin{bmatrix} \hat{R} \\ 0 \end{bmatrix}$; i.e.,
$
c = \begin{bmatrix} \hat{c} \\ d \end{bmatrix},
$
where $\hat{c} \in \mathbb{R}^n$.


Then

$$
\|b - Ax\|_2^2 = \left\|\begin{bmatrix} \hat{c} - \hat{R}x \\ d \end{bmatrix}\right\|_2^2
= \left\|\hat{c} - \hat{R}x\right\|_2^2 + \|d\|_2^2 \geq \|d\|_2^2, \qquad \text{for all $x \in \mathbb{R}^n$.}
$$

Therefore,

$$
\min_x \|b - Ax\|_2 \geq \|d\|_2.
$$

---

## Full column rank case

When $A$ has full column rank, the matrix $\hat{R}$ is nonsingular.

Let $x$ be the **unique** solution of the upper-triangular system 

$$\hat{R}x = \hat{c}.$$

Then

$$
\|b - Ax\|_2^2 
= \left\|\hat{c} - \hat{R}x\right\|_2^2 + \|d\|_2^2 = \|d\|_2^2.
$$

Thus, the vector $x$ achieves the minimum possible value and is the **unique** solution of the least squares problem.

---

## Algorithm to solve least squares when $A$ has full column rank

1. Compute the $QR$ decomposition of $A$, but do not form $Q$.
2. Compute $c = Q^Tb$.
3. Solve $\hat{R} x = \hat{c}$, where $\hat{c} = c[1:n]$.


---

## Example

In [None]:
using LinearAlgebra

m, n = 5, 2
A = rand(m, n)
x = rand(n)
b = A*x + .1*randn(m)
A\b

In [None]:
F = qr(A)

Q = F.Q

c = Q'*b  # Done efficiently without forming Q

In [None]:
Rhat = F.R
xhat = Rhat\c[1:n]

In [None]:
norm(b - A*xhat)

In [None]:
d = c[n+1:m]
norm(d)

---

## Rank-deficient case

If $A \in \mathbb{R}^{m \times n}$, $m > n$, is not full column rank because its columns are linearly dependent, 

$$
\mathrm{rank}(A) = r < n,
$$

then the $QR$ decomposition is

$$
\hat{A} = Q R, \qquad R = \begin{bmatrix} R_{11} & R_{12} \\ 0 & 0 \end{bmatrix},
$$

where $R_{11} \in \mathbb{R}^{r \times r}$ is upper-triangular and nonsingular, and $\hat{A}$ is $A$ with its columns permuted.

The pivoted $QR$ factorization of $A$ reorders the columns of $A$ so that

$$
|r_{11}| \geq |r_{22}| \geq \cdots \geq |r_{nn}|.
$$

This reordering can be represented as a vector $p$ or as a permutation matrix $P$ such that

$$
A[:,p] = QR,\qquad AP = QR.
$$

The matrix $P$ is the identity matrix $I$ with columns reordered according to $p$,

$$
P = I[:,p].
$$

---

## pivoted `qr`

In [None]:
?qr

In [None]:
B = rand(1:10, 5, 2)*diagm([1, 10])
A = Float64[B B*rand(1:3, 2, 2)]

In [None]:
r = rank(A)

In [None]:
pivot = Val(true)
Q, R, p = qr(A, pivot)

In [None]:
R11 = R[1:2, 1:2]

In [None]:
R12 = R[1:2, 3:4]

In [None]:
A[:,p] - Q*R

In [None]:
m, n = size(A)
Id = Matrix(I, n, n)
P = Id[:,p]

In [None]:
A*P - Q*R

---

## Rank-deficient least squares

Suppose that the pivoted $QR$ decomposition of $A$ is $AP = QR$, and that $\mathrm{rank}(A) = r < n$, so that

$$
R = \begin{bmatrix} R_{11} & R_{12} \\ 0 & 0 \end{bmatrix},
$$

where $R_{11}$ is $r \times r$, upper-triangular, and nonsingular.

Then

$$
\|b - Ax\|_2^2 = \|\hat{c} - R_{11}\hat{x}_1 - R_{12}\hat{x}_2\|_2^2 + \|d\|_2^2 \geq \|d\|_2^2,
$$

where 

$$
\hat{x} = P^T x = \begin{bmatrix} \hat{x}_1 \\ \hat{x}_2 \end{bmatrix}, \qquad 
\begin{bmatrix} \hat{c} \\ d \end{bmatrix}  = Q^T b.
$$

Also, $\|b - Ax\|_2^2 = \|d\|_2^2$ when

$$
R_{11}\hat{x}_1 + R_{12}\hat{x}_2 = \hat{c}.
$$

Thus, $\hat{x}_1$ are the dependent variables and $\hat{x}_2$ are the independent variables, and

$$
\hat{x}_1 = R_{11}^{-1}\left(\hat{c} - R_{12} \hat{x}_2\right).
$$

---

### Sparse solution:

$$
\hat{x}_2 = 0 \quad \Rightarrow \quad \hat{x}_1 = R_{11}^{-1}\hat{c}
$$

In [None]:
b = rand(m)

c = Q'b
ĉ = c[1:r]

x = zeros(n)
x[p[1:r]] = R11\ĉ
x

In [None]:
norm(b - A*x)

---

### Least-norm solution:

Let $S$ be the set of optimal solutions of the least-squares problem:

$$
S = \operatorname*{argmin}_x \|b - Ax\|_2.
$$

Then the least-norm solution is found by solving

$$
\min_{x \in S} \|x\|_2 = \|\hat{x}\|_2 
= \left\| \begin{bmatrix} R_{11}^{-1}\hat{c} \\ 0 \end{bmatrix} - 
\begin{bmatrix} R_{11}^{-1} R_{12} \\ I \end{bmatrix} \hat{x}_2 \right\|_2,
$$

which has a unique solution $\hat{x}_2$ since the matrix 

$$
\begin{bmatrix} R_{11}^{-1} R_{12} \\ I \end{bmatrix}
$$

has full column rank.

In [None]:
# The least-squares solution given by \ is the least-norm solution

x = A\b

In [None]:
# Using the formulas from above
x = zeros(n)
x[p[r+1:n]] = [R11\R12; I]\[R11\ĉ; zeros(n-r)]
x[p[1:r]] = R11\(ĉ - R12*x[p[r+1:n]])
x

In [None]:
norm(b - A*x)

---