---
# Section 3.3: Solution of the Least Squares Problem
---

The least square problem is

$$
\min_x \|b - Ax\|_2
$$

where $A \in \mathbb{R}^{m \times n}$, $b \in \mathbb{R}^m$, and $m > n$ (i.e., linear system $Ax = b$ is overdetermined).

---

> ### Theorem: ($QR$ for $m \times n$ matrix $A$)

> Let $A \in \mathbb{R}^{m \times n}$, $m > n$. Then:
> 1. There exists an orthogonal matrix $Q \in \mathbb{R}^{m \times m}$ and 
> $$R = \begin{bmatrix} \hat{R} \\ 0 \end{bmatrix} \in \mathbb{R}^{m \times n}$$
> with $\hat{R} \in \mathbb{R}^{n \times n}$ upper-triangular such that 
> $ A = QR$. 
> 2. The matrix $A$ has **full column rank** (i.e., $\mathrm{rank}(A) = n$) if and only if $\hat{R}$ is nonsingular.

---

### Using $QR$ to solve the least squares problem

First factor $A = QR$. Then,

$$
\|b - Ax\|_2 = \left\|Q^Tb - Rx\right\|_2.
$$

Let $c = Q^Tb$ and partition $c$ conformably with $R = \begin{bmatrix} \hat{R} \\ 0 \end{bmatrix}$; i.e.,
$
c = \begin{bmatrix} \hat{c} \\ d \end{bmatrix},
$
where $\hat{c} \in \mathbb{R}^n$.


Then

$$
\|b - Ax\|_2^2 = \left\|\begin{bmatrix} \hat{c} - \hat{R}x \\ d \end{bmatrix}\right\|_2^2
= \left\|\hat{c} - \hat{R}x\right\|_2^2 + \|d\|_2^2 \geq \|d\|_2^2, \qquad \text{for all $x \in \mathbb{R}^n$.}
$$

Therefore,

$$
\min_x \|b - Ax\|_2 \geq \|d\|_2.
$$

---

## Full column rank case

When $A$ has full column rank, the matrix $\hat{R}$ is nonsingular.

Let $x$ be the **unique** solution of the upper-triangular system 
$$\hat{R}x = \hat{c}.$$

Then
$$
\|b - Ax\|_2^2 
= \left\|\hat{c} - \hat{R}x\right\|_2^2 + \|d\|_2^2 = \|d\|_2^2.
$$

Thus, the vector $x$ achieves the minimum possible value and is the **unique** solution of the least squares problem.

---

## Algorithm to solve least squares when $A$ has full column rank

1. Compute the $QR$ decomposition of $A$.
2. Compute $c = Q^Tb$.
3. Solve $\hat{R} x = \hat{c}$, where $\hat{c} = c[1:n]$.


---

## Example

In [1]:
m, n = 5, 2
A = rand(m, n)
x = rand(n)
b = A*x + .1*randn(m)
A\b

2-element Array{Float64,1}:
 -0.24218 
  0.775855

In [2]:
F = qrfact(A)

Q = F[:Q]
c = Q'*b  # Done efficiently without forming Q

5-element Array{Float64,1}:
 -0.813606 
 -0.511264 
 -0.0147933
  0.097828 
 -0.143254 

In [3]:
Rhat = F[:R]
xhat = Rhat\c[1:n]

2-element Array{Float64,1}:
 -0.24218 
  0.775855

In [4]:
norm(b - A*xhat)

0.17410042783558716

In [5]:
d = c[n+1:m]
norm(d)

0.1741004278355871

---

In [8]:
A = rand(-3:3, 4, 2)

4x2 Array{Int64,2}:
 -1  -3
  2   1
  1   1
 -2   2

In [9]:
b = rand(-3:3, 4)

4-element Array{Int64,1}:
 -1
 -1
 -2
  0

---

## Rank-deficient case

If $A \in \mathbb{R}^{m \times n}$, $m > n$, is not full column rank because its columns are linearly dependent, 

$$
\mathrm{rank}(A) = r < n,
$$

then the $QR$ decomposition is

$$
\hat{A} = Q R, \qquad R = \begin{bmatrix} R_{11} & R_{12} \\ 0 & 0 \end{bmatrix},
$$

where $R_{11} \in \mathbb{R}^{r \times r}$ is upper-triangular and nonsingular, and $\hat{A}$ is $A$ with its columns permuted.

The pivoted $QR$ factorization of $A$ reorders the columns of $A$ so that

$$
|r_{11}| \geq |r_{22}| \geq \cdots \geq |r_{nn}|.
$$

This reordering can be represented as a vector $p$ or as a permutation matrix $P$ such that

$$
A[:,p] = QR,\qquad AP = QR.
$$

The matrix $P$ is the identity matrix $I$ with columns reordered according to $p$,

$$
P = I[:,p].
$$

---

## pivoted `qr`

In [10]:
?qr

search: qr qrfact qrfact! sqrt sqrtm isqrt require QuickSort PartialQuickSort



```
qr(A [,pivot=Val{false}][;thin=true]) -> Q, R, [p]
```

Compute the (pivoted) QR factorization of `A` such that either `A = Q*R` or `A[:,p] = Q*R`. Also see `qrfact`. The default is to compute a thin factorization. Note that `R` is not extended with zeros when the full `Q` is requested.


In [11]:
B = rand(1:10, 5, 2)*diagm([1, 10])
A = Float64[B B*rand(1:3, 2, 2)]

5x4 Array{Float64,2}:
 5.0  30.0   95.0  105.0
 2.0  20.0   62.0   66.0
 8.0  70.0  218.0  234.0
 9.0  20.0   69.0   87.0
 1.0  90.0  271.0  273.0

In [12]:
r = rank(A)

2

In [13]:
pivot = Val{true}
Q, R, p = qr(A, pivot)

(
5x4 Array{Float64,2}:
 -0.269111  -0.265556    0.906628   -0.183003
 -0.169156  -0.0468373  -0.0345385   0.346534
 -0.599734  -0.261581   -0.107119    0.649655
 -0.222978  -0.730388   -0.386874   -0.507417
 -0.69969    0.570434   -0.125248   -0.408534,

4x4 Array{Float64,2}:
 -390.173  -371.797   -120.869        -9.18823    
    0.0      19.0343     9.51716      -9.51716    
    0.0       0.0       -3.02013e-14   1.42028e-14
    0.0       0.0        0.0           4.82736e-15,

[4,3,2,1])

In [14]:
R11, R12 = R[1:2, 1:2], R[1:2, 3:4]

(
2x2 Array{Float64,2}:
 -390.173  -371.797 
    0.0      19.0343,

2x2 Array{Float64,2}:
 -120.869    -9.18823
    9.51716  -9.51716)

In [15]:
A[:,p] - Q*R

5x4 Array{Float64,2}:
 2.84217e-14  -1.42109e-14  3.55271e-14  2.66454e-15
 1.42109e-14   0.0          3.55271e-15  1.11022e-15
 0.0          -2.84217e-14  0.0          8.88178e-16
 1.42109e-14   1.42109e-14  7.10543e-15  0.0        
 0.0           0.0          1.42109e-14  0.0        

In [16]:
m, n = size(A)
I = eye(n)
P = I[:,p]

4x4 Array{Float64,2}:
 0.0  0.0  0.0  1.0
 0.0  0.0  1.0  0.0
 0.0  1.0  0.0  0.0
 1.0  0.0  0.0  0.0

In [17]:
A*P - Q*R

5x4 Array{Float64,2}:
 2.84217e-14  -1.42109e-14  3.55271e-14  2.66454e-15
 1.42109e-14   0.0          3.55271e-15  1.11022e-15
 0.0          -2.84217e-14  0.0          8.88178e-16
 1.42109e-14   1.42109e-14  7.10543e-15  0.0        
 0.0           0.0          1.42109e-14  0.0        

---

## Rank-deficient least squares

Suppose that the pivoted $QR$ decomposition of $A$ is $AP = QR$, and that $\mathrm{rank}(A) = r < n$, so that

$$
R = \begin{bmatrix} R_{11} & R_{12} \\ 0 & 0 \end{bmatrix},
$$

where $R_{11}$ is $r \times r$, upper-triangular, and nonsingular.

Then

$$
\|b - Ax\|_2^2 = \|\hat{c} - R_{11}\hat{x}_1 - R_{12}\hat{x}_2\|_2^2 + \|d\|_2^2 \geq \|d\|_2^2,
$$

where 

$$
\hat{x} = P^T x = \begin{bmatrix} \hat{x}_1 \\ \hat{x}_2 \end{bmatrix}, \qquad 
\begin{bmatrix} \hat{c} \\ d \end{bmatrix}  = Q^T b.
$$

Also, $\|b - Ax\|_2^2 = \|d\|_2^2$ when

$$
R_{11}\hat{x}_1 + R_{12}\hat{x}_2 = \hat{c}.
$$

Thus, $\hat{x}_1$ are the dependent variables and $\hat{x}_2$ are the independent variables, and

$$
\hat{x}_1 = R_{11}^{-1}\left(\hat{c} - R_{12} \hat{x}_2\right).
$$

---

### Sparse solution:

$$
\hat{x}_2 = 0 \quad \Rightarrow \quad \hat{x}_1 = R_{11}^{-1}\hat{c}
$$

In [18]:
b = rand(m)

c = Q'b
ĉ = c[1:r]

x = zeros(n)
x[p[1:r]] = R11\ĉ
x

4-element Array{Float64,1}:
  0.0      
  0.0      
 -0.0338594
  0.0358564

In [19]:
norm(b - A*x)

0.35612769417740486

---

### Least-norm solution:

Let $S$ be the set of optimal solutions of the least-squares problem:

$$
S = \operatorname*{argmin}_x \|b - Ax\|_2.
$$

Then the least-norm solution is found by solving

$$
\min_{x \in S} \|x\|_2 = \|\hat{x}\|_2 
= \left\| \begin{bmatrix} R_{11}^{-1}\hat{c} \\ 0 \end{bmatrix} - 
\begin{bmatrix} R_{11}^{-1} R_{12} \\ I \end{bmatrix} \hat{x}_2 \right\|_2,
$$

which has a unique solution $\hat{x}_2$ since the matrix 

$$
\begin{bmatrix} R_{11}^{-1} R_{12} \\ I \end{bmatrix}
$$

has full column rank.

In [20]:
# The least-squares solution given by \ is the least-norm solution

x = A\b

4-element Array{Float64,1}:
  0.0204399
 -0.0125941
 -0.0173424
  0.0235374

In [21]:
# Using the formulas from above
x = zeros(n)
x[p[r+1:n]] = [R11\R12; eye(n-r)]\[R11\ĉ; zeros(n-r)]
x[p[1:r]] = R11\(ĉ - R12*x[p[r+1:n]])
x

4-element Array{Float64,1}:
  0.0204399
 -0.0125941
 -0.0173424
  0.0235374

In [22]:
norm(b - A*x)

0.35612769417740486

---