## Chapter 9
# Orthogonalization

In [1]:
import sys
sys.path.append('../')

from vecutil import zero_vec
from orthogonalization import project_along

def project_orthogonal(b, vlist):
    for v in vlist:
        b = b - project_along(b, v)
    return b

**Problem 9.2.2:** Using hand-calculation, show the steps carried out when `project_orthogonal` is called with `b=[1,1,1]` and `vlist=[v_1,v_2]` where `v_1 = [0,2,2]` and `v_2 = [0,1,-1]`.

$\boldsymbol{b} = [1,1,1]$

After the first loop,

$\boldsymbol{b} = \boldsymbol{b} - \boldsymbol{v}_1 \frac{\boldsymbol{v_1} \cdot \boldsymbol{b}}{\boldsymbol{v}_1 \cdot \boldsymbol{v}_1} = [0,2,2]\frac{4}{8} = [1,1,1] - [0,1,1] = [1,0,0]$

After the second loop,

$\boldsymbol{b} = \boldsymbol{b} - \boldsymbol{v}_2 \frac{\boldsymbol{v_2} \cdot \boldsymbol{b}}{\boldsymbol{v}_2 \cdot \boldsymbol{v}_2} = [1,0,0] - [0,1,-1]\frac{0}{2} = [1,0,0] - [0,0,0] = [1,0,0]$

In [2]:
# verify:
from vecutil import list2vec

project_orthogonal(list2vec([1,1,1]), [list2vec([0,2,2]), list2vec([0,1,-1])])

Vec({0, 1, 2},{0: 1.0, 1: 0.0, 2: 0.0})

In [3]:
def orthogonalize(vlist):
    vstarlist = []
    for v in vlist:
        vstarlist.append(project_orthogonal(v, vstarlist))
    return vstarlist

**Problem 9.3.4:** Using hand-calculation, show the steps carried out when `orthogonalize` is applied to $[\boldsymbol{v}_1, \boldsymbol{v}_2, \boldsymbol{v}_3]$ where $\boldsymbol{v}_1 = [1,0,2]$, $\boldsymbol{v}_2 = [1,0,2]$, and $\boldsymbol{v}_3 = [2,0,0]$.

On the first iteration, `vstarlist` is empty, so we set $\boldsymbol{v^*}_1 = \boldsymbol{v}_1$.

Next, we compute $\boldsymbol{v^*}_2$ as the projection of $\boldsymbol{v}_2$ orthogonal to $\boldsymbol{v^*}_1$:

$\boldsymbol{v^*}_2 = \boldsymbol{v}_2 - \boldsymbol{v^*}_1 \frac{\langle \boldsymbol{v}_2, \boldsymbol{v^*}_1\rangle}{\langle \boldsymbol{v^*}_1, \boldsymbol{v^*}_1\rangle} = [1,0,2] - [1,0,2] \cdot 1 = [0,0,0]$

Next, compute $\boldsymbol{v^*}_3$ as the projection of $\boldsymbol{v}_3$ orthogonal to $[\boldsymbol{v^*}_1, \boldsymbol{v^*}_2]$:

Let $\boldsymbol{v^*}_{3,1} = \boldsymbol{v}_3 - \boldsymbol{v^*}_2 \frac{\langle \boldsymbol{v}_3, \boldsymbol{v^*}_2\rangle}{\langle \boldsymbol{v^*}_2, \boldsymbol{v^*}_2\rangle}  = [2,0,0] - [0,0,0] = [2,0,0]$

Then $\boldsymbol{v^*}_3 = \boldsymbol{v^*}_{3,1} - \boldsymbol{v^*}_1 \frac{\langle \boldsymbol{v^*}_{3,1}, \boldsymbol{v^*}_1\rangle}{\langle \boldsymbol{v^*}_1, \boldsymbol{v^*}_1\rangle} = [2,0,0] - [1,0,2] \cdot \frac{2}{5} = [\frac{8}{5},0,-\frac{4}{5}]$

In [4]:
# verify:
orthogonalize([list2vec([1,0,2]), list2vec([1,0,2]), list2vec([2,0,0])])

[Vec({0, 1, 2},{0: 1, 1: 0, 2: 2}),
 Vec({0, 1, 2},{0: 0.0, 1: 0.0, 2: 0.0}),
 Vec({0, 1, 2},{0: 1.6, 1: 0.0, 2: -0.8})]

In [5]:
def find_orthogonal_complement(U_basis, W_basis):
    Lstar = orthogonalize(U_basis + W_basis)
    return [L for L in Lstar[len(U_basis):] if not L.is_almost_zero()]

## 9.10 Review questions

**What does it mean to normalize a vector?**

Normalizing a vector is dividing each of its elements by the vector's magnitude.

**What does it mean for several vectors to be mutually orthogonal?**

For a set of vectors to be mutually orthogonal, each vector in the set must be orthogonal to every other vector in the set.

**What are orthonormal vectors? What is an orthonormal basis?**

A set of vectors are orthonormal if they are mutually orthogonal and all have norm 1.

**How can one find the vector in Span $\{\boldsymbol{b}_1, ..., \boldsymbol{v}_n\}$ closest to $\boldsymbol{b}$?**

Find the projection of the vector $\boldsymbol{b}$ orthogonal to Span $\{\boldsymbol{b}_1, ..., \boldsymbol{v}_n\}$, and subtract this vector from $\boldsymbol{b}$.

**How does one find the projection of a vector $\boldsymbol{b}$ orthogonal to several mutually orthogonal vectors $\boldsymbol{v}_1, ..., \boldsymbol{v}_n$?**

Subtract from $\boldsymbol{b}$ the projection of $\boldsymbol{b}$ along each $\boldsymbol{v}$, as in `project_orthogonal` above.

**How does one find vectors that (i) span the same space as $\boldsymbol{v}_1, ..., \boldsymbol{v}_n$ and that (ii) are mutually orthogonal?**

Start with the first vector $\boldsymbol{v}_1$, then add the result of the procedure from the previous question setting $\boldsymbol{b} = \boldsymbol{v}_2$ and the mutually orthogonal set as the running computed set, and repeat for $\boldsymbol{b} = \boldsymbol{v}_3, ... \boldsymbol{v}_n$ (See the algorithm `orthogonolize` above).

**What is a column-orthogonal matrix? An orthogonal matrix?**

A column-orthogonal matrix is one with orthonormal columns.

An orthogonal matrix is a square matrix with orthonormal columns _and_ rows.

**What is the inverse of an orthogonal matrix?**

The inverse of an orthogonal matrix is the transpose of the matrix.

**How can you use matrix-vector multiplication to find the coordinate representation of a vector in terms of an orthonormal basis?**

$Q\boldsymbol{x} = \boldsymbol{b}$

$Q^TQ\boldsymbol{x} = Q^T\boldsymbol{b}$

$\mathbb{1}\boldsymbol{x} = Q^T\boldsymbol{b}$

$\boldsymbol{x} = Q^T\boldsymbol{b}$

**What is the $QR$ factorization of a matrix?**

The $QR$ factorization of a matrix $M$ is a factorization into two components $Q$ and $R$ where $Q$ a column-orthogonal basis for the span of Col $M$ and $R$ is an upper-triangular matrix and $M = QR$.

**How can the $QR$ factorization be used to solve a matrix equation?**

$QR$ factorization can be used to solve a matrix equation by solving the equation $R\boldsymbol{x} = Q^T\boldsymbol{b}$ by backward-substitution (which is possible since $R$ is upper-triangular.

**How can the $QR$ factorization be computed?**

The $QR$ factorization can be computed from a matrix $A$ by finding a column-orthogonal basis for the vector space Col $A$, and the upper-triangular matrix of coefficients such that $A = QR$ ($R$'s columns are the representation of columns of $A$ in the basis $Q$).

**How can the $QR$ factorization be used to solve a least-squares problem?**

Since the least-squares problem is defined by minimizing $\|A\boldsymbol{x} - \boldsymbol{b}\|$, we can use $QR$ factorization to find $\boldsymbol{b}^{\|\text{Span}\space \text{Col}\space A} = QQ^T$, and then find the solution to $R\boldsymbol{x} = Q^T\boldsymbol{b}$

**How can solving a least-squares problem help in fitting data to a line or a quadratic?**

A linear model for a set of data hypothesises that $f(x) = a + cx$ for each observation $(x_i, y_i)$.

The prediction error for the $i^{th}$ observation is $|f(x_i) - y_i|$ and the total sum-of-squares prediction error is $\sum_\limits{i}(f(x_i) - y_i)^2$.

If $A$ is a matrix whose rows are $(1, x_1), (1, x_2), ..., (1, x_n)$, then the dot-product of row $i$ with the vector $(a, c)$ is $a + cx_i$, which is the prediction for the $i^{th}$ observation.  Thus the vector of predictions is $A \cdot (a, c)$.  The vector of prediction error is $A(a, c) - (y_1, y_2, ..., y_k)$, and the squared norm of this vector is the sum of squared differences.  Minimizing this value will thus find the line that best fits the data.

The same form of argument can be used to find the quadradic, except we have observation vectors
$\begin{bmatrix}1 &x_1 &x_1^2\\1 &x_2 &x_2^2\\&\vdots\\1 &x_n &x_n^2\end{bmatrix}$, and $\boldsymbol{x}$ has three variables.

**How can solving a least-squares problem help to get more accurate output?**

We can solve for $\boldsymbol{x}$ directly if we have a number of measurements equal to the number of variables being solved for.  However, real world data are usually noisy and inaccurate, so the exact solution will be incorrect.  If we obtain _more obserations_ (samples, measurements, individuals), we can obtain more accuracy by finding the closest solution that matches all the observations.

**What is the orthogonal complement?**

The orthogonal complement $U$ which is a subspace of $W$ is the space $V$ that is the basis for every vector in $W$ that is _not in_ $U$, such that 1) the only vector in $V$ shared with $U$ is the zero vector, 2) every vector in $V$ is orthogonal to the space $U$, and 3) ...

**What is the connection between orthogonal complement and direct sum?**

... the direct sum $U \oplus V$ is $W$.

## 9.11 Problems

### Orthogonal Complement

**Problem 9.11.1:** Find generators for the orthogonal complement of $U$ with respect to $W$ where

1. $U = $ Span $\{[0,0,3,2]\}$ and $W = $ Span $\{[1,2,-3,-1],[1,2,0,1],[3,1,0,-1],[-1,-2,3,1]\}$.

    

In [6]:
orthogonalize([list2vec([1,2,-3,-1]), list2vec([1,2,0,1]), list2vec([3,1,0,-1]), list2vec([-1,-2,3,1])])

[Vec({0, 1, 2, 3},{0: 1, 1: 2, 2: -3, 3: -1}),
 Vec({0, 1, 2, 3},{0: 0.7333333333333334, 1: 1.4666666666666668, 2: 0.8, 3: 1.2666666666666666}),
 Vec({0, 1, 2, 3},{0: 2.2432432432432434, 1: -0.5135135135135137, 2: 0.810810810810811, 3: -1.2162162162162162}),
 Vec({0, 1, 2, 3},{0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0})]

We can see that only three vectors are needed to span $W$. Performing the operations of `orthogonalize` by hand gives $W = $ Span $\{[1,2,-3,-1],[\frac{11}{15},\frac{22}{15},\frac{4}{5},\frac{19}{15}],[\frac{9}{37},-\frac{19}{37},\frac{30}{37},-\frac{45}{37}]\}$

In [7]:
orthogonalize([list2vec([0,0,3,2]), list2vec([1,2,-3,-1]), list2vec([1,2,0,1]), list2vec([3,1,0,-1]), list2vec([-1,-2,3,1])])

[Vec({0, 1, 2, 3},{0: 0, 1: 0, 2: 3, 3: 2}),
 Vec({0, 1, 2, 3},{0: 1.0, 1: 2.0, 2: -0.4615384615384617, 3: 0.6923076923076923}),
 Vec({0, 1, 2, 3},{0: 0.0, 1: 0.0, 2: 1.1102230246251565e-16, 3: 0.0}),
 Vec({0, 1, 2, 3},{0: 2.2432432432432434, 1: -0.5135135135135134, 2: 0.810810810810811, 3: -1.2162162162162162}),
 Vec({0, 1, 2, 3},{0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0})]

Therefore the generators are  $[[1,2,-\frac{6}{13},\frac{9}{13}],[\frac{9}{37},-\frac{19}{37},\frac{30}{37},-\frac{45}{37}]]$

2\. $U = $ Span $\{[3,0,1]\}$ and $W = $ Span $\{[1,0,0],[1,0,1]\}$.


For this and the rest, we will simply use the `find_orthogonal_complement` procedure developed above"

In [8]:
find_orthogonal_complement([list2vec([3,0,1])],[list2vec([1,0,0]), list2vec([1,0,1])])

[Vec({0, 1, 2},{0: 0.10000000000000009, 1: 0.0, 2: -0.3})]

Thus a generator for the orthogonal basis of the complement is $[1,0,-3]$, which would have also been easy to find by simply thinking of an obvious vector such that $[3,0,1] \cdot \boldsymbol{x} = 0$.

3\. $U = $ Span $\{[-4,3,1,-2],[-2,2,3,-1\}$ and $W = \mathbb{R}^4$

In [9]:
find_orthogonal_complement([list2vec([-4,3,1,-2]),list2vec([-2,2,3,-1])], [list2vec([1,0,0,0]),list2vec([0,1,0,0]),list2vec([0,0,1,0]),list2vec([0,0,0,1])])

[Vec({0, 1, 2, 3},{0: 0.41899441340782123, 1: 0.3910614525139665, 2: -0.07821229050279327, 3: -0.29050279329608936}),
 Vec({0, 1, 2, 3},{0: -1.1102230246251565e-16, 1: 0.33333333333333315, 2: -0.06666666666666665, 3: 0.46666666666666673})]

Thus a generator for the orthogonal basis of the complement is $[[0.41899441340782123,0.3910614525139665,-0.07821229050279327,-0.29050279329608936],[0,\frac{1}{3},-\frac{2}{30},\frac{21}{45}]]$

(Sorry, not manually figuring out the rationals like the first problem!)

**Problem 9.11.2:** Explain why each statement cannot be true.

1. $U = $ Span $\{[0,0,1],[1,2,0]\}$ and $W = $ Span $\{[1,0,0],[1,0,1]\}$, and there is a vector space $V$ that is the orthogonal complement of $U$ in $W$.

  Using the dimension principle, we must have $dim U + dim V = dim W$.  Since $dim W = 2$ and $dim U = 2$, then $dim V = 0$.
  
1. $U = $ Span $\{[3,2,1],[5,2,-3]\}$ and $W = $ Span $\{[1,0,0],[1,0,1],[0,1,1]\}$, and the orthogonal complement $V$ of $U$ in $W$ contains the vector $[2, -3, 1]$.
    The orthogonal complement $V$ must be orthogonal to _every_ vector in $U$.  However, $[3,2,1]\cdot[2,-3,1] = 1$.

**Problem 9.11.3:** Let $A = \begin{bmatrix}-4 &-1 &-3 &-2\\0 &4 &0 &-1\end{bmatrix}$. Use orthogonal complement to find a basis for the null space of $A$.

Find the orthogonal complement of Row $A$:

In [10]:
find_orthogonal_complement([list2vec([-4,-1,-3,-2]),list2vec([0,4,0,-1])], [list2vec([1,0,0,0]),list2vec([0,1,0,0]),list2vec([0,0,1,0]),list2vec([0,0,0,1])])

[Vec({0, 1, 2, 3},{0: 0.4624505928853755, 1: -0.07114624505928856, 2: -0.4031620553359684, 3: -0.2845849802371541}),
 Vec({0, 1, 2, 3},{0: 0.0, 1: 0.038461538461538505, 2: -0.11538461538461539, 3: 0.15384615384615388})]

This is a numerical solution.  One could use the manual procedure from the first problems in this notebook to find an exact rational solution.

**Problem 9.11.4:** Find a normal for each of the following lines in $\mathbb{R}^2$.

1. $\{\alpha [3, 2] : \alpha \in \mathbb{R}\}$

  Any nonzero vector $\boldsymbol{n}$ that is orthogonal to $[3,2]$ is a normal.  
  Choose $\boldsymbol{n} = [-2,3]$.

2. $\{\alpha [3, 5] : \alpha \in \mathbb{R}\}$

  Choose $\boldsymbol{n} = [-5,3]$.

**Problem 9.11.5:** Find a normal for each of the following planes in $\mathbb{R}^3$.

1. Span $\{[0,1,0],[0,0,1]\}$

  Any nonzero vector $\boldsymbol{n}$ that is orthogonal to every vector in the plane.  If it is orthogonal to the given basis, it will satisfy this condition.  
  Choose $\boldsymbol{n} = [1,0,0]$.

1. Span $\{[2,1,-3],[-2,1,1]\}$

  Choose $\boldsymbol{n} = [1,1,1]$
  
1. affine hull of $[3,1,4]$, $[5,2,6]$, and $[2,3,5]$.

  A vector that is perpendicular to a translation of this plane that contains the origin will also be perpendicular to the original plane.  Thus, this is equivalent to findint a normal for Span $\{[5,2,6]-[3,1,4],[2,3,5]-[3,1,4]\} = $ Span $\{[2,1,2],[-1,2,1]\}$.
  Choose $\boldsymbol{n} = [18,24,-30]$

**Problem 9.11.6:** For each of the following vectors in $\mathbb{R}^2$, give a mathematical description of a line that has this vector as the normal.

1. $[0,7]$

  $\{\alpha [1, 0] : \alpha \in \mathbb{R}\}$

1. $[1,2]$

  $\{\alpha [-2, 1] : \alpha \in \mathbb{R}\}$

**Problem 9.11.7:** For each of the following vectors, provide a set of vectors that span a plane in $\mathbb{R}^3$ for which the normal is the given vector.

1. $[0,1,1]$

  $\{[1,0,0],[0,1,-1]\}$

1. $[0,1,0]$

  $\{[1,0,0],[0,0,1]\}$

**Problem 9.11.8:** In this problem, you will give an alternative proof of the Rank Theorem, a proof that works for matrices over the reals.

**Theorem:** For a matrix $A$ over the reals, the row rank equals the column rank.

Your proof should proceed as follows:
* The orthogonal complement of Row $A$ is Null $A$.
* Using the connection between orthogonal complement and direct sum (Lemma 9.6.5) and the Direct Sum Dimension Corollary (Corollary 6.3.9), show that

  dim Row $A$ + dim Null $A$ = number of columns of $A$


* Using the Kernel-Image Theorem (Theorem 6.4.7), show that

  dim Col $A$ + dim Null $A$ = number of columns of $A$


* Combine these equations to obtain the theorem.

**Proof:**

Let $A$ be an $R \times C$ matrix.

Then the orthogonal complement of Row $A$ with respect to $\mathbb{R}^C$ is Null $A$. (Given)

It directly follows that Row $A \oplus $ Null $A = \mathbb{R}^C$. (Lemma 9.6.5)

Thus, $dim$ Row $A$ + $dim$ Null $A = dim \mathbb{R}^C = C$ 

(This proves the first item.)


Define $f : \mathbb{R}^C \to \mathbb{R}^R$ by $f(\boldsymbol{x}) = A\boldsymbol{x}$. By the Kernel-Image Theorem, $dim $ Ker $f + dim$ Im $f = dim $ $\mathbb{R}^C$. The kernel of $f$ is Null $A$, and the image of $f$ is the column space of $A$. Thus,

$dim $ Null $A + dim $ Col $A = dim $ $\mathbb{R}^C = C$

(This proves the second item.)

Combining these results,

$dim $ Row $A$ = $dim $ Col $A = dim $ $\mathbb{R}^C - dim$ Null $A$.

$QED$

**Problem 9.11.9:** Write a module `orthonormalization` that defines a procedure `orthonormalize(L)` with the following spec:

* _input:_ a list `L` of linearly independent `Vec`s
* _output:_ a list `Lstar` of `len(L)` orthonormal `Vec`s such that, for `i = 1, ..., len(L)`, the first `i` `Vec`s of `Lstar` and the first `i` `Vec`s of `L` span the same space.

Your procedure should follow this outline:

1. Call `orthogonalize(L)`
1. Compute the list of norms of the resulting vectors, and
1. Return the list resulting from normalizing each of the vectors resulting from Step 1.

When the input consists of the list of `Vec`s corresponding to `[4,3,1,2], [8,9,-5,-5], [10,1,-1,5]`, your procedure should return the list of `Vec`s corresponding approximately to `[0.74,0.55,0.18,0.37], [0.19,0.40,-0.57,-0.69],[0.53,-0.65,-0.51,0.18]`.

(Of course, I am doing this inline here instead of defining a module.)

In [11]:
def orthonormalize(L):
    Lstar_not_normalized = orthogonalize(L)
    norms = [(Lstar_i * Lstar_i)**0.5 for Lstar_i in Lstar_not_normalized]
    return [Lstar_i / norm for Lstar_i, norm in zip(Lstar_not_normalized, norms)]

In [12]:
orthonormalize([list2vec([4,3,1,2]), list2vec([8,9,-5,-5]), list2vec([10,1,-1,5])])

[Vec({0, 1, 2, 3},{0: 0.7302967433402214, 1: 0.5477225575051661, 2: 0.18257418583505536, 3: 0.3651483716701107}),
 Vec({0, 1, 2, 3},{0: 0.1867707814860146, 1: 0.4027244975792189, 2: -0.5661489313794816, 3: -0.6945538436511166}),
 Vec({0, 1, 2, 3},{0: 0.5275409009423367, 1: -0.6531216993058959, 2: -0.5123087286340884, 3: 0.18075511139121447})]

**Problem 9.11.10:** Write a procedure `aug_orthonormalize(L)` in your `orthonormalization` module with the following spec:

* _input:_ a list `L` of `Vec`s
* _output:_ a pair `Qlist, Rlist` of lists of `Vec`s such that
  - `coldict2mat(L)` equals `coldict2mat(Qlist)` times `coldict2mat(Rlist)`, and
  - `Qlist = orthonormalize(L)`

Your procedure should start by calling the procedure `aug_orthogonalize(L)` defined in the module `orthogonalization`. I suggest that your procedure also use a subroutine `adjust(v, multipliers)` with the following spec:

* _input:_ a `Vec` `v` with domain $\{0,1,2,...,n - 1\}$ and an $n$-element list `multipliers` of scalars
* _output:_ a `Vec` `w` with the same domain as `v` such that `w[i] = multipliers[i] * v[i]`

Here is an example for testing `aug_orthonormalize(L)`:
```
>>> L = [list2vec(v) for v in [[4,3,1,2],[8,9,-5,-5],[10,1,-1,5]]]
>>> print(coldict2mat(L))
    0  1  2
  ---------
0 | 4  8 10
1 | 3  9  1
2 | 1 -5 -1
3 | 2 -5  5
>>> Qlist, Rlist = aug_orthonormalize(L)
>>> print(coldict2mat(Qlist))

          0      1      2
    ---------------------
0  |   0.73  0.187  0.528
1  |  0.548  0.403 -0.653
2  | 0.183  -0.566 -0.512
3  | 0.365 -0.695   0.181

>>> print(coldict2mat(Rlist))

         0      1      2
    --------------------
0  |  5.48   8.03   9.49
1  |     0   11.4 -0.636
2  |     0      0   6.04

>>> print(coldict2mat(Qlist) * coldict2mat(Rlist))

      0  1  2
    ---------
0  |  4  8 10
1  |  3  9  1
2  |  1 -5 -1
3  |  2 -5  5
```

In [13]:
from vec import Vec
from orthogonalization import aug_orthogonalize
from matutil import mat2coldict

# not actually needed
def adjust(v, multipliers):
    return Vec(v.D, {k: multipliers[k] * v[k] for i, k in enumerate(v.f.keys())})

def aug_orthonormalize(L):
    Qlist = orthonormalize(L)
    Q = coldict2mat(Qlist).transpose()
    R = Q * coldict2mat(L)
    Rlist = list(mat2coldict(R).values())
    return Qlist, Rlist

In [14]:
from matutil import coldict2mat

L = [list2vec(v) for v in [[4,3,1,2],[8,9,-5,-5],[10,1,-1,5]]]
Qlist, Rlist = aug_orthonormalize(L)
print(coldict2mat(Qlist))
print(coldict2mat(Rlist))
print(coldict2mat(Qlist) * coldict2mat(Rlist))


           0      1      2
     ---------------------
 0  |   0.73  0.187  0.528
 1  |  0.548  0.403 -0.653
 2  |  0.183 -0.566 -0.512
 3  |  0.365 -0.695  0.181


               0        1      2
     ---------------------------
 0  |       5.48     8.03   9.49
 1  |   2.22E-16     11.4 -0.636
 2  |  -2.22E-16 4.44E-16   6.04


       0  1  2
     ---------
 0  |  4  8 10
 1  |  3  9  1
 2  |  1 -5 -1
 3  |  2 -5  5



**Problem 9.11.11:** Compute the $QR$ factorization for the following matrices. You can use a calculator or computer for the arithmetic.

1. $\begin{bmatrix}6 &6\\2 &0\\3 &3\end{bmatrix}$

Step 1: Find orthogonal basis

$v_1 = [6,2,3]$

$\begin{align}v_2 &= [6,0,3] - [6,2,3]\frac{[6,2,3]\cdot[6,0,3]}{[6,2,3]\cdot[6,2,3]}\\
&= [6,0,3] - [6,2,3]\frac{45}{49}\\
&= [6,0,3] - [\frac{270}{49},\frac{90}{49},\frac{135}{49}]\\
&= [\frac{24}{49},-\frac{90}{49},\frac{12}{49}]\end{align}$

Step 2: Divide by norm (normalize)

$\|v_1\| = \sqrt{[6,2,3]^2} = 7$

$\|v_2\| = \sqrt{[\frac{24}{49},-\frac{90}{49},\frac{12}{49}]^2} = \sqrt{\frac{24^2+(-90)^2+12^2}{49^2}} = \sqrt{\frac{49\cdot180}{49^2}} = \sqrt{\frac{5\cdot36}{49}} = \frac{6\sqrt{5}}{7}$


$v_{1_{norm}} = \frac{[6,2,3]}{7} = [\frac{6}{7},\frac{2}{7},\frac{3}{7}]$

$v_{2_{norm}} = \frac{[\frac{24}{49},-\frac{90}{49},\frac{12}{49}]}{\frac{6\sqrt{5}}{7}} = [\frac{24}{42\sqrt{5}},-\frac{90}{42\sqrt{5}},\frac{12}{42\sqrt{5}}] = [\frac{4}{7\sqrt{5}},-\frac{15}{7\sqrt{5}},\frac{2}{7\sqrt{5}}]$

This is our orthonormal basis, and they form the columns of Q!

$Q = \begin{bmatrix}\frac{6}{7} &\frac{4}{7\sqrt{5}}\\\frac{2}{7} &-\frac{15}{7\sqrt{5}}\\\frac{3}{7} &\frac{2}{7\sqrt{5}}\end{bmatrix}$

Step 3: Factorize

$R = Q^TA = \begin{bmatrix}\frac{6}{7} &\frac{2}{7} &\frac{3}{7}\\\frac{4}{7\sqrt{5}} &-\frac{15}{7\sqrt{5}} &\frac{2}{7\sqrt{5}}\end{bmatrix} \begin{bmatrix}6 &6\\2 &0\\3 &3\end{bmatrix} = \begin{bmatrix}7 &6.43\\0 &1.92\end{bmatrix}$

In [15]:
# Verify...
L = [list2vec(v) for v in [[6,2,3],[6,0,3]]]
Qlist, Rlist = aug_orthonormalize(L)
print(coldict2mat(Qlist))
print(coldict2mat(Rlist))
print(coldict2mat(Qlist) * coldict2mat(Rlist))


           0      1
     --------------
 0  |  0.857  0.256
 1  |  0.286 -0.958
 2  |  0.429  0.128


               0    1
     ----------------
 0  |          7 6.43
 1  |  -1.33E-15 1.92


       0        1
     ------------
 0  |  6        6
 1  |  2 8.88E-16
 2  |  3        3



2\. $\begin{bmatrix}2 &3\\2 &1\\1 &1\end{bmatrix}$

(I'm going to skip some of the manual steps above for this one).

$Q = \begin{bmatrix}\frac{2}{3} &\frac{1}{\sqrt{2}}\\\frac{2}{3} &-\frac{1}{\sqrt{2}}\\\frac{1}{3} &0\end{bmatrix}$

$R = \begin{bmatrix}3 &3\\0 &1.41\end{bmatrix}$

In [16]:
# Verify...
L = [list2vec(v) for v in [[2,2,1],[3,1,1]]]
Qlist, Rlist = aug_orthonormalize(L)
print(coldict2mat(Qlist))
print(coldict2mat(Rlist))
print(coldict2mat(Qlist) * coldict2mat(Rlist))


           0      1
     --------------
 0  |  0.667  0.707
 1  |  0.667 -0.707
 2  |  0.333      0


       0    1
     --------
 0  |  3    3
 1  |  0 1.41


       0 1
     -----
 0  |  2 3
 1  |  2 1
 2  |  1 1



**Problem 9.11.12:** Write and test a procedure `QR_solve(A, b)`. Assuming the columns of `A` are linearly independent, this procedure should return the vector $\boldsymbol{\hat{x}}$ that minimizes $\|\boldsymbol{b} - A\boldsymbol{\hat{x}}\|$.

The procedure should use
* `triangular_solve(rowlist, label_list, b)`, and
* `factor(A)` defined in the module `QR`, which in turn uses the procedure `aug_orthonormalize(L)` from Problem 9.11.9.

You should try your procedure on the examples given in Problem 9.11.13 and on the following example:
```
>>> A = Mat(({'a', 'b', 'c'}, {'A', 'B'}), {('a', 'A'): -1, ('a', 'B'): 2, ('b', 'A'): 5, ('b', 'B'): 3, ('c', 'A'): 1, ('c', 'B'): -2})
>>> print(A)
        A  B
     -------
 a  |  -1  2
 b  |   5  3
 c  |   1 -2

>>> Q, R = QR_factor(A)
>>> print(Q)

            0     1
     --------------
 a  |  -0.192  0.68
 b  |   0.962 0.272
 c  |   0.192 -0.68

>>> print(R)

         A    B
     ----------
 0  |  5.2 2.12
 1  |    0 3.54
 
>>> b = Vec({'a', 'b', 'c'}, {'a': 1, 'b': -1})
>>> x = QR_solve(A, b)
>>> x
Vec({'A', 'B'},{'A': -0.269...,'B': 0.115...})
```

A good way to test your solution is to verify that the residual is (approximately) orthogonal to the columsn of `A`:
```
>>> A.transpose() * (b - A*x)
Vec({'A', 'B'},{'A': -2.22e-16, 'B': 4.44e-16})
```

In [17]:
from triangular import triangular_solve
from QR import factor
from matutil import mat2rowdict

def QR_solve(A, b):
    Q, R = factor(A)
    return triangular_solve(mat2rowdict(R), sorted(A.D[1], key=repr), Q.transpose() * b)

In [18]:
from mat import Mat
A = Mat(({'a', 'b', 'c'}, {'A', 'B'}), {('a', 'A'): -1, ('a', 'B'): 2, ('b', 'A'): 5, ('b', 'B'): 3, ('c', 'A'): 1, ('c', 'B'): -2})
print(A)

Q, R = factor(A)
print(Q)
print(R)


        A  B
     -------
 a  |  -1  2
 b  |   5  3
 c  |   1 -2


            0     1
     --------------
 a  |  -0.192  0.68
 b  |   0.962 0.272
 c  |   0.192 -0.68


              A    B
     ---------------
 0  |       5.2 2.12
 1  |  4.44E-16 3.54



In [19]:
b = Vec({'a', 'b', 'c'}, {'a': 1, 'b': -1})
x = QR_solve(A, b)
x

Vec({'A', 'B'},{'B': 0.11538461538461535, 'A': -0.2692307692307692})

In [20]:
A.transpose() * (b - A*x)

Vec({'A', 'B'},{'A': -2.220446049250313e-16, 'B': 4.440892098500626e-16})

**Problem 9.11.13:** In each of the following parts, you are given a matrix $A$ and a vector $\boldsymbol{b}$. You are also given the approximate $QR$ factorization of $A$. You are to

* find a vector $\boldsymbol{\hat{x}}$ that minimizes $\|A\boldsymbol{\hat{x}} - \boldsymbol{b}\|^2$,
* prove to yourself that the columns of $A$ are (approximately) orthogonal to the residual
* calculate the value of $\|A\boldsymbol{\hat{x}} - \boldsymbol{b}\|$.


1. $A = \begin{bmatrix}8 &1\\6 &2\\0 &6\end{bmatrix}$ and $\boldsymbol{b} = [10,8,6]$

  $A = \underbrace{\begin{bmatrix}0.8 &-0.099\\0.6 &0.132\\0 &0.986\end{bmatrix}}_{Q}\underbrace{\begin{bmatrix}10 &2\\0 &6.08\end{bmatrix}}_{R}$

1. $A = \begin{bmatrix}3 &1\\4 &1\\5 &1\end{bmatrix}$ and $\boldsymbol{b} = [10,13,15]$

  $A = \underbrace{\begin{bmatrix}0.424 &0.808\\0.566 &0.115\\0.707 &-0.577\end{bmatrix}}_{Q}\underbrace{\begin{bmatrix}7.07 &1.7\\0 &0.346\end{bmatrix}}_{R}$

In [31]:
# 1)
from matutil import listlist2mat

A = listlist2mat([[8,1],[6,2],[0,6]])
b = list2vec([10,8,6])

x_hat = QR_solve(A, b)
print('x_hat: ', x_hat)

residual = b - A*x_hat
print('\nresidual: ', residual)
print('\ninner-product of columns of A with residual:', A.transpose() * residual)
print('\n|A*x_hat - b|:',((A * x_hat - b) * (A * x_hat - b)) ** 0.5)

x_hat:  
    0     1
-----------
 1.08 0.984

residual:  
    0      1      2
-------------------
 0.35 -0.467 0.0973

inner-product of columns of A with residual: 
         0         1
--------------------
 -3.55E-15 -1.78E-15

|A*x_hat - b|: 0.5918363542992867


In [32]:
# 2)

A = listlist2mat([[3,1],[4,1],[5,1]])
b = list2vec([10,13,15])

x_hat = QR_solve(A, b)
print('x_hat: ', x_hat)

residual = b - A*x_hat
print('\nresidual: ', residual)
print('\ninner-product of columns of A with residual:', A.transpose() * residual)
print('\n|A*x_hat - b|:',((A * x_hat - b) * (A * x_hat - b)) ** 0.5)

x_hat:  
   0    1
---------
 2.5 2.67

residual:  
      0     1      2
--------------------
 -0.167 0.333 -0.167

inner-product of columns of A with residual: 
        0        1
------------------
 1.95E-14 3.55E-15

|A*x_hat - b|: 0.4082482904638623


**Problem 9.11.14:** For each of the following, find a vector $\boldsymbol{\hat{x}}$ that minimizes $\|A\boldsymbol{\hat{x}} - \boldsymbol{b}\|$. use the algorithm based on the $QR$ factorization.

1. $A = \begin{bmatrix}8 &1\\6 &2\\0 &6\end{bmatrix}$ and $\boldsymbol{b} = [10,8,6]$

  Already done above (is that cheating? hehe).

2. $A = \begin{bmatrix}3 &1\\4 &1\end{bmatrix}$ and $\boldsymbol{b} = [10,13]$

In [33]:
A = listlist2mat([[3,1],[4,1]])
b = list2vec([10,13])

x_hat = QR_solve(A, b)
print('x_hat: ', x_hat)

x_hat:  
 0 1
----
 3 1


**Problem 9.11.15:** Use Python to find the values for parameters $a$ and $b$ defining the line $y = ax + b$ that best approximates the relationship between age ($x$) and height ($y$).

In [34]:
from read_data import read_vectors

age_height_vectors = read_vectors('age-height.txt')
len(age_height_vectors)

12

In [36]:
len(age_height_vectors[0].f.keys())

2

In [42]:
# we want each row in the matrix to be an individual, with values (1, x).

from matutil import rowdict2mat
A = rowdict2mat([Vec({'b', 'x'}, {'b': 1, 'x': age_height_vector['age']}) for age_height_vector in age_height_vectors])
print(A)

y = list2vec([age_height_vector['height'] for age_height_vector in age_height_vectors])
print(y)


        b  x
      ------
  0  |  1 18
  1  |  1 19
 10  |  1 28
 11  |  1 29
  2  |  1 20
  3  |  1 21
  4  |  1 22
  5  |  1 23
  6  |  1 24
  7  |  1 25
  8  |  1 26
  9  |  1 27


    0  1   10   11    2    3    4    5    6    7    8    9
----------------------------------------------------------
 76.1 77 82.8 83.5 78.1 78.2 78.8 79.7 79.9 81.1 81.2 81.8


In [44]:
estimated_params = QR_solve(A, y)
print(estimated_params)


    b     x
-----------
 64.9 0.635


In [45]:
residual = y - A*estimated_params
print('residual:', residual) # Looks pretty low!

residual: 
      0       1     10    11     2       3       4     5      6     7      8      9
-----------------------------------------------------------------------------------
 -0.258 0.00734 0.0927 0.158 0.472 -0.0626 -0.0976 0.167 -0.267 0.298 -0.237 -0.272


In [46]:
A.transpose() * residual # verify orthogonality

Vec({'b', 'x'},{'b': 9.947598300641403e-14, 'x': 2.2168933355715126e-12})

**Problem 9.11.16:** Try using the least-squares approach on the problem addressed in the machine learning lab. Compare the quality of the solution with that you obtained using gradient descent.

In [50]:
sys.path.append('../chapter_8')
from cancer_data import read_training_data

A, b = read_training_data('../chapter_8/train.data')

In [52]:
x = QR_solve(A, b)

In [53]:
# Redefine evaluation methods:

def signum(u):
    return Vec(u.D, {k: 1 if v >= 0 else -1 for k, v in u.f.items()})

def fraction_wrong(A, b, w):
    return 0.5 * (1 - (signum(A * w) * b) / len(b.D))

def loss(A, b, w):
    error = (A * w - b)
    return error * error

In [54]:
print('Final loss:\t', loss(A, b, x))
print('Final fraction wrong:\t', fraction_wrong(A, b, x))

Final loss:	 77.50051096407529
Final fraction wrong:	 0.04666666666666669


In [55]:
validation_A, validation_b = read_training_data('../chapter_8/validate.data')

print('Validation loss:\t', loss(validation_A, validation_b, x))
print('Validation fraction wrong:\t', fraction_wrong(validation_A, validation_b, x))

Validation loss:	 57.379426896553504
Validation fraction wrong:	 0.03076923076923077


The best results on the validation set for gradient descent after 2,000 iterations was:
```
Validation loss:	 152.62023375699934
Validation fraction wrong:	 0.04999999999999999
```

This approach does much better (unsurprisingly)!