## Introduction 
* LU better for actually solving linear systems 
* QR better for revealing information about the system of equation 
* if a matrix $Q$ is orthogonal, then it can only rotate/reflect a vector

### Applications of QR factorization
1. Solving least-squares 
2. Used in computing eigenvalues and singular values 
3. Used in iterative methods for solving systems of equations 

## Householder reflections

![alt text](qr-choices.png "Q can be square or rectangular")

* Method of choice for $Q$ square (i.e. I and III above)
* What operations must we perform on $A$ to recover $R$? The product of these elementary operations must be $Q^T$ i.e. $Q^TA=R$

### Householder reflections

* We begin with the first column of $R$. I.e. what matrix must we pre-multiply the first column of A by to retrive some multiple of $e_1$, the unit vector in the 1-direction
* Idea is to find a reflection that maps $a_1$ to $\pm\Vert a_1\Vert e_1$
* reflection is defined by a plane, which itself is defined by a normal vector

What is the formula for this reflection?
1. Project $x$ onto normal vector $v$: $y = \frac{v^T x}{v^T v} v$
2. Target is $x - 2 y = x - \frac{v^T x}{v^T v} v = (I - 2\frac{v v^T}{v^T v})x$
Hence the projection operator $P$ is $I - 2\frac{v v^T}{v^T v}$

Now how do we choose $v$ such that we reflect $x$ onto $\Vert x \Vert e_1$?
Want $ Px = \Vert x \Vert e_1 = x - \beta v v^T x$ where $\beta = 2/(v^T v)$. Rearranging we have $\beta (v^T x) v = x - \Vert x \Vert e_1$. Direction is all that matters so choose $\beta (v^T x) = 1$ implying $v = x - \Vert x \Vert e_1$

### Roundoff errors
* Should reflect onto -$e_1$ rather than $e_1$ to avoid large cancellation errors (i.e. large roundoff errors) seeing as nominally $v_1 = x_1 - \Vert x \Vert$

## Givens rotations

* Useful when you only need to make small changes to a matrix (editing just a few entries at a time) 

## Gram-Schmidt
* Used when we want a tall skinny $Q$ rather than a square one 
* Consider each column of $A$ and note that it lies in the span of the first $k$ columns of $Q$ (coefficients given by $k$-th column of R)
* For column 1 this means $a_1 = r_{11} q_1$. Now $q_1$ must be unit size so we take $q_1 = a_1/\Vert a_1 \Vert$ and $r_{11} = \Vert a_1 \Vert$ 
* Thus in general we have $a_k = r_{kk}q_k + \sum_{i=1}^{k-1} r_{ik}q_i$ and we don't know $r_{ik}$ (the $k$-th column vector of $R$) and $q_k$ at the $k$-th step.
* Because the columns of $Q$ are orthonormal though we know that for $i<k$ $r_{ik}$ is given by: $r_{ik} = q_{i}^T a_k$
* To find $r_{kk}$ we rearrange the above as: $z=r_{kk}q_k = a_k - \sum_{i=1}^{k-1} r_{ik}q_i$. $q_k$ is a unit vector so this means $r_{kk}=\Vert z \Vert$ and furthermore $q_k = z/r_{kk}$
* Key to making this algorithm stable is updating the entries of the $k$-th column of $A$ as the entries in $R$ are computed (rather than all at once at the end of the $k$-th step)

### Classical Gram-Schmidt

In [71]:
using LinearAlgebra
include("../src/geqrf.jl")

A = [1.0 2 3; 4 5 6; 7 8 9]; # simple example 
R = geqrfGS!(A)
print(A*R) # get A back 

[1.0 2.0 3.0; 4.0 5.0 6.0; 7.0 8.0 9.0]

In [72]:
using Printf

function pretty_printing(A)
    for i = 1:size(A,1)
        for j = 1:size(A,2)
            @printf("%10.2e ",A[i,j]) # 2 characters to be printed 
                # having 2 decimal places in scientific (e) notation
        end
        @printf("\n")
    end
end

pretty_printing (generic function with 1 method)

Unfortunately classical Gram-Schmidt is unstable, as shown by the following example: 

In [73]:
ϵ = sqrt(eps(Float64)); # sqrt of unit roundoff error 
A = [1 1 1; ϵ 0 0; 0 ϵ 0; 0 0 ϵ];
pretty_printing(A)

R = geqrfGS!(A)
Q = A
@show dot(Q[:,2],Q[:,3]) # columns of Q are not orthogonal!
@show norm(Q[:,1]);

  1.00e+00   1.00e+00   1.00e+00 
  1.49e-08   0.00e+00   0.00e+00 
  0.00e+00   1.49e-08   0.00e+00 
  0.00e+00   0.00e+00   1.49e-08 
dot(Q[:, 2], Q[:, 3]) = 0.4999999999999999
norm(Q[:, 1]) = 1.0


This is because the first column mathematically has a norm greater than 1 but the computer views it as having a norm of 1 due to roundoff errors. 

In [74]:
ϵ = 2*sqrt(eps(Float64)); # slightly larger than before 
A = [1 1 1; ϵ 0 0; 0 ϵ 0; 0 0 ϵ];
pretty_printing(A)

R = geqrfGS!(A)
Q = A
@show dot(Q[:,2],Q[:,3]) # columns of Q are orthogonal now!
@show norm(Q[:,1]);

  1.00e+00   1.00e+00   1.00e+00 
  2.98e-08   0.00e+00   0.00e+00 
  0.00e+00   2.98e-08   0.00e+00 
  0.00e+00   0.00e+00   2.98e-08 
dot(Q[:, 2], Q[:, 3]) = -4.996003610813204e-16
norm(Q[:, 1]) = 1.0


We can improve the stability of the algorithm by project the $j$-th column of A to be orthogonal to the prior columns of $Q$ *while* we are finding the entries of R in column $j$:

In [83]:
ϵ = sqrt(eps(Float64)); # sqrt of unit roundoff error 
A = [1 1 1; ϵ 0 0; 0 ϵ 0; 0 0 ϵ];
pretty_printing(A)

R = geqrfMGS!(A)
Q = A
@show dot(Q[:,2],Q[:,3]) # columns of Q are orthogonal now!
@show norm(Q[:,1]);

  1.00e+00   1.00e+00   1.00e+00 
  1.49e-08   0.00e+00   0.00e+00 
  0.00e+00   1.49e-08   0.00e+00 
  0.00e+00   0.00e+00   1.49e-08 
dot(Q[:, 2], Q[:, 3]) = 1.1102230246251565e-16
norm(Q[:, 1]) = 1.0


## Least-square problems 
* The case where there are more equations than unknown (so tall and skinny A)
* What $x$ makes $A x$ closest to $b$ 
* One way is *method of normal equations*: solve $A^T A x = A^T b$. 
* Not good if $A$ is ill-conditioned though since then condition number of $A^T A$ is twice that value

### QR method for least-square problems
* Idea is analagous to that for the method of normal equations except we note that $R(A) = R(Q)$ and thus start from $Q^T A x = Q^T b$ since we know $Q$ is well-conditioned.
* $A = QR$ so really this becomes $R x = Q^T b$

### Rank-deficient A

* In this case there are infinite number of solutions that satisfy the normal-equations.
* Instead what we do is use SVD to find the unique solution that has minimal 2-norm