# 18.S096 pset 2

Due Wednesday, January 24, 2pm.

# Problem 1: Backsubstitution

An [upper-triangular system of equations](https://en.wikipedia.org/wiki/Triangular_matrix) can be solved efficiently ($O(n^2)$ time) by *backsubstitution*: starting at the last row (= one equation in one unknown) and working upwards, eliminating one variable at a time.

Here is "textbook" code that implements backsubstition for an upper-triangular matrix `U` and a matrix or vector `B` of right-hand sides, i.e. solving $UX=B$ for $X$:

In [6]:
function backsolve(U::UpperTriangular, B::Union{AbstractMatrix,AbstractVector})
    m = LinAlg.checksquare(U)
    m == size(B,1) || throw(DimensionMismatch("matrix sizes don't match"))
    n = size(B, 2)
    X = similar(B, typeof(zero(eltype(B))/oneunit(eltype(U)))) # allocate result array
    for j = 1:n, i = m:-1:1
        s = zero(eltype(X))
        for k = i+1:m
            s += U[i,k] * X[k,j]
        end
        X[i,j] = (B[i,j] - s) / U[i,i]
    end
    return X
end

backsolve (generic function with 1 method)

Let's check that it gives the same answer as `\` in a couple of cases:

In [2]:
U = UpperTriangular(rand(3,3))

3×3 UpperTriangular{Float64,Array{Float64,2}}:
 0.567182  0.851563  0.0675331
  ⋅        0.664635  0.463687 
  ⋅         ⋅        0.771664 

In [3]:
x = rand(3)

3-element Array{Float64,1}:
 0.0172276
 0.094541 
 0.15564  

In [4]:
U \ x

3-element Array{Float64,1}:
 0.00405888
 0.00153189
 0.201694  

In [7]:
backsolve(U, x)

3-element Array{Float64,1}:
 0.00405888
 0.00153189
 0.201694  

In [9]:
B = rand(3,4)
U \ B ≈ backsolve(U, B)

true

But it is **vastly slower**, especially for a matrix of right-hand sides, for matrices of significant size:

In [10]:
using BenchmarkTools
U = UpperTriangular(rand(1000,1000))
b = rand(1000)
B = rand(1000,1000)
@btime backsolve($U,$b)
@btime (\)($U,$b)
@btime backsolve($U,$B)
@btime (\)($U,$B);

[1m[36mINFO: [39m[22m[36mRecompiling stale cache file /Users/stevenj/.julia/lib/v0.6/BenchmarkTools.ji for module BenchmarkTools.
[39m

  874.866 μs (1 allocation: 7.94 KiB)
  311.113 μs (2 allocations: 7.95 KiB)
  969.399 ms (2 allocations: 7.63 MiB)
  9.551 ms (3 allocations: 7.63 MiB)


3x slower for a vector, but 100x slower for an $m\times m$ matrix of right-hand sides, where backsubstitution is $O(m^3)$.

## Assignment: Speed up `backsolve`

* Things like `@inbounds` and `@simd` are obviously the first things to try.

* Think about a blocked or recursive/cache-oblivious algorithm.  If you divide the matrices into half-size blocks, can you write down a backsubstitution algorithm acting on the blocks?  For example, divide the matrices as $$B = UX = \begin{pmatrix} U_1 & U_2 \\ 0 & U_3 \end{pmatrix} \begin{pmatrix} X_1 \\ X_2 \end{pmatrix} = \begin{pmatrix} B_1 \\ B_2 \end{pmatrix}$$ and see if the blocked algorithm (backsolving recursively on the blocks, with some base case) speeds things up.

* Can you parallelize it, especially for a large matrix?  (i.e. solve for different columns of `X` in parallel.)