In [1]:
using DrWatson;
@quickactivate "NumericalAnalysis"

# Motivation for Solving Linear Systems

We now begin our study of numerical methods for linear systems. Recall that a system of linear equations (also called a linear system) with $m$ equations and $n$ unknowns takes the form

$$
\begin{align*}
a_{11}x_{1} + a_{12}x_{2} + \cdots + a_{1n}x_{n} &= b_{1} \\
a_{21}x_{1} + a_{22}x_{2} + \cdots + a_{2n}x_{n} &= b_{2} \\
                                                 &\vdots \\
a_{m1}x_{1} + a_{m2}x_{2} + \cdots + a_{mn}x_{n} &= b_{m}
\end{align*}
$$

The $mn$ values $a_{ij}$ are the given coefficients, the $n$ values $x_{i}$ are the unknowns, and the $m$ values $b_{j}$ are the specified right hand side values. 

It is convenient to write a linear system in matrix vector notation such as

$$Ax = b,$$

where $A$ is a coefficient matrix, $x$ and $b$ are column vectors, and $Ax$ means matrix multiplication. To solve a system of linear equations means to find a vector $x$ such that for a given matrix $A$ and vector $b$ the equation $Ax=b$ is satisfied. Solving linear systems is a reoccurring theme in applied mathematics and science. 

> It has been estimated that the solution of a linear system of equations enters in at some stage in about 75 percent of all scientific problems. Dahlquist and Björck

You should take note that a large portion of a linear algebra course is devoted to the derivation of general criteria for establishing when a linear system $Ax=b$ has a solution, and in that case when the solution is unique. For example, you can view $A$ is a linear transformation on a vector space and then examine the range and nullspace of $A$. The point here is that you are already armed with powerful theoretical tools for studying the well-posedness of the abstract mathematical problem of solving $Ax=b$. This course considers practical algorithms for obtaining a numerical solution to linear systems.    

You may have two questions: 

1. Why does the problem of solving systems of linear equations come up so frequently?
2. Why do we need to learn new methods (new as in ones not already covered in a linear algebra course) for solving such problems? 

We will provide partial answers to these questions now, but as we proceed through the course you will learn much more regarding answers to these questions, especially the second one. 

As you learn in a linear algebra course, the concept of linearity is abstract and general. Any finite-dimensional linear problem will have a matrix representation and thus can be reduced to solving a system of linear equations. Even infinite-dimensional linear problems may be well-approximated by finite-dimensional problems, typically via some type of discretization (which of course will result in errors that we need to assess and control for). For example, the famous [heat equation](https://en.wikipedia.org/wiki/Heat_equation) is a linear partial differential equation (PDE) that models the heat flow over time and throughout some spatial domain. The corresponding steady-state problem for the heat equation tells one what the temperature distribution is over the spatial domain as time goes to infinity. The solution to the steady-state problem for the heat equation involves solving a so-called [elliptic PDE](https://en.wikipedia.org/wiki/Elliptic_partial_differential_equation) that is linear. Discretizing a linear elliptic PDE results in a (typically large, like $10000 \times 10000$ or larger) square system of linear equations. A really great video to illustrate aspects of what we are talking about here is [this presentation](https://www.youtube.com/watch?v=rRCGNvMdLEY&t=29s) by the former NFL player [John Urschel](https://en.wikipedia.org/wiki/John_Urschel). 

As further motivation, the famous [Schrödinger equation](https://en.wikipedia.org/wiki/Schr%C3%B6dinger_equation#:~:text=The%20Schr%C3%B6dinger%20equation%20is%20a,of%20a%20quantum%2Dmechanical%20system.&text=Those%20two%20parameters%20are%20sufficient,Newton's%20law%20is%20Schr%C3%B6dinger's%20equation.) from [quantum mechanics](https://en.wikipedia.org/wiki/Quantum_mechanics) is another example of a linear PDE. Even when a problem is nonlinear, a key step in solving it is to somehow linearize the problem with the hope that the solution to the linearized problem will provide at least a good approximation to the solution of the original nonlinear one, in fact this often turns out to be the case. We will say a little more about this later, but data science, machine learning, and deep learning in particular are also major sources of problems whose solutions involve solving a system of linear equations at one step or another. See the recent book [Linear Algebra and Learning from Data](https://math.mit.edu/~gs/learningfromdata/) for more on the relation between computational linear algebra and data science.  

A short answer to question 2 is, as we have already indicated, conditioning and stability matter. Furthermore, when solving large systems of linear equations, efficiency matters and it may even be important to take into account how much memory is required in order to store the problem data on a computer. 

Hopefully we have convinced you of the relevance of solving linear systems and the need to do so using numerical algorithms. Thus we begin our brief tour through the vast world of numerical linear algebra. To learn more or get a sense of the vastness of this field, see [Matrix Computations](https://jhupbooks.press.jhu.edu/title/matrix-computations) by Golub and Van Loan.  

# Square Linear Systems

We will begin by studying the problem of solving $Ax=b$ when $A$ is $n\times n$ so that $A$ has the same number of rows and columns. In this case the linear system is said to be a square linear system. Written out in full, such a system looks like

$$
\begin{align*}
a_{11}x_{1} + a_{12}x_{2} + \cdots + a_{1n}x_{n} &= b_{1} \\
a_{21}x_{1} + a_{22}x_{2} + \cdots + a_{2n}x_{n} &= b_{2} \\
                                                 &\vdots \\
a_{n1}x_{1} + a_{n2}x_{2} + \cdots + a_{nn}x_{n} &= b_{n}
\end{align*}
$$

or

$$
\left[\begin{array}{cccc} a_{11} & a_{12} & \cdots & a_{1n}\\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \cdots & \ddots & \vdots \\ a_{n1} & a_{n2} & \cdots & a_{nn} \end{array}\right] \left[\begin{array}{c} x_{1} \\ x_{2} \\ \vdots \\ x_{n} \end{array} \right] = \left[\begin{array}{c} b_{1} \\ b_{2} \\ \vdots \\ b_{n} \end{array} \right]
$$

It is advisable to review at this time any linear algebra for which you might be a bit rusty. As a place to start, [this video](https://www.youtube.com/watch?v=bRM3zrzZYg8&list=PLvUvOH0OYx3BcZivtXMIwP6hKoYv0YvGn&index=5) together with Appendix A from the textbook is recommnded.  

Julia provides many powerful tools for solving systems of linear equations. In particular, the [LinearAlgebra.jl package](https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/) makes matrix computations fast and easy. For example, suppose we want to solve the linear system

$$
\left[\begin{array}{cccc} -1.0 & 3.2 & 2.5 & -4.0\\ 1.5 & -3.0 & 7.0 & 2.1 \\ -3.2 & -9.8 & 7.5 & 13.0 \\ -1.0 & 0.0 & 6.9 & -10.0 \end{array}\right] \left[\begin{array}{c} x_{1} \\ x_{2} \\ x_{3} \\ x_{4} \end{array} \right] = \left[\begin{array}{c} -1.0 \\ 1.0 \\ 0.0 \\ -1.0 \end{array} \right]
$$

We can easily do this in Julia using the **backslash operator** as follows.

In [2]:
using LinearAlgebra # load the LinearAlgebra.jl package

In [3]:
A = [-1.0 3.2 2.5 -4.0;1.5 -3.0 7.0 2.1; -3.2 -9.8 7.5 13.0; -1.0 0.0 6.9 -10.0] # define A
b = [-1.0,1.0,0.0,-1.0] # define right hand side vector b
x = A\b # apply backslash

4-element Vector{Float64}:
  0.46227392289927544
 -0.1019010408349391
 -0.013260324659117384
  0.04462298369528145

We have obtained a solution which can be checked by computing $Ax$:

In [4]:
A*x # compute matrix/vector multiplication Ax 

4-element Vector{Float64}:
 -0.9999999999999999
  0.9999999999999998
  2.220446049250313e-16
 -0.9999999999999999

The vector $b-Ax$ is called the **residual** and should be $0$ when $x$ is a solution. Let's compute the residual for our example system:

In [5]:
b - A*x # compute the residual vector

4-element Vector{Float64}:
 -1.1102230246251565e-16
  2.220446049250313e-16
 -2.220446049250313e-16
 -1.1102230246251565e-16

We obtain a residual that is very close to (in a sense that we will later make precise) but not actually equal to the zero vector. Here is what we now need to discuss:

1. What is backslash doing? That is, what is the underlying algorithm(s)? 
2. There is obviously error, how do we assess and control for this error? 

You can start to get some insight into backslash by referencing the help documentation.

In [6]:
?\ # help documentation on backslash

search: [0m[1m\[22m



```
\(x, y)
```

Left division operator: multiplication of `y` by the inverse of `x` on the left. Gives floating-point results for integer arguments.

# Examples

```jldoctest
julia> 3 \ 6
2.0

julia> inv(3) * 6
2.0

julia> A = [4 3; 2 1]; x = [5, 6];

julia> A \ x
2-element Vector{Float64}:
  6.5
 -7.0

julia> inv(A) * x
2-element Vector{Float64}:
  6.5
 -7.0
```

---

```
\(A, B)
```

Matrix division using a polyalgorithm. For input matrices `A` and `B`, the result `X` is such that `A*X == B` when `A` is square. The solver that is used depends upon the structure of `A`.  If `A` is upper or lower triangular (or diagonal), no factorization of `A` is required and the system is solved with either forward or backward substitution. For non-triangular square matrices, an LU factorization is used.

For rectangular `A` the result is the minimum-norm least squares solution computed by a pivoted QR factorization of `A` and a rank estimate of `A` based on the R factor.

When `A` is sparse, a similar polyalgorithm is used. For indefinite matrices, the `LDLt` factorization does not use pivoting during the numerical factorization and therefore the procedure can fail even for invertible matrices.

# Examples

```jldoctest
julia> A = [1 0; 1 -2]; B = [32; -4];

julia> X = A \ B
2-element Vector{Float64}:
 32.0
 18.0

julia> A * X == B
true
```

---

```
(\)(F::QRSparse, B::StridedVecOrMat)
```

Solve the least squares problem $\min\|Ax - b\|^2$ or the linear system of equations $Ax=b$ when `F` is the sparse QR factorization of $A$. A basic solution is returned when the problem is underdetermined.

# Examples

```jldoctest
julia> A = sparse([1,2,4], [1,1,1], [1.0,1.0,1.0], 4, 2)
4×2 SparseMatrixCSC{Float64, Int64} with 3 stored entries:
 1.0   ⋅
 1.0   ⋅
  ⋅    ⋅
 1.0   ⋅

julia> qr(A)\fill(1.0, 4)
2-element Vector{Float64}:
 1.0
 0.0
```


We highlight the following information from the backslash help file:

> For input matrices A and B, the result X is such that A*X == B when A is square. The solver that is used depends upon the structure of A. If A is upper or lower triangular (or diagonal), no factorization of A is required and the system is solved with either forward or backward substitution. For non-triangular square matrices, an LU factorization is used.

What is this about forward and backward substitution and $LU$ factorization? This is what we will take up in the next lecture. For now, let's foreshadow what is to come.

## Foreshadowing

Given a matrix $A$, a matrix factorization of $A$ is simply a finite sequence of matrices that multiply together to equal $A$. That is, a matrix factorization of $A$ is a sequence, $A_{1}$, $A_{2}$, $\ldots$, $A_{N}$, of matrices that satisfy 

$$A_{1}A_{2}\cdots A_{N} = A.$$

You've already been exposed to matrix factorizations although perhaps not explicitly so. In linear algebra you performed row operations to put a matrix into reduced row echelon form. It is a fact that row operations can be performed by multiplying by special matrices that we will call row operation matrices. Thus, row reduction is the same as matrix factorization. This already suggests that matrix factorizations might be helpful in solving linear systems, a fact that is indeed true. Let's flesh this out a little more. 

The solution to $Ax=b$ is $x=A^{-1}b$, where $A^{-1}$ is the matrix inverse of $A$, provided the inverse $A^{-1}$ exists. In general it is a lot of work to invert a matrix. However, there are some classes of matrices that are relatively easy to invert. Now suppose that we can factorize a matrix $A$ as $A=LU$ where it is the fortunate case that the matrix factors $L$ and $U$ are easy to invert. Then the system $Ax=b$ is equivalent to the system $LUx=b$. Now we can split $LUx = b$ into two subsystems:

1. $Ly=b$ which has solution $y=L^{-1}b$, and
  
2. $Ux=y$ which has solution $x=U^{-1}y=U^{-1}L^{-1}b$.

Now the $x$ we have just found is the solution to the original system $Ax=b$. 

What we will do next is, for square matrices $A$, derive a matrix factorization $A=LU$ where $L$ and $U$ are relatively easy to invert (althoug in practice we don't actually compute the inverse). This will actually turn out to be equivalent to Gaussian elimination that you are already familiar with. 

As an exercise, can you think of a matrix that has a structure such that the corresponding linear system if relatively easy to solve? 