In [None]:
import numpy as np
import matplotlib.pyplot as plt

# We'll be using several functions from scipy.linalg here.
import scipy.linalg

# Solving Systems of Equations

Let's start with a simple example, where we want to solve the following equations simultaneously:

$$
3x_1 + 5x_2 + 2x_3 = 11\\
2x_1 - 3x_2 - 3x_3 = -1\\
x_1 + x_2 + x_3 = 2
$$

This would be equivalent to solving the matrix equation

$$
\mathbf{Mx}=\mathbf{b}
$$
where
$$
\mathbf{M} = \begin{pmatrix} 3 & 5 & 2 \\ 2 & -3 & -3 \\ 1 & 1 & 1 \end{pmatrix},\quad 
\mathbf{x} = \begin{pmatrix} x_1 \\ x_2 \\ x_3 \end{pmatrix}, \quad
\mathbf{b} = \begin{pmatrix} 11 \\ -1 \\ 2 \end{pmatrix}.
$$

In this case of a system with just three equations, this is not difficult to do by hand, but often we'll need to be able to solve systems with an arbitrarily large number of equations and variables.

More generally a system of $n$ equations of $n$ unknowns, such as

$$
m_{11} x_1 + m_{12} x_2 + \cdots + m_{1n}x_n = b_1 \\
m_{21} x_1 + m_{22} x_2 + \cdots + m_{2n}x_n = b_2 \\
\qquad \vdots \qquad \qquad \qquad \qquad \vdots  \\
m_{n1} x_1 + m_{n2} x_2 + \cdots + m_{nn}x_n = b_n
$$
would be equivalent to solving the matrix equation

$$
\mathbf{Mx}=\mathbf{b}
$$
where
$$
\mathbf{M} = \begin{pmatrix}m_{11} & m_{12} & \cdots & m_{1n} \\
                            m_{21} & m_{22} & \cdots & m_{2n} \\
                            \vdots & \vdots & \ddots & \vdots \\
                            m_{n1} & m_{n2} & \cdots & m_{nn}
         \end{pmatrix},\quad 
\mathbf{x} = \begin{pmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{pmatrix}, \quad
\mathbf{b} = \begin{pmatrix} b_1 \\ b_2 \\ \vdots \\ b_n \end{pmatrix}.
$$

So we need a systematic way to do this.

When might it not be possible to do this?

## Gaussian Elimination

The usual approach to solving this kind of linear system is to use an algorithm called Gaussian Elimination. This can be done with the following procedure

- Combine the solution vector with matrix of parameters to form the augmented matrix $\mathbf{\tilde{M}}$ as

$$ \mathbf{\tilde{M}} = \begin{pmatrix}m_{11} & m_{12} & \cdots & m_{1n} & b_1\\
                            m_{21} & m_{22} & \cdots & m_{2n} & b_2\\
                            \vdots & \vdots & \ddots & \vdots & \vdots\\
                            m_{n1} & m_{n2} & \cdots & m_{nn} & b_n \end{pmatrix} $$
                            
- Perform row reduction on this matrix such that $\mathbf{M}$ is transformed to an upper triangular form:

$$ \mathbf{\tilde{M}'} = \begin{pmatrix}m_{11}' & m_{12}' & \cdots & m_{1n}' & b_1'\\
                            0 & m_{22}' & \cdots & m_{2n}' & b_2'\\
                            \vdots & \vdots & \ddots & \vdots & \vdots\\
                            0 & 0 & \cdots & m_{nn}' & b_n' \end{pmatrix} $$

- Row reduction consists of the following operations, which are repeated as necessary.
    1. Multiply any row by a constant.
    2. Add any row to another row.
    3. Swap the order of any two rows.

Let's do this now for the simple example we gave above:

We start with
$$
\mathbf{\tilde{M}}_0 = \begin{pmatrix} 3 & 5 & 2 & 11\\ 2 & -3 & -3 & -1\\ 1 & 1 & 1 & 2\end{pmatrix}
$$

We want to have zeros below some diagonal elements, so let's use the following procedure:
- Given a diagonal element $m_{ii}$, to set element $m_{jk}$ to zero where $j>i$ (row) and $k<i$ (column), multiply row $i$ by $-m_{jk}/m_{ii}$ and add it to row $j$.

Here we want to have zeros below the 3 in the first row. In the second row the value in that spot is 2, so we can multiply the first row by -2/3 and add it to the second to give a zero in that position. And similarly, we can multiply the first row by -1/3 and add it to the third to get a zero in that position in the third row:
$$
\mathbf{\tilde{M}}_1 = \begin{pmatrix} 3 & 5 & 2 & 11\\ 0 & -19/3 & -13/3 & -25/3\\ 0 & -2/3 & 1/3 & -5/3\end{pmatrix}
$$

Now we want to get a zero in the second position in row 3 where we have -2/3, below the $-19/3$ in row 2. So as before we can multiply row 2 by $-2/3 \times 3/19 = -2/19 $ and add row 3 to it to give 
$$
\mathbf{\tilde{M}}_u = \begin{pmatrix} 3 & 5 & 2 & 11\\ 0 & -19/3 & -13/3 & -25/3\\ 0 & 0 & 15/19 & -15/19\end{pmatrix}
$$

Now we can stop as the matrix is in upper diagonal form, so we have completed the row reduction step.

- Back substitution is now used to generate the full solution.
- If we write this as a set of equations again, it's clear we can solve directly these one at a time starting from the last one and working backwards as we will have one unknown each time.

In the case of our example, we have the following set of equations:
$$
3 x_1 + 5 x_2 + 2 x_3 = 11\\
-19/3 x_2 + -13/3 x_3 = -25/3\\
15/19 x_3 = -15/19
$$

- Equation 3 tells us $x_3 = -1$.
- Then equation 2 becomes $-19/3 x_2 = -25/3 + 13/3 = 38/3$, so $x_2 = 2$.
- Then equation 1 becomes $3 x_1 = 11 - 10 + 2 = 3$, so $x_1 = 1$.


## LU Factorization

In the process of Gaussian Elimination, we have in effect multiplied the original matrix $\mathbf{M}_0$ by another matrix, which we'll call $\mathbf{N}$ for now, which transforms it to an upper triangular form $\mathbf{U}$

For example, in the first step with our example matrix:
$$\mathbf{M}_1 = \mathbf{N}_0 \mathbf{M}_0 =
\begin{pmatrix} 1 & 0 & 0\\ -2/3 & 1 & 0\\ -1/3 & 0 & 1\end{pmatrix}\cdot
\begin{pmatrix} 3 & 5 & 2\\ 2 & -3 & -3\\ 1 & 1 & 1\end{pmatrix} = 
\begin{pmatrix} 3 & 5 & 2\\ 0 & -19/3 & -13/3\\ 0 & -2/3 & 1/3\end{pmatrix}$$

The next step is:
$$\mathbf{M}_u = \mathbf{N}_1 \mathbf{M}_1 =
\begin{pmatrix} 1 & 0 & 0\\ 0 & 1 & 0\\ 0 & -2/19 & 1 \end{pmatrix}\cdot
\begin{pmatrix} 3 & 5 & 2\\ 0 & -19/3 & -13/3\\ 0 & -2/3 & 1/3\end{pmatrix} = 
\begin{pmatrix} 3 & 5 & 2\\ 0 & -19/3 & -13/3\\ 0 & 0 & 15/19\end{pmatrix}$$

If we get the product of our $\mathbf{N}$ matrices, we obtain the matrix that would transfrom $\mathbf{M}$ to an upper triangular form in a single step.
$$\mathbf{U} = \mathbf{N}_1 \mathbf{N}_0 \mathbf{M}_0$$
So clearly we can write
$$ \mathbf{M}_0 = \mathbf{N}_0^{-1} \mathbf{N}_1^{-1} \mathbf{U} = \mathbf{LU}$$

As the matrices $\mathbf{N}_0$ and $\mathbf{N}_1$ are fairly simple we can invert them by inspection.

$\mathbf{N}_1$ corresponds to multiplying the second row by $-19/2$ and adding it to the third, so the inverse would be to multiply the second row by $19/2$ and add it to the third.
$$\mathbf{N}_1^{-1} =
\begin{pmatrix} 1 & 0 & 0\\ 0 & 1 & 0\\ 0 & 2/19 & 1 \end{pmatrix}$$

$\mathbf{N}_0$ corresponds to multiplying the first row by $-2/3$ and adding it to the second, and multiplying the first row by $-1/3$ and adding it to the third, so the inverse would instead multiply by $2/3$ and $1/3$ so that
$$\mathbf{N}_0^{-1} =
\begin{pmatrix} 1 & 0 & 0\\ 2/3 & 1 & 0\\ 1/3 & 0 & 1 \end{pmatrix}$$

And the product of these is also easier as we can simply combine the non-zero off-diagonal elements to give us
$$\mathbf{L} =
\begin{pmatrix} 1 & 0 & 0\\ 2/3 & 1 & 0\\ 1/3 & 2/19 & 1 \end{pmatrix}$$
which as we can see is lower diagonal.

So altogether we have
$$\begin{pmatrix} 3 & 5 & 2\\ 2 & -3 & -3\\ 1 & 1 & 1\end{pmatrix} =
\begin{pmatrix} 1 & 0 & 0\\ 2/3 & 1 & 0\\ 1/3 & 2/19 & 1\end{pmatrix} \cdot 
\begin{pmatrix} 3 & 5 & 2\\ 0 & -19/3 & -13/3\\ 0 & 0 & 15/19\end{pmatrix}$$

The LU decomposition is most useful for when we want to solve $\mathbf{Mx}=\mathbf{b}$ for many different vectors $\mathbf{b}$.

In this situation we can write
$$\mathbf{Mx}=\mathbf{LUx}=\mathbf{L}(\mathbf{Ux})=\mathbf{b}$$

From this we first set $\mathbf{Ux}=\mathbf{y}$ and solve $\mathbf{Ly}=\mathbf{b}$ by forward subtitution. Then use $\mathbf{y}$ to solve for $\mathbf{x}$ in $\mathbf{Ux}=\mathbf{y}$ by back substitution. If we already know $\mathbf{L}$ and $\mathbf{U}$ this takes far fewer operations than performing Gaussian elimination. This procedure, using LU decomposition to solve a linear system where we obtain $\mathbf{L}$ with 1s along the diagonal, is sometimes referred to as Doolittle's method.

Let's look at our first example again, but say our solution vector is now 
$$\mathbf{b} = \begin{pmatrix}2\\-1\\2 \end{pmatrix}.$$
We can first solve $\mathbf{Ly}=\mathbf{b}$:
$$\begin{pmatrix} 1 & 0 & 0\\ 2/3 & 1 & 0\\ 1/3 & 2/19 & 1 \end{pmatrix} \begin{pmatrix} y_1\\y_2\\y_3 \end{pmatrix}
=\begin{pmatrix}2\\-1\\2 \end{pmatrix} $$

Which gives us $y_1 = 2$, $y_2 = -1 - 4/3 = -7/3$, $y_3 = 2 -2/3 +(2/19)(7/3)=30/19$.

And now we solve $\mathbf{Ux}=\mathbf{y}$:
$$\begin{pmatrix}3 & 5 & 2\\ 0 & -19/3 & -13/3\\ 0 & 0 & 15/19 \end{pmatrix} \begin{pmatrix} x_1\\x_2\\x_3 \end{pmatrix} =\begin{pmatrix}2\\-7/3\\30/19 \end{pmatrix} $$

Which gives us:
- $x_3 = 2$
- $x_2 = (-7/3 + 26/3)(-3/19) = -1$
- $x_1 = (2+5-4)/3=1$.

## Partial Pivoting

In our Gaussian elimination scheme as outlined above, the diagonal element $m_{ii}$ is called the *pivot*. We multiplied the elements of a row by a constant inversely proportional to the pivot. Clearly, there will be an issue if the pivot is zero; in this case we would need to swap the row with another.

Less obviously, we will also have an issue if the pivot is small relative to other elements of the matrix. This would make the constant we multiply the row by very large, and will lead to us summing numbers of very different magnitudes, leading to a loss of precision in our code (see e.g. [here](https://gitlab.com/eamonn.murray/IntroToScientificComputing/tree/master/systems#real-numbers-floating-point-format).)

To alleviate this, we use what is known as partial pivoting, where we swap rows such that we place the element with largest absolute value at the pivot.

Say we want to calculate the LU decomposition of the following matrix $\mathbf{M}$

$$\mathbf{M_0} = \begin{pmatrix} 1 & -1 & 2 \\ -4 & 4 & 1 \\ -2 & 4 & 3 \end{pmatrix}$$

First we look at the first column, where the row 2 has the element with largest absolute value, so we swap row 2 and row 1.

$$\mathbf{M_0}' = \begin{pmatrix} -4 & 4 & 1 \\ 1 & -1 & 2 \\ -2 & 4 & 3 \end{pmatrix} = \mathbf{P_{12}M_0}$$
where $\mathbf{P_{12}}$ is the permutation matrix that swaps row 1 and row 2:
$$\mathbf{P_{12}} = \begin{pmatrix} 0 & 1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 1 \end{pmatrix}$$.

After the first step of Gaussian elimination we have now
$$ \mathbf{M_1} = \begin{pmatrix} -4 & 4 & 1 \\ 0 & 0 & 9/4 \\ 0 & 2 & 5/2 \end{pmatrix} = \mathbf{N_0 M_0'}$$

where
$$ \mathbf{N_0} = \begin{pmatrix} 1 & 0 & 0 \\ 1/4 & 1 & 0 \\ -1/2 & 0 & 1 \end{pmatrix} $$

Now we want go the second diagonal element, so of the remaining rows, row 3 has the largest value element in the second column, and the value in the second row is zero in any case so we would not be able to continue without a permutation. So we can multiply with the matrix $\mathbf{P_{23}}$ which swaps rows 2 and 3, and we'll have obtained an upper diagonal matrix.

$$\mathbf{P_{23}} = \begin{pmatrix} 1 & 0 & 0 \\ 0 & 0 & 1 \\ 0 & 1 & 0 \end{pmatrix}$$
and
$$\mathbf{P_{23}M_1} = \mathbf{U} = \begin{pmatrix} -4 & 4 & 1 \\ 0 & 2 & 5/2 \\ 0 & 0 & 9/4 \end{pmatrix}$$

So combining the various operations we have
$$ \mathbf{P_{23}N_0P_{12}M_0} = \mathbf{U}.$$

Permutation matrices have the useful property that they are their own inverse, so we can write
$$ \mathbf{P_{23}N_0P_{23}^{-1}P_{23}P_{12}M_0} = (\mathbf{P_{23}N_0P_{23})P_{23}P_{12}M_0} = \mathbf{U}$$

Pre-multiplying by a permutation vector exchanges rows, while post-multiplying exchanges columns, so
$$\mathbf{P_{23}N_0P_{23}} = \begin{pmatrix} 1 & 0 & 0 \\ -1/2 & 1 & 0 \\ 1/4 & 0 & 1 \end{pmatrix}$$

And we can write our factorization as
$$ \mathbf{P_{23}P_{12}M_0} = (\mathbf{P_{23}N_0P_{23}})^{-1}\mathbf{U} = \mathbf{P M} = \mathbf{LU}$$

## A Python Implementation

That all likely seems pretty complicated. Let's try writing Python/NumPy implementation of this, and hopefully you'll see it's not so bad:

In [None]:
def pivot_matrix(M):
    """Return the permuted matrix PM and pivot matrix P for M."""
    
    # Start from the identity matrix
    P = np.identity(len(M))
    # PM will be calculated at the same time, starting from M
    PM = M.copy()

    # Rearrange the identity matrix such that the largest element of
    # each column of M is placed on the diagonal of M
    for i in range(len(M)):
        max_row = i + np.argmax(np.absolute(PM[i:, i]))
        if i != max_row:
            # Swap the rows
            P[i], P[max_row] = P[max_row], P[i].copy()
            PM[i], PM[max_row] = PM[max_row], PM[i].copy()

    return PM, P

def lu_decomp(M):
    """Find the LU decomposition of square matrix M into PM = LU.
    
    The function returns P, L and U.
    """

    # Initialize L as the identity and U with zeros
    L = np.identity(len(M))
    U = np.zeros((len(M), len(M)))
    
    # Create the pivot matrix P
    PM, P = pivot_matrix(M)

    # Perform the LU decomposition on the permuted matrix
    for j in range(len(M)):

        # LaTeX: u_{ij} = m_{ij} - \sum_{k=1}^{i-1} u_{kj} l_{ik}
        for i in range(j+1):
            s1 = sum(U[k, j] * L[i, k] for k in range(i))
            U[i, j] = PM[i, j] - s1

        # LaTeX: l_{ij} = \frac{1}{u_{jj}} (m_{ij} - \sum_{k=1}^{j-1} u_{kj} l_{ik} )
        for i in range(j, len(M)):
            s2 = sum(U[k, j] * L[i, k] for k in range(j))
            L[i, j] = (PM[i, j] - s2) / U[j, j]

    return (P, L, U)

In [None]:
# This is the matrix used in the pivot example above
M1 = np.array([[1, -1, 2], [-4, 4, 1], [-2, 4, 3]])

P1, L1, U1 = lu_decomp(M1)
print("P =", P1, "\n\nL =", L1, "\n\nU =", U1)

In [None]:
%timeit lu_decomp(M1)

### Doing this with SciPy

The [linalg](https://docs.scipy.org/doc/scipy/reference/linalg.html) module in SciPy has many useful linear algebra routines. These call Fortran LAPACK library routines and are significantly faster than a native Python implementation.

To perform an LU factorization, you can use the [`lu_factor`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.lu_factor.html) function.

The output of this can then be passed to the [`lu_solve`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.lu_solve.html) function to use this factorization with a solution vector.

Note there is also the [`lu`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.lu.html) function that returns the factorization in a more usable format, if you want to do something with the factorization yourself rather than simply use it with the `lu_solve` function.

In [None]:
M1 = np.array([[1, -1, 2], [-4, 4, 1], [-2, 4, 3]])
LU1, P1 = scipy.linalg.lu_factor(M1)
print("LU =", LU1, "\n\n P indices = ", P1)

In [None]:
%timeit scipy.linalg.lu_factor(M1)

In [None]:
M1 = np.array([[3, 5, 2], [2, -3, -3], [1, 1, 1]])

LU1, P1 = scipy.linalg.lu_factor(M1)

b1 = np.array([11, -1, 2])
print(scipy.linalg.lu_solve((LU1, P1), b1))

b2 = np.array([2, -1, 2])
print(scipy.linalg.lu_solve((LU1, P1), b2))

There is also the [`scipylinalg.solve`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.solve.html) function, and the [`numpy.linalg.solve`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.solve.html) function, which also use LU decomposition to solve a linear system. The SciPy version offers many more options that allow you to choose more optimal implementation when solving for e.g. symmetric or hermitian matrices.

These functions do not, however, return the factorized matrices. This makes them more suitable for cases where you want to solve for a single solution vector $\mathbf{b}$.

In [None]:
print(np.linalg.solve(M1,b1))
print(scipy.linalg.solve(M1,b1))
print()
print(np.linalg.solve(M1,b2))
print(scipy.linalg.solve(M1,b2))

In [None]:
%timeit np.linalg.solve(M1, b1)
%timeit scipy.linalg.solve(M1, b1)

For small matrices, the overhead associated with processing the additional options in the SciPy implementation may make it somewhat slower than the NumPy equivalent. By default, both routines use the same LAPACK library routine to solve the system, so will take roughly the same size for larger matrices. We can confirm this by testing with large random matrices (which hopefully have a solution).

In [None]:
dimrand = 2000
Mrand = np.random.rand(dimrand, dimrand)
brand = np.random.rand(dimrand)
%timeit np.linalg.solve(Mrand, brand)
%timeit scipy.linalg.solve(Mrand, brand)

## Scaling

If you examine our Python implementation above, you'll see we have a triple nested loop. This means we likely have $O(n^3)$ scaling. Let's use `%timeit -o` to generate output we can save and try to see this ourselves. This generates a "TimeitResult" object which has few methods available for examining the output. We'll save the average of each run (these are in seconds).

In [None]:
solve_timing = []
size = np.arange(10, 1000, 100)
for s in size:
    Mrand = np.random.rand(s, s)
    brand = np.random.rand(s)
    timer_out = %timeit -o scipy.linalg.solve(Mrand, brand)
    solve_timing.append(timer_out.average)

In [None]:
# Set a scaling so the final points match up
scale = solve_timing[-1] / size[-1]**3
# Plot our timing results
plt.plot(size, np.array(solve_timing), "bo")
# And plot the scaling factor times size^3
plt.plot(size, scale * size**3, 'r-')
plt.xlabel("Matrix Size")
plt.ylabel("Time for solution (s)")
plt.show()

## Banded Matrices

One class of problem that comes up in many other methods is the banded matrix. 
Banded matrices only have non-zero elements along a diagonal band, consisting of the main diagonal and zero or more diagonals on either side. 
Let's denote the number of non-zero lower and upper diagonals as $k_u$ and $k_l$ respectively.
- A matrix with $k_l = k_u = 0$ is a diagonal matrix.
- An $n\times n$ matrix with $k_l = 0$ and $k_u = n-1$ is an upper triangular matrix.
- A matrix with $k_l = k_u = 1$ is called a tridiagonal matrix. This is a matrix where the only non-zero elements are along the main diagonal, and along the diagonals above and below it such as the following.

$$ \begin{pmatrix} 
m_{11} & m_{12} & 0 & 0 & \cdots & 0 \\
m_{21} & m_{22} & m_{23} & 0 & \cdots & 0 \\
0 & m_{32} & m_{33} & m_{34} & \cdots & 0 \\
\vdots & \ddots & \ddots & \ddots & \ddots & \vdots \\
0 & \cdots & \dots & m_{N-1, N-2} & m_{N-1, N-1} & m_{N-1, N} \\
0 & \cdots & \dots & 0 & m_{N, N-1} & m_{N, N}
\end{pmatrix} $$

For example, any problem with a 1D system where we approximate some piece of it as only interacting with its left and right neighbours can usually be cast as the solution of a tridiagonal matrix problem. (And by extension if we also included interactions with next-nearest neighbours it would be a banded matrix with $k_l = k_u = 2$.

For these types of matrices there are better approaches we can use than doing e.g. a full LU factorisation using Gaussian elimination. If the matrix is diagonal, we can obtain the solutions directly. The usual method used for tridiagonal matrices is known as the Thomas algorithm, which is composed of a forward sweep which converts the matrix to upper triangular in a single pass, and is followed by backward substitution sweep to produce the result.

When working with a banded matrix, it is wasteful to store that full matrix when most elements are 0. For example, if you're working with a tridiagonal matrix, it is better to create 3 arrays, 1 for the diagonal elements and 2 for the off-diagonal elements.

### Banded matrices in Python

The [SciPy Linear algebra submodule](https://docs.scipy.org/doc/scipy/reference/linalg.html) contains the function [`solve_banded`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.solve_banded.html) that can be used to efficiently find the solution of a banded matrix as described above. This is, as before, in fact an interface to a Lapack (fortran linear algebra library) function so will be quite fast.

In [None]:
help(scipy.linalg.solve_banded)

As you can see from the help output, using this is a little more involved than in the previous cases where we could just pass our matrix and solution vector as arguments. This is because, as mentioned above, it's wasteful to store the full matrix if most elements are zero, so we need to ensure we're giving our matrix data in the correct format.

Let's go through an example and hopefully this will become clear.

Say we have a 10x10 tridiagonal matrix with $2$ along the diagonal ($m_{i,i} = 2$ for $0\le i\le 9$) and $-1$ along both subdiagonals ($m_{i, i+1} = m_{i+1, i} = - 1$ for $0 \le i \le 8$). And let's set the elements of solution vector $b$ to be all zero, except $b_4=b_5=-1$. Now we need to generate a set of appropriate arguments for the `solve_banded` function:
- `(l, u)` will be `(1, 1)` since these correspond to our $k_l$ and $k_u$ above which are both 1 for a tridiagonal matrix.
- `ab` stores our tridiagonal matrix in a compact form. For our 10x10 tridiagonal matrix this will be a 3x10 array, as follows, where a $*$ indicates an entry that won't be used.

$$
\begin{pmatrix}
 * & -1 & -1 & -1 & -1 & -1 & -1 & -1 & -1 & -1 \\
 2 &  2 &  2 &  2 &  2 &  2 &  2 &  2 &  2 &  2 \\
-1 & -1 & -1 & -1 & -1 & -1 & -1 & -1 & -1 &  * \end{pmatrix}
$$

- `b` stores our solution vector as before.

Now let's write a write the code to solve this system. 

In [None]:
m_packed = np.empty((3, 10)) # Initialize our 3x10 array
m_packed[0] = -1 # We can just set the whole row to -1.
m_packed[1] = 2 # Set row that stores the diagonals.
m_packed[2] = -1 # Again we can just set the whole row to -1.
# Let's output this to make sure it looks correct.
print("m_packed = \n", m_packed)

b = np.zeros(10) # And similarly for b, we'll intitialize it to zeros.
# And now manually set the non-zero elements
b[4] = -1
b[5] = -1

print("Solution =", scipy.linalg.solve_banded((1,1), m_packed, b))

In [None]:
# We can compare this to the solution we get from the full matrix:

# We first create arrays for the diagonal and subdiagonals
m_diag = np.full(10, 2)
m_subdiag = np.full(9, -1)

# Now We'll use np.diag to convert these into full matrices and add them.
# This makes constructing this kind of matrix a little easier.
m_full = np.diag(m_diag, 0) + np.diag(m_subdiag, -1) + np.diag(m_subdiag, 1)
# Let's output this to make sure it looks correct.
print("m_full = \n", m_full)

# We can use b as it is already.
print("Solution =", np.linalg.solve(m_full, b))

We've already seen that the full solver scales as $O(n^3)$. What about this banded matrix solver?

Let's test the scaling for random tridiagonal matrices of increasing size in the same way we did earlier.

In [None]:
bsolve_timing = []
# Note we've scaled up our sizes by a factor of 10 compared to earlier
bsize = np.arange(100, 10001, 1000)
for s in bsize:
    Mrand = np.random.rand(3, s)
    brand = np.random.rand(s)
    timer_out = %timeit -o scipy.linalg.solve_banded((1, 1), Mrand, brand)
    bsolve_timing.append(timer_out.average)

In [None]:
plt.plot(bsize, np.array(bsolve_timing), "bo")
plt.xlabel("Matrix Size")
plt.ylabel("Time for solution (s)")
plt.show()

It's clear that a tridiagonal matrix can be solved significantly faster. The tridiagonal matrix with nominally 10,000x10,000 elements (29,998 non-zero elements) is solved almost three orders of magnitude faster than a 1,000x1,000 regular matrix. And the tridiagonal matrix solver scales linearly, which the full matrix solution goes like $n^3$.

If we were solving some matrix equation as part of a simulation of some system, we would be able to investigate significantly larger systems if we can use a banded matrix. In a system involving some set of interacting parts, this might involve truncating the interaction so that only neighbouring parts interact with each other.

In [None]:
# Timing for large matrix and large tridiagonal matrix with similar numbers of non-zero elements
Mrand = np.random.rand(1000, 1000)
brand = np.random.rand(1000)
%timeit scipy.linalg.solve(Mrand, brand)

Mrand = np.random.rand(3, 333334)
brand = np.random.rand(333334)
%timeit scipy.linalg.solve_banded((1, 1), Mrand, brand)