# Linear Algebra

TODO

A lot of the time, it's either easier or more convinient to express the math for not only neural networks but most statistical models and algorithms via linear algebra (sometimes called matrix algebra). Similarly, there are many computational advantages to using algorithms for matrices rather than for loops in terms of speed, so there are real benefits to being familiar with linear algebra.

This section of the guide won't cover too much linear algebra - just the parts you need to know to understand neural networks and their estimation. Nonetheless, I'd always recommend further pursuit of linear algebra as it served me well during my PhD research on many occasions.

## Some definitions

A *matrix* is an array of numbers. Here are some:

$$ \boldsymbol{A} = \begin{bmatrix} 2 & 4 \\ 5 & 1 \end{bmatrix} \\[5pt]
\boldsymbol{B} = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix} \\[5pt]
\boldsymbol{C} = \begin{bmatrix} 1 & 2 \\ 3 & 4 \\ 5 & 6 \end{bmatrix} $$

A matrix has $n$ rows and $m$ columns. We write this as $n \times m$ and call this the *order* of the matrix. If $n=m$, we call the matrix a *square matrix*.

If one of these values is 1, the matrix is a vector. We can have row vectors:

$$ \boldsymbol{v} = \begin{bmatrix} 1 & 2 & 3\end{bmatrix} $$

And we can have column vectors:

$$ \boldsymbol{u} = \begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix} $$

A quick notational convention: vectors and matrices are usually written in bold. Matrices are usually upper case, while vectors are usually lower case. We don't have to write row vectors with the $'$ symbol, but doing so is helpful to make clear the orientation of the vector (more on this symbol later on below).

## The Dot Product

Before we talk about matrices, it's useful for a moment to talk about the dot product of two vectors. Let's take our $\boldsymbol{u}$ and $\boldsymbol{v}$ vectors from above, and take their dot product:

$$ \boldsymbol{u} \cdot \boldsymbol{v} $$

```{note}
For now, we are ignoring whether these are row or column vectors. We'll be returning to the question of how to treat row vs column vectors later on.
```

The output of the dot product is given by:

$$ \boldsymbol{u} \cdot \boldsymbol{v} = 1 \times 1 + 2 \times 2 + 3 \times 3 = 14 $$

There is an important rule for the dot product of two vectors: they must both be of the same length. We can write a general formula for the dot product of two $n$-length vectors as:

$$ \boldsymbol{a} \cdot \boldsymbol{b} = \sum^n_{i=1} a_i \times b_i $$

We'll now turn to matrices, where we'll quickly find that the dot product is very useful.

## A Motivating Example

So why is it useful to organise our data in matrices and vectors? Consider the following system of linear equations:

b1 = 2;  b2 = 3;  b3 = 5;

$$ 4\beta_1 + 8\beta_2 + 3\beta_3 = 47 $$
$$ 2\beta_1 + 3\beta_2 + 2\beta_3 = 19 $$
$$ 7\beta_1 + 6\beta_2 + 5\beta_3 = 57 $$

We could plausible solve for $\beta_1$, $\beta_2$, and $\beta_3$ by repeatedly re-arranging the formulas and then inserting them into the formulas. This is easy enough when we have two unknown values, still fine if we have three unknown values, but becomes increasingly time consuming and inefficient if we have lots more.

So how do matrices and vectors help?

Well, we can start by organising all of the above into a matrix and a couple of vectors:

$$ \boldsymbol{X} = \begin{bmatrix}
    4 & 8 & 3 \\
    2 & 3 & 2 \\
    7 & 6 & 5
\end{bmatrix} $$

$$ \boldsymbol{y} = \begin{bmatrix}
    47 \\ 19 \\ 57
\end{bmatrix} $$

$$ \boldsymbol{\beta} = \begin{bmatrix}
    \beta_1 \\ \beta_2 \\ \beta_3
\end{bmatrix} $$

which then allows us to re-write our problem as:

$$ \boldsymbol{X}\boldsymbol{\beta} = \boldsymbol{y} $$

It would be nice if we could someone manipulate this to solve for $\boldsymbol{\beta}$. Well, it turns out that we can. We can rewrite the above as:

$$ \boldsymbol{\beta} = \boldsymbol{X}^{-1}\boldsymbol{y} $$

But what do we need to do to get there? Before we can solve this, we need to introduce some ways of manipulating matrices.

## Matrix Multiplication

Let's go back to our $\boldsymbol{B}$ and $\boldsymbol{C}$ matrices from earlier:

$$ \boldsymbol{B} = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix} \\[5pt]
\boldsymbol{C} = \begin{bmatrix} 1 & 2 \\ 3 & 4 \\ 5 & 6 \end{bmatrix} $$

Note that $\boldsymbol{B}$ has order $2 \times 3$ and $\boldsymbol{C}$ has order $3 \times 2$. To mutliply two matrices, the number of columns in the first matrix must match the number of rows in the second matrix. So in this case, $\boldsymbol{B}\boldsymbol{C}$ is a valid matrix multiplication, and so is $\boldsymbol{C}\boldsymbol{B}$.

But *this won't always be true*! In many cases, just because one matrix multiplication exists, does not mean the other one will.

So, say we want to compute $\boldsymbol{B}\boldsymbol{C}$. What do we actually do? Well, we use the dot products from earlier. A general rule for computing the $(i,j)$th element of the output matrix is that it is the dot product of the $i$th row of the first matrix and the $j$th column of the second matrix.

So, for $\boldsymbol{B}\boldsymbol{C}$:

$$ \boldsymbol{B}\boldsymbol{C} =
    \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix}
    \begin{bmatrix} 1 & 2 \\ 3 & 4 \\ 5 & 6 \end{bmatrix}
    =
    \begin{bmatrix} (1, 2, 3) \cdot (1, 3, 5) & (1, 2, 3) \cdot (2, 4, 6) \\
    (4, 5, 6) \cdot (1, 3, 5) & (4, 5, 6) \cdot (2, 4, 6) \end{bmatrix}
$$

which is solved as:

$$ \boldsymbol{B}\boldsymbol{C} = \begin{bmatrix} 22 & 28 \\ 49 & 64 \end{bmatrix} $$

One thing worth noting: if the first matrix is order $n \times m$, and the second matrix is order $m \times l$, then the output matrix will be order $n \times l$. So we can add has many rows to the first matrix as we want, and as many columns to the second matrix as we want, and we will still have a solution.

```{note}
Unlike regular algebra, there is no guarantee that $\boldsymbol{A}\boldsymbol{B} = \boldsymbol{B}\boldsymbol{A}$ (or even that both exist).
```

## Transpose of a matrix

Before continuing, it's worth taking a moment to talk about the *transpose* of a matrix. This is written as either $\boldsymbol{A}^T$ or as $\boldsymbol{A}'$, both of which are read as "the transpose of $\boldsymbol{A}$".

Transposing a matrix just means flipping it over the diagonal. So, for example, for our matrix B:

$$ \boldsymbol{B} = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix} $$

The transpose is given by:

$$ \boldsymbol{B}^T = \begin{bmatrix} 1 & 4 \\ 2 & 5 \\ 3 & 6 \end{bmatrix} $$

```{note}
Note that we can treat row vectors as matrices with one row, and column vectors as matrices with one column. Consider $\boldsymbol{a}$ and $\boldsymbol{b}$ as column vectors of equal length $n$.

Transposing a column vector produces a row vector, and vice versa.

Then  $\boldsymbol{a}^T\boldsymbol{b}$ gives the dot product of both vectors, and is sometimes called the *inner product* of two vectors. It will *always* produce a scalar.

By contrast, $\boldsymbol{a}\boldsymbol{b}^T$ will produce a matrix of order $n \times n$, and is called the *outer product* of two vectors.
```

## Identity Matirx

An *identity matrix* is a square matrix where all diagonal elements are 1, and all other elements are 0. For example, a $3 \times 3$ identity matrix could be given by:

$$ \boldsymbol{I} = \begin{bmatrix}
    1 & 0 & 0 \\
    0 & 1 & 0 \\
    0 & 0 & 1
\end{bmatrix} $$

Sometimes this might be written as $\boldsymbol{I}_3$, but sometimes the number of elements will be left implict.

The identity matrix is a generalisation of the role that the number 1 has in scalar math. This is because for any identity matrix $\boldsymbol{I}$,

$$ \boldsymbol{A}\boldsymbol{I} = \boldsymbol{A} $$

## The Inverse of a Matrix and Identity Matrices

The *inverse* is a generalisation of the the concept of division from scalar numbers to matrices. Note that *only square matrices have inverses*. In general, the inverse of a square matrix is defined as follows:

$$ \boldsymbol{A}^{-1}\boldsymbol{A} = \boldsymbol{A}\boldsymbol{A}^{-1} = \boldsymbol{I} $$

So how do we get $\boldsymbol{A}^{-1}$ from $\boldsymbol{A}$?


