# Linear algebra review

Review of some of the linear algebra required for machine learning.

For matrix and vector usage in python, we will import the `numpy` package.

In [1]:
# Basic matrix operators
import numpy as np

# Inverse a matrix
from numpy.linalg import inv

## Definitions

### Matrix

A rectangular array of numbers with dimensions are written as `rows x columns`

- "A 2 x 5 matrix" is 4 rows, 2 columns
- Also could be $\mathbb{R}^{4\times2}$

> If a matrix is $m \times m$ it is called a "square matrix"

Denoting elements of a matrix:

$$
A = \begin{bmatrix}
1 && 2 && 3 \\
4 && 5 && 6 \\
7 && 8 && 9 \\
10 && 11 && 12
\end{bmatrix}
$$

Denoting the matrix in python:

In [2]:
A = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9],
    [10, 11, 12],
])
A

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

Syntax for an item of the matrix:

$$
\begin{align}
A_{ij} &= \text{entry at $i$th row and $j$th column} \\
A_{42} &= 11
\end{align}
$$

Items in python are the same except for 0-indexing:

In [3]:
A[3, 1]

11

### Vector

A single column matrix with dimensions written as `n dimensions`.

- "A 4-dimensional vector" is 4 rows
- Also could be $\mathbb{R}^{4}$

Denoting the elements of a vector:

$$
y = \begin{bmatrix}
1 \\
2 \\
3
\end{bmatrix}
$$

Denoting the vector in python:

In [4]:
y = np.array([1, 2, 3])
y

array([1, 2, 3])

Syntax for an item in the vector:

$$
\begin{align}
y_{i} &= \text{entry at $i$th row} \\
y_{2} &= 2
\end{align}
$$

Items in python (again, same but 0-indexed):

In [5]:
y[1]

2

## Notation

| Value | Meaning |
| :-: | :-- |
| $\begin{bmatrix}a & b\\c & d\end{bmatrix}$ | matrix - multiple columns |
| $\begin{bmatrix}a \\ b \\ c \\ d\end{bmatrix}$ | vector - a n x 1 matrix |

> Generally uppercase letters refer to matrices, whereas lowercase refers to numbers, scalars, and vectors.

## Matrix addition

$$
\begin{bmatrix}1 & 0 \\ 2 & 5 \\ 3 & 1\end{bmatrix} + \begin{bmatrix}4 & 0.5 \\ 2 & 5 \\ 0 & 1\end{bmatrix} = A
$$

$A$ is a result of simply adding each matrix element to another:

$$
\begin{bmatrix}1 & 0 \\ 2 & 5 \\ 3 & 1\end{bmatrix} +
\begin{bmatrix}4 & 0.5 \\ 2 & 5 \\ 0 & 1\end{bmatrix} =
\begin{bmatrix}1+4 & 0+0.5 \\ 2+2 & 5+5 \\ 3+0 & 1+1\end{bmatrix} =
\begin{bmatrix}5 & 0.5 \\ 4 & 10 \\ 3 & 2\end{bmatrix}
$$

> You can only add matrices of the same dimension (in the above case, we are adding two $3\times2$ matrixes

## Matrix-scalar multiplication

Scalar = "real number".

$$
3\times\begin{bmatrix}1 & 0 \\ 2 & 5 \\ 3 & 1\end{bmatrix} = A
$$

$A$ is a result of simply multiplying each matrix element by the scalar:

$$
3\times
\begin{bmatrix}1 & 0 \\ 2 & 5 \\ 3 & 1\end{bmatrix} =
\begin{bmatrix}1\times3 & 0\times3 \\ 2\times3 & 5\times3 \\ 3\times3 & 1\times3\end{bmatrix} =
\begin{bmatrix}3 & 0 \\ 6 & 15 \\ 9 & 3\end{bmatrix}
$$

Division by $n$ is the same as multiplying by $\frac{1}{n}$:

$$
\begin{bmatrix}4 & 0 \\ 6 & 3\end{bmatrix} / 4 =
\begin{bmatrix}4 & 0 \\ 6 & 3\end{bmatrix}\times\frac{1}{4} =
\begin{bmatrix}4\times\frac{1}{4} & 0\times\frac{1}{4} \\ 6\times\frac{1}{4} & 3\times\frac{1}{4}\end{bmatrix} =
\begin{bmatrix}1 & 0 \\ \frac{3}{2} & \frac{3}{4}\end{bmatrix}
$$


## Matrix-vector multiplication

To do this, you multiply each row of the matrix by the values in the vector, then add them together:

$$
\begin{bmatrix} \color{red}1 & \color{red}3 \\ \color{orange}4 & \color{orange}0 \\ \color{green}2 & \color{green}1\end{bmatrix}
\times
\begin{bmatrix}\color{purple}1 \\ \color{purple}5\end{bmatrix}
=
\begin{bmatrix}
\color{red}1\times\color{purple}1 + \color{red}3\times\color{purple}5 \\
\color{orange}4\times\color{purple}1 + \color{orange}0\times\color{purple}5 \\
\color{green}2\times\color{purple}1 + \color{green}1\times\color{purple}5
\end{bmatrix}
=
\begin{bmatrix} 16 \\ 4 \\ 7\end{bmatrix}
$$

The result of multiplying a $m \times n$ matrix with a $n \times 1$ matrix ($n$-dimensional vector) is a $m \times 1$ matrix ($m$-dimensional vector).

> The number of columns in the matrix must match the number of rows in the vector ($n$).

### Application

A nice way to apply a hypothesis like the [linear hypothesis](./terms/linear_hypothesis.ipynb) to a set of houses:

House sizes:
- 2104
- 1416
- 1534
- 852

$$
h_\theta(x) = -40 + 0.25x
$$

$$
\begin{bmatrix}1 & 2104 \\ 1 & 1416 \\ 1 & 1534 \\ 1 & 852\end{bmatrix}
\times
\begin{bmatrix}-40 \\ 0.25\end{bmatrix}
=
\begin{bmatrix}486 \\ 314 \\ 343.5 \\ 173\end{bmatrix}
$$

- $486 = h_\theta(2104)$
- $314 = h_\theta(1416)$
- $343.5 = h_\theta(1534)$
- $173 = h_\theta(852)$

Calculating this way is much more computationally efficient than running through a for loop with each data point (especially for larger [training sets](./terms/training_set.ipynb)).

In [6]:
data_matrix = np.array([
    [1, 2104],
    [1, 1416],
    [1, 1534],
    [1, 852],
])

parameters = np.array([
    [-40],
    [0.25],
])

data_matrix @ parameters

array([[486. ],
       [314. ],
       [343.5],
       [173. ]])

## Matrix-matrix multiplication

Take two matrices:

$$
A = \begin{bmatrix}1 & 3 & 2 \\ 4 & 0 & 1\end{bmatrix},
B = \begin{bmatrix}1 & 3 \\ 0 & 1 \\ 5 & 2\end{bmatrix}
$$

Multiply them together:

$$
\begin{bmatrix}1 & 3 & 2 \\ 4 & 0 & 1\end{bmatrix}
\times
\begin{bmatrix}\color{green}1 & \color{red}3 \\ \color{green}0 & \color{red}1 \\ \color{green}5 & \color{red}2\end{bmatrix}
$$

First, multiply $A$ by the first column of $B$:

$$
\begin{bmatrix}1 & 3 & 2 \\ 4 & 0 & 1\end{bmatrix}
\times
\color{green}{\begin{bmatrix}1 \\ 0 \\ 5\end{bmatrix}}
=
\color{blue}{\begin{bmatrix}11 \\ 9\end{bmatrix}}
$$

Second, multiply $A$ by the second column of $B$:

$$
\begin{bmatrix}1 & 3 & 2 \\ 4 & 0 & 1\end{bmatrix}
\times
\color{red}{\begin{bmatrix}3 \\ 1 \\ 2\end{bmatrix}}
=
\color{orange}{\begin{bmatrix}10 \\ 14\end{bmatrix}}
$$

Finally, join the two resulting vectors into a matrix:

$$
C = \begin{bmatrix} \color{blue}{11} & \color{orange}{10} \\ \color{blue}{9} & \color{orange}{14} \end{bmatrix}
$$

The result of multiplying a $m \times n$ matrix with a $n \times o$ matrix is a $m \times o$ matrix.

> The number of columns in the first matrix must match the number of rows in the second ($n$).

The $i^{th}$ column of the matrix $C$ is obtained by multiplying $A$ with the $i^{th}$ column of $B$ (for $i = 1, 2, ...,o$)

### Application

A nice way to apply a group of competing hypothesis like the [linear hypothesis](./terms/linear_hypothesis.ipynb) to a set of houses:

House sizes:
- 2104
- 1416
- 1534
- 852

$$
\begin{align}
h_\theta(x) &= -40 + 0.25x \\
h_\theta(x) &= 200 + 0.1x \\
h_\theta(x) &= -150 + 0.4x
\end{align} \\
\begin{bmatrix}1 & 2104 \\ 1 & 1416 \\ 1 & 1534 \\ 1 & 852\end{bmatrix}
\times
\begin{bmatrix}-40 & 200 & -150 \\ 0.25 & 0.1 & 0.4\end{bmatrix}
=
\begin{bmatrix}486 & 410.4 & 691.4 \\ 314 & 341.6 & 416.4 \\ 343.5 & 353.4 & 463.6 \\ 173 & 285.2 & 190.8\end{bmatrix}
$$

- first column = predicted prices with first hypothesis
- second column = predicted prices with second hypothesis
- third column = predicted prices with third hypothesis

Calculating this way is much more computationally efficient than running through a for loop with each data point (especially for larger [training sets](./terms/training_set.ipynb)).

In [7]:
data_matrix = np.array([
    [1, 2104],
    [1, 1416],
    [1, 1534],
    [1, 852],
])

parameters = np.array([
    [-40, 200, -150],
    [0.25, 0.1, 0.4],
])

data_matrix @ parameters

array([[486. , 410.4, 691.6],
       [314. , 341.6, 416.4],
       [343.5, 353.4, 463.6],
       [173. , 285.2, 190.8]])

## Matrix multiplication properties

### Matrix multiplication _is not_ commutative

When multiplying scalars, or matrix $\times$ scalar, it is "commutative" (can be done in either direction). This is **not** true for matrix multiplication. Even the dimensions of the resulting matrix will change depending on the order.

$$
A \times B \neq B \times A
$$

### Matrix multiplication _is_ associative

When multiplying $n>2$ scalars, you can multiply in any order. The same **is** true for matrices.

$$
A \times B \times C = A \times (B \times C) = (A \times B) \times C
$$

### Identity matrix

$1$ is "identity", in that for any $z: 1 \times z = z \times 1 = z$. In matrices, for any matrix $A: A \times I = I \times A = A$.

> Note that $I$ is different in both cases, as it must match with the columns in $A \times I$ and rows in $I \times A$

$$
2\times2: \begin{bmatrix}1 & 0 \\ 0 & 1\end{bmatrix}
$$

$$
3\times3: \begin{bmatrix}1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1\end{bmatrix}
$$

In [8]:
A = np.array([
    [1, 2],
    [4, 5]
])

B = np.array([
    [1, 1],
    [0, 2],
])

I = np.identity(2)

In [9]:
I @ A

array([[1., 2.],
       [4., 5.]])

In [10]:
A @ I

array([[1., 2.],
       [4., 5.]])

## Matrix inverse

The inverse of a scalar $s$ is a value $i$ such that $s \times i = 1$ aka "identity" (example $3 \times 3^{-1} = 1$).

In matrices, some will have an inverse (only matrices that are $m \times m$). Other cases mean a matrix has no inverse, like an all-$0$ matrix. The matrices without an inverse are called **"singular"** or **"degenerate"** matrices.

If $A$ is an $m \times m$ matrix, and if it has an inverse:

$$
A \times A^{-1} = A^{-1} \times A = I
$$

For example:

$$
\begin{bmatrix}3 & 4 \\ 2 & 16\end{bmatrix} \times \begin{bmatrix}0.4 & -0.1 \\ -0.05 & 0.075\end{bmatrix} = \begin{bmatrix}1 & 0 \\ 0 & 1\end{bmatrix} = I_{2 \times 2}
$$

In [11]:
A = np.array([
    [3, 4],
    [2, 16],
])
A

array([[ 3,  4],
       [ 2, 16]])

In [12]:
I = inv(A)
I

array([[ 0.4  , -0.1  ],
       [-0.05 ,  0.075]])

In [13]:
A @ I

array([[1., 0.],
       [0., 1.]])

## Matrix transpose

$$
A = \begin{bmatrix}1 & 2 & 0 \\ 3 & 5 & 9\end{bmatrix} \quad A^T = \begin{bmatrix}1 & 3 \\ 2 & 5 \\ 0 & 9\end{bmatrix}
$$

To get a transpose, simply take row 1 of A, which now becomes columm 1 of $A^T$ (essentally just switch row and column for each item). The matrix goes from $m \times n$ to $n \times m$.

$$
A_{ij} = A^T_{ji}
$$

In [14]:
A = np.array([
    [1, 2, 0],
    [3, 5, 9],
])
A

array([[1, 2, 0],
       [3, 5, 9]])

In [15]:
np.transpose(A)

array([[1, 3],
       [2, 5],
       [0, 9]])