# Introduction to Linear Algebra 

## Matrices

### Conventions of Representation 

In the last chapter, we introduced a convention of representing vectors as an ordered, vertical arrangement of numbers within square brackets.

$$\vec{a}=\begin{bmatrix} 3\\4\end{bmatrix}; \quad 
\vec{b}=\begin{bmatrix} 2\\-1\end{bmatrix}; \quad 
\vec{c}=\begin{bmatrix} -3\\-3\end{bmatrix}; \quad 
\vec{d}=\begin{bmatrix} -2\\3\end{bmatrix}; \quad
\vec{e}=\begin{bmatrix} 2\\3\\5\end{bmatrix}
$$

This is part of a broader convention of orderly arranging numbers in a rectangular, table-like structure called a __matrix__. A variable containing a vector is represented with a small letter and an arrow above it. A variable containing a matrix is conventionally represented with a capital letter. For example, M is a matrix containing 12 numbers arranged in a particular way.

$$M=\begin{bmatrix} 1 & 3 & -2 & 4\\2 & 3 & -1 & -1\\ -1 & 1 & 3 & 4\end{bmatrix}$$

As the table-like arrangement of numbers suggests, a matrix can be organised into rows and columns. This allows us to assess its size. The matrix $M$ has 3 rows and 4 columns, thus it is described as a $3\times4$ matrix. The convention is to always start with the number of rows. 

$$M_{3\times4}=\underset{\;\\\mathbf{3 \, rows \, \times \, 4 \, \text{columns}}}{\begin{bmatrix} 1 & 3 & -2 & 4\\2 & 3 & -1 & -1\\ -1 & 1 & 3 & 4\end{bmatrix}}$$

With this in mind, we can think of vectors as single column matrices. 

### Why a Matrix?

There is not a single precise meaning associated with matrices. They are first and foremost a  mathematical structure uniquely arranging numbers. We can use numbers to count but also to measure.  We can use vectors to describe a direction and magnitude of a force but also features of some object we are trying to encode computationally. Consequently, we can use this rectangular arrangement of numbers for many things. We can, for example, concisely represent a system of linear equations, or conveniently represent any linear transformation of space. These specific uses, however, do not exhaust what this structure may ultimately facilitate. It is rather that we have found a way of using an ordered table of numbers to accomplish something more efficiently or to describe something in a way which ends up further illuminating it from a new perspective. When this happens, as it does in linear algebra, we tend to associate a specific meaning to mathematical objects such are matrices and forget that meaning is a matter of the perspective taken.

### Matrices in Linar Algebra

#### Linear Transformation

In linear algebra, matrices extend the idea of manipulating vectors in space, towards manipulating the space itself (including all the possible vectors in it). To explain this, we will use the idea of a __linear transformation__. We can think of a linear transformation as a function (with some properties), which can transform an input defined in terms of vectors into an output defined in terms of vectors.

<img src="img/linear-algebra18.png" alt="drawing" width="500"/>

 In linear algebra, space itself is defined in terms of vectors. We can, for example, make a linear combination of any two linearly independent 2-D vectors to reach any point in 2-D space. Hence a linear transformation can transform a whole space into a different one.

What are the properties of a linear transformation? The most obvious one is the linearity. It is an aspect that mathematics defines formally, in terms of operations, without much regard to its spatial aspect. This leads to a multitude of possible geometric interpretations. Let's describe one: <br>
Imagine a two-dimensional plane. The plane itself has no shape (you can think of it as an empty canvas), so to illustrate its principal two directions, we need some kind of a helping device. One such device is the coordinate axes. Another such device is a grid, subdividing the space with two sets of lines, parallel to the coordinate axes. The axes lay in the centre of this grid. A linear transformation of space implies that the coordinate centre remains fixed at $(0,0)$, and the the grid lines of the transformed space remain parallel to each other and equally spaced. 

<img src="img/linear-algebra19.png" alt="drawing" width="800"/>

#### Using Basis Vectors to Describe a Linear Transformation

If the linearity is guaranteed, an easier and much more convenient way to describe what a linear transformation accomplishes is to track how it transforms its basis vectors. In the case of the 2-dimensional space, we start with the basis vectors $\boldsymbol{\hat{\imath}}$ and $\boldsymbol{\hat{\jmath}}$. A linear transformation transforms the space such that  $\boldsymbol{\hat{\imath}}$ and $\boldsymbol{\hat{\jmath}}$ (usually) end up at a new location in space in terms of the starting condition. We can call these transformed vectors $\boldsymbol{\hat{\imath}}^{\;new}$ and $\boldsymbol{\hat{\jmath}}^{\; new}$

![SegmentLocal](img/linear-algebra20.gif "segment")

Numerically expressing this linear transformation is easier than one might think. We simply need to capture the coordinates of the transformed basis vectors $\boldsymbol{\hat{\imath}}^{\;new}$ and $\boldsymbol{\hat{\jmath}}^{\; new}$ in terms of the original space and put them in a matrix! Since we have two vectors, our matrix will have 2 columns. The first vector corresponds to the first column (we count columns from left to right). Because our vectors are two-dimensional, our matrix will have 2 rows. The first spatial dimension starts at the top (we encode dimensions from the top towards the bottom).

<img src="img/linear-algebra21.png" alt="drawing" width="800"/>

To capture the transformation described in the image above we define a matrix $M$ as follows:

$$M=\begin{bmatrix} -1&-2\\3&-1\end{bmatrix}$$

### Some Common Linear Transformations

Some linear transformations are used very often, especially when applied in computer graphics. A good example is using matrices to rotate a space in a certain direction. 

#### Rotate 90 Degrees Clockwise

<img src="img/linear-algebra22.png" alt="drawing" width="800"/>

$$M_{rotate 90+}=\begin{bmatrix} 0&-1\\-1&0\end{bmatrix}$$

#### Rotate 90 Degrees Counter-clockwise

<img src="img/linear-algebra23.png" alt="drawing" width="800"/>

$$M_{rotate 90-}=\begin{bmatrix} 0&-1\\1&0\end{bmatrix}$$

#### Shear Transformation

With this transformation, all the vector components in the direction of $\boldsymbol{\hat{\imath}}$ remain unchanged, but the ones in the $\boldsymbol{\hat{\jmath}}$ direction get pushed towards the $\boldsymbol{\hat{\imath}}$. 

<img src="img/linear-algebra24.png" alt="drawing" width="800"/>

$$M_{shear}=\begin{bmatrix} 1&1\\0&1\end{bmatrix}$$

#### Identity Matrix

A transformation matrix that does not transform the space at all is a matrix whose columns contain the basis vectors. This idea might sound not that useful, yet such a matrix is extremely important in mathematics. Its name is an __identity matrix__, and it is usually symbolised with the capital $I$. An identity matrix serves a similar purpose that the number one does in arithmetics. Multiplying any number by one also does not change the number. 

<img src="img/linear-algebra25.png" alt="drawing" width="800"/>

$$I=\begin{bmatrix} 1&0\\0&1\end{bmatrix}$$

In the case of three dimensions, an identity matrix would be contain 3 vectors of 3 dimensions:

$$I_{3}=\begin{bmatrix} 1&0&0 \\ 0&1&0 \\ 0&0&1 \end{bmatrix}$$

Notice an important feature that in both cases, the diagonal elements are ones, and all the rest are zeros.

### Elementary Matrix Operations

#### How a Linear Transformation Affects an Arbitary Vector

Let us return to the transformation matrix $M$, we have created, described as:
$$M=\begin{bmatrix} 0&-3\\4&-1\end{bmatrix}$$

The matrix $M$ is sufficient to compute where any vector of the original space will land in the transformed space. The process is illustrated in the image below. (Fig. 1) We start in the space defined by the basis vectors $\boldsymbol{\hat{\imath}}$ and $\boldsymbol{\hat{\jmath}}$. The vector $\vec{p}$ is an arbitrary vector in this space whose coordinates are $-0.5$ and $1.5$. We can describe $\vec{p}$ as a linear combination of the basis vectors: $\vec{p}=-0.5\boldsymbol{\hat{\imath}}-1.5\boldsymbol{\hat{\jmath}}$.

Now we transform the space. To define the transformation we change the direction and orientation of the basis vectors. We end up with new vectors $\boldsymbol{\hat{\imath}}^{\;new}$ and $\boldsymbol{\hat{\jmath}}^{\; new}$. To encode this transformation in a matrix $M$, we capture vector's new coordinates: $\boldsymbol{\hat{\imath}}^{\;new}$ has coordinates $-1$ and $3$; $\boldsymbol{\hat{\jmath}}^{\;new}$ has coordinates $-2$ and $-1$. (Fig. 2)

<img src="img/linear-algebra26.png" alt="drawing" width="800"/>

To compute where the vector $\vec{p}$ lands in the transformed space, we multiply $\vec{p}$'s first coordinate $-0.5$ with the first column of the transformation matrix $M$, and the $\vec{p}$'s second coordinate $-1.5$ with the second column. This gives us the vector $\vec{p}^{\;new}$ whose coordinates in terms of the old space are $3.5$ and $0$ (Fig. 3) Once we remove the old space and its basis vectors from the picture, we are left with new basis vectors $\boldsymbol{\hat{\imath}}^{\;new}$, $\boldsymbol{\hat{\jmath}}^{\;new}$ and the vector $\vec{p}^{\;new}$. (Fig. 4) If we represent the vector $\vec{p}^{\;new}$ as a linear combination of $\boldsymbol{\hat{\imath}}^{\;new}$ and $\boldsymbol{\hat{\jmath}}^{\;new}$ we notice something very interesting. The linear combination is parametrised by the same scalars $-0.5$ and $-1.5$ just like in the space we started! (Fig. 1) Vectors $\vec{p}$'s coordinates are still $-0.5$ and $-1.5$, only in terms of the new basis vectors (In terms of the old basis vectors, its coordinates are $3.5$ and $0$). The consequence of linearity of the transformation is that as we transform the space, the basis vectors of the space change, but the scalars parametrising every vector in this space remain constant.

<img src="img/linear-algebra27.png" alt="drawing" width="800"/>

#### Multiplying a Vector by a Matrix

The calculation we performed to find the landing coordinates of $\vec{p}^{\;new}$ was, in fact, the same as __multiplying__ a vectory $\vec{p}$ by a matrix $M$. To correctly perform the calculation, the transformation matrix needs to be on the left of the vector that it multiplies. What we did can be described algebraically as following:

\begin{align}
\vec{p}^{\;new}&=M \times \vec{p} \\\\
\vec{p}^{\;new}&=\begin{bmatrix} -1&-2\\3&-1 \end{bmatrix}\times \begin{bmatrix} -0.5\\-1.5\end{bmatrix} \\\\
\vec{p}^{\;new}&=-0.5 \begin{bmatrix} -1\\3 \end{bmatrix} -1.5\begin{bmatrix} -2\\-1 \end{bmatrix} \\\\
\vec{p}^{\;new}&=\begin{bmatrix} 0.5\\-1.5 \end{bmatrix}+\begin{bmatrix} 3\\1.5 \end{bmatrix} \\\\
\vec{p}^{\;new}&=\begin{bmatrix} 3.5\\0 \end{bmatrix}
\end{align}

#### Matrix Multiplication

##### Composing Linear Transformations

A single matrix can be used to describe a transformation of space. A __product__ of two matrices can be used to compose two linear transformations into one! Let's say that we wanted to first rotate a vector by 90 degrees counter-clockwise and then perform a shear operation. We could start with a vector $\vec{q}$ and multiply it by the matrix $M_{rotate90+}$. This operation would yield a vector $\vec{q_{rotated}}$. 


$$\vec{q_{rotated}}=M_{rotate90-} \times \vec{q}$$

Now we can multiply the vector $q_{rotated}$ by the matrix $M_{shear}$ and get the final vector $q_{final}$. 

$$\vec{q_{final}}=M_{shear} \times \vec{q_{rotated}}$$

Vector multiplication allows us to compute a new matrix $M_{rotate-shear}$ whose effect on the vector $\vec{q}$ would be of applying both the rotation and shear transformation at the same time! The order of operations matter, as we will get a different result if we first applied shear and then rotated than if we first rotated and then applied shear. The operation that comes last should be on the left of the operation that precedes it. 

\begin{align}
M_{rotate-shear}&=M_{shear}\times M_{rotate90-} \\\\
M_{rotate-shear}&=\begin{bmatrix} 1&1\\0&1\end{bmatrix} \times \begin{bmatrix} 0&-1\\1&0\end{bmatrix} \\\\
M_{rotate-shear}&=\begin{bmatrix} 1&-1\\1&0\end{bmatrix}
\end{align}

![SegmentLocal](img/linear-algebra28.gif "segment")

##### Matrix Multiplication Rules

In the previous step, we skipped the process of actually computing the matrix product and directly showed the result. This is because matrix multiplication can be quite numerically cumbersome, and its computation can obscure the idea of what it stands for in linear algebra—composing linear transformations. Here we introduce the multiplication rules. Let's start with an example of a single matrix, $A$:

$$
A=\begin{bmatrix}
a_{11}&a_{12}&a_{13}&a_{14} \\
a_{21}&a_{22}&a_{23}&a_{14} \\
a_{31}&a_{32}&a_{33}&a_{14} \\
\end{bmatrix} \\
$$

$A$ is a $3\times4$ matrix containing 12 elements distributed in 3 rows and 4 columns. Each element $a_{ij}$ is indexed by two numbers. The first one $i$ defines the element's row and the second one $j$ the element's column. Thus the name $a_{23}$ indicates an element in the second row and the third column.

Let's introduce another matrix B:

$$
B=\begin{bmatrix}
b_{11}&b_{12} \\
b_{21}&b_{22} \\
b_{31}&a_{32} \\
b_{41}&a_{42} \\
\end{bmatrix} \\
$$

$B$ is a $4\times2$ matrix, containing 8 elements distributed in 4 rows and 2 columns. The element indexing rules are the same as in the case of $A$.

To compute a matrix product, matrices need to be compatible in size. Moreover, the compatibility then defines the size of the product:

__To compute the matrix product $A\times B$, the number of columns of matrix A needs to match the number of rows of matrix B. If this criterion is met, the product matrix will have the same number of rows as the matrix A and the same number of columns as the matrix B.__

Since matrix $A$ is a $\color{red}3\times \boldsymbol{4}$ matrix and B is a $\boldsymbol{4} \times \color{blue}2$ matrix, two matrices are compatible to compute the product $A \times B$, and the product matrix will be a $\color{red}3 \times \color{blue}2$ matrix.

\begin{align}
C_{\color{red}3\times \color{blue}2} &= A_{\color{red}3\times\boldsymbol{4}} \times B_{\boldsymbol{4}\times\color{blue}2} \\\\
C &= \begin{bmatrix}
c_{11} & c_{12} \\
c_{21} & c_{22} \\
c_{31} & c_{32}
\end{bmatrix}
\end{align}

What is left is to compute are actual elements $c_{ij}$, and for that, we will need the dot product operation. Index $_{\color{red}2 \color{blue}1}$ of tells us that to compute the element $c_{\color{red}2 \color{blue}1}$ we need to dot multiply the <span style="color:red">__second__ </span>row of the matrix $A$ by the <span style="color:blue">__first__</span> column of the matrix $B$. To compute the element $c_{\color{red}3 \color{blue}2}$ we need to dot multiply the <span style="color:red">__third__ </span>row of the matrix $A$ by the <span style="color:blue">__second__</span> column of the matrix $B$. In other words, to compute $c_{\color{red}2 \color{blue}1}$ we need to multiply:

<img src="img/linear-algebra29.png" alt="drawing" width="500"/>

Notice that any column or a row of a matrix is one-dimensional, so we can think of it as a vector and represent it consequently:

\begin{align}
c_{21} &= \begin{bmatrix} a_{21} \\ a_{22} \\ a_{23} \\ a_{24}\end{bmatrix} \cdot
\begin{bmatrix} b_{11} \\ b_{21} \\ b_{31} \\ b_{41}\end{bmatrix} \\[2ex]
c_{21} &=a_{21}b_{11}+a_{22}b_{21}+a_{23}b_{31}+a_{24}b_{41}
\end{align}

to compute $c_{\color{red}3 \color{blue}2}$ we need to multiply:

<img src="img/linear-algebra30.png" alt="drawing" width="500"/>

And in terms of a dot product of vectors:

\begin{align}
c_{32} &= \begin{bmatrix} a_{31} \\ a_{32} \\ a_{33} \\ a_{34}\end{bmatrix} \cdot
\begin{bmatrix} b_{12} \\ b_{22} \\ b_{32} \\ b_{42}\end{bmatrix} \\[2ex]
c_{32} &=a_{31}b_{12}+a_{32}b_{22}+a_{33}b_{32}+a_{34}b_{42}
\end{align}

The animation below shows how to compute the rest of the elements:

![SegmentLocal](img/linear-algebra31.gif "segment")

Here it should be clear why the matrix multiplication requires that $A$ and $B$ are compatible in size. If matrix $A$ had 3 rows instead of 4, and the matrix $B$ remained the same, we would need to dot multiply two vectors of a different length. That is simply not possible, as dot product requires us to have to vectors with the same number of elements. 

To compute all the elements of matrix C, we need to compute 8 dot products of two 4-dimensional vectors. This is why the matrix multiplication is better left to computers.

$$
C=
\begin{bmatrix} a_{11}b_{11}+a_{12}b_{21}+a_{13}b_{31}+a_{14}b_{41} &  a_{11}b_{12}+a_{12}b_{22}+a_{13}b_{32}+a_{14}b_{42} \\
a_{21}b_{11}+a_{22}b_{21}+a_{23}b_{31}+a_{24}b_{41} &  a_{21}b_{12}+a_{22}b_{22}+a_{23}b_{32}+a_{24}b_{42} \\
a_{31}b_{11}+a_{32}b_{21}+a_{33}b_{31}+a_{34}b_{41} &  a_{31}b_{12}+a_{32}b_{22}+a_{33}b_{32}+a_{34}b_{42} \\
a_{41}b_{11}+a_{42}b_{21}+a_{43}b_{31}+a_{44}b_{41} &  a_{41}b_{12}+a_{4}b_{22}+a_{43}b_{32}+a_{44}b_{42} \\
\end{bmatrix}
$$

#### Matrix Addition and Subtraction

Matrix addition and subtraction do not have a straightforward intuitive meaning. They can be performed, but we need to think of them purely in algebraic terms. Similarly to vectors, to be able to add or subtract two matrices, they need to be of the same size. If the matrix $A$ is a $3\times4$ matrix, then the matrix $B$ needs to be a $3\times4$ matrix as well if we wish to compute $A+B$ and $A-B$. If the matrices are of the same size, both addition and subtraction  are trivial. We simply need to add or subtract individual elements of both matrices that have matching indices. 

$$A_{2\times3}=\begin{bmatrix}
a_{11} & a_{12} & a_{13}\\
a_{21} & a_{22} & a_{23}\\
\end{bmatrix}; \quad
B_{2\times3} = \begin{bmatrix}
b_{11} & b_{12} & b_{13}\\
b_{21} & b_{22} & b_{23}\\
\end{bmatrix} \\[2em]
A+B = \begin{bmatrix}
a_{11}+b_{11} & a_{12}+b_{12} & a_{13}+b_{13}\\
a_{21}+b_{21} & a_{22}+b_{22} & a_{23}+a_{23}\\
\end{bmatrix} \\[2em]
A-B = \begin{bmatrix}
a_{11}-b_{11} & a_{12}-b_{12} & a_{13}-b_{13}\\
a_{21}-b_{21} & a_{22}-b_{22} & a_{23}-a_{23}\\
\end{bmatrix}
$$

#### Multiplying a Matrix by a Scalar

Another common operation is to multiply a matrix by a scalar. This is also very straightforward. Every element of the new matrix will be a multiple of the old one and the given scalar. 

$$A_{2\times3}=\begin{bmatrix}
a_{11} & a_{12} & a_{13}\\
a_{21} & a_{22} & a_{23}\\
\end{bmatrix}; \\[2em]
pA = \begin{bmatrix}
p\,a_{11} & p\,a_{12} & p\,a_{13}\\
p\,a_{21} & p\,a_{22} & p\,a_{23}\\
\end{bmatrix}
$$

### Elementary Matrix Operations in Python

To represent matrices in Python programming language, we will use the linear algebra library called NumPy. To import it we write the following line of code:

In [6]:
import numpy as np

Let us define matrices $A$, $B$ and $C$ as follows:

$$
A=\begin{bmatrix}
4 & 2 & -1 & 5 \\
2 & 1 & 3 & -3 \\
-2 & -3 & 1 & 4 \\
\end{bmatrix}; \quad
B=\begin{bmatrix}
-1 & 1 \\
3 & -3 \\
4 & -2 \\
5 & 5 \\
\end{bmatrix}; \quad
C=\begin{bmatrix}
1 & 3  \\
3 & -1 \\
\end{bmatrix}; \quad
D=\begin{bmatrix}
2 & 1  \\
-1 & 2 \\
\end{bmatrix}
$$

Now we can use NumPy's data structure called array to represent matrices:

In [10]:
A = np.array([[4,   2, -1,  5],
              [2,   1,  3, -3],
              [-2, -3,  1,  4]])

In [14]:
B = np.array([[-1,  1],
              [ 3, -3],
              [ 4, -2],
              [ 5,  5]])

In [15]:
C = np.array([[-1,  3],
              [ 3, -1]])

In [20]:
D = np.array([[ 2, 1],
              [-1, 2]])

To make sure we encoded them properly, we can use the command print:

In [17]:
print (A)

[[ 4  2 -1  5]
 [ 2  1  3 -3]
 [-2 -3  1  4]]


In [18]:
print (B)

[[-1  1]
 [ 3 -3]
 [ 4 -2]
 [ 5  5]]


In [19]:
print (C)

[[-1  3]
 [ 3 -1]]


In [21]:
print (D)

[[ 2  1]
 [-1  2]]


#### Matrix Addition and Subtraction in Python

We can only add two matrices if they are of the same size. Thus attempting to compute $A+B$ will result in an error:

In [22]:
A+B

ValueError: operands could not be broadcast together with shapes (3,4) (4,2) 

What we can add are the matrices $C$ and $D$:

In [24]:
print (C+D)

[[1 4]
 [2 1]]


or we can add any matrix to itself

In [25]:
print (A+A)

[[ 8  4 -2 10]
 [ 4  2  6 -6]
 [-4 -6  2  8]]


#### Multiplying a Matrix by a Scalar in Python

Multiplying a matrix by a scalar in python is quite straightforward. To multiply the matrix $A$ by 5, we write:

In [37]:
print (5*A)

[[ 20  10  -5  25]
 [ 10   5  15 -15]
 [-10 -15   5  20]]


In [38]:
print (-5*A)

[[-20 -10   5 -25]
 [-10  -5 -15  15]
 [ 10  15  -5 -20]]


#### Matrix Multiplication in Python

We can only multiply two matrices if their sizes are matching. We can for example compute $A\times B$. To apply matrix multiplication we will use the NumPy command `np.dot`:

In [29]:
print (np.dot(A,B))

[[ 23  25]
 [ -2 -22]
 [ 17  25]]


Since matrix product is not commutative, trying to compute the product $B\times A$ will not work:

In [30]:
print (np.dot(B,A))

ValueError: shapes (4,2) and (3,4) not aligned: 2 (dim 1) != 3 (dim 0)

However, we can find the product $B\times C$ or $B\times D$:

In [31]:
print (np.dot(B,C))

[[  4  -4]
 [-12  12]
 [-10  14]
 [ 10  10]]


In [32]:
print (np.dot(B,D))

[[-3  1]
 [ 9 -3]
 [10  0]
 [ 5 15]]


Since the matrices $C$ and $D$ have the same size, computing both $C\times D$ and $D\times C$ is possible:

In [33]:
print (np.dot(C,D))

[[-5  5]
 [ 7  1]]


In [34]:
print (np.dot(D,C))

[[ 1  5]
 [ 7 -5]]
