<a href="https://colab.research.google.com/github/ncssmmlclub/Lectures/blob/master/into-to-matrix-math.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction to Matrix and Vector Mathematics

Today, we'll be discussing how vectors and matrices work!


## 1.1 What even are Vectors and Matrices?

**_First of all, what is a vector?_**

A vector is quantity with a direction and a magnitude (as seen pictured below)

<img src="https://i.ibb.co/JnG1FWB/vector.png" alt="vector" border="0">

In machine learning, they can represent a large variety of data including parameter spaces, coordinates, as well as changes to the aforementioned data.

Note that vectors don't have a specific start position. Sometimes, when discussing coordinates we will use the vector to represent a position as a certain direction and magnitude from the origin, but the vector itself does not specify an origin.

When we discuss vectors, we typically refer to column vectors, which are also $n \times 1$ matrices and are represented as such:

$$ \vec{v} = \begin{bmatrix}a\\b\\c\\d\end{bmatrix}$$

where $a$, $b$, $c$, and $d$ are components of the vector (and matrix).

Sometimes, however, we will use row vectors, which are $1 \times n$ matrices and are the **transpose** of the corresponding column vector and are represented like so:

$$ \vec{v}^\top = \begin{bmatrix}a & b & c & d\end{bmatrix}$$

In literature, vectors are often lowercase and denoted in boldface ($\mathbf{v}$) or with an arrow ($\vec{v}$).

**_Secondly, what is a matrix?_**

A matrix is a rectangular grid of numbers, and are usually used to represent **transformations** and parameters spaces, (as well as changes to that data).

Matrices are often denoted with boldface uppercase characters as such:

$$\mathbf{A} = \begin{bmatrix}A_{11} & A_{12} & A_{13}\\ A_{21} & A_{22} & A_{23}\\ A_{31} & A_{32} & A_{33} \end{bmatrix}$$

In this matrix $A_{ij}$ (where $i$ and $j$ are indices from 1 to 3) represents the components of the matrix. Notice that unlike Python, we are indexing from 1.

Matrices come in all shapes and sizes and we can represent its shape by the number of rows and columns ($m$ rows and $n$ columns = $m \times n$ matrix)

For example, $\mathbf{A}$ is a $3 \times 3$ matrix.

##### MiniQuiz

1. What is the size of this matrix?

$$\mathbf{C} = \begin{bmatrix}1 & 0 & -4 & 10 & 0.2 & 6 & 2+2i \\ 2 & 4 & 2.6 & 0.8 & -4.2 & \sqrt[4]{-1} & 1.5\\ 3 & 10 & -\pi & e & 2.1 & \alpha & -\sqrt{17}/5\\ 5 & 8 & 2 & 1 & \pi^{-1} & \sqrt{2} & 1.1\end{bmatrix}$$

# 1.2 What can I do with Vectors and Matrices?

Just like we can with numbers and variables, we can combine vectors and matrices in fun ways to do interesting things.

In regular arithmetics, we can add, subtract, multiply numbers. Can we do the same with matrices and vectors?

Indeed we can, but we have certain constraints.

#### Addition

**Vectors**

To add vectors, they both need to be of the same size. If $\vec{a}$ and $\vec{b}$ are of size $4 \times 1$ and $5 \times 1$, you cannot add them together, because it simply makes no sense.

However, if the are of the same size, addition is relatively straightforward:

$$ \vec{a} = \begin{bmatrix}a_x\\a_y\\a_z\end{bmatrix}$$

$$ \vec{b} = \begin{bmatrix}b_x\\b_y\\b_z\end{bmatrix}$$

$$ \vec{a} + \vec{b} = \begin{bmatrix}a_x+b_x\\a_y+b_y\\a_z+b_z\end{bmatrix}$$

Tada! It was that simple!

But what does that mean geometrically?

Well, if vectors represent directions and magnitudes, the sum of two vectors tells you the total direction and magnitude travelled if you travel along the first vector and then the second vector.

<img src="https://i.ibb.co/1GkFWyZ/addvec.png" alt="addvec" border="0">

Let's try matrices now!

**Matrices**

Just like vectors, matrices must be of equivalent sizes to sum them together.

$$\mathbf{A} = \begin{bmatrix}A_{11} & A_{12}\\ A_{21} & A_{22}\end{bmatrix}$$

$$\mathbf{B} = \begin{bmatrix}B_{11} & B_{12}\\ B_{21} & B_{22}\end{bmatrix}$$

$$\mathbf{A} + \mathbf{B} = \begin{bmatrix}A_{11} + B_{11} & A_{12} + B_{12}\\ A_{21} + B_{21} & A_{22} + B_{22}\end{bmatrix}$$


##### MiniQuiz

If we have a row vector $\begin{bmatrix}1 & 2 & 3\end{bmatrix}$ and another row vector $\begin{bmatrix}-7 & 6 & 2 & 8\end{bmatrix}$, what is their sum?

What about the sum of row vector $\begin{bmatrix}8 & -9 & 2 & 16\end{bmatrix}$ and $\begin{bmatrix}-7 & 6 & 2 & 8\end{bmatrix}$?

#### Subtraction

Subtraction is the exact same as addition except we negate the second operand's components.

So, for vectors:

$$ \vec{a} - \vec{b} = \begin{bmatrix}a_x-b_x\\a_y-b_y\\a_z-b_z\end{bmatrix}$$

And for matrices:

$$\mathbf{A} - \mathbf{B} = \begin{bmatrix}A_{11} - B_{11} & A_{12} - B_{12}\\ A_{21} - B_{21} & A_{22} - B_{22}\end{bmatrix}$$

For vectors, this can be seen geometrically as finding the vector displacement from the one position to another.

<img src="https://i.ibb.co/1GkFWyZ/addvec.png" alt="addvec" border="0">

#### Multiplication

Aha! For vectors and matrices, multiplication is defined in SOOOOO MANY WAYS!

Let's start off with **scalar multiplication**.

Firstly, what is a scalar?

A scalar is just a regular number (no direction or magnitude).

So, let's scale!

For vectors:
$ \lambda\vec{a} = \begin{bmatrix}{\lambda a_x} \\ {\lambda a_y} \\ {\lambda a_z} \end{bmatrix}$
For matrices:
$$\lambda\mathbf{A} = \begin{bmatrix} \lambda A_{11} & \lambda A_{12}\\ \lambda A_{21} & \lambda A_{22}\end{bmatrix}$$

Remember, $\lambda$ is a scalar, not a vector or matrix.

Next up on our list is **Hadamard product**, also called element-wise multiplication.

To perform the Hadamard product, our vectors/matrices must be of the exact same size, just like with addition and subtraction.

Since the Hadamard product is just element-wise multiplication, you just have to multiply the corresponding components of the vectors/matrices!

So, for vectors:

$$ \vec{a} \circ \vec{b} = \begin{bmatrix}a_xb_x\\a_yb_y\\a_zb_z\end{bmatrix}$$

And for matrices:

$$\mathbf{A} \circ \mathbf{B} = \begin{bmatrix}A_{11}B_{11} & A_{12}B_{12}\\ A_{21}B_{21} & A_{22}B_{22}\end{bmatrix}$$

Third is the **dot product**!

The dot product is performed between two vectors and is pretty much the sum of all the components in the Hadamard product.

The dot product is also commutative, so $\vec{a}\cdot\vec{b} = \vec{b}\cdot\vec{a}$

So:

$$ \vec{a} \cdot \vec{b} = a_xb_x + a_yb_y + a_zb_z$$

For a pair of vectors with length n:

$$ \vec{a} \cdot \vec{b} = \sum_{i=1}^n a_ib_i$$

Geometrically, the dot product can be interpreted as the projection of one vector onto another and is the basis for a type of similarity measurement called cosine similarity.

<img src="https://i.ibb.co/qFh0RHM/dotproduct.png" alt="dotproduct" border="0">

Next would be the **cross product**, but it's pretty much irrelevant to machine learning, so I'll just put a link to the wikipedia page. It also takes a horrendously long time to explain, so there's that too.

https://en.wikipedia.org/wiki/Cross_product

Last, but probably one of the most important, **matrix multiplication**!

To multiply matrices, they must be of a very specific shape.

If the first matrix is of shape $m \times n$, then the second shape must be $n \times p$ where $p$ is an arbitrary number. The resulting matrix will then be of shape $m \times p$.

So, I've always found matrix multiplications really strange to describe, but I imagine them as a set of dot products between the rows and columns of a matrix.

We can walk through it at this link: http://matrixmultiplication.xyz/

It is important to note that matrix multiplications are NOT commutative.

Matrix multiplication has several important uses in Machine Learning.

First of all, it represents linear transformations.

When we do a matrix multiplication between a matrix and an appropriately-dimensioned vector (because vectors can be viewed as matrices, too), we apply a combination of stretching, squeezing, rotating, shearing, and reflecting (as well as the occasional orthogonal projection).

Here's a good example:

<img src="https://i.ibb.co/72wv75M/matrixtransformation.gif" alt="matrixtransformation" border="0">

In this example, we are demonstrating a combination of a rotation and a shear transformation.

Secondly, and probably more importantly, it represents a linear system of equations.


Example:
$$ \mathbf{A}\vec{v} = \vec{c}$$

If $\mathbf{A} = \begin{bmatrix}1 & 2 \\ 3 & 4 \end{bmatrix}$, $\vec{v} = \begin{bmatrix}x \\ y \end{bmatrix}$ and $\vec{c} = \begin{bmatrix}1 \\ 2\end{bmatrix}$, then:

$$\mathbf{A}\vec{v} = \begin{bmatrix}1 & 2 \\ 3 & 4 \end{bmatrix} \begin{bmatrix}x \\ y \end{bmatrix} = \begin{bmatrix}1x+2y\\3x+4y\end{bmatrix} = \begin{bmatrix}1 \\ 2\end{bmatrix} = \vec{c}$$

This is equivalent to this linear system of equations:

$$1x+2y=1$$
$$3x+4y=2$$

Using matrix multiplication, we can accelerate numerous tasks!

##### MiniQuiz

1. Can you give me the dot product of these two vectors: $\begin{bmatrix}1\\2\\3\end{bmatrix}$ and $\begin{bmatrix}-2\\3\\5\end{bmatrix}$?
<br/>
<br/>
2. How about the scalar product of this scalar and this vector: $3$ and $\begin{bmatrix}-2\\3\\5\end{bmatrix}$?
3. Ooh, what about the matrix multiplication of this row vector and this column vector: $\begin{bmatrix}-2 & 3 & 5\end{bmatrix}$ and $\begin{bmatrix}-2\\3\\5\end{bmatrix}$?


#### Division

Sadly, outside of scalar division (which is really just multiplication by the reciprocal), division isn't actually a real thing with matrices and vectors. However, closely related is matrix inversion, which is likely out of the scope of an "Introduction to Matrices and Vectors"

#### Transpose

Not only do vectors and matrices have arithmetic operations, they also have a whole boatload of other operations! But for this intro, we'll just discuss transposes.

Transposes allow you to "flip" the matrix/vector over its main diagonal.

For row vectors, their corresponding column vector is its transpose (and vice versa).

For example:

This column vector, $\begin{bmatrix}-2\\3\\5\end{bmatrix}$ is the transpose of this row vector $\begin{bmatrix}-2 & 3 & 5\end{bmatrix}$.

When you take the transpose of a matrix (and vector) of shape $(m \times n)$, the resultant matrix/vector will have shape  $(n \times m)$.

Example:

$$\mathbf{A} = \begin{bmatrix} A_{11} & A_{12} & A_{13}\\ A_{21} & A_{22} & A_{23}\end{bmatrix}$$
<br/>
<br/>
$$\mathbf{A}^\top = \begin{bmatrix} A_{11} & A_{21} \\ A_{12} & A_{22}\\ A_{13} & A_{23}\end{bmatrix}$$

### Codifying it all!

Now that we've covered all these ways to do arithmetics, let's do them in Python!

In [None]:
import numpy as np

# Create some random scalars, vectors and matrices

l = np.random.randint(2, 5)

a = np.random.randint(-10, 10, size=(3, 1))
b = np.random.randint(-10, 10, size=(3, 1))

A = np.random.randint(-5, 5, size=(3, 3))
B = np.random.randint(-5, 5, size=(3, 3))

In [None]:
# Addition

c = a + b

print("Vector a")
print(a)
print("Vector b")
print(b)
print("Vector c")
print(c)

Vector a
[[ 4]
 [-7]
 [-9]]
Vector b
[[-3]
 [-1]
 [ 1]]
Vector c
[[ 1]
 [-8]
 [-8]]


In [None]:
# Subtraction

d = c - a # You should expect b because c = a + b

print("Vector c")
print(c)
print("Vector a")
print(a)
print("Vector d")
print(d)

Vector c
[[ 1]
 [-8]
 [-8]]
Vector a
[[ 4]
 [-7]
 [-9]]
Vector d
[[-3]
 [-1]
 [ 1]]


In [None]:
# Scalar Multiplication

e = l * a  # Scaling vector a by the scalar l

print("Scalar l")
print(l)
print("Vector a")
print(a)
print("Vector e")
print(e)

Scalar l
2
Vector a
[[ 4]
 [-7]
 [-9]]
Vector e
[[  8]
 [-14]
 [-18]]


In [None]:
# Dot Product

# In NumPy we can use np.dot or ndarray.dot to perform dot products, however, it expects 1-D row vectors.

# To get the row vectors from the column vectors, we can squeeze (remove dimensions with length 1)

c = np.dot(a.squeeze(1), b.squeeze(1))

print("Vector a")
print(a)
print("Vector b")
print(b)
print("Vector c")
print(c)

Vector a
[[ 4]
 [-7]
 [-9]]
Vector b
[[-3]
 [-1]
 [ 1]]
Vector c
-14


In [None]:
# Matrix Multiplication

# Matrix-Matrix

# NumPy provides the "@" operator for matrix multiplication.

C = A @ B

print("Matrix A")
print(A)
print("Matrix B")
print(B)
print("Matrix C")
print(C)

Matrix A
[[-3  4  2]
 [-4 -1  4]
 [-1  4 -1]]
Matrix B
[[-3 -5 -3]
 [-3 -1  2]
 [-2 -2 -5]]
Matrix C
[[ -7   7   7]
 [  7  13 -10]
 [ -7   3  16]]


In [None]:
# Matrix-Vector

c = A @ b

print("Matrix A")
print(A)
print("Vector b")
print(b)
print("Vector c")
print(c)

Matrix A
[[-3  4  2]
 [-4 -1  4]
 [-1  4 -1]]
Vector b
[[-3]
 [-1]
 [ 1]]
Vector c
[[ 7]
 [17]
 [-2]]


In [None]:
# Vector-Vector

# Our vectors have shape (3 x 1) so they can't normally multiply.
# However, if you "transpose" the first vector, you get shape (1 x 3), so you can multiply.

# To transpose, NumPy has the ndarray.T attribute

c = a.T @ b

print("Vector A")
print(a.T)
print("Vector B")
print(b)
print("Matrix C")
print(c) # Technically it's not a scalar, but a (1 x 1) matrix, but it's practically a scalar.

# If you notice, this is equivalent to the dot product of a and b.

Vector A
[[ 4 -7 -9]]
Vector B
[[-3]
 [-1]
 [ 1]]
Matrix C
[[-14]]


In [None]:
# Transpose

# Let's do some more transpose demonstrations!

A = np.random.randint(0, 5, size=(3, 3))

print("Matrix A")
print(A)
print("Matrix A Transpose")
print(A.T)

B = np.random.randint(0, 5, size=(3, 4))

print("Matrix B")
print(B)
print("Matrix B Transpose")
print(B.T)

Matrix A
[[1 0 3]
 [2 2 3]
 [1 3 4]]
Matrix A Transpose
[[1 2 1]
 [0 2 3]
 [3 3 4]]
Matrix B
[[0 1 0 3]
 [2 0 4 3]
 [1 0 3 3]]
Matrix B Transpose
[[0 2 1]
 [1 0 0]
 [0 4 3]
 [3 3 3]]
