<a href="https://colab.research.google.com/github/jonkrohn/ML-foundations/blob/master/notebooks/1-intro-to-linear-algebra.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Intro to Linear Algebra

This topic, *Intro to Linear Algebra*, is the first in the *Machine Learning Foundations* series. 

It is essential because linear algebra lies at the heart of most machine learning approaches and is especially predominant in deep learning, the branch of ML at the forefront of today’s artificial intelligence advances. Through the measured exposition of theory paired with interactive examples, you’ll develop an understanding of how linear algebra is used to solve for unknown values in high-dimensional spaces, thereby enabling machines to recognize patterns and make predictions. 

The content covered in *Intro to Linear Algebra* is itself foundational for all the other topics in the Machine Learning Foundations series and it is especially relevant to *Linear Algebra II*.

Over the course of studying this topic, you'll: 

* Understand the fundamentals of linear algebra, a ubiquitous approach for solving for unknowns within high-dimensional spaces. 

* Develop a geometric intuition of what’s going on beneath the hood of machine learning algorithms, including those used for deep learning. 
* Be able to more intimately grasp the details of machine learning papers as well as all of the other subjects that underlie ML, including calculus, statistics, and optimization algorithms. 

**Note that this Jupyter notebook is not intended to stand alone. It is the companion code to a lecture or to videos from Jon Krohn's [Machine Learning Foundations](https://github.com/jonkrohn/ML-foundations) series, which offer detail on the following:**

*Segment 1: Data Structures for Algebra*

* What Linear Algebra Is  
* A Brief History of Algebra 
* Tensors 
* Scalars 
* Vectors and Vector Transposition
* Norms and Unit Vectors
* Basis, Orthogonal, and Orthonormal Vectors
* Arrays in NumPy  
* Matrices 
* Tensors in TensorFlow and PyTorch

*Segment 2: Common Tensor Operations* 

* Tensor Transposition
* Basic Tensor Arithmetic
* Reduction
* The Dot Product
* Solving Linear Systems

*Segment 3: Matrix Properties*

* The Frobenius Norm
* Matrix Multiplication
* Symmetric and Identity Matrices
* Matrix Inversion
* Diagonal Matrices
* Orthogonal Matrices


## Segment 1: Data Structures for Algebra

**Slides used to begin segment, with focus on introducing what linear algebra is, including hands-on paper and pencil exercises.**

### Scalars (Rank 0 Tensors) in Base Python

In [0]:
x = 25
x

In [0]:
type(x) # if we'd like more specificity (e.g., int16, uint8), we need NumPy or another numeric library

In [0]:
y = 3

In [0]:
py_sum = x + y
py_sum

In [0]:
type(py_sum)

int

In [0]:
x_float = 25.0
float_sum = x_float + y
float_sum

28.0

In [0]:
type(float_sum)

float

### Scalars in TensorFlow (ver 2.0 or later)

Tensors created with a wrapper, all of which [you can read about here](https://www.tensorflow.org/guide/tensor):  

* `tf.Variable`
* `tf.constant`
* `tf.placeholder`
* `tf.SparseTensor`

Most widely-used is `tf.Variable`, which we'll use here. 

Also, a full list of tensor data types is available [here](https://www.tensorflow.org/api_docs/python/tf/dtypes/DType).

In [0]:
import tensorflow as tf

In [0]:
x_tf = tf.Variable(25, dtype=tf.int16)
x_tf

<tf.Variable 'Variable:0' shape=() dtype=int16, numpy=25>

In [0]:
x_tf.shape

TensorShape([])

In [0]:
y_tf = tf.Variable(3, dtype=tf.int16)

In [0]:
x_tf + y_tf

<tf.Tensor: shape=(), dtype=int16, numpy=28>

In [0]:
tf_sum = tf.add(x_tf, y_tf)
tf_sum

<tf.Tensor: shape=(), dtype=int16, numpy=28>

In [0]:
tf_sum.numpy() # note that NumPy operations automatically convert tensors to NumPy arrays, and vice versa

28

In [0]:
type(tf_sum.numpy())

numpy.int16

In [0]:
tf_float = tf.Variable(25, dtype=tf.float16)
tf_float

<tf.Variable 'Variable:0' shape=() dtype=float16, numpy=25.0>

### Scalars in PyTorch

* PyTorch tensors are designed to be pythonic, i.e., to feel and behave like NumPy arrays
* The advantage of PyTorch tensors relative to NumPy arrays is that they easily be used for operations on GPU (see [here](https://pytorch.org/tutorials/beginner/examples_tensor/two_layer_net_tensor.html) for example) 
* As with TF tensors, in PyTorch we can similarly perform operations, and we can easily convert to and from NumPy arrays
* Documentation on PyTorch tensors, including available data types, is [here](https://pytorch.org/docs/stable/tensors.html)

In [0]:
import torch

In [0]:
x_pt = torch.tensor(25, dtype=torch.float16)
x_pt

tensor(25., dtype=torch.float16)

In [0]:
x_pt.shape

torch.Size([])

**Return to slides here.**

### Vectors (Rank 1 Tensors) in NumPy

In [0]:
import numpy as np 

In [0]:
x = np.array([25, 2, 5], np.int16) # type argument is optional
x

array([25,  2,  5], dtype=int16)

In [0]:
len(x)

3

In [0]:
x.shape

(3,)

In [0]:
type(x)

numpy.ndarray

In [0]:
x[0] # zero-indexed

25

In [0]:
type(x[0])

numpy.int16

### Vector Transposition

In [0]:
# Can't transpose an array...
x_t = x.T
x_t

array([25,  2,  5], dtype=int16)

In [0]:
x_t.shape

(3,)

In [0]:
# ...but can transpose a matrix with a dimension of length 1, which is mathematically equivalent: 
x_t = np.matrix(x).T
x_t

matrix([[25],
        [ 2],
        [ 5]], dtype=int16)

In [0]:
x_t.shape # this is a column vector as it has 3 rows and 1 column

(3, 1)

In [0]:
# Column vector can be transposed back to original row vector: 
x_t.T 

matrix([[25,  2,  5]], dtype=int16)

In [0]:
x_t.T.shape

(1, 3)

### Zero Vectors

Have no effect if added to another vector

In [0]:
z = np.zeros(3) # dtype argument is optional; defaults to float64
z

array([0., 0., 0.])

### Vectors in TensorFlow and PyTorch

In [0]:
x_tf = tf.Variable([25, 2, 5], dtype=tf.int16)
x_tf

<tf.Variable 'Variable:0' shape=(3,) dtype=int16, numpy=array([25,  2,  5], dtype=int16)>

In [0]:
x_pt = torch.tensor([25, 2, 5], dtype=torch.int16)
x_pt

tensor([25,  2,  5], dtype=torch.int16)

**Return to slides here.**

### $L^2$ Norm

In [0]:
x

array([25,  2,  5], dtype=int16)

In [0]:
(25**2 + 2**2 + 5**2)**(1/2)

25.573423705088842

In [0]:
np.linalg.norm(x)

25.573423705088842

So, if units in this 3-dimensional vector space are meters, then the vector $x$ has a length of 25.6m

**Return to slides here.**

### $L^1$ Norm

In [0]:
x

array([25,  2,  5], dtype=int16)

In [0]:
np.abs(25) + np.abs(2) + np.abs(5)

32

### Squared $L^2$ Norm

In [0]:
x

array([25,  2,  5], dtype=int16)

In [0]:
(25**2 + 2**2 + 5**2)

654

In [0]:
# we'll cover tensor multiplication more soon but to prove point quickly: 
np.dot(x.T, x)

654

**Return to slides here.**

### Max Norm

In [0]:
x

array([25,  2,  5], dtype=int16)

In [0]:
np.max([np.abs(25), np.abs(2), np.abs(5)])

25

### Matrices (Rank 2 Tensors) in NumPy

In [0]:
# Use array() with nested brackets: 
X = np.array([[25, 2], [5, 26], [3, 7]])
X

array([[25,  2],
       [ 5, 26],
       [ 3,  7]])

In [0]:
X.shape

(3, 2)

In [0]:
X.size

6

In [0]:
# Select left column of matrix X (zero-indexed)
X[:,0]

array([25,  5,  3])

In [0]:
# Select middle row of matrix X: 
X[1,:]

array([ 5, 26])

In [0]:
# Another slicing-by-index example: 
X[0:2, 0:2]

array([[25,  2],
       [ 5, 26]])

### Matrices in TensorFlow

In [0]:
X_tf = tf.Variable([[25, 2], [5, 26], [3, 7]], dtype=tf.int16)
X_tf

<tf.Variable 'Variable:0' shape=(3, 2) dtype=int16, numpy=
array([[25,  2],
       [ 5, 26],
       [ 3,  7]], dtype=int16)>

In [0]:
tf.rank(X_tf)

<tf.Tensor: shape=(), dtype=int32, numpy=2>

In [0]:
tf.shape(X_tf)

<tf.Tensor: shape=(2,), dtype=int32, numpy=array([3, 2], dtype=int32)>

In [0]:
X_tf[1,:]

<tf.Tensor: shape=(2,), dtype=int16, numpy=array([ 5, 26], dtype=int16)>

### Matrices in PyTorch

In [0]:
X_pt = torch.tensor([[25, 2], [5, 26], [3, 7]], dtype=torch.int16)
X_pt

tensor([[25,  2],
        [ 5, 26],
        [ 3,  7]], dtype=torch.int16)

In [0]:
X_pt.shape # more pythonic

torch.Size([3, 2])

In [0]:
X_pt[1,:]

tensor([ 5, 26], dtype=torch.int16)

**Return to slides here.**

### Higher-Rank Tensors

As an example, rank 4 tensors are common for images, where each dimension corresponds to: 

1. Number of images in training batch, e.g., 32
2. Image height in pixels, e.g., 28 for [MNIST digits](http://yann.lecun.com/exdb/mnist/)
3. Image width in pixels, e.g., 28
4. Number of color channels, e.g., 3 for full-color images (RGB)

In [0]:
images_tf = tf.zeros([32, 28, 28, 3])

In [0]:
# images_tf

In [0]:
images_pt = torch.zeros([32, 28, 28, 3])

In [0]:
# images_pt

**Return to slides here.**

## Segment 2: Common Tensor Operations

### Tensor Transposition

In [0]:
X

array([[25,  2],
       [ 5, 26],
       [ 3,  7]])

In [0]:
X.T

array([[25,  5,  3],
       [ 2, 26,  7]])

In [0]:
tf.transpose(X_tf)

<tf.Tensor: shape=(2, 3), dtype=int16, numpy=
array([[25,  5,  3],
       [ 2, 26,  7]], dtype=int16)>

In [0]:
X_pt.T # more pythonic!

tensor([[25,  5,  3],
        [ 2, 26,  7]], dtype=torch.int16)

### Basic Arithmetical Properties

Adding or multiplying with scalar applies operation to all elements and tensor shape is retained: 

In [0]:
X*2

array([[50,  4],
       [10, 52],
       [ 6, 14]])

In [0]:
X+2

array([[27,  4],
       [ 7, 28],
       [ 5,  9]])

In [0]:
X*2+2

array([[52,  6],
       [12, 54],
       [ 8, 16]])

In [0]:
X_tf*2+2  # operators are overloaded; could alternatively use tf.multiply() or tf.add()

<tf.Tensor: shape=(3, 2), dtype=int16, numpy=
array([[52,  6],
       [12, 54],
       [ 8, 16]], dtype=int16)>

In [0]:
tf.add(tf.multiply(X_tf, 2), 2)

<tf.Tensor: shape=(3, 2), dtype=int16, numpy=
array([[52,  6],
       [12, 54],
       [ 8, 16]], dtype=int16)>

In [0]:
X_pt*2+2

tensor([[52,  6],
        [12, 54],
        [ 8, 16]], dtype=torch.int16)

In [0]:
torch.add(torch.mul(X_pt, 2), 2)

tensor([[52,  6],
        [12, 54],
        [ 8, 16]], dtype=torch.int16)

If two tensors have the same size, operations are often by default applied element-wise. This is **not matrix multiplication**, which we'll cover later, but is rather called the **Hadamard product** or simply the **element-wise product**. 

The mathematical notation is $A \odot X$

In [0]:
X

array([[25,  2],
       [ 5, 26],
       [ 3,  7]])

In [0]:
A = X+2
A

array([[27,  4],
       [ 7, 28],
       [ 5,  9]])

In [0]:
A + X

array([[52,  6],
       [12, 54],
       [ 8, 16]])

In [0]:
A * X

array([[675,   8],
       [ 35, 728],
       [ 15,  63]])

In [0]:
A_tf = X_tf + 2

In [0]:
A_tf + X_tf

<tf.Tensor: shape=(3, 2), dtype=int16, numpy=
array([[52,  6],
       [12, 54],
       [ 8, 16]], dtype=int16)>

In [0]:
A_tf * X_tf

<tf.Tensor: shape=(3, 2), dtype=int16, numpy=
array([[675,   8],
       [ 35, 728],
       [ 15,  63]], dtype=int16)>

In [0]:
A_pt = X_pt + 2

In [0]:
A_pt + X_pt

tensor([[52,  6],
        [12, 54],
        [ 8, 16]], dtype=torch.int16)

In [0]:
A_pt * X_pt

tensor([[675,   8],
        [ 35, 728],
        [ 15,  63]], dtype=torch.int16)

### Reduction

Calculating the sum across all elements of a tensor is a common operation. For example: 

* For vector ***x*** of length *n*, we calculate $\sum_{i=1}^{n} x_i$
* For matrix ***X*** with *m* by *n* dimensions, we calculate $\sum_{i=1}^{m} \sum_{j=1}^{n} X_{i,j}$

In [0]:
X

array([[25,  2],
       [ 5, 26],
       [ 3,  7]])

In [0]:
X.sum()

68

In [0]:
tf.reduce_sum(X_tf)

<tf.Tensor: shape=(), dtype=int16, numpy=68>

In [0]:
torch.sum(X_pt)

tensor(68)

In [0]:
# Can also be done along one specific axis alone, e.g.:
X.sum(axis=0) # summing all rows

array([33, 35])

In [0]:
X.sum(axis=1) # summing all columns

array([27, 31, 10])

In [0]:
tf.reduce_sum(X_tf, 0)

<tf.Tensor: shape=(2,), dtype=int16, numpy=array([33, 35], dtype=int16)>

In [0]:
torch.sum(X_pt, 1)

tensor([27, 31, 10])

Many other operations can be applied with reduction along all or a selection of axes, e.g.:

* maximum
* minimum
* mean
* product

They're fairly straightforward and used less often than summation, so you're welcome to look them up in library docs if you ever need them.

### The Dot Product

If we have two vectors (say, ***x*** and ***y***) with the same length *n*, we can calculate the dot product between them. This is annotated several different ways, including the following: 

* $x \cdot y$
* $x^Ty$
* $\langle x,y \rangle$

Regardless which notation you use (I prefer the first), the calculation is the same; we calculate products in an element-wise fashion and then sum reductively across the products to a scalar value. That is, $x \cdot y = \sum_{i=1}^{n} x_i y_i$

The dot product is ubiquitous in deep learning: It is performed at every artificial neuron in a deep neural network, which may be made up of millions (or orders of magnitude more) of these neurons.

**Return to slides here.**

## Segment 3: Matrix Properties

### Frobenius Norm

In [0]:
X = np.array([[1, 2], [3, 4]])
X

array([[1, 2],
       [3, 4]])

In [0]:
(1**2 + 2**2 + 3**2 + 4**2)**(1/2)

5.477225575051661

In [0]:
np.linalg.norm(X) # same function as for vector L2 norm

5.477225575051661

**Return to slides here.**

### Matrix Multiplication (with a Vector)

In [0]:
A = np.array([[3, 4], [5, 6], [7, 8]])
A

array([[3, 4],
       [5, 6],
       [7, 8]])

In [0]:
B = np.array([1, 2])
B

array([1, 2])

In [0]:
np.dot(A, B) # even though technically dot products are between vectors only

array([11, 17, 23])

In [0]:
A_tf = tf.Variable([[3, 4], [5, 6], [7, 8]])
A_tf

<tf.Variable 'Variable:0' shape=(3, 2) dtype=int32, numpy=
array([[3, 4],
       [5, 6],
       [7, 8]], dtype=int32)>

In [0]:
B_tf = tf.Variable([1, 2])
B_tf

<tf.Variable 'Variable:0' shape=(2,) dtype=int32, numpy=array([1, 2], dtype=int32)>

In [0]:
tf.linalg.matvec(A_tf, B_tf)

<tf.Tensor: shape=(3,), dtype=int32, numpy=array([11, 17, 23], dtype=int32)>

In [0]:
A_pt = torch.tensor([[3, 4], [5, 6], [7, 8]])
A_pt

tensor([[3, 4],
        [5, 6],
        [7, 8]])

In [0]:
B_pt = torch.tensor([1, 2])
B_pt

tensor([1, 2])

In [0]:
torch.matmul(A_pt, B_pt) # like np.dot(), automatically infers dims in order to perform dot product, matvec, or matrix multiplication

tensor([11, 17, 23])

**Return to slides here.**

### Symmetric Matrices

In [0]:
X_sym = np.array([[0, 1, 2], [1, 7, 8], [2, 8, 9]])
X_sym

array([[0, 1, 2],
       [1, 7, 8],
       [2, 8, 9]])

In [0]:
X_sym.T == X_sym

array([[ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True]])

**Return to slides here.**

### Identity Matrices

In [0]:
I = torch.tensor([[1, 0, 0], [0, 1, 0], [0, 0, 1]])
I

tensor([[1, 0, 0],
        [0, 1, 0],
        [0, 0, 1]])

In [0]:
x_pt = torch.tensor([25, 2, 5])
x_pt

tensor([25,  2,  5])

In [0]:
torch.matmul(I, x_pt)

tensor([25,  2,  5])

**Return to slides here.**

### Matrix Multiplication (with Two Matrices)

In [0]:
B = np.array([[1, 9], [2, 0]])
B

array([[1, 9],
       [2, 0]])

In [0]:
np.dot(A, B)

array([[11, 27],
       [17, 45],
       [23, 63]])

In [0]:
B_tf = tf.convert_to_tensor(B, dtype=tf.int32)
B_tf

<tf.Tensor: shape=(2, 2), dtype=int32, numpy=
array([[1, 9],
       [2, 0]], dtype=int32)>

In [0]:
tf.matmul(A_tf, B_tf)

<tf.Tensor: shape=(3, 2), dtype=int32, numpy=
array([[11, 27],
       [17, 45],
       [23, 63]], dtype=int32)>

In [0]:
B_pt = torch.from_numpy(B) # much cleaner than TF conversion
B_pt

tensor([[1, 9],
        [2, 0]])

In [0]:
# another neat way to create the same tensor with transposition: 
B_pt = torch.tensor([[1, 2], [9, 0]]).T
B_pt

tensor([[1, 9],
        [2, 0]])

In [0]:
torch.matmul(A_pt, B_pt) # no need to change functions, unlike in TF

tensor([[11, 27],
        [17, 45],
        [23, 63]])

Note that matrix multiplication is not "commutative" (i.e., $AB \neq BA$) so uncommenting the following line will throw a size mismatch error:

In [0]:
# torch.matmul(B_pt, A_pt)

**Return to slides here.**

In [0]:
M_q = torch.tensor([[0, 1, 2], [3, 4, 5], [6, 7, 8]])
M_q

tensor([[0, 1, 2],
        [3, 4, 5],
        [6, 7, 8]])

In [0]:
V_q = torch.tensor([[-1, 1, -2], [0, 1, 2]]).T
V_q

tensor([[-1,  0],
        [ 1,  1],
        [-2,  2]])

In [0]:
torch.matmul(M_q, V_q)

tensor([[ -3,   5],
        [ -9,  14],
        [-15,  23]])

### Matrix Inversion

In [0]:
X = np.array([[4, 2], [-5, -3]])
X

array([[ 4,  2],
       [-5, -3]])

In [0]:
Xinv = np.linalg.inv(X)
Xinv

array([[ 1.5,  1. ],
       [-2.5, -2. ]])

In [0]:
y = np.array([4, -7])
y

array([ 4, -7])

In [0]:
w = np.dot(Xinv, y)
w

array([-1.,  4.])

**Return to slides here.**

### Matrix Inversion Where No Solution

In [0]:
X = np.array([[-4, 1], [-8, 2]])
X

array([[-4,  1],
       [-8,  2]])

In [0]:
# Uncommenting the following line results in a "singular matrix" error
# Xinv = np.linalg.inv(X)