## Sections
* Linear Algebra Notation
* Central Theory of Linear Algebra
* Vector Basics
* Matrix Basics
* Linear Transformations
* Span and Linear Independence
* Dot Product
* Cross Product
* Determinants
* Matrix Multiplication
* Systems of Linear Equations
* Eigenvalues, Eigenvectors
* Covariance Matrix
* Spectral Theorem

---

## Linear Algebra Notation

$i_{th}, j_{th}$: Used to refer to an index of a matrix $[i, j]$ or to whole rows $i$ or whole columns $j$.

$\mathbf{A}^T$: The transpose of matrix $\mathbf{A}$

$\text{det}(\mathbf{A})$: The determinant of matrix $\mathbf{A}$

$\text{rank}(\mathbf{A})$: The rank of matrix $\mathbf{A}$

$\text{tr}(\mathbf{A})$: The trace of matrix $\mathbf{A}$

$\mathbf{A}^{-1}$: The inverse of matrix $\mathbf{A}$

$\mathbf{Ax} = \mathbf{b}$: A system of linear equations where $\mathbf{A}$ is a matrix, $\mathbf{x}$ is a column vector of variables, and $\mathbf{b}$ is a column vector of constants

$\mathbf{x} \cdot \mathbf{y}$ or $\langle x, y \rangle$: The inner product (dot product) of vectors $\mathbf{x}$ and $\mathbf{y}$, takes in two vectors and returns a scalar

$\mathbf{A} \times \mathbf{B}$: The cross product of matrices $\mathbf{A}$ and $\mathbf{B}$, takes in two vectors and returns a vector

$\mathbf{x} \otimes \mathbf{y}$: The outer product of vectors $\mathbf{x}$ and $\mathbf{y}$, takes in two vectors and returns a matrix

$|\mathbf{x}|$: The norm (length) of column vector $\mathbf{x}$

$\mathbf{I}$: The identity matrix

$\mathbf{0}$: The zero matrix

$\text{span}({\mathbf{v}_1, \mathbf{v}_2, \ldots, \mathbf{v}_n})$: The span of column vectors $\mathbf{v}_1, \mathbf{v}_2, \ldots, \mathbf{v}_n$

$\text{null}(\mathbf{A})$: The null space of matrix $\mathbf{A}$


---

## Central Theory of Linear Algebra
* All of linear algebra can be completed with two operations: vector addition and vector multiplication.
* All vectors can be represented as scalar multiples of the two basis vectors *i* and *j*.

In [45]:
using LinearAlgebra

# Basis vectors.
i = [1, 0]
j = [0, 1]

# Vector addition.
[1, 2] + [3, 4]

# Vector scaling.
[4, 6] * 2

2-element Vector{Int64}:
  8
 12

---

## Vector Basics
* The magnitude (length) of a vector is calculated as: $||v||_2 = \sqrt{v_1^2 + v_2^2 + ... + v_n^2}$
* The magnitude is also called the L2 norm or Euclidean norm: $||v||_2 = (\sum^n_{i=1}|x_i|^2)^{\frac{1}{2}}$
* We can also calculate the L1 norm or Manhattan norm: $||v||_1 = \sum^n_{i=1}|x_i|$
    * This can be thought of in 2D as drawing a line along the x-axis and then a line along the y-axis to the point. The sum of the lengths of the lines is the L1 norm.
* The angle between two vectors is calculated as: $\theta = cos^{-1}(\frac{v \cdot w}{||v|| \cdot ||w||})$

In [46]:
A = [2, 1, 4]
B = [1, 3, 0]

# Calculating vector length by hand.
v = sqrt(A[1]^2 + A[2]^2 + A[3]^2)
println("Vector length calculated by hand: $v")

# Calculating vector length using norm() in Julia.
v = norm(A)
println("Vector length calculated by norm(): $v")

# Calculating the angle between A and B.
Θ = acos(dot(A, B) / (norm(A) * norm(B))) # Outputted in radians, we could use acosd() to output in degrees.
Θ = Θ * 180 / pi # Converting to degrees.
println("Angle between A and B: $Θ")

Vector length calculated by hand: 4.58257569495584
Vector length calculated by norm(): 4.58257569495584
Angle between A and B: 69.81620322790722


---

## Matrix Basics
* Matrix transpose flips a matrix over its diagonal (switches the row and column indices of a matrix).
    * This is formally defined as: $(A^T)_{ij} = A_{ji}$
* A square matrix whose transpose is equal to itself is called a symmetric matrix: $A = A^T$
* A square matrix whose transpose is equal to its inverse is called an orthogonal matrix: $A^{-1} = A^T$
* The identity matrix $I$ is a square matrix with ones on the diagonal and zeros elsewhere.
* An invertible matrix is a matrix that when multiplied by the inverse of that matrix, equals the identity matrix: $A^{-1}A = I$
    * Because the identity matrix $I$ is square, invertible matrices must be square.

In [47]:

A = [2 1 4; 3 3 1]

# Transpose of a matrix.
At = A'
println("Base Matrix: $A")
println("Transposed Base Matrix: $At")

# The multiplication of a matrix with its transpose yields a symmetric matrix.
# This is proven by showing that A*A' is its own transpose.
(A*A')' == (A')'*A' == A*A'

# Symmetric matrix example.
S = [1 2 3; 2 4 5; 3 5 6]
S == S'

# The identity matrix.
i = [1 0 0; 0 1 0; 0 0 1]
println("Identity Matrix: $i")

# Orthogonal matrix example.
# The inverse of an orthogonal matrix is its transpose.
# This is proven by showing that A*A' is the identity matrix.
A = [0 1; -1 0]
inv(A) == A'

# Invertible matrix example.
out = A*inv(A)
println("Invertible Matrix A * inv(A): $out")


Base Matrix: [2 1 4; 3 3 1]
Transposed Base Matrix: [2 3; 1 3; 4 1]
Identity Matrix: [1 0 0; 0 1 0; 0 0 1]
Invertible Matrix A * inv(A): [1.0 0.0; 0.0 1.0]


---

## Linear Transformations
* Linear transformation is the process of multiplying vectors by a matrix in order to transform them into new vectors.
* Linear transformation can be thought of as a way to move around space. Every time that you see a matrix you can interpret it as a transformation of space.
    * For example, rotating the standard 2D grid by 45 degrees. 
* The linear transformation of any vector can be described/calculated in terms of how the basis vectors *i* and *j* are transformed by the linear transformation.
* Matrices are the language that describes linear transformation.
    * Columns of the matrix represent how the basis vectors are transformed, they can be thought of as the coordinates of the transformed basis vectors.
    * You can transform vectors between dimensions with nonsquare matrices.
* Types of linear transformations: rotate, reflect, scale, shear, project.
* Matrix multiplication stretches or squeezes the size of the grid, this degree of stretch/squeeze is encoded in the determinant.
    * This stretch/squeeze can be represented by how much area the basis vectors take after transformation (by default 1i * 1j = 1).

In [48]:
vec = [2, 2]
println("Base vector: $vec")

# Reflection in the y-axis.
A = [-1 0; 0 1]
out = A * vec
println("Reflection in the y-axis: $out")

# Reflection in the x-axis.
A = [1 0; 0 -1]
out = A * vec
println("Reflection in the x-axis: $out")

# Horizontal expansion.
A = [2 0; 0 1]
out = A * vec
println("Horizontal expansion: $out")

# Horizontal shear.
A = [1 4; 0 1]
out = A * vec
println("Horizontal sheer: $out")

# Transforming a vector from 2d to 3d space with a nonsquare matrix.
A = [1 2; 3 4; 5 6]
out = A * vec
println("Transformation from 2d to 3d w/ nonsquare matrix: $out")


Base vector: [2, 2]
Reflection in the y-axis: [-2, 2]
Reflection in the x-axis: [2, -2]
Horizontal expansion: [4, 2]
Horizontal sheer: [10, 2]
Transformation from 2d to 3d w/ nonsquare matrix: [6, 14, 22]


---

## Span and Linear Independence
* **Vector Space** = Set of vectors {v1, v2, ..., vn} along with two operations, vector addition and scalar multiplication.
* **Span** = Given a set of vectors {v1, v2, ..., vn} in a vector space V, the span is the set of all linear combinations of the vectors. 
    * The span is itself a vector space that represents all possible linear combinations of the given vectors.
    * The dimension of the span equals the number of vectors in the vector space unless two of the vectors are linearly dependent (fall on the same line).
* **Spanning Set** = Set of vectors that can be combined linearly to produce any vector in the vector space. A set of vectors that spans the entire vector space.
* **Linear Independence** = A set of vectors is linearly independent if no vector in the set can be expressed as a linear combination of other vectors in the set.
* **Basis** = The basis for a vector space is the set of linearly independent vectors that can be used to represent any vector in a vector space V.
    * A vector space may have many different bases, but all bases have the same number of vectors, which is the dimension of the vector space.
    * For example, the basis of 3D space is 3.
* **Dimension** = Number of vectors in the basis.
    * The dimension of any span equals the number of vectors in the span, unless two of the vectors are linearly dependent (fall on the same line).
* **Rank (Matrix Rank)** = Number of dimensions in the output of a transformation.
    * A linear transformation that transforms space to a line has rank 1.
        * Note that in a Rank-1 matrix all rows and columns are multiples of each other, this is what transforms space to a line.
    * A linear transformation that transforms space to a plane has rank 2.
* **Column Space** = Span of the vectors that make up all columns in a matrix.
* **Row Space** = Span of the vectors that make up all rows in a matrix.
* **Null Space** = All possible solutions to the equation *Ax* = 0.
    * All vectors *x* that become null (land on the zero vector) when transformed by *A*.

In [49]:
# Any vector in the span of two vectors can be expressed as a linear combination
# of the two vectors. For example, [2, 3, 0] is in the span of [1, 0, 0] and [0, 1, 0].
[2, 3, 0] == 2 * [1, 0, 0] + 3 * [0, 1, 0]

# The basis for 3D space i, j, k that can be used to represent any vector in 3D space.
# Because there are 3 vectors, the dimension of the space is 3.
i = [1, 0, 0]
j = [0, 1, 0]
k = [0, 0, 1]

# Calculating the nullspace of A and then calculating Ax = 0.
A = [1 2; 3 4; 5 6]
x = nullspace(A')
out = A'x
println("Output of Ax round to 0 (e-16): $out")


Output of Ax round to 0 (e-16): [-4.440892098500626e-16; -8.881784197001252e-16;;]


---

## Dot Product 
* Also referred to as the *inner product* $\langle x, y \rangle$.
* **Dot Product** = Length vector A * length vector B * cos(angle between the vectors).
    * $A \cdot B = ||A|| \times ||B|| \times \cos(\theta)$
    * Geometrically, the dot product is equivalent to projecting one vector onto another, taking the length of the projection, and multiplying it by the length of the other vector.
    * When the vectors point in the same direction, the projection and thus the dot product are positive.
    * When the vectors point in opposite directions the dot product is negative.
    * When the vectors are perpendicular (orthogonal), the projection onto the other vector has a length of 0, thus the dot product is 0.
    * The dot product of a vector with itself is equal to the square of its magnitude (length).


In [50]:
# Dot Product
A = [1, 4, 2]
B = [2, 4, 2]

# The dot product is calculated as the sum of the products of the corresponding
# entries of the two vectors. 
out = (1 * 2) + (4 * 4) + (2 * 2)
println("Dot product via sum of products: $out")

# Calculating the dot product as ||A|| * ||B|| * cos(theta).
Θ = acosd(dot(A, B) / (norm(A) * norm(B))) 
out = norm(A) * norm(B) * cosd(Θ)
println("Dot product via cos(Θ) approach: $out")

# Dot product via the dot() function.
out = dot(A, B)
println("Dot product via dot() function: $out")

# The dot product of a vector with itself is equal to the square of its magnitude
# (length). The length of the vector [2, 2] is 2.82.
C = [2, 2]
sqrt(dot(C, C)) == norm(C)

# The dot product of orthogonal vectors is 0.
A = [10, 0]
B = [0, 10]
out = dot(A, B)
println("Dot product of orthogonal vectors: $out")

Dot product via sum of products: 22
Dot product via cos(Θ) approach: 22.0
Dot product via dot() function: 22
Dot product of orthogonal vectors: 0


---


## Cross Product
* **Cross Product** = Takes two vectors and produces a third vector that is orthogonal to both of the original vectors.
    * The cross product, $v \times w$ is the area of the parallelogram formed by the two vectors
    * The cross product of a vector with itself is 0. 
    * The cross product is simply the determinant of the matrix created by combining vectors v and w, this is because the determinant is a measure of how much the area changes as a result of transformation, and this area is the area of the parallelogram


In [51]:

A = [1, 4, 2]
B = [2, 4, 2]

# Formula for calculating the cross product.
out = [A[2] * B[3] - A[3] * B[2], A[3] * B[1] - A[1] * B[3], A[1] * B[2] - A[2] * B[1]]
println("Cross product calculated by hand: $out")

# Cross product using the cross function.
cross(A, B)
println("Cross product calculated using cross(): $out")

# The cross product of A and B is orthogonal to both A and B.
out = dot(cross(A, B), A)
println("dot(cross(A, B), A): $out")

out = dot(cross(A, B), B)
println("dot(cross(A, B), B): $out")

# The cross product of a vector with itself is 0.
out = cross(A, A)
println("Cross product of a vector with itself: $out")

Cross product calculated by hand: [0, 2, -4]
Cross product calculated using cross(): [0, 2, -4]
dot(cross(A, B), A): 0
dot(cross(A, B), B): 0
Cross product of a vector with itself: [0, 0, 0]


---

## Outer Product
* Also referred to as the *tensor product* $\mathbf{x} \otimes \mathbf{y}$
* **Outer Product** = given an *n* dimensional vector *u* and a *m* dimensional vector *v* returns an *n x m* matrix.

$$u \otimes v = \begin{bmatrix} u_1v_1 & u_1v_2 & \cdots & u_1v_m \\ u_2v_1 & u_2v_2 & \cdots & u_2v_m \\ \vdots & \vdots & \ddots & \vdots \\ u_nv_1 & u_nv_2 & \cdots & u_nv_m \end{bmatrix}$$

* The output matrix provides information about the pairwise relationship between the elements of the two input vectors.


In [1]:
# Outer product example in julia.
A = [1, 2, 3]
B = [4, 5, 6]
A * B'

3×3 Matrix{Int64}:
  4   5   6
  8  10  12
 12  15  18

---

## Determinants
* **Determinant** = Scalar value that measures how much the area of a vector space changes as a result of a linear transformation with a square matrix.
    * The determinant can be thought of as the "area scaling factor" that a linear transformation using the matrix would induce.
    * Can only be computed for square matrices.
    * Negative determinants represent a flip of orientation.
    * When the determinant is 0, *i* has been transformed so it is a linear combination of *j* (on the same line). The area of the grid is now 0, thus the determinant is 0.

In [52]:

# The determinant of a 2x2 matrix [a b; c d] is given by the formula ad - bc.
A = [1 2; 3 4]
out = 1 * 4 - 2 * 3
println("2x2 determinant calculated by hand: $out")

out = det(A)
println("2x2 determinant calculated by via det(): $out")

# The determinant of a horizontal expansion by a factor of 2 is simply 2. 
A = [2 0; 0 1]
out = det(A)
println("Determinant of a horizontal expansion by 2: $out")

# The determinant of larger matrices can be manually calculated using Laplace expansion.
A = [1 2 3; 4 1 6; 7 8 1]
out = det(A)
println("Determinant of A calculated via Laplace expansion: $out")

# The determinant of a matrix is also the product of the eigenvalues of the matrix.
vals = eigvals(A)
out = vals[1] * vals[2] * vals[3]
println("Determinant of A calculated using eigenvalues: $out")

# When the columns are linearly dependent, the determinant is 0.
A = [1 2 3; 2 4 6; 4 8 12]
out = det(A)
println("Determinant of a linearly dependent matrix: $out")


2x2 determinant calculated by hand: -2
2x2 determinant calculated by via det(): -2.0
Determinant of a horizontal expansion by 2: 2.0
Determinant of A calculated via Laplace expansion: 104.0
Determinant of A calculated using eigenvalues: 103.99999999999994
Determinant of a linearly dependent matrix: 0.0


---

## Matrix Multiplication
* Matrix multiplication is simply a composition of linear transformations. 
    * Each matrix represents a linear transformation, and the multiplication of both matrices tells you what happens to the basis vectors when both linear transformations are applied.
    * For example, two matrices may represent a grid being rotated 90 degrees and then being sheared.
    * The order in which matrices are multiplied matters, a 90 degree rotation followed by a shear results in a different matrix than a shear followed by a 90 degree rotation.
    * Matrix multiplication is not commutative, $AB \neq BA$.
* The inner dimensions of the matrices must match, the outer dimensions of the resulting matrix will be the outer dimensions of the two input matrices.
    * $A_{m \times n} \times B_{n \times p} = C_{m \times p}$



In [53]:
vec = [2, 2]

# Reflect over the y-axis and then stretch in the horizontal direction.
A = [-1 0; 0 1]
B = [2 0; 0 1]
composition = A * B
out = composition * vec 
println("Result of composition of linear transformations: $out")

Result of composition of linear transformations: [-4, 2]


---

## Systems of Linear Equations
* The geometric interpretation of solving the fundamental equation *Ax* = *b* is the following: We are looking for a matrix *A* that transforms vector *x* into vector *b*.
    * *Ax* = *b* is solving for which original vector *x* is stretched to *b* when the matrix *A* transforms the space.
* *A* is a coefficient matrix, *x* is the vector of variables that we want to solve for, and *b* is vector of constants on the right hand side of the equation. 
* If the matrix *A* is invertible (its determinant does not equal 0), then we can solve for *x* using the form: $x = A^{-1}b$.
    * If det(A) does not equal 0 than the columns of A are linearly dependent and the transformation using A does not squish all space into a lower dimension, thus there is only one vector *x* that can be transformed into vector *b* via *A*.
    * If det(A) = 0 then the transformation from *A* squishes space into a lower dimension and it is likely that there is no solution. There will only be a solution if *b* falls upon the lower dimension line/plane/etc.


In [54]:
# Take the system of equations:
# x + 2y + 2z = 5
# 3x - 2y + z = -6
# 2x + y - z = 1

# This can be represented as:
A = [1 2 2; 3 -2 1; 2 1 -1]
# x = [x, y, z]
b = [5, -6, 1]

println("A matrix: $A")
println("b output: $b")

# The vector x is solved for by multiplying the inverse of A by b.
coefs = inv(A) * b

# Check outputs.
out = (1 * coefs[1]) + (2 * coefs[2]) + (2 * coefs[3])
println("b[1] estimate: $out")

out = (3 * coefs[1]) + (-2 * coefs[2]) + (1 * coefs[3])
println("b[2] estimate: $out")

out = (2 * coefs[1]) + (1 * coefs[2]) + (-1 * coefs[3])
println("b[3] estimate: $out")


A matrix: [1 2 2; 3 -2 1; 2 1 -1]
b output: [5, -6, 1]
b[1] estimate: 5.000000000000001
b[2] estimate: -6.0
b[3] estimate: 1.0000000000000009


---

## Eigenvectors, Eigenvalues
* **Eigenvector** = Vector that does not get knocked off its span (the 1D line that houses all scalar multiples of the vector) when a matrix transformation *A* is applied to the vector.
    * $A * v = \lambda * v$ where *v* is the eigenvector and *lambda* is the eigenvalue.
    * $\begin{bmatrix} 2 & 0 \\ 0 & 3 \end{bmatrix}$ * $\begin{bmatrix} 1 \\ 0 \end{bmatrix}$ = 2 * $\begin{bmatrix} 1 \\ 0 \end{bmatrix}$
* **Eigenvalue** = The factor with which the vector is stretched or squished during the transformation. 
* Eigenvectors can be used to scale down large matrices. If $M$ is a large matrix, we can look for a vector $o$ and a scalar $n$ which can be used to generate $M$.
    * Then one only needs the eigenvector and eigenvalue ($o$ and $n$) to generate the matrix.
    * $M \times o = o \times n$ where $o$ is the eigenvector and $n$ is the eigenvalue.

In [None]:
A = [2 0; 0 3]

# Find the eigenvectors and eigenvalues of A.
out = eigvecs(A)
println("eigenvectors of A: $out")

out = eigvals(A)
println("eigenvalues of A: $out")

# For both eigenvectors and eigenvalues, A * eigvec = eigvec * eigval.
A * eigvecs(A)[:, 1] == eigvecs(A)[:, 1] * eigvals(A)[1]
A * eigvecs(A)[:, 2] == eigvecs(A)[:, 2] * eigvals(A)[2]

---

## Covariance Matrix

* **Covariance Matrix** = Matrix of the form $X^TX$, calculates the covariance between each pair of variables in a dataset.
* Recall, covariance measures the direction and degree to which two variables vary together: $cov(X, Y) = \frac{\sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y})}{n-1}$
* The $(i, j)^{th}$ entry of $X^TX$ tells us how similar the $i^{th}$ feature (row) is to the $j^{th}$ feature (column).
* $X^TX$ is a symmetric matrix, the diagonal entries are the variance of each feature, and the off-diagonal entries are the covariance between each pair of features.
* **Fact**: All eigenvalues of symmetric matrices are $>= 0$.
* To calculate the covariance matrix:
    * 1) Center the data by subtracting the mean from each feature.
    * 2) Calculate the covariance matrix using an outer product: $X^TX$.
    * 3) Divide the covariance matrix by the number of observations (rows/cols) minus 1.

In [2]:
using LinearAlgebra
using Statistics

# Calculate covariance matrix using built-in cov() function.
A = [2 2 3; 4 2 6; 2 8 14]
cov(A)

# Calculate covariance matrix by hand:

# Center the data by subtracting the mean from each column.
Adm = A .- mean(A, dims=1)

# Calculate the covariance matrix.
CovMat = (Adm' * Adm) / (size(A)[1] - 1)

# Show that the diaganol entries are the variance of each feature.
var(A[:, 1]) # Variance of the first column.
CovMat[1, 1] # Diagonal entry of the covariance matrix.

var(A[:, 2]) # Variance of the first column.
CovMat[2, 2] # Diagonal entry of the covariance matrix.

# Show that the off-diagonal entries are the covariance between features.
cov(A[:, 1], A[:, 2]) # Covariance between the first and second columns.
CovMat[1, 2] # Off-diagonal entry of the covariance matrix.


-2.0

---

## Spectral Theorem
* **Spectral Theorem**: Every symmetric matrix has an eigen decomposition: $A = Q \cdot D \cdot Q^T$.
    * Where $A$ is a symmetric matrix.
    * $Q$ is an orthogonal matrix of eigenvectors, each column is an eigenvector.
    * $D$ is a diagonal matix where the diagonal entries are the eigenvalues of $A$.
    * $Q^T$ is the transpose of $Q$.
* The eigen decomposition of $A$ tells us that $A$ is "almost" a diagonal matrix (the eigenvalues are on the diagonal of $D$).
    * The eigenvectors in $Q$ tell us how to rotate the basis vectors to make $A$ diagonal.
    * The eigenvalues in $D$ tell us how much to stretch the basis vectors to make $A$ diagonal.
    

In [3]:
# Spectral Theorem example in Julia.
A = [2 0; 0 3]

Q = eigvecs(A)
D = Diagonal(eigvals(A))

Q * D * Q' == A


true

---