# Introduction to Matrices for Machine Learning

1. What is a Matrix?
2. Defining a Matrix in NumPy.
3. Matrix Arithmetic.
    * Matrix Addition.
    * Matrix Subtraction.
    * Matrix Multiplication (Hadamard Product).
    * Matrix Division.
4. Dot Product.
    * Matrix-Matrix Multiplication.
    * Matrix-Vector Multiplication.
5. Matrix-Scalar Operations.
    * Addition.
    * Subtraction.
    * Multiplication.
    * Division.
6. Matrix Types.
    * Square Matrix.
    * Symmetric Matrix.
    * Triangular Matrix.
    * Diagonal Matrix.
    * Identity Matrix.
    * Orthogonal Matrix.
7. Matrix Operations.
    * Transpose.
    * Inversion.
    * Trace.
    * Determinant.
    * Rank.

## 1. What is a Matrix?

A matrix is a two-dimensional array of scalars with one or more columns and one or more rows.

## 2. Defining a Matrix in NumPy.

In [1]:
from numpy import array

A = [[1, 2, 3], [4, 5, 6]]
A = array(A)
print(A)

[[1 2 3]
 [4 5 6]]


## 3. Matrix Arithmetic.

### Matrix Addition

In [2]:
A = array([[1, 2, 3], [4, 5, 6]])
B = array([[1, 2, 3], [4, 5, 6]])
C = A + B
print(C)

[[ 2  4  6]
 [ 8 10 12]]


### Matrix Subtraction

In [3]:
A = array([[1, 2, 3], [4, 5, 6]])
B = array([[0.5, 0.5, 0.5], [0.5, 0.5, 0.5]])
C = A - B
print(C)

[[0.5 1.5 2.5]
 [3.5 4.5 5.5]]


### Matrix Multiplication (Hadamard Product)

Two matrices with the same size can be multiplied together, and this is often called element-wise matrix multiplication or the Hadamard product.

It is not the typical operation meant when referring to matrix multiplication, therefore a different operator is often used, such as a circle “o”.

In [4]:
A = array([[1, 2, 3], [4, 5, 6]])
B = array([[1, 2, 3], [4, 5, 6]])
C = A * B
print(C)

[[ 1  4  9]
 [16 25 36]]


### Matrix Division

In [5]:
A = array([[1, 2, 3], [4, 5, 6]])
B = array([[1, 2, 3], [4, 5, 6]])
C = A / B
print(C)

[[1. 1. 1.]
 [1. 1. 1.]]


In [6]:
C = A // B
print(C)

[[1 1 1]
 [1 1 1]]


In [7]:
C = A % B
print(C)

[[0 0 0]
 [0 0 0]]


## 4. Dot Product.

### Matrix-Matrix Multiplication

In [8]:
A = array([[1, 2], [3, 4], [5, 6]])
B = array([[1, 2], [3, 4]])
C = A.dot(B)
print(C)

[[ 7 10]
 [15 22]
 [23 34]]


### Matrix-Vector Multiplication

In [9]:
A = array([[1, 2], [3, 4], [5, 6]])
v = array([0.5, 0.5])
C = A.dot(v)
print(C)

[1.5 3.5 5.5]


## 5. Matrix-Scalar Operations.

### Addition

In [10]:
A = array([[1, 2], [3, 4], [5, 6]])
s = 2
C = A + s
print(C)

[[3 4]
 [5 6]
 [7 8]]


### Subtraction

In [11]:
s = 0.5
C = A - s
print(C)

[[0.5 1.5]
 [2.5 3.5]
 [4.5 5.5]]


### Multiplication

In [12]:
s = 2
C = A * s
print(C)

[[ 2  4]
 [ 6  8]
 [10 12]]


### Division

In [13]:
C = A / s
print(C)

[[0.5 1. ]
 [1.5 2. ]
 [2.5 3. ]]


In [14]:
C = A // s
print(C)

[[0 1]
 [1 2]
 [2 3]]


In [15]:
C = A % s
print(C)

[[1 0]
 [1 0]
 [1 0]]


## 6. Matrix Types.

### Square Matrix

* A square matrix is a matrix where the number of rows (n) equals the number of columns (m).
* The square matrix is contrasted with the rectangular matrix where the number of rows and columns are not equal.
* The size of the matrix is called the order, so an order 4 square matrix is 4 x 4.
* The vector of values along the diagonal of the matrix from the top left to the bottom right is called the main diagonal.
* Square matrices are readily added and multiplied together and are the basis of many simple linear transformations, such as rotations (as in the rotations of images).

### Symmetric Matrix

* A symmetric matrix is a type of square matrix where the top-right triangle is the same as the bottom-left triangle.
* *It is no exaggeration to say that symmetric matrices S are the most important matrices the world will ever see – in the theory of linear algebra and also in the applications.*
* A symmetric matrix is always square and equal to its own transpose.

### Triangular Matrix

* A triangular matrix is a type of square matrix that has all values in the upper-right or lower-left of the matrix with the remaining elements filled with zero values.
* A triangular matrix with values only above the main diagonal is called an **upper triangular matrix**. Whereas, a triangular matrix with values only below the main diagonal is called a **lower triangular matrix**.

In [16]:
M = array([[1, 2, 3], [1, 2, 3], [1, 2, 3]])
print(M)

[[1 2 3]
 [1 2 3]
 [1 2 3]]


In [17]:
from numpy import tril

lower = tril(M)
print(lower)

[[1 0 0]
 [1 2 0]
 [1 2 3]]


In [18]:
from numpy import triu

upper = triu(M)
print(upper)

[[1 2 3]
 [0 2 3]
 [0 0 3]]


### Diagonal Matrix

* A diagonal matrix is one where values outside of the main diagonal have a zero value, where the main diagonal is taken from the top left of the matrix to the bottom right.
* A diagonal matrix is often denoted with the variable D and may be represented as a full matrix or as a vector of values on the main diagonal.
* A diagonal matrix **does not have to be square**.

In [19]:
from numpy import diag

M = array([[1, 2, 3], [1, 2, 3], [1, 2, 3]])
print(M)

[[1 2 3]
 [1 2 3]
 [1 2 3]]


In [20]:
# extract diagonal vector
d = diag(M)
print(d)

[1 2 3]


In [21]:
# create diagonal matrix from vector
D = diag(d)
print(D)

[[1 0 0]
 [0 2 0]
 [0 0 3]]


### Identity Matrix

* An identity matrix is a square matrix that does not change a vector when multiplied.
* The values of an identity matrix are known. All of the scalar values along the main diagonal (top-left to bottom-right) have the value one, while all other values are zero.

In [22]:
from numpy import identity

I = identity(3)
print(I)

[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]


Alone, the identity matrix is not that interesting, although it is a component in other import matrix operations, such as **matrix inversion**.

### Orthogonal Matrix

* Two vectors are orthogonal **when their dot product equals zero**, called orthonormal.
* **An Orthogonal matrix is a type of square matrix** whose columns and rows are orthonormal unit vectors, e.g. perpendicular and have a length or magnitude of 1.
* *An orthogonal matrix is a square matrix whose rows are mutually orthonormal and whose columns are mutually orthonormal.*
* An Orthogonal matrix is often denoted as **uppercase “Q”**.
* Multiplication by an orthogonal matrix preserves lengths.

Q^T . Q = Q . Q^T = I

* Orthogonal matrices are used a lot for linear transformations, such as reflections and permutations.

In [23]:
from numpy.linalg import inv

Q = array([[1, 0], [0, -1]])
print(Q)
print(Q.T)

[[ 1  0]
 [ 0 -1]]
[[ 1  0]
 [ 0 -1]]


In [24]:
# inverse equivalence
V = inv(Q)
print(V)

[[ 1.  0.]
 [-0. -1.]]


In [25]:
# identity equivalence
I = Q.dot(Q.T)
print(I)

[[1 0]
 [0 1]]


* Orthogonal matrices are useful tools as they are computationally cheap and stable to calculate their inverse as simply their transpose.

## 7. Matrix Operations.

### Transpose

* A defined matrix can be transposed, which creates a new matrix with the number of columns and rows flipped.
* This is denoted by the superscript “T” next to the matrix.

In [26]:
A = array([[1, 2], [3, 4], [5, 6]])
print(A)
C = A.T
print(C)

[[1 2]
 [3 4]
 [5 6]]
[[1 3 5]
 [2 4 6]]


### Inversion

* Matrix inversion is a process that finds another matrix that when multiplied with the matrix, results in an **identity matrix**.
* The operation of inverting a matrix is indicated by a -1 superscript next to the matrix; for example, A^-1. The result of the operation is referred to as the inverse of the original matrix.
* A matrix is invertible if there exists another matrix that results in the identity matrix, where not all matrices are invertible. A square matrix that is not invertible is referred to as **singular**.

In [27]:
from numpy.linalg import inv

A = array([[1.0, 2.0], [3.0, 4.0]])
B = inv(A)
print(B)

[[-2.   1. ]
 [ 1.5 -0.5]]


In [28]:
# Identity matrix
I = A.dot(B)
print(I)

[[1.00000000e+00 1.11022302e-16]
 [0.00000000e+00 1.00000000e+00]]


* Matrix inversion is used as an operation in solving systems of equations framed as matrix equations where we are interested in finding vectors of unknowns. A good example is in **finding the vector of coefficient values in linear regression**.

### Trace

* A trace of a square matrix is the sum of the values on the main diagonal of the matrix (top-left to bottom-right).
* The operation of calculating a trace on a square matrix is described using the notation **“tr(A)”** where A is the square matrix on which the operation is being performed.

In [29]:
from numpy import trace

A = array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
B = trace(A)
print(B)

15


In [30]:
# Extracting main diagonal from matrix and sumimg all values
C = diag(A).sum()
print(C)

15


* Alone, the trace operation is not interesting, but it offers a simpler notation and it is used as an element in other key matrix operations.

### Determinant

* The determinant of a square matrix is a scalar representation of the volume of the matrix.
* *The determinant describes the relative geometry of the vectors that make up the rows of the matrix. More specifically, the determinant of a matrix A tells you the volume of a box with sides given by rows of A.*
* It is denoted by the **“det(A)”** notation or |A|, where A is the matrix on which we are calculating the determinant.
* The determinant of a square matrix is calculated from the elements of the matrix. More technically, **the determinant is the product of all the eigenvalues of the matrix**.
* The intuition for the determinant is that it describes the way a matrix will scale another matrix when they are multiplied together. For example, **a determinant of 1 preserves the space of the other matrix. A determinant of 0 indicates that the matrix cannot be inverted**.

In [31]:
from numpy.linalg import det

A = array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
B = det(A)
print(B)

-9.51619735392994e-16


* Like the trace operation, alone, the determinant operation is not interesting, but it offers a simpler notation and it is used as an element in other key matrix operations.

### Rank

* The rank of a matrix is the estimate of the number of linearly independent rows or columns in a matrix.
* The rank of a matrix M is often denoted as the function **rank()**. For example: rank(A).
* An intuition for rank is to consider it the number of dimensions spanned by all of the vectors within a matrix. For example, **a rank of 0 suggest all vectors span a point, a rank of 1 suggests all vectors span a line, a rank of 2 suggests all vectors span a two-dimensional plane**.
* The rank is estimated numerically, often using a matrix decomposition method. A common approach is to use the Singular-Value Decomposition or SVD for short.

### a. Vector rank

In [32]:
from numpy.linalg import matrix_rank

v1 = array([1,2,3])
v1_rank = matrix_rank(v1)
print(v1_rank)

1


In [33]:
# zero rank
v2 = array([0,0,0,0,0])
v2_rank = matrix_rank(v2)
print(v2_rank)

0


### b. Matrix rank

In [34]:
M0 = array([[0,0],[0,0]])
mr0 = matrix_rank(M0)
print(mr0)

0


In [35]:
# rank 1
M1 = array([[1,2],[2,4]])
mr1 = matrix_rank(M1)
print(mr1)

1


In [36]:
# rank 2
M2 = array([[1,2],[3,4]])
mr2 = matrix_rank(M2)
print(mr2)

2
