# Linear Algebra

This notebook is a simple translation of the linear algebra revision lecture into Python constructs, using numpy

In [1]:
import numpy as np

In [2]:
np.random.seed(42)

Please note that Python uses 0-based indexing, as does the rest of this notebook

## Basic Notation

$A \in \mathbb{R}^{m x n}$ is a matrix with m rows and n columns

In [3]:
m = 5
n = 3
A = np.random.randn(m, n)
A

array([[ 0.49671415, -0.1382643 ,  0.64768854],
       [ 1.52302986, -0.23415337, -0.23413696],
       [ 1.57921282,  0.76743473, -0.46947439],
       [ 0.54256004, -0.46341769, -0.46572975],
       [ 0.24196227, -1.91328024, -1.72491783]])

$x \in \mathbb{R}^{n}$ is a vector with n rows

In [4]:
x = np.random.randn(n)
x

array([-0.56228753, -1.01283112,  0.31424733])

$a_{ij}$ is the entry in the ith row and jth column of A

In [5]:
i = 2
j = 1
a_ij = A[i][j]
a_ij

0.76743472915290878

## Matrix Multiplication

The product of two matrixes $A \in \mathbb{R}^{m x n}$ and $B \in \mathbb{R}^{n x p}$ is the matrix 

$C = AB \in \mathbb{R}^{m x p}$ where 

$C_{ij} = \sum_{k=1}^{n} A_{ik} B_{kj}$

In [6]:
p = 6
B = np.random.randn(n, p)
B

array([[-0.90802408, -1.4123037 ,  1.46564877, -0.2257763 ,  0.0675282 ,
        -1.42474819],
       [-0.54438272,  0.11092259, -1.15099358,  0.37569802, -0.60063869,
        -0.29169375],
       [-0.60170661,  1.85227818, -0.01349722, -1.05771093,  0.82254491,
        -1.22084365]])

In [7]:
C = np.matmul(A, B)
C

array([[-0.76547819,  0.48285148,  0.87840781, -0.84915915,  0.64934202,
        -1.45808819],
       [-1.11459697, -2.61064038,  2.50489606, -0.18418579,  0.05090089,
        -1.81578833],
       [-1.56925562, -3.01479942,  1.43759548,  0.42834307, -0.74047335,
        -1.90068169],
       [ 0.03985168, -1.68032411,  1.33487931,  0.19600514, -0.06809894,
        -0.06925218],
       [ 1.85974361, -3.74897788,  2.58008658,  1.0510195 , -0.25329297,
         2.31921156]])

In [11]:
# Sanity check:
C_manual = np.zeros((m, p))
for i in range(m):
    for j in range(p):
        for k in range(n):
            C_manual[i][j] += A[i][k] * B[k][j]

C_manual

array([[-0.76547819,  0.48285148,  0.87840781, -0.84915915,  0.64934202,
        -1.45808819],
       [-1.11459697, -2.61064038,  2.50489606, -0.18418579,  0.05090089,
        -1.81578833],
       [-1.56925562, -3.01479942,  1.43759548,  0.42834307, -0.74047335,
        -1.90068169],
       [ 0.03985168, -1.68032411,  1.33487931,  0.19600514, -0.06809894,
        -0.06925218],
       [ 1.85974361, -3.74897788,  2.58008658,  1.0510195 , -0.25329297,
         2.31921156]])

In [9]:
print(A.shape)
print(B.shape)
print(C.shape)

(5, 3)
(3, 6)
(5, 6)


### Vector-Vector Products

Given two vectors $x, y \in \mathbb{R}^{n}$, $x^{T}y$, called the inner product or dot product, is a real number given by:

$x^{T}y \in \mathbb{R} = \sum_{i=1}^{n} x_{i} y_{i}$

In [13]:
x = np.random.randn(n)
y = np.random.randn(n)
print(x)
print(y)

[ 0.2088636  -1.95967012 -1.32818605]
[ 0.19686124  0.73846658  0.17136828]


In [14]:
np.dot(x, y)

-1.6336427091601982

In [17]:
# Sanity check
dot_prod = 0
for i in range(n):
    dot_prod += x[i] * y[i]
    
dot_prod

-1.6336427091601982

Given $x \in \mathbb{R}^{m}, y \in \mathbb{R}^{n}$

$xy^T \in \mathbb{R}^{m x n}$ is called the outer product of the vectors.

In [18]:
x = np.random.randn(m)
y = np.random.randn(n)
print(x)
print(y)

[-0.11564828 -0.3011037  -1.47852199 -0.71984421 -0.46063877]
[ 1.05712223  0.34361829 -1.76304016]


In [19]:
np.outer(x, y)

array([[-0.12225437, -0.03973886,  0.20389257],
       [-0.31830341, -0.10346474,  0.53085791],
       [-1.56297846, -0.5080472 ,  2.60669364],
       [-0.76096331, -0.24735164,  1.26911425],
       [-0.48695148, -0.15828391,  0.81212465]])

### Matrix-Vector Products

Given a matrix $A \in \mathbb{R}^{m x n}$ and a vector $x \in \mathbb{R}^{n}$, their product is a vector 

$y = Ax \in \mathbb{R}^{m}$

In [22]:
A.shape # m x n

(5, 3)

In [25]:
x = np.random.randn(A.shape[1])
x

array([ 0.61167629,  1.03099952,  0.93128012])

In [31]:
x.shape # n

(3,)

In [29]:
y = np.matmul(A, x)
y

array([ 0.7644573 ,  0.47214214,  1.31997971, -0.57963717, -3.43097012])

In [30]:
y.shape # should be m

(5,)

### Matrix-Matrix Products

This was already covered above. Some basic properties

  * Matrix multiplication is associative, i.e. $ABC = (AB)C = A(BC)$
  * Matrix multiplication is distributive, i.e. $A(B + C) = AB + AC$
  * Matrix multiplication is **not** commutative, i.e. $AB \ne BA$ (in general)  

## Operations and Properties

### The Identity Matrix and Diagonal Matrices

The identity matrix $I_{n} \in \mathbb{R}^{n x n}$ is a square matrix with ones on the diagonal and zeros everywhere else

In [36]:
I = np.identity(n)
I

array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])

It has the property that for all $A \in \mathbb{R}^{m x n}$:
    
$AI_{n} = A = I_{m}A$

In [38]:
A

array([[ 0.49671415, -0.1382643 ,  0.64768854],
       [ 1.52302986, -0.23415337, -0.23413696],
       [ 1.57921282,  0.76743473, -0.46947439],
       [ 0.54256004, -0.46341769, -0.46572975],
       [ 0.24196227, -1.91328024, -1.72491783]])

In [39]:
np.matmul(A, np.identity(A.shape[1]))

array([[ 0.49671415, -0.1382643 ,  0.64768854],
       [ 1.52302986, -0.23415337, -0.23413696],
       [ 1.57921282,  0.76743473, -0.46947439],
       [ 0.54256004, -0.46341769, -0.46572975],
       [ 0.24196227, -1.91328024, -1.72491783]])

In [40]:
np.matmul(np.identity(A.shape[0]), A)

array([[ 0.49671415, -0.1382643 ,  0.64768854],
       [ 1.52302986, -0.23415337, -0.23413696],
       [ 1.57921282,  0.76743473, -0.46947439],
       [ 0.54256004, -0.46341769, -0.46572975],
       [ 0.24196227, -1.91328024, -1.72491783]])

A diagonal matrix is a matrix where all non-diagonal elements are 0

In [43]:
D = np.identity(n) * np.random.randn(n)
D

array([[ 0.97554513, -0.        , -0.        ],
       [ 0.        , -0.47917424, -0.        ],
       [ 0.        , -0.        , -0.18565898]])

### The Transpose

The transpose of a matrix results from "flipping" the rows and columns

In [45]:
A.T

array([[ 0.49671415,  1.52302986,  1.57921282,  0.54256004,  0.24196227],
       [-0.1382643 , -0.23415337,  0.76743473, -0.46341769, -1.91328024],
       [ 0.64768854, -0.23413696, -0.46947439, -0.46572975, -1.72491783]])

A square matrix $A \in \mathbb{R}^{n x n}$ is symmetric if $A = A^T$, and anti-symmetric if $A = -A^T$

For any matrix $A \in \mathbb{R}^{n x n}$, 

$A + A^T$ is symmetric and $A - A^T$ is anti-symmetric.

It follows that any matrix can be represented as a sum of a symmetric and anti-symmetric matrix:

$A = \frac{1}{2}(A + A^T) + \frac{1}{2}(A - A^T)$

In [46]:
A = np.random.randn(n, n)
A

array([[-1.10633497, -1.19620662,  0.81252582],
       [ 1.35624003, -0.07201012,  1.0035329 ],
       [ 0.36163603, -0.64511975,  0.36139561]])

In [47]:
0.5*(A + A.T) + 0.5*(A - A.T)

array([[-1.10633497, -1.19620662,  0.81252582],
       [ 1.35624003, -0.07201012,  1.0035329 ],
       [ 0.36163603, -0.64511975,  0.36139561]])

### The Trace

The trace of a square matrix $A \in \mathbb{R}^{n x n}$, denoted as tr(A) is the sum of the diagonal elements:

In [49]:
A.trace()

-0.81694949007794815

In [50]:
tr = 0
for i in range(n):
    tr += A[i][i]
    
tr

-0.81694949007794815

### Norms

The norm of a vector $||x||$ is informally a measure of the length of the vector.

For example, we have the Euclidean or $l_{2}$ norm:

$||x||_{2} = \sqrt{\sum_{i=1}^{n} x_{i}^2}$

In [52]:
x

array([ 0.61167629,  1.03099952,  0.93128012])

In [57]:
np.linalg.norm(x, ord=2)

1.5180219227593836

In [55]:
np.sqrt(np.dot(x,x.T))

1.5180219227593836

Other examples are $l_1$ norm:

$||x||_{1} = \sum_{i=1}^{n} |x_{i}|$

In [59]:
np.linalg.norm(x, ord=1)

2.5739559304530175

and the infinity norm:

$||x||_{\infty} = max_{i} |x_{i}|$

In [60]:
np.linalg.norm(x, ord=np.inf)

1.0309995224959509

All of the above are from the family of $l_p$ norms, which are parameterized by $p \geq 1$, and defined as 

$$||x_p|| = \left( \sum_{i=1}^{n} |x_{i}^p| \right)^\frac{1}{p}$$

Another important one is the Frobenius norm, for matrices

$||A||_{F} = \sqrt{tr(A^T A)}$

### Linear Independence and Rank