# Ch4 - Linear Algebra

This is a branch of math that deals with **vector spaces** and which underlines a significant amount of data science and machine learning concepts and techniques.

### Vectors

These are objects that can be added together or multiplied by scalars, both of which form new vectors.

There are points in some finit-dimensional space and are a good way to represent numeric data, such as 3D and 4D vectors of `(height,weight,age)` and `(exam1,exam2,exam3,exam4)`

The simplest "from-scratch" way to represent vectors are as `list`'s of numbers

In [1]:
# 3d vector
eight_weight_age = [70,140,40] # inches, lbs, years

A problem with this representation = lists aren't vectors and therefore provide no way to perform vector arithmetic, so we need to build them.

Vectors add *component-wise*/*element-wise*

In [2]:
v = [1,2]
w = [3,5]
list(zip(v,w))

[(1, 3), (2, 5)]

In [3]:
def vector_add(v,w):
    """Adds corresponding elements"""
    return[v_i + w_i
          for v_i, w_i in zip(v,w)]

def vector_subtract(v,w):
    """Adds corresponding elements"""
    return[v_i - w_i
          for v_i, w_i in zip(v,w)]

print(vector_add(v,w)," ",vector_subtract(v,w))

[4, 7]   [-2, -3]


It's also useful to sometime sum a list of vectors component-wise, i.e. create a new vector whose 1st element = sum of all 1st elements, etc. The easiest way to do this is to add one vector at a time

In [4]:
def vector_sum(vects):
    """Sums all corresponding elements"""
    # init new vector with 1st vector in list
    result = vects[0]
    # for each vector in the list, add it to the totals
    for vec in vects[1:]:
        result = vector_add(result,vec)
    return result

In [5]:
v = [1,2]
w = [3,5]
z = [2,3]

vector_sum([v,w,z])

[6, 10]

In [6]:
# the above is the same as reducing a list via 'vector_add()'
# re-write the above more briefly via higher-order functions
from functools import reduce

def vector_sum2(vects):
    return reduce(vector_add, vects)

vector_sum2([v,w,z]) # probably more clever than helpful

[6, 10]

In [7]:
# multiply (each element of a) vector by a scalar
def vector_mult(v,s):
    """v is a vector, s is a scalar"""
    return [s*element for element in v]

vector_mult([1,2,3],2)

[2, 4, 6]

This then lets us find component-wise means of a list of same-sized vectors

In [8]:
def vec_means(vecs):
    """Compute vector whose ith element = mean of the
    ith element of the input vectors"""
    # get # of vectors
    n = len(vecs)
    return vector_mult(vector_sum(vecs),1/n)

vec_means([v,w,z]) # [6/3, 10/3]

[2.0, 3.333333333333333]

A **dot product** = measures how far vector *v* extends in the direction of *w*. If `w = [1,0]`, then `dot(v,w)` is just the first component of `v`, as the second is cancelled out by the `0` in `w` (i.e. the length of a vector resulting from **projecting** vector `v` onto vector `w`

In [9]:
# perform dot product = sum of component-wise products
def dot_prod(v,w):
    # for each component in vectors the zip list, multiply it by the
    # corresponding-indexed element in the other vectors in the zipped list
    return sum(v_i*w_i           
              for v_i,w_i in zip(v,w))

dot_prod(v,w)

13

In [11]:
## use dot product to easily compute SUM OF SQUARES of a vector
def vector_SS(v):
    """v1*v1 + ... + vn*vn"""
    return dot_prod(v,v)

vector_SS(w) # 9 + 25 == 3**2 + 5**2

34

In [12]:
## use SS to compute magnitude (length)
from math import sqrt

def vector_mag(v):
    return sqrt(vector_SS(v))

vector_mag(w)

5.830951894845301

In [15]:
## can now compute distance between 2 vectors == sqrt[(v1 - w1)**2 + ... + (vn-wn)**2]
def squared_distance(v,w):
    # get squared distance from each element-wise difference
    return vector_SS(vector_subtract(v,w))

print(v,w,vector_subtract(v,w),squared_distance(v,w))# -2**2 + -3**2

[1, 2] [3, 5] [-2, -3] 13


In [16]:
def distance(v,w):
    return sqrt(squared_distance(v,w))

distance(v,w)

3.605551275463989

In [17]:
## equivalent to above:
def distance2(v,w):
    return vector_mag(vector_subtract(v,w))

print(v,w,vector_subtract(v,w),distance2(v,w))

[1, 2] [3, 5] [-2, -3] 3.605551275463989


### Matrices

These are 2D collections of numbers, represented in Python as `lists`s of `list`s, with each inner list being the same size as they represent *rows*. 

For a matrix `A`, the *i*th row and *j*th column = `A[i][j]`.

In [20]:
# 2x3
A = [[1,2,3],
     [4,5,6]]

# 3x2
B = [[1,2],
     [3,4],
     [5,6]]

print(A,B)

[[1, 2, 3], [4, 5, 6]] [[1, 2], [3, 4], [5, 6]]


In [21]:
# get # of rows
print(len(A))

# get # of cols (elements in 1st row)
print(len(A[0]))

2
3


In [22]:
def mtx_shape(A):
    num_rows = len(A)
    num_cols = len(A[0])
    return num_rows,num_cols

mtx_shape(B)

(3, 2)

Can think of an `n*k` matrix as having rows as vectors of length `k` and having columns as vectors of length `n`

In [26]:
def get_row(A,i):
    return A[i]

def get_col(A,j):
    return [Ai[j] for Ai in A] # for each row in A, get the column values for specified column

print(get_row(A,1),get_col(A,1))

[4, 5, 6] [2, 5]


In [32]:
## create matrix given its shape + generate its elements some function using nested list comprehension
def make_matrix(rows,cols,entry_f):
    """Returns a rows*cols matrix whose
    (i,j)th entry is entry_f(i,j)"""
    return [[entry_f(i,j)
              for j in range(cols)] # given i, create list = [entry_f(i,0), entry_f(i,1) ...]
             for i in range(rows)]  # create one list for each i
    
## make 5x5 identity matrix w/ above
def is_diagonal(i,j):
    """Return identity matrix (1's on the diagonal, 0's elsewhere)"""
    return 1 if i == j else 0

identity_5 = make_matrix(5,5,is_diagonal)
identity_5

[[1, 0, 0, 0, 0],
 [0, 1, 0, 0, 0],
 [0, 0, 1, 0, 0],
 [0, 0, 0, 1, 0],
 [0, 0, 0, 0, 1]]

Can use a matrix to represent a data set consisting of multiple vectors, simply by considering each vector as a row of the matrix. 

If we had heights, weights, and ages of 1,000 people you could put them in a 1000x3 matrix:

In [33]:
data = [[70, 170, 40], # height,weight,age
[65, 120, 26],
[77, 250, 19]]

Can use an matrix to represent a **linear function** that maps *k*-dimensional vectors to *n*-dimensional vectors

Can also use matrices to represent **binary relationships**, such a matrix *A* such that `A[i][j]` = 1 if nodes `i` and `j` in a network graph are connected and 0 otherwise.

In [35]:
## network before (ch1)
friendships = [(0, 1), (0, 2), (1, 2), (1, 3), (2, 3), (3, 4),
               (4, 5), (5, 6), (5, 7), (6, 8), (7, 8), (8, 9)]

## binary network
# user 0 1 2 3 4 5 6 7 8 9
friendships_binary = [[0, 1, 1, 0, 0, 0, 0, 0, 0, 0], # user 0
[1, 0, 1, 1, 0, 0, 0, 0, 0, 0], # user 1
[1, 1, 0, 1, 0, 0, 0, 0, 0, 0], # user 2
[0, 1, 1, 0, 1, 0, 0, 0, 0, 0], # user 3
[0, 0, 0, 1, 0, 1, 0, 0, 0, 0], # user 4
[0, 0, 0, 0, 1, 0, 1, 1, 0, 0], # user 5
[0, 0, 0, 0, 0, 1, 0, 0, 1, 0], # user 6
[0, 0, 0, 0, 0, 1, 0, 0, 1, 0], # user 7
[0, 0, 0, 0, 0, 0, 1, 1, 0, 1], # user 8
[0, 0, 0, 0, 0, 0, 0, 0, 1, 0]] # user 9
print(friendships_binary)

[[0, 1, 1, 0, 0, 0, 0, 0, 0, 0], [1, 0, 1, 1, 0, 0, 0, 0, 0, 0], [1, 1, 0, 1, 0, 0, 0, 0, 0, 0], [0, 1, 1, 0, 1, 0, 0, 0, 0, 0], [0, 0, 0, 1, 0, 1, 0, 0, 0, 0], [0, 0, 0, 0, 1, 0, 1, 1, 0, 0], [0, 0, 0, 0, 0, 1, 0, 0, 1, 0], [0, 0, 0, 0, 0, 1, 0, 0, 1, 0], [0, 0, 0, 0, 0, 0, 1, 1, 0, 1], [0, 0, 0, 0, 0, 0, 0, 0, 1, 0]]


If there are very few connections, this is a much more inefficient representation, since we end up storing a lot of 0's (**sparse matrix**). However, with the matrix representation, it's much quicker to check if 2 nodes are connected via a **matrix lookup** instead of (potentially) inspecting every edge:

In [37]:
# are user 2 and user 3 connected?
friendships_binary[2][3]

1

Similarly, to find connections a node *does* have, you only need to inspect the column (or row) corresponding to said node:

In [None]:
list(enumerate(friendships_binary[5]))

In [43]:
# return index of users where is_friend = T in our enumeration
friends_of_five = [i
                # make list of 1/0 = T/F for which users 5 is friends with
                  for i,is_friend in enumerate(friendships_binary[5]) 
                  if is_friend]
friends_of_five

[4, 6, 7]