# Data Science - Linear Algebra

Lniear Algebra is the branch of mathematics that deals with vector spaces.

Vectors are objects that can be added together to form new vectors, multiplied by scalars, to form new vectors. **Vectors are points in some finite-dimensional space.**

For example, if you have the heights, weights, and ages of a large number of people, you can treat your data as three-dimensional vectors (height, weight, age). If you’re teaching a class with four exams, you can treat student grades as four dimensional vectors (exam1, exam2, exam3, exam4).

In [None]:
# vector in three-dimensional space

height_weight_age = [70, 170, 40] # [inches, pounds, years]

### Vectors add  component-wise

(If two vectors are not the same length, then we’re not allowed to add them.)

In [1]:
def vector_add(v, w):
    return [vi + wi for vi, wi in zip(v,w)]

def vector_subtract(v, w):
    return [vi - wi for vi, wi in zip(v,w)]

In [None]:
# create a new vector whose first element is the sum of all the first elements, 
# whose second element is the sum of all the second elements, and so on.

def vector_sum(vectors):
    result = vectors[0] # start with the first vector
    for vector in vectors:
        result = vector_add(result, vector)
    return result

def vector_sum(vectors):
    return reduce(vector_add, vectors)

vector_sum = partial(reduce, vector_add)

**The reduce(fun,seq) function is used to apply a particular function passed in its argument to all of the list elements mentioned in the sequence passed along.** [source](https://www.geeksforgeeks.org/reduce-in-python/)

In [2]:
def scalar_multiply(c, v):
    return [c * vi for vi in v]

def vector_mean(vectors):
    n = len(vectors)
    return scalar_multiply(1/n, vector_sum(vectors))

**Dot product of two vectors is the sum of their component-wise products**

In [3]:
def dot(v, w):
    return sum(vi * wi for vi, wi in zip(v, w))

def sum_of_squares(v):
    return dot(v, w)

import math

def magnitude(v):
    return math.sqrt(sum_of_squares(v))

def squared_distance(v, w):
    return sum_of_squares(vector_subtract(v, w))

def distance(v, w):
    return math.sqrt(squared_distance(v, w))

Using lists as vectors is great for exposition but terrible for performance.

In production code, you would want to use the NumPy library, which includes a high-performance array class with all sorts of arithmetic operations included.

## Matrices

A matrix is a two-dimensional collection of numbers. **We will represent matrics as lists of lists, with each inner list having the same size and representing a row of the matrix.**

In [4]:
A = [[1,2,3],[4,5,6]] # A has 2 rows and 3 columns
B = [[1,2],[3,4],[5,6]] # B has 3 rows and 2 columns

In [5]:
def shape(A):
    num_rows = len(A)
    num_cols = len(A[0]) if A else 0
    return num_rows, num_cols

In [10]:
def get_row(A, i):
    return A[i]

def get_column(A, j):
    return [A_i[j] for A_i in A]

In [11]:
def make_matrix(num_rows, num_cols, entry_fn):
    return [[entry_fn(i,j) for j in range(num_cols)] for i in range(num_rows)]

def is_diagonal(i, j):
    return 1 if i == j else 0

In [12]:
identity_matrix = make_matrix(4, 4, is_diagonal)
print(identity_matrix)

[[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0], [0, 0, 0, 1]]


Matrics will be important because:

1. use matrix to represent dataset consisting of multiple vectors by considering each vector as a row of the matrix.
2. use n x k matrix to represent a linear function that maps k-dimensional vectors to n-dimensional vectors.
3. use matrix to represent binary relationships

In [13]:
friendships = [(0, 1), (0, 2), (1, 2), (1, 3), (2, 3), (3, 4), (4, 5), (5, 6), (5, 7), (6, 8), (7, 8), (8, 9)]

# can be represented as 

          # user 0  1  2  3  4  5  6  7  8  9

friendships =  [[0, 1, 1, 0, 0, 0, 0, 0, 0, 0], # user 0
                [1, 0, 1, 1, 0, 0, 0, 0, 0, 0], # user 1
                [1, 1, 0, 1, 0, 0, 0, 0, 0, 0], # user 2
                [0, 1, 1, 0, 1, 0, 0, 0, 0, 0], # user 3
                [0, 0, 0, 1, 0, 1, 0, 0, 0, 0], # user 4
                [0, 0, 0, 0, 1, 0, 1, 1, 0, 0], # user 5
                [0, 0, 0, 0, 0, 1, 0, 0, 1, 0], # user 6
                [0, 0, 0, 0, 0, 1, 0, 0, 1, 0], # user 7
                [0, 0, 0, 0, 0, 0, 1, 1, 0, 1], # user 8
                [0, 0, 0, 0, 0, 0, 0, 0, 1, 0]] # user 9

In [15]:
[i for i, is_friend in enumerate(friendships[5]) if is_friend]

[4, 6, 7]