## Chapter 4: Linear Algebra

For data science, you need to understand _vector spaces_. 

### Vectors

_Vectors_ can be added together to make new vectors, or multiplied by _scalars_ to form new vectors. 

Each category of data (e.g. height, weight, age) about something is a _dimension_; if you're looking at height, weight, age of people, you have a _three-dimensional vector_. A class with four exams would have student grade data as a four-dementional vector.

**A vector is a list [array] of numbers**

In [2]:
height_weight_age = [70,   # inches
                     170,  # pounds
                     40    # years
                    ]

But a list isn't a vector, so we need to build or use a tool to make our vectors. An easy way to do this is the `zip` function:

In [4]:
def vector_add(first, second):
    """adds corresponding elements"""
    return [first_i + second_i for first_i, second_i in zip(first, second)]


def vector_subtract(first, second):
    """subtracts corresponding elements"""
    return [first_i - second_i for first_i, second_i in zip(first, second)]

Sometimes we also was a **_compotentwise sum_** of a list of vectors. This is where you create a new vector whose first element is the sum of all the first elements, the second the sum of all second elements, etc.

In [10]:
def vector_sum(vectors):
    """sum all corresponding elements"""
    
    result = vectors[0]
    
    for vector in vectors[1:]:
        result = vector_add(result, vector)
    return result

We also want to multiply a vector:

In [5]:
def vector_multiply(number, vector):
    return [number * vector_i for vector_i in vector]


def vector_mean(vectors):
    """compute the vector whose ith element is the mean of the
       ith elements of the input vectors"""
    n = len(vectors)
    return vector_multiply(1/n, vector_sum(vectors))

The **_dot product_** is the sum of their componentwise products. It tells us the length of the vector you'd get if you project one vector onto another.

In [9]:
def dot(v, w):
    return sum(v_i * w_i for v_i, w_i in zip(v, w))

The **_sum of squares_** builds off of the _dot product_ to calculate a vector's **magnitude**.

In [8]:
import math


def sum_of_squares(vector):
    """v_1 * v_1 + ... + v_n * v_n """
    return dot(vector, vector)

def magnitude(vector):
    return math.sqrt(sum_of_squares(v))

All of this allows us to compute the _distance between two vectors_, or **_squared distance_**:

In [6]:
def squared_distance(v, w):
    """(v_1-w_1) ** 2 + ... (v_n - w_n) ** 2"""
    return sum_of_squares(vector_subtract(v, w))


def distance(v, w):
    return math.sqrt(squared_distance(v, w))

This could also be written:

In [7]:
def distance(v, w):
    return magnitude(vector_subtract(v, w))

All this has apparently just been for illustration, though, as using lists for vectors gives horrible performance; the author says you want to use `numpy`'s `array` class instead.

### Matrices

A _matrix_ is a two-dimensional collection of numbers. So it's like a list of lists.

Matrices are represented by capital letters:

In [14]:
A = [[1, 2, 3],
     [4, 5, 6]]


B = [[1, 2],
     [3, 4],
     [5, 6]]

print('A[1][2]:', A[1][2])
print('B[2][0]:', B[2][0])

A[1][2]: 6
B[2][0]: 5


When we talk about the **shape** of a matrix, we describe the numnber of rows and columns:

In [16]:
def shape(A):
    num_rows = len(A)
    num_cols = len(A[0]) if A else 0
    return num_rows, num_cols

shape(A)

(2, 3)

A matrix with `n` rows and `k` columns is an `n x k` matrix. This means each row has a vector of length `k` (`k` elements) and each column has a vector of length `n` (`n` elements).  
`A` could be called a `2x3` matrix. Each row has a vector of length `3` and each column has a vector of length `2`.

In [17]:
def get_row(A, i):
    return A[i]


def get_column(A, j):
    return [A_i[j] for A_i in A]  # return a list of j elements for each row in A_i

print(get_row(A, 0))
print(get_column(B, 1))

[1, 2, 3]
[2, 4, 6]


A nested list comprehension can create a matrix of a given shape.

In [19]:
def make_matrix(num_rows, num_cols, entry_fn):
    return [[entry_fn(i, j) for j in range(num_cols)] for i in range(num_rows)]


def is_diagonal(i, j):
    return 1 if i == j else 0

make_matrix(5, 5, is_diagonal)

[[1, 0, 0, 0, 0],
 [0, 1, 0, 0, 0],
 [0, 0, 1, 0, 0],
 [0, 0, 0, 1, 0],
 [0, 0, 0, 0, 1]]

So, matrices can:
1. represent a data set consisting of multiple vectors
2. represent linear functions that map `k` dimensional vectors to `n` dimensions
3. represent binary relationships like graphs.

I think this is an extremely disappointing attempt at explaining linear algebra.