<img src= "./resources/title.png">
<img src= "./resources/muchmath.png" style = "width: 600px;">

# Opening:

[How does linear regression work?](https://docs.google.com/document/d/1_MehWTBkYtN-Qiu3QPczXcm7_pQ-DNcbhZ8i1_klvlE/edit?usp=sharing)

## Before we talk about using regularization and its use in preventing overfitting, we need to be able to read a new language: Linear algebra.

### Objective:
Students will be able to **define** linear algebra's role in data science and **describe** a few _key concepts_ and _rules_.

## Wait, why do we care about linear algebra?

### Linear algebra is used everywhere in machine learning:

#### Regression

We'll cover that today.

#### Text Analytics
It is used to model complicated things like language. </br>
Some of you may have heard of "vectorizing text" when talking about NLP.

Converting words and text into vectors and matricies allows us to see how "close" and "far apart" words are from eachother in meaning and connection.

<img src = "./resources/Word-Vectors.png">

#### Image compression and recognition

At its basest form, an image is a three dimensional matrix.

An $n$ by $m$ by $3$ matrix to be precise.

Where $n$ and $m$ are the size of the image and each pixel is an array of three digits represeting its color code.

<img src = "./resources/images.gif">

#### Recommendation engines 
Can make much more sophisticated recommendations by using linear algebra in conjunction with user and content data.

**Quick thought exercise** - what would the matrix of user and content data look like?

<img src = "./resources/netflix.png">

### Let's start with...
<img src= "./resources/algebra.png">

### Algebra's from highschool, but that doesn't make it simple

#### Problem 1:
Solve for $x$</br>

$20  = 5 + 3x$

#### Problem 2:
Solve for $x$</br>

$20 - 7x = 6x - 6$

#### Problem 3
Solve for $x$ and $y$</br>

$-2(x - 1) + 4y = 5$

#### Problem 4
Solve for $x_1$ and $x_2$</br>

$4x_1 + 2x_2 = 8$</br>

$5x_1 + 3x_2 = 9$

### What are all these problems doing?

solving for the _unknown_

### Order of magnitude

Now Problem 4 might be doable by hand, but what if instead of 2 equations we had 5? 20? 200? 50,000?

### Fortunately for us we have

<img src= "https://media0.giphy.com/media/JlxFcvNuzlPYA/giphy.gif?cid=790b7611c4a4fc74c05cd06fe2c8cc00860e04b6f8049e52&rid=giphy.gif">

## Computers!


### But there is a problem:

| people | computers|
|--------|----------|
|can read equations like sentences | can't really do that |

### Linear albegra solves that problem, by turning this:

$4x_1 + 2x_2 = 8$</br>

$5x_1 + 3x_2 = 9$

### into this:

$
\begin{bmatrix}4 & 2 \\ 5 & 3 \end{bmatrix}*\begin{bmatrix}x_1\\x_2\end{bmatrix} = \begin{bmatrix}8\\9\end{bmatrix}
$ 

***

### Exercise, how would we rewrite the equation sets in each problem into linear algebra?

#### Problem 1

$x_0 + 2x_1 = 10$</br>

$3x_0 + x_1 = 9$

#### Problem 2

$x_0 + 2x_1 = 10$</br>

$3x_0 + x_1 = 9$</br>

$32x_0 - 6x_1 = 24$

#### Problem 3
$x_0 + 2x_1 = 10$</br>

$3x_0 + x_1 + 5x_2= 22$</br>

$32x_0 - 6x_1 -4x_2= 7$


### We should probably learn some vocabulary for what we are using

#### Scalar

$ 2 $

#### Vector $\vec{v}$

$\begin{bmatrix}8\\9\end{bmatrix}$

Now what if I told you that **both** $a$ and $b$ are vectors?

$a = \begin{bmatrix}8\\9\end{bmatrix} \\              
b = \begin{bmatrix}8 & 9\end{bmatrix}$

How are they alike?

#### Matrix
$ \begin{bmatrix}4 & 2 \\ 5 & 3 \end{bmatrix} $

#### Tensor

$ \left[ \begin{array}{ccc} 
         \begin{bmatrix}4 & 2 \\ 5 & 3 \end{bmatrix} &
         \begin{bmatrix}6 & -4 \\ 2 & 8 \end{bmatrix} \\ 
         \begin{bmatrix}-1 & 5 \\ 0 & 1 \end{bmatrix} & 
         \begin{bmatrix}9 & -2 \\ -5 & 4/5 \end{bmatrix}  \end{array} \right]$

#### Or put differently:

<img src = "./resources/datadogs.jpg">

### Specific definitions of Data Types for Linear Algebra

* **Scalars** only have magnitude.

* A **vector** is an array with **magnitude and direction**.
  - The coordinates of a vector represent where the tip of the vector would be if you travelled from the origin
  - The **magnitude** of a vector would be its length in space.

* **Matrices** can be interpreted differently in different contexts but it's often used to represent multiple simultaneous vectors. 

* **Tensors** are made up of matrices with the same dimensions.

* A vector or matrix can be multiplied by a scalar to create a change in **scale** and/or **direction**.


## Quick code break!
For linear algebra, `NumPy` is your favorite package.

Vectors, matrices and tensors are represented by NumPy arrays. **Not lists!!!** <br>

We can use `np.array.shape` to explore the dimensions of these data structures.

#### Make some objects:

In [None]:
import numpy as np

In [None]:
vector = np.array([1, 2, 3, 4, 5, 6])
matrix1 = np.array([[1, 2, 3], [4, 5, 6]])
matrix2 = np.array([[1, 2], [3, 4], [5, 6]])
tensor = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])

#### Print them out and find their shape

In [None]:
print(vector)
print('vector shape:', vector.shape, '\n')

In [None]:
print(matrix1)
print('matrix1 shape:', matrix1.shape, '\n')

In [None]:
print(matrix2)
print('matrix2 shape:', matrix2.shape, '\n')

In [None]:
print(tensor)
print('tensor shape:', tensor.shape, '\n')

#### Question: How would you index or subset a vector, matrix, or tensor?

#### Exercise:

Index each object to return the **6** for each one.

## Okay, let's get back to the....

<img src= "./resources/linear.png">

## part of Linear Algebra

### What are Linear Equations?

Linear equations only have **linear variables**. This means our unknowns are only multiplied by a scalar and raised to a power of only **one**, such as:

$ x - 2y = 1$

$3ex + 2\pi y = 0$

**Not linear:**

$ x^2 - 2\ln{y} = 4$

$0.5x + 2y^x = 11$

$e^x + 2x=2$

## Linear Regression built upon Linear Algebra
A linear regression can be interpreted as the solution to a system of linear equations: each observation just corresponds to a linear equation, and the **coefficients** are the linear unknowns we're solving for! 

We're representing each **observation** as a **linear combination of features**.

Our prediction equation for a linear regression typically looks something like:

$ y_{pred} = \beta_{0} + \beta_{1}x_1 + \beta_{2}x_2 + ... + \beta_{n}x_n $

### In matrix notation that can also be:

$ y = Xb $, so we are solving for $b$.

Where:
- $X$ is your matrix of scalars
- $b$ is the vector of coefficients

Okay, specifically we are solving for $\hat{b}$:

$ \hat{y} = X\hat{b}$

to:


$MSE = (\frac{1}{n})\sum_{i=1}^{n}(y_{i} - \hat{y}_{i})^{2}$

## Pause

<img src= "https://i0.wp.com/timemanagementninja.com/wp-content/uploads/2014/02/Pause-Button-Key.jpg?w=600&ssl=1">

## That was a lot, let's make sure everyone followed with that knowledge drop.

## Linear Algebra powers the majority of machine learning algorithms we will learn in this course
<img src= "./resources/linearalgebra.png">

Next lecture we will review an example of regression using linear algebra, but this lecture is about terminology.</br>
You will see a few recurring types of matricies and vectors accross algorithms, so next:

## MVPs of Linear Algebra
<img src = "./resources/mvp.jpeg">

### 1. Identity Matrix
An identity matrix is a square with a diagonal of 1's moving from left to right and the remaining numbers 0. When a matrix is multiplied by an identity matrix, it will result in the same matrix (think of it as the operational equivalent to 1 for linear algebra).

<img src = "./resources/identity_matrix.svg">

In [None]:
np.eye(3)

In [None]:
i_3 = np.identity(3)
print(i_3)

### 2. Matrix Inverse
The **inverse** of a matrix, when a matrix is multiplied by its inverse, it results in the identity matrix. 

<img src = "./resources/inverse.webp">

The order of multiplication does not matter for a matrix and its inverse:

$$A \cdot A^{-1} = A^{-1} \cdot A $$



In [None]:
# original matrix
x = np.array([[4,8,10],[3,9,12],[5,10,15]])

In [None]:
# inverse of x and multiplying by x
inv_x = np.linalg.inv(x)
print(inv_x, '\n')

In [None]:
# see if it produces the identity matrix:
print(np.round(x.dot(inv_x)))

In [None]:
print(matrix1)
print(matrix2)
matrix1.dot(matrix2)

In [None]:
np.matmul(matrix2,matrix1)

In [None]:
np.dot(matrix2,matrix1)

### 2.a Do all matricies have an inverse?  Nope. 

    An n-by-n square matrix A is called invertible if there exists an N by N square matrix B such that

<div style="text-align:center"><span style="color:blue; font-family:Georgia; font-size:1.5em;">AB = BA = I</span></div>

    where I is the identity matrix. A and B are inverses of each other.

#### Wait, what was that last function `x.dot` ?
### 3. Dot product

The dot product of matrices is also commonly known as **Matrix Multiplication**. Unless otherwise stated, _multiplication_ refers to this kind of multiplication.


\begin{equation}
\begin{bmatrix}
a_{1,1} & a_{1,2} \\
a_{2,1} & a_{2,2}
\end{bmatrix}
\times
\begin{bmatrix}
b_{1,1} & b_{1,2} \\
b_{2,1} & b_{2,2}
\end{bmatrix}
=
\begin{bmatrix}
a_{1,1}\times b_{1,1} + a_{1,2}\times b_{2,1} & a_{1,1}\times b_{1,2} + a_{1,2}\times b_{2,2} \\
a_{2,1}\times b_{1,1} + a_{2,2}\times b_{2,1} & a_{2,1}\times b_{1,2} + a_{2,2}\times b_{2,2}
\end{bmatrix}
\end{equation}

<img src= "./resources/matrix_mult.png" style="width: 400px;">
https://www.mathsisfun.com/algebra/matrix-multiplying.html

#### Dot product rules:
- We take the **rows** (horizontal) of the first matrix and do an element-wise product with the **columns** (vertical) of the second matrix.
- Order of operations matters, $AB ≠ BA $  and $(AB)C ≠ A(BC)$.

#### Exercise:

Let's do one small dot product by hand! (this is the most matrix math you will be asked to do)

$\begin{bmatrix}8\\5\\6\end{bmatrix} * \begin{bmatrix}3 & 4 & 2 \end{bmatrix}  = ?$

### 4. Transpose

The _transpose_ of Matrix $X$, or using notation, $X^{T}$, is matrix $X$ in reverse shape order.

$a = \begin{bmatrix}8\\9\end{bmatrix} \\              
a^T = \begin{bmatrix}8 & 9\end{bmatrix}$

Calling `.transpose()` on an array **reverses** the shape order of a matrix.

In [None]:
# the original shape of matrix1
print(matrix1)
print('matrix1 shape:', matrix1.shape, '\n')

In [None]:
# transposed
print(matrix1.transpose(), '\n')
print('matrix1.transpose() shape:', matrix1.transpose().shape)

There is also the shorthand function of `.T`

In [None]:
print(matrix1.T)

#### Exercise
(again, by hand!)

What would be the transpose of the following matrix?

$\begin{bmatrix}8 & 2\\5 & 3\\6&4\end{bmatrix} $

### Why do we care about these MVPs?
![gif](https://media1.giphy.com/media/QA7C1yuI0QZtBbxxM4/giphy.gif)

#### Get into shape
Matrix math cares about the _shape_ of the matricies involved.

What have we seen so far with matrix multiplication? what shape does each matrix need to be for matrix multiplication to work?

#### Addition and subtraction - same shape

$ \vec{v} = \begin{bmatrix}v_{1} \\v_{2}\end{bmatrix} \vec{w} = \begin{bmatrix}w_{1} \\w_{2}\end{bmatrix} $

$ \vec{v} + \vec{w} = \begin{bmatrix}v_{1} + w_{1} \\v_{2} + w_{2}\end{bmatrix} $


To do the **complex** math behind the screen and get matricies to the correct shape for the right formula, matricies are transformed using transposes, identity matricies, and plenty of dot products. 

# Exit Ticket

You will see linear algebra again very soon.

To close this lecture, let's end with a knowledge check in the form of an exit ticket

[QUICK QUIZ HERE!!](https://forms.gle/D6jscCFJWgNk2qHB6)

### Additional Resources
* 3 Blue 1 Brown:  https://www.youtube.com/playlist?list=PLZHQObOWTQDPD3MizzM2xVFitgF8hE_a
* Matrix approach to Linear Regression: http://www.stat.columbia.edu/~fwood/Teaching/w4315/Fall2009/lecture_11
* [link to fun desmos interaction](https://www.desmos.com/calculator/yovo2ro9me)
* [Link to good video on scalars and vectors](https://www.youtube.com/watch?v=fNk_zzaMoSs&list=PLZHQObOWTQDPD3MizzM2xVFitgF8hE_ab)
* [What is X^T * X?](https://stats.stackexchange.com/questions/267948/intuitive-explanation-of-the-xtx-1-term-in-the-variance-of-least-square/267963)

https://www.desmos.com/calculator/y08wwbjwid

## Only if there is time:

(or for those advanced folks who have sped through the rest of the content already)

## Linear Regression with Linear Algebra (OLS!)

In this example, we'll work through a linear regression problem with the Auto dataset. We want to predict the **mpg** using *cylinders, displacement, horsepower, weight, acceleration and year*.

We're representing each **observation** as a **linear combination of features**.

Our prediction equation for a linear regression typically looks something like:

$ y_{pred} = \beta_{0} + \beta_{1}x_1 + \beta_{2}x_2 + ... + \beta_{n}x_n $

Represented in matrix form:

$ y = Xb $, so we are solving for $b$.

In [None]:
import pandas as pd
import numpy as np
car_df = pd.read_csv('http://faculty.marshall.usc.edu/gareth-james/ISL/Auto.csv',na_values='?').dropna()
car_df.head()

In [None]:
X_df = car_df[['cylinders','displacement','horsepower','weight','acceleration','year']]
y = car_df['mpg']
X_df.head()

In [None]:
# to get the intercept term
X_df['constant'] = 1

In [None]:
X_df.head()

$ y = Xb + 0 $  --> $ y = Xb $

We want to solve for $b$! As we did before, to solve for $b$ we need to multiply both sides by the inverse of $X$.

Let's try to $ X^{-1} $


In [None]:
np.linalg.inv(X_df.values)

We get: 

    LinAlgError: Last 2 dimensions of the array must be square.

We can only calculate an inverse of a **square** matrix.

we can only find the inverse of square matrices. So with $b$ not being square, how can we solve this system using the data that we have? (No spoilers.)


 $$b = (X^{T}X)^{-1}X^{T}y$$ 



Let's apply this to our data.

In [None]:
x = X_df.values
xt = x.T

# We create an squared matrix that we can invert
xtx = xt @ x
xtx_inv = np.linalg.inv(xtx)

product = xtx_inv @ xt

b = product @ y.values
print(b)

Now we have our coefficients! They correspond to each of the columns in `X_df` in order. Let's compare this to our `sklearn` model.

In [None]:
list(zip(X_df.columns, b))

In [None]:
# comparing with sklearn

from sklearn.linear_model import LinearRegression
lr = LinearRegression()

skl_X = X_df.drop(columns = 'constant')
lr.fit(skl_X,y)

In [None]:
print('constant: ', lr.intercept_)
print('coefficients: ', lr.coef_)

### Now try yourself adding the remaining variable
No copy pasting<br>
Write your linear algebra to gain experience<br>
How do the coefficients change?<br>