<img src= "./resources/title.png">

### The Sequel:

4. Systems of Linear Equations
5. Linear Regression with Linear Algebra


# Part 4: Systems of Linear Equations

One of the most common applications of matrix operations is in solving systems of linear equations. 
***
### Sidebar: What are Linear Equations?

Linear equations only have **linear variables**. This means our unknowns are only multiplied by a scalar and raised to a power of only **one**, such as:

$ x - 2y = 1$

$3ex + 2\pi y = 0$

**Not linear:**

$ x^2 - 2\ln{y} = 4$

$0.5x + 2y^x = 11$

$e^x + 2x=2$

***

To solve the system of linear equations to find `x` and `y`:

`x - 2y = 1`

`3x + 2y = 11`

First, we need to represent these equations as matrices/vectors.

$\begin{pmatrix}1 & -2 \\3 & 2 \end{pmatrix} \cdot \begin{pmatrix}x \\ y \end{pmatrix} = \begin{pmatrix}1 \\11\end{pmatrix}$ 

In [None]:
import numpy as np

In [None]:
a = np.array([[1,-2],[3,2]])
b = np.array([[1],[11]])


$ A \cdot X = B $

$ A^{-1} A X = A^{-1} \cdot B  $

$I \cdot X   = A^{-1} \cdot B $

In [None]:
a = np.array([[1,-2],[3,2]])
b = np.array([[1],[11]])
inv_a = np.linalg.inv(a)

inv_a.dot(b)

# Part 5: Linear Regression with Linear Algebra (OLS!)

A linear regression can be interpreted as the solution to a system of linear equations: each observation just corresponds to a linear equation, and the **coefficients** are the linear unknowns we're solving for! 

We're representing each **observation** as a **linear combination of features**.

Our prediction equation for a linear regression typically looks something like:

$ y_{pred} = \beta_{0} + \beta_{1}x_1 + \beta_{2}x_2 + ... + \beta_{n}x_n $

Represented in matrix form:

$ y = Xb $, so we are solving for $b$.

In this example, we'll work through a linear regression problem with the Auto dataset. We want to predict the **mpg** using *cylinders, displacement, horsepower, weight, acceleration and year*.

In [None]:
import pandas as pd
car_df = pd.read_csv('http://faculty.marshall.usc.edu/gareth-james/ISL/Auto.csv',na_values='?').dropna()
car_df.head()

In [None]:
X_df = car_df[['cylinders','displacement','horsepower','weight','acceleration','year']]
y = car_df['mpg']
X_df.head()

In [None]:
# to get the intercept term
X_df['constant'] = 1

In [None]:
X_df.head()

$ y = Xb + 0 $  --> $ y = Xb $

We want to solve for $b$! As we did before, to solve for $b$ we need to multiply both sides by the inverse of $X$.

Let's try to $ X^{-1} $


In [None]:
np.linalg.inv(X_df.values)

We get: 

    LinAlgError: Last 2 dimensions of the array must be square.

We can only calculate an inverse of a **square** matrix. Why?

***

### Sidebar (again): Some Linear Algebra Theory 

In Linear Algebra theory, we have something called an **invertible matrix** with a definition as follows:

    An n-by-n square matrix A is called invertible if there exists an N by N square matrix B such that

<div style="text-align:center"><span style="color:blue; font-family:Georgia; font-size:1.5em;">AB = BA = I</span></div>

    where I is the identity matrix. A and B are inverses of each other.
    
***
    

By this definition, we can only find the inverse of square matrices. So with $b$ not being square, how can we solve this system using the data that we have? (No spoilers.)

<center>
 $b = (X^{T}X)^{-1}X^{T}y$ 
</center>


Let's apply this to our data.

In [None]:
xt = (X_df.values).T
xtx = np.matmul(xt, X_df.values)

product = np.matmul(np.linalg.inv(xtx), xt)

b = np.matmul(product, y.values)
print(b)

Now we have our coefficients! They correspond to each of the columns in `X_df` in order. Let's compare this to our `sklearn` model.

In [None]:
list(zip(X_df.columns, b))

In [None]:
# comparing sklearn

from sklearn.linear_model import LinearRegression
lr = LinearRegression()

skl_X = X_df.drop(columns = 'constant')
lr.fit(skl_X,y)

In [None]:
print('constant: ', lr.intercept_)
print('coefficients: ', lr.coef_)

### Additional Resources
* 3 Blue 1 Brown:  https://www.youtube.com/playlist?list=PLZHQObOWTQDPD3MizzM2xVFitgF8hE_a
* Matrix approach to Linear Regression: http://www.stat.columbia.edu/~fwood/Teaching/w4315/Fall2009/lecture_11
* [link to fun desmos interaction](https://www.desmos.com/calculator/yovo2ro9me)
* [Link to good video on scalars and vectors](https://www.youtube.com/watch?v=fNk_zzaMoSs&list=PLZHQObOWTQDPD3MizzM2xVFitgF8hE_ab)
* [What is X^T * X?](https://stats.stackexchange.com/questions/267948/intuitive-explanation-of-the-xtx-1-term-in-the-variance-of-least-square/267963)