[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/jaidevd/linalg-numpy/blob/master/03_linear_systems.ipynb)

In [None]:
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
%matplotlib inline

### The single most important equation in linear systems

$$\mathbf{y} = \mathbf{A}\mathbf{x}$$

### Or

$$\mathbf{Y} = \mathbf{A}\mathbf{X}$$

$$\mathbf{y} = \mathbf{A}\mathbf{x}$$

### Where $\mathbf{x}$ is the input, $\mathbf{y}$ is the output, or observations, and $\mathbf{A}$ is a matrix of coefficients.

--------------
# Linear System of Equations

### Question: Why does it take two points to define a line?

In [None]:
# pick any two points, at random, between 0 and 10

# First point - P
px, py = np.random.randint(0, high=11, size=(2,))

# Second point - Q
qx, qy = np.random.randint(0, high=11, size=(2,))

fig, ax = plt.subplots()
ax.scatter([px, qx], [py, qy], s=100)
ax.annotate('P', [px, py], fontsize='xx-large')
ax.annotate('Q', [qx, qy], fontsize='xx-large')
ax.axis([-1, 12, -1, 12])
ax.set_aspect('auto')

### Assume that the two points are joined by a line
$$y = mx + c$$
### i.e.
$$p_{y} = mp_{x} + c$$
### and
$$q_{y} = mq_{x} + c$$


### Exercise: Arrange the equations above in the form

$$\mathbf{d} = \mathbf{A}\mathbf{b}$$

### What are $\mathbf{A}$, $\mathbf{b}$ and $\mathbf{d}$?

In [None]:
np.linalg.solve?

### Exercise: Construct the matrices $\mathbf{A}$, $\mathbf{b}$ and $\mathbf{c}$ with NumPy and solve for the slope and the intercept of the line


In [None]:
### Put the slope in the variable `m` and the intercept in a variable `c`.
### Then run the next cell to check your solution
# enter code here

In [None]:
xx = np.linspace(0, 10, 100)
yy = m * xx + c
fig, ax = plt.subplots()
ax.scatter([px, qx], [py, qy], s=100)
ax.annotate('P', [px, py], fontsize='xx-large')
ax.annotate('Q', [qx, qy], fontsize='xx-large')
ax.plot(xx, yy)

# What you just solved was a trivial form of linear regression!
-----------------

# Types of Linear Systems

* ## Ideal System
  - ### number of equations = number of unknowns
  - ### Unique solutions

* ## Underdetermined System:
  - ### number of equations < number of unknowns
  - ### Infinitely many solutions! (Or no solution)

* ## Overdetermined systems:
  - ### number of equations > number of unknowns
  - ### No unique solutions

# Application: Linear Regression
## We want to fit a straight line through the following dataset:

In [None]:
import pandas as pd
df = pd.read_csv('data/hwg.csv')
fig, ax = plt.subplots(figsize=(10, 8))
ax.scatter(df['Height'], df['Weight'], alpha=0.2)

### Question: What type of a system of equations is this? Ideal, underdetermined or overdetermined?

### Each y-coordinate, $y_{i}$ can be defined as:
### $$y_{i} = x_{i}\beta + \epsilon$$

## Ordinary Least Squares solution
### Optimal solution: Find the $\beta$ which minimizes:

### $$S(\beta) = \|\mathbf{y} -\mathbf{x}\beta\|^2$$

### The optimal $\beta$ is:
### $$\hat{\beta} = (\mathbf{x}^{T}\mathbf{x})^{-1}\mathbf{x}^{T}\mathbf{y}$$

In [None]:
np.transpose?

In [None]:
np.linalg.inv?

In [None]:
np.dot?

In [None]:
X = np.c_[np.ones((df.shape[0],)), df['Height'].values]
Y = df['Weight'].values.reshape(-1, 1)

### Exercise: use the formula above to find the optimal beta, given the X and Y as defined.
### Place your solution in a variable named `beta`,
### then run the cell below to check your solution

In [None]:
# enter code here

In [None]:
fig, ax = plt.subplots(figsize=(10, 8))
ax.scatter(df['Height'], df['Weight'], alpha=0.2)
ax.plot(X[:, 1], np.dot(X, beta).ravel(), 'g')