## Exercise 1: Line of Best Fit

In school, you probably worked with scatterplots and drew lines of best fit on the data that didn't deviate too far from the scatterpoints. By doing it this way, we would minimize the **Error** of the line of best fit.

It is important to choose the model that will generate the least errors when fitted to your data. This will ensure far more reliable outputs. 

![image.png](../assets/ex6-line-best-fit.png)

There are many error functions to choose from when it comes to modelling data. In this exercise , we will look at *Ordinary Least Squares*. As you can see in the picture, the vertical distance from the line of best fit to our data point is called the error. If we square the errors and take an average of the errors, we have obtained the **Mean Square Error (MSE)** for the model.

For example, given

- a line of best fit with $\hat y = mx_i$.
- data points $(-2,5)$, $(0,0)$, $(3,-6)$.

 We can compute the MSE:

 $$ MSE = \frac{1}{n}\sum_{i=1}^n (y_i - \hat y)^2$$

where $n$ is the number of data points, $y_i$ is the $y$ coordinate of the data point and $hat y$ is the output the line of best fit predicts for that specific $x$ coordinate.

$$ MSE = \frac{1}{n}\sum_{i=1}^n (y_i - mx_i)^2$$

Let's plug our data points into the error function for the model.

$$ MSE = \frac {1}{3}[(5-(-2)m)^2 + (0-(0)m)^2 + (-6-3m)^2]$$

### Exercise 1.1

Use code to figure out which value of $m$ will minimize our error function. By hand, try to expand the equation for the MSE that we found above and clean it up. Then write it as a function in the cell below.

Remember to format it as:

```
def f(x):
    return...
```

When defining your function, use `x` instead of $m$.

In [1]:
import numpy as np
from scipy.misc import derivative

In [2]:
# exercise 1.1

def f(x):
    return 1/3*(61+56*x+13*x**2)

m = np.random.uniform(1000,-1000,)
for i in range(1000000):
    x = np.random.uniform(1000,-1000,)
    if f(x) < f(m):
        m = x
    else:
        continue
        
print(m)

-2.1577444880480243


In [3]:
# The value of m that will the minimize the MSE is -2.15.

### Exercise 1.2

1. Use the `linspace` function to create 29 points between -28 and 10 and save the result as `W`.
2. After that, divide every element in W by 13 and save this new result as `W`.

In [4]:
# exercise 1.2

W = np.linspace(-28,10,29)
W /= 13
W

array([-2.15384615, -2.04945055, -1.94505495, -1.84065934, -1.73626374,
       -1.63186813, -1.52747253, -1.42307692, -1.31868132, -1.21428571,
       -1.10989011, -1.00549451, -0.9010989 , -0.7967033 , -0.69230769,
       -0.58791209, -0.48351648, -0.37912088, -0.27472527, -0.17032967,
       -0.06593407,  0.03846154,  0.14285714,  0.24725275,  0.35164835,
        0.45604396,  0.56043956,  0.66483516,  0.76923077])

### Exercise 1.3

Run $f(W)$ and $fprime(W)$ in the cell below and determine the value in `W` that makes the `fprime = 0` (or very close to it!).

Use the loop that you wrote in the previous exercise. That value should give us the value of $m$ that makes it so our line of best fit has the smallest error.

NOTE: Make the print statement in your loop as "The value of m that gives the smallest error is..."

In [8]:
# exercise 1.3
m = W[0]
for i in range(len(list(W))):
    if f(W[i]) < f(m):
        m = W[i]
    else:
        continue
print(f"The value of m that gives the smallest error is: {m}")

The value of m that gives the smallest error is: -2.1538461538461537


In [9]:
# exercise 1.3

def fprime(x):
    h = 1e-5
    return (f(x+h) - f(x))/h 

m = W[0]
for i in range(len(list(W))):
    if fprime(W[i]) < f(m):
        m = W[i]
    else:
        continue
print(f"The value of m that gives the smallest error is: {m}")

The value of m that gives the smallest error is: -2.1538461538461537


## Exercise 2: Multivariate Calculus + Linear Algebra

Up to now, we've looked at functions with respect to one variable, but what if we have more than one variable in our function and we want to take a derivative?

Going back to our Error function exercise from the line of best fit, what if I wanted to fit the line:

$$\hat y = mx_i + b$$
to the points (-3,7), (-2,5) and (-1,3).

This would give a Mean Square Error function as:

$$ f(m,b) = \frac{1}{n}\sum_{i=1}^n (y_i - mx_i - b)^2$$
$$f(m,b) = \frac {1}{3}[(7+3m-b)^2 + (5+2m-b)^2 + (3+m-b)^2]$$

and say we wanted to find values of $m$ and $b$ that minimized this function. In this case, we'd apply a **partial derivative**. In other words, a derivative with respect to one of the variables holding the other constant. If we take derivatives of the above function with respect to $m$ and $b$, we get:

$$\frac{\partial f(m,b)}{\partial m} = \frac{2}{3}[(7+3m-b)(3) + (5+2m-b)(2) + (3+m-b)] $$

$$\frac{\partial f(m,b)}{\partial b} = \frac{2}{3}[(7+3m-b)(-1) + (5+2m-b)(-1) + (3+m-b)(-1)] $$

> To better understand how we obtained these derivatives by hand, watch [this video](https://youtu.be/TgIl15Nlg_U) for a more detailed explanation.

Let's clean up the above equations:

$$\frac{\partial f(m,b)}{\partial m} = \frac{2}{3}[34 + 14m - 6b] $$

$$\frac{\partial f(m,b)}{\partial b} = \frac{2}{3}[-15 -6m + 3b] $$

Equating the partial derivatives to 0 since we want to obtain a minimum and multiplying both sides by $\frac{3}{2}$ we get a familiar system of equations:

$$34 + 14m - 6b = 0$$

$$-15-6m+3b = 0$$

Converting to matrix form, we get:

$$\begin{bmatrix} 34 & 14 & -6 \\ -15 & -6 & 3 \end{bmatrix} \begin{bmatrix} 1 \\ m \\ b \end{bmatrix} = \begin {bmatrix} 0\\ 0\end{bmatrix} $$

From here, we can use our standard matrix operations to solve for values of $m$ and $b$.

We can rewrite the above equation as:

$$\begin{bmatrix} 14 & -6 \\ -6 & 3 \end{bmatrix} \begin{bmatrix} m \\ b \end{bmatrix} = \begin {bmatrix} -34\\ 15\end{bmatrix} $$

**EXTRA** Try to workout by hand how I was able to make the conversion between the two matrices. 

Use the cell below to write code that will solve the above matrix for the values of $m$ and $b$ that minimize our error.

In [11]:
# exercise 2

Matrix = np.array([[14,-6],[-6,3]])
Solutions = np.array([-34,15])
x = np.linalg.solve(Matrix, Solutions)
m = x[0]
b = x[1]
print(f"m is {m} and b is {b}.")

m is -2.0 and b is 1.0.


_Using this method, we can fit more complicated models that have more than one parameter to our data for better results!_