In [9]:
import pathlib
import sklearn.metrics
import sklearn.linear_model

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import linear_regression as lr

# 1. Linear Regression

From school you know functions of the form $y = f(x)$ and in particular linear functions $y = mx + b$. Both $x \in \mathbb{R}$ and $y \in \mathbb{R}$ are variables, whereas $m$ and $b$ are parameters. In our first task, we study how to *learn* a dataset with a linear machine learning (ML) model.

## 1.1 Load and Visualize the Dataset
Let's load and plot our dataset first!

In [None]:
# Load the dataset
X, y = lr.load_data()
print(X, y)

# Plot the dataset
lr.scatter_data(X, y)

Our dataset is a list of $N=200$ data points $(x^{(i)}, y^{(i)})$ with $i=1,2,\dots,N$. The dataset seems to be somewhat **linear but noisy**. We can express our belief with the following formula:
$$ y^{(i)} = f(x^{(i)}) + \varepsilon = m x^{(i)} + b + \varepsilon,$$
with $\varepsilon$ as random noise and $(m, b)$ as unknown parameters.

### Q1.1
1. How large is the dataset?
2. What is $x^{(i)}$? What is $y^{(i)}$?
3. How many features does each data point have? 

## 1.2 Fitting Parameters Manually
We think our data can be **approximated** with a linear function, but the problem is we do not know the parameters $m$ and $b$.
Let's begin with a simple approach: we use a **linear model** $\hat{f}$ of the form
$$ \hat{y}^{(i)} = \hat{f}(x^{(i)}) = \theta_0 + \theta_1 x^{(i)}$$
and just guess the model parameters $\theta = (\theta_0, \theta_1)$.
The variable $\hat{y}^{(i)}$ is called the **estimated outcome**.
We want it to be close to the **actual outcome** $y^{(i)}$.

In [None]:
# Choose your model parameters
theta_0 = 
theta_1 = 

# The linear regression model
def linear_regression_model(x):
    return theta_0 +  theta_1 * x

# Plot the dataset and your model fit
lr.plot_prediction(X, y, linear_regression_model)

### Q1.2
1. What are the parameters of our model?
2. What is $\hat{y}^{(i)}$?
3. Can our dataset be modeled with a linear function $f(x)$ **exactly**? What would that mean in terms of $\hat{y}^{(i)}$ and $y^{(i)}$?

## 1.3 How Good is our Fit?
We found some model parameters using the great **trial and error** problem-solving method by fitting the model to our data **visually**. However, instead of relying on our eyes, let's try to **measure** how well our model fits our data. In ML we use **error metrics** to quantify prediction performance of a model. The prediction error is given by
$$e^{(i)} = y^{(i)} - \hat{y}^{(i)}.$$
Let's plot our prediction error for a few data points!

In [None]:
lr.plot_prediction_error(X, y, linear_regression_model)

Our error metric is based on the prediction error.
It is called **Mean Absolute Error (MAE)**:
$$ \text{MAE}(y, \hat{y}) = \frac{1}{N} \sum_{i=1}^{N} |e^{(i)}| = \frac{1}{N} \sum_{i=1}^{N} |y^{(i)} - \hat{y}^{(i)}|$$

In [None]:
# Let's predict with our own model
y_pred = linear_regression_model(X)
mae_own = sklearn.metrics.mean_absolute_error(y, y_pred)
print(f"Our MAE is {mae_own}")

## 1.4 Automatically Fitting Parameters with ML
To predict linear functions we can use a ML model called Linear Regression (LR).
Let's see which parameters $\theta$ are found by the LR model and the MAE it achieves on our dataset.

In [None]:
# Fit a linear regression (LR) model
reg = sklearn.linear_model.LinearRegression()
reg.fit(X, y)

# Print the learned parameters
print(f"The learned parameters are theta={[reg.intercept_, *reg.coef_]}")
y_pred = reg.predict(X)
lr.plot_prediction(X, y, reg)

# Print the MAE of the LR model
mae_lr = sklearn.metrics.mean_absolute_error(y, y_pred)
print(f"The MAE of the LR model is {mae_lr}")

### Q1.4
1. Explain what the MAE measures.
2. What do we *learn* when we train our model?
3. Is your model similar to the LR model?