# Simple Linear Regression

We start our section on regression with the simplest model, simple linear regression.

## What we will accomplish in this notebook

In this notebook we will:
- Introduce the simple linear regression model,
- Discuss and visualize its assumptions,
- Demonstrate how to fit the model theoretically and practically

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from seaborn import set_style

## First we import the model class
from sklearn.linear_model import LinearRegression

set_style("whitegrid")

## The model

In simple linear regression (SLR) we have a variable we would like to predict, $y$, and a single feature $x$. The form of $f$ in the supervised learning framework we have discussed is as follows:

$$
y = f(x) + \epsilon = \beta_0 + \beta_1 x + \epsilon,
$$

where $\beta_0, \beta_1 \in \mathbb{R}$ are constants we must estimate and we assume that $\epsilon \sim N(0,\sigma^2)$ is a normally distributed error term independent of $x$.

### Visualizing the model

Let's think about what this model is saying about the outcome variable, $y$. For help we will look at the picture drawn below.

<img src="lecture_3_assets/slr_curves.png" width="60%"></img>

Above we see both the systematic part and the random error. For a given value of $x$ you can find the theoretically possible values for $y$ by going to the line $\beta_0 + \beta_1 x$ and randomly drawing an error term from the normal distribution centered on the line. We can also see one of our key assumptions at play: no matter what the value of $x$, our errors are drawn from the same exact bell curve.

You can look at a 3D version of the same diagram [here](https://www.desmos.com/3d/09db6f9c8d).

If our assumptions hold, we can derive some nice features about estimates and predictions made in the course of fitting this model that we may touch on in our problem session and/or the homework.

### Fitting the model

Given $n$ observations of pairs $(x_i,y_i)$, $i = 1,\dots,n$ how do we fit this model, what do we need to estimate? Remember that our goal is to find an estimate of $f$ called $\hat{f}$. For SLR this means that we need to estimate $\beta_0$ and $\beta_1$, i.e. we need to find $\hat{\beta_0}$ and $\hat{\beta_1}$.

#### Minimizing mean square error (MSE)

We find a $\hat{\beta_0}$ and $\hat{\beta_1}$ by minimizing a <i>loss function</i>, namely the mean square error (MSE), which is given by:

$$
\operatorname{MSE}(\beta) = \frac{1}{n}\sum_{i=1}^n (y_i - f_\beta(x_i))^2.
$$

For the particular case of SLR this is:

$$
\operatorname{MSE}(\beta_0,\beta_1) = \frac{1}{n}\sum_{i=1}^n (y_i - \beta_0 - \beta_1 x_i)^2.
$$

The MSE represents the average square error of the estimate from the actual value, for a measurement of the average error that is on the same scale as $y$ you can take the square root of the MSE known as the Root MSE or RMSE.

You can look at a 2D visualization of the MSE at [this link](https://www.desmos.com/calculator/ewqexkfjm1) and a 3D visualization of the MSE at [this link](https://www.desmos.com/3d/72e4cb5e40).

You can do some mathematics to find the values $\hat{\beta_0}$ and $\hat{\beta_1}$ that minimize the MSE.  This was covered in math hour.  The 5 second summary is that you can either:

* Use calculus:  take the gradient of the MSE with respect to the parameters and set it equal to zero.
* Use linear algebra:  use dot products to project $\vec{y}$ onto the subspace spanned by $\vec{1}$ and $\vec{x}$

Either way you do it we find that

$$
\hat{\beta_0} = \overline{y} - \hat{\beta_1} \overline{x}, \text{ and}
$$

$$
\hat{\beta_1} = \frac{\sum_{i=1}^n \left( x_i - \overline{x}\right)\left( y_i - \overline{y} \right)}{\sum_{i=1}^n \left(x_i - \overline{x} \right)^2} = \frac{\text{cov}(x,y)}{\text{var}(x)},
$$

where $\overline{x}$ and $\overline{y}$ are the means of $x$ and $y$ respectively, $\text{cov}$ denotes the sample covariance and $\text{var}$ denotes the sample variance.

<i>Note:</i> MSE is used as the default loss function for simple linear regression for a number of reasons stemming from its roots as a statistical regression technique. Importantly, MSE is differentiable with respect to $\beta_i$ and is a convex function. As seen in math hour we also minimize the MSE when performing a maximum likelihood estimate of the parameters.  However, MSE is not the only loss function people consider in this type of model. Check out the corresponding `Optional Extra Practice` notebook to learn about mean absolute error (MAE).

## Implementing SLR in `sklearn`

While we can code up coefficient estimates for SLR using the formulae we just derived, we can also use `sklearn`'s `LinearRegression` model object.

Here is the documentation for `LinearRegression`, <a href="https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html">https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html</a>. We will show how to fit the model with some randomly generated data, but in the next notebook we will work with some real data.

In [None]:
## Making some data
np.random.seed(321)
X = np.random.random(100)
y = 2 * X + 1 + 0.5 * np.random.randn(100)

In [None]:
plt.figure(figsize=(6, 5))

plt.scatter(X, y)

plt.xlabel("$X$", fontsize=12)
plt.ylabel("$y$", fontsize=12)

plt.show()

`sklearn` is <b>the</b> python machine learning model package. We will use it frequently throughout these notebooks. `sklearn` models follow a similar pattern we will now demonstrate.

##### Import the model class

##### Make a model object

In [None]:
## Now we make an instance of the model
## To do this just call the name of the model class, LinearRegression()
## Sometimes there are optional arguments, here we note that copy_X  and fit_intercept default to True
## this ensures that the X array is hard copied prior to fitting
slr = LinearRegression()

In [None]:
slr

##### `fit`ting the model

In [None]:
X.reshape(-1, 1).shape

In [None]:
X

In [None]:
X.reshape(-1, 1)

In [None]:
## Now we fit the model
## this is typically model.fit(X, y)
## NOTE! X has to be a 2D array, think matrix or column vector
## Thus we must use .reshape(-1,1), see the Python Prep numpy notebook
slr.fit(X.reshape(-1, 1), y)

##### Making `predict`ions

In [None]:
## model.predict will tell us what the model says
## for an array of input values
slr.predict([[0], [1], [2], [3]])

Those are the basic steps for most every `sklearn` model we will work with. However, models typically have features and methods that are unique to them. We will review a few of those for `LinearRegression` below.

##### Simple linear regression content

In [None]:
## We can look at beta_0_hat with .intercept_
slr.intercept_

In [None]:
## We can look at beta_1_hat with .coef_
slr.coef_

In [None]:
## Plotting the model with our sample

## y = 2*X + 1 + .5*np.random.randn(100)
plt.figure(figsize=(8, 5))

plt.scatter(X, y, alpha=0.7, label="Sample")

plt.plot(
    np.linspace(0, 1, 100),
    slr.predict(np.linspace(0, 1, 100).reshape(-1, 1)),
    "k",
    label=r"Model $\hat{f}$",
)

plt.plot(np.linspace(0, 1, 100), 2 * np.linspace(0, 1, 100) + 1, "r--", label="$f$")

plt.legend(fontsize=10)
plt.xlabel("$X$", fontsize=12)
plt.ylabel("$y$", fontsize=12)

plt.show()

Now you know the basics about simple linear regression and `LinearRegression` in `sklearn`!

--------------------------

This notebook was written for the Erd&#337;s Institute C&#337;de Data Science Boot Camp by Matthew Osborne, Ph. D., 2023.

Any potential redistributors must seek and receive permission from Matthew Tyler Osborne, Ph.D. prior to redistribution. Redistribution of the material contained in this repository is conditional on acknowledgement of Matthew Tyler Osborne, Ph.D.'s original authorship and sponsorship of the Erdős Institute as subject to the license (see License.md)