# Prelab 06: A review of residuals and fitting

In [None]:
%reset -f
%matplotlib widget
import data_entry2
import numpy as np
import matplotlib.pyplot as plt
import fit_plot


This prelab focuses on revisting and reinforcing concepts from Lab 05. We start with a review of residual plots.

## 6.1 Review of residuals

**Your turn #1a:** Given experimental data and a model for that data, what is a residual?

**Your turn #1b:** What are the properties of a residual plot that inform us that the model is a good fit to the data?

##### **Answer for #1a:**

A residual is defined as the difference between experimental data and a model, i.e., $r_i = y_i - f(x_i)$.

##### **Answer for #1b:**

A model that is a good fit to the data will have a residual plot with:<font>
- No obvious trend
- A roughly equal scatter of points across the $x$-axis ($x = 0$)
- And if the uncertainties are well characterized, we will also see
    + Roughly ~68% of error bars crossing the $x$-axis
    + Roughly all (~95%) of doubled error bars crossing the $x$-axis

## 6.2 Use residuals to diagnose model fits

Now that we have reminded ourselves of how to use residuals to diagnose the goodness of fit of a model to a given set of experimental data, let's use these criteria to diagnose some fits. Below, we have three data sets to which we have tried to fit the model of a straight line with intercept; $y = mx + b$. We provide a `slope` and `intercept` for an initial fit and ask you to diagnose what needs to be changed to improve the quality of the fit.

In [None]:
# Run me to load our three data sets
# Make sure to hit the "Generate Vectors" button!

de1 = data_entry2.sheet("lab06_prelab_data.csv")

### Your turn #2 (Dataset 1)
Answer the questions below related to the fit shown and then for part c, update the fitting parameters to get a better fit. 

![dataset 1](https://i.ibb.co/r7ShPLk/data1.png)

#### **Your turn #2a:** Which feature(s) of the **residual** plot above indicates a problem with the fit of the model to the data?

##### **Answer #2a**

We see a trend in the residuals where all of the residuals are negative, lying below the residuals = 0 line.

#### **Your turn #2b:** What should be changed about the model to improve the fit?

##### **Answer #2b**

The $y$-intercept of the model is too large, so we should decrease it.

#### **Your turn #2c:** Run the `fit_plot.line` interactive fitting session below and update the fitting parameters to `slope = 1` and `intercept = 1.1` using the text entry boxes in the widget. Use the interactive fitting widget to update the fit to show a good fit in the scatter plot and residuals. What are your new slope and/or $y$-intercept?

In [None]:
# Use the text boxes to manually update the values to
# slope = 1, intercept = 1.1

fit_plot.line("dataset 1", xVec, y_1Vec, del_y_1Vec)

##### **Answer #2c:**

Changing the intercept from 1.1 to 1 in the code improves the model in the desired fashion.

### Your turn #3 (Dataset 2)

Answer the questions below related to the fit shown and then for part c, update the fitting parameters to get a better fit. 

![dataset 2](https://i.ibb.co/5sLwRZh/data2.png)

#### **Your turn #3a:** Which feature(s) of the residual plot above indicates a problem with the fit of the model to the data?

##### **Answer #3a**

We see an (upwards) linear trend in the residuals plot.

#### **Your turn #3b:** What should be changed about the model to improve the fit?

##### **Answer #3b**

The slope of the model is too small, so we should increase it.

#### **Your turn #3c:** Run the `fit_plot.line` interactive fitting session below and update the fitting parameters to `slope = 1.98` and `intercept = 3` using the text entry boxes in the widget. Use the interactive fitting widget to update the fit to show a good fit in the scatter plot and residuals. What are your new slope and/or $y$-intercept?

##### **Answer #3c**

Changing the slope from 1.9 to 2 in the code improves the model in the desired fashion.

##### **Cell to generate plots for dataset 2**

In [None]:
# Use the text boxes to manually update the values to
# slope = 1.98, intercept = 3

fit_plot.line("dataset 2", xVec, y_2Vec, del_y_2Vec)

### Your turn #4 (dataset 3)

**Note:** For this dataset, the issue with the model is a bit harder to fix compared to the other two examples; for this part it is alright to skip the step of fixing the model in the code, and just comment on what appears to be wrong and how one could feasibly improve the fit of the model to the data.

![dataset 3](https://i.ibb.co/r0n2b5T/data3.png)

#### **Your turn #4a:** Which feature(s) of the **residual** plot above indicates a problem with the fit of the model to the data?

##### **Answer #4a**

We notice from the residuals plot that there is a upwards parabolic trend.

#### **Your turn #4b:** What should be changed about the model to improve the fit?

##### **Answer #4b**

This parabolic trend tells us that the current linear model is not a good fit for the data; we should add a (positive) quadratic term.

#### **Your turn #4c (optional):** Play around with the interactive fitting below to convince yourself that no matter which compination of slope and intercept you try, a linear model will not result in a good fit.

In [None]:
# Use the text boxes to manually update the values to
# slope = 1, intercept = 3

fit_plot.line("dataset 3", xVec, y_3Vec, del_y_3Vec)

## 6.3 Beyond linear models: quadratic models

### 6.3.1 Definition

Data that is expected to be linear may, in reality, be more complicated. In that case, the assumption of linearity  fails to account for other significant (non-linear) effects. Therefore, it is beneficial to understand how to modify our modeling and plotting code to include non-linear effects. 

**Quadratic models** are represented by a quadratic function, usually written as (standard form):

$$y = f(x) =ax^2 +bx + c,$$ 

where $a$ is the **quadratic coefficient**, $b$ is the **linear coefficient**, and $c$ is the **constant term**. Note that $b$ here is **not** the same parameter as for a linear model.

The graph of a quadratic function is a **parabola**.

### 6.3.2 Example

*In this example, we adapt the plotting code from section 5.5 in Prelab 05 and use dataset 3 from this prelab. We introduce the Matplotlib function `plt.subplots()` which enables to create multiple plots in one figure.*

*We recommend using (and adapting) either of these plotting codes (linear or quadratic model version) for Lab 06 and future labs.*

💡 The Matplotlib function `plt.subplots()` returns:
- `fig`: a Python figure (equivalent to the one returned by `plt.figure()`)
- `axs`: an array of Axes objects (i.e. the subplots themselves). The shape of the resulting array is given by the parameters `nrows` and  `ncols` (number of rows and columns of the subplot grid, respectively) of `plt.subplots()`. For example: `plt.subplots(nrows=3, ncols=2)` returns an array of Axes of shape $(3,2)$; then `axs[2,0]` refers to the subplot located in row 3 and column 1 of the subplot grid (❗remember that Python starts counting at 0).



💡 Simple LaTex equations are supported by Matplotlib: the string needs to be placed inside a pair of dollar signs `$`. Note that Latex symbols containing a backslash, will need to be written with two backslashes. For example: `$\alpha$` (LaTex) --> `"$\\alpha$"` (Matplotlib).

First, we will attempt to fit data from dataset 3 with a linear model. Then we will use a quadratic model (non-linear) and update the plotting code accordingly.

#### Best-fitting attempt using a linear model

In [None]:
# LINEAR MODEL

# Model – step 1: find the range of x values from the experimental data.
x_data = xVec
y_data = y_3Vec
del_x_data = del_xVec
del_y_data = del_y_3Vec
x_min = np.min(x_data)  # find the smallest x value
x_max = np.max(x_data)  # find the largest x value

# Model – step 2: generate an array of model x values between x_min and x_max
# for which we want to plot the model y values.
x_model = np.linspace(
    start=x_min, stop=x_max, num=200
    )  # return 200 evenly spaced values

# Model – step 3: calculate the model y values at each of the model x values.
# Choose best-fitting values for the linear model.
m = 1.07
b = 2.94
y_model = m * x_model + b

# Model – step 4: plot the model on the graph of the experimental data.
# Create a figure and a set of subplots and reference them in the
# variables "fig" and "axs".
fig, axs = plt.subplots(
    nrows=2, ncols=1, squeeze=False,
    height_ratios=[1.75, 1], figsize=(6, 8)
    )  # do not change

data_label = "dataset 3"
graph_title = "Dataset 3 with the best-fitting curve from a linear model"
x_label = "$x$ (s)"
y_label = "$y$ (m)"
axs[0, 0].errorbar(
    x=x_data, y=y_data, yerr=del_y_data,
    fmt='bo', markersize=3, label=data_label
    )  # plot experimental data
axs[0, 0].set_title(graph_title)
axs[0, 0].set_xlabel(x_label)
axs[0, 0].set_ylabel(y_label)

model_label = "model ($y = mx + b$)"
axs[0, 0].plot(x_model, y_model, "r-", label=model_label)  # plot model data
# Add a legend (you can change the location as needed)
axs[0, 0].legend(loc='upper left')

# Residuals – step 1: calculate the model predictions y_prediction for each of
# the measured x_data values.
y_prediction = m * x_data + b

# Residuals – step 2: calculate the residuals.
residuals = y_data - y_prediction

# Residuals – step 3: plot the residuals against the measured x_data values.
residual_graph_title = "Corresponding residuals"
residual_y_label = "residual = data - model (m)"
axs[1, 0].errorbar(
    x=x_data, y=residuals, yerr=del_y_data,
    fmt='bo', markersize=3, label=data_label
    )
axs[1, 0].set_title(residual_graph_title)
axs[1, 0].set_xlabel(x_label)  # reuse the x-label from the scatter plot
axs[1, 0].set_ylabel(residual_y_label)

# Residuals – step 4: add a horizontal line at r=0 to the plot.
axs[1, 0].hlines(y=0, xmin=x_min, xmax=x_max, color='k', label="$r = 0$")
# Add a legend (you can change the location as needed)
axs[1, 0].legend(loc='lower left', fontsize='small')

plt.tight_layout()  # adjust the padding between and around subplots
plt.show()

The parabolic trend in the residuals highlights the need for a quadratic model.

#### Best-fitting attempt using a quadratic model

In [None]:
# QUADRATIC MODEL

# Model – step 1: find the range of x values from the experimental data.
x_data = xVec
y_data = y_3Vec
del_x_data = del_xVec
del_y_data = del_y_3Vec
x_min = np.min(x_data)  # find the smallest x value
x_max = np.max(x_data)  # find the largest x value

# Model – step 2: generate an array of model x values between x_min and x_max
# for which we want to plot the model y values.
x_model = np.linspace(
    start=x_min, stop=x_max, num=200
    )  # return 200 evenly spaced values

# Model – step 3: calculate the model y values at each of the model x values.
# Choose best-fitting values for the quadratic model.
a = 0.01
b = 1
c = 3
y_model = a * x_model**2 + b * x_model + c

# Model – step 4: plot the model on the graph of the experimental data.
# Create a figure and a set of subplots and reference them in the
# variables "fig" and "axs".
fig, axs = plt.subplots(
    nrows=2, ncols=1, squeeze=False,
    height_ratios=[1.75, 1], figsize=(6, 8)
    )  # do not change

data_label = "dataset 3"
data_title = "Dataset 3 with the best-fitting curve from a quadratic model"
x_label = "$x$ (s)"
y_label = "$y$ (m)"

axs[0, 0].errorbar(
    x=x_data, y=y_data, yerr=del_y_data,
    fmt='bo', markersize=3, label=data_label
    )  # plot experimental data
axs[0, 0].set_title(graph_title)
axs[0, 0].set_xlabel(x_label)
axs[0, 0].set_ylabel(y_label)

model_label = "model ($y = ax^2 + bx + c$)"
axs[0, 0].plot(x_model, y_model, "r-", label=model_label)  # plot model data
# Add a legend (you can change the location as needed)
axs[0, 0].legend(loc='upper left')

# Residuals – step 1: calculate the model predictions y_prediction for each of
# the measured x_data values.
y_prediction = a * x_data**2 + b * x_data + c

# Residuals – step 2: calculate the residuals.
residuals = y_data - y_prediction

# Residuals – step 3: plot the residuals against the measured x_data values.
residual_graph_title = "Corresponding residuals"
residual_y_label = "residual = data - model (m)"
axs[1, 0].errorbar(
    x=x_data, y=residuals, yerr=del_y_data,
    fmt='bo', markersize=3, label=data_label
    )
axs[1, 0].set_title(residual_graph_title)
axs[1, 0].set_xlabel(x_label)  # reuse the x-label from the scatter plot
axs[1, 0].set_ylabel(residual_y_label)

# Residuals – step 4: add a horizontal line at r=0 to the plot.
axs[1, 0].hlines(y=0, xmin=x_min, xmax=x_max, color='k', label="$r = 0$")
# Add a legend (you can change the location as needed)
axs[1, 0].legend(loc='upper left', fontsize='small')

plt.tight_layout()  # adjust the padding between and around subplots
plt.show()

## Your turn #5: Preparing your Lab 06 notebook
In this final set of tasks, you will prepare your Lab 06 notebook for data collection and analysis.

1. Open the Lab 06 Instructions on Canvas and take a few minutes to read through them so that you have a sense of how you will be spending your time during the lab.
2. In Part B, you will notice that we will be reanalyzing the data from Lab 05. Add some code to your Part B notebook to read your Lab 05 data into the Lab 06 notebook and launch the `fit_plot.line()` interactive fitting widget. Update the fit parameters to match your best fit from Lab 05. We will be learning a new tool to help us improve our fits even further, and we will want to use your previous best fit as a starting point.
3. Also in Part B (Step 3), you will use your Matplotlib graphing skills to create nice, well-labeled plots, as you did in Section 6.3 of this prelab. Copy and paste the code you need to create these plots. 

You should now be ready for this lab.

# Submit

Steps for submission:

1. Click: Run => Run_All_Cells
2. Read through the notebook to ensure all the cells executed correctly and without error.
3. File => Save_and_Export_Notebook_As->HTML
4. Inspect your downloaded html document
5. Upload the HTML document to the lab submission assignment on Canvas.

In [None]:
# The following function will display tables based on the data currently
# stored in your data_entry2 spreadsheets. Please do not modify this cell.
display_sheets()