# Lab 06 Prelab - A review of residuals and fitting

In [1]:
%reset -f
import data_entry2
import numpy as np
import matplotlib.pyplot as plt
import fit_plot
%matplotlib widget

This prelab focuses on revisting and reinforcing concepts from Lab 05. We start with a review of residual plots.

## Review: How to read residuals

**Your turn #1a:** Given experimental data and a model for that data, what is a residual?

**Your turn #1b:** What are the properties of a residual plot that inform us that the model is a good fit to the data?

##### **Answer for #1a:**

A residual is defined as the difference between experimental data and a model, i.e., $r_i = y_i - \text{model}(x_i)$ or $r_i = y_i - f(x_i)$.

##### **Answer for #1b:**

A model that is a good fit to the data will have a residual plot with:<font>
- No obvious trend
- A roughly equal scatter of points across the x-axis (x = 0)
- And if the uncertainties are well characterized, we will also see
    + Roughly ~68% of error bars crossing the x-axis
    + Roughly all (~95%) of doubled error bars crossing the x-axis

## Diagnosing model fits using residuals

Now that we have reminded ourselves of how to use residuals to diagnose the goodness of fit of a model to a given set of experimental data, let's use these criteria to diagnose some fits. Below, we have three data sets to which we have tried to fit the model of a straight line with intercept; $y_{\text{model}} = mx + b$. We provide a `slope` and `intercept` for an initial fit and ask you to diagnose what needs to be changed to improve the quality of the fit.

In [3]:
# Run me to load our three data sets
# Make sure to hit the "Generate Vectors" button!

de1 = data_entry2.sheet("lab06_prelab_data.csv")

Sheet name: lab06_prelab_data.csv


VBox(children=(HBox(children=(Button(description='Undo', style=ButtonStyle()), Button(description='Redo', styl…

### Your turn #2 (Dataset 1)
Answer the questions below related to the fit shown and then for part c, update the fitting parameters to get a better fit. 

![data set 1](https://i.ibb.co/r7ShPLk/data1.png)

#### **Your turn #2a:** Which feature(s) of the RESIDUALS plot above indicates a problem with the fit of the model to the data?

##### **Answer #2a**

We see a trend in the residuals where all of the residuals are negative, lying below the residuals = 0 line.

#### **Your turn #2b:** What should be changed about the model to improve the fit?

##### **Answer #2b**

The y-intercept of the model is too large, so we should decrease it.

#### **Your turn #2c:** Run the `fit_plot.line` interactive fitting session below and update the fitting parameters to `slope = 1` and `intercept = 1.1` using the text entry boxes in the widget. Use the interactive fitting widget to update the fit to show a good fit in the scatter plot and residuals. What are your new slope and/or y-intercept?

In [None]:
# Use the text boxes to manually update the values to
# slope = 1, intercept = 1.1

fit_plot.line("Data set 1", xVec, y1Vec, dy1Vec)

##### **Answer #2c:**

Changing the intercept from 1.1 to 1 in the code improves the model in the desired fashion.

### Your turn #3 (Dataset 2)

Answer the questions below related to the fit shown and then for part c, update the fitting parameters to get a better fit. 

![data set 2](https://i.ibb.co/5sLwRZh/data2.png)

#### **Your turn #3a:** Which feature(s) of the RESIDUALS plot above indicates a problem with the fit of the model to the data?

##### **Answer #3a**

We see an (upwards) linear trend in the residuals plot.

#### **Your turn #3b:** What should be changed about the model to improve the fit?

##### **Answer #3b**

The slope of the model is too small, so we should increase it.

#### **Your turn #3c:** Run the `fit_plot.line` interactive fitting session below and update the fitting parameters to `slope = 1.98` and `intercept = 3` using the text entry boxes in the widget. Use the interactive fitting widget to update the fit to show a good fit in the scatter plot and residuals. What are your new slope and/or y-intercept?

##### **Answer #3c**

Changing the slope from 1.9 to 2 in the code improves the model in the desired fashion.

##### **Cell to generate plots for data set 2**

In [None]:
# Use the text boxes to manually update the values to
# slope = 1.98, intercept = 3

fit_plot.line("Data set 2", xVec, y2Vec, dy2Vec)

### Your turn #4 (Data set 3)

**Note:** For this data set, the issue with the model is a bit harder to fix compared to the other two examples; for this part it is alright to skip the step of fixing the model in the code, and just comment on what appears to be wrong and how one could feasibly improve the fit of the model to the data.

![data set 3](https://i.ibb.co/r0n2b5T/data3.png)

#### **Your turn #4a:** Which feature(s) of the RESIDUALS plot above indicates a problem with the fit of the model to the data?

##### **Answer #4a**

We notice from the residuals plot that there is a upwards parabolic trend.

#### **Your turn #4b:** What should be changed about the model to improve the fit?

##### **Answer #4b**

This parabolic trend tells us that the current linear model is not a good fit for the data; we should add a (positive) quadratic term.

#### **Your turn #4c (optional):** Play around with the interactive fitting below to convince yourself that no matter which compination of slope and intercept you try, a linear model will not result in a good fit.

In [None]:
# Use the text boxes to manually update the values to
# slope = 1, intercept = 3

fit_plot.line("Data set 3", xVec, y3Vec, dy3Vec)

## Using models that are more complicated than `y = mx + b` linear models

Frequently, data that we expect to be linear will be more complicated than that. Often this happens when the assumption of linearity has overlooked other important effects.

As such, it is good to know how to modify our modelling and plotting code to include nonlinear effects. 

Below we include a linear model version of our plotting code from prelab 05, and then afterward a nonlinear version to show how you update the code to make a more complicated model. Out more complicated model includes a $0.01 x^2$ term in the model, which involves including a third fitting parameter `par3` and then updating the two places where we use the model to calculate y-values: 
- ypoints = par3 * xpoints**2 + xpoints * slope + intercept
- ymodel = par3 * xVec**2 + xVec * slope + intercept

In the nonlinear version of the plotting code, look for the modified code where there are comment lines using "##########" to help them stand out.

### *The best fit using a linear model. The obvious parabolic structure in the residuals suggests a different model is needed*

In [None]:
# Run me for a (poor) linear fit to data set 3

# Scatter step 1: Define the variables we will be plotting, as well as labels and titles
# Plotting variables
xdata = xVec
ydata = y3Vec
dydata = dy3Vec

# Labels and titles
data_label = "Data set 3"
model_label = "y = mx + b"
graph_title = "Data set 3"
x_label = "x (s)"
y_label = "y (m)"
residuals_title = "Residuals for Data set 3 linear fit"
residuals_y_label = "Residual = data - model (m)"

# Model parameters
######## SLOPE AND INTERCEPT CHOSEN TO SHOW CLEARLY THAT THE DATA ARE NONLINEAR #######
slope = 1.07 
intercept = 2.93

# Scatter step 2: find the limits of the data:
xmin = np.min(xdata) # use the np.min function to find the smallest x-value
xmax = np.max(xdata) # same for max
# print (xmin, xmax)  # uncomment to see what the limits are

# Scatter step 3: generate a bunch of x points between xmin and xmax to help us plot the model line
xpoints = np.linspace(xmin, xmax, 200) # gives 200 evenly spaced points between xmin and xmax
# print(xpoints) # uncomment to see the x values that were generated.

# Scatter step 4: calculate the y points to plot the model line
ypoints = xpoints * slope + intercept # this calculates the model y-values at all 200 points.

# Scatter step 5: plot the model line. We plot this as a red line "r-" :
plt.figure()
plt.plot(xpoints, ypoints, "r-", label = model_label)

# Scatter step 6: Plot the data, with the previous details from before
plt.errorbar(xdata, ydata, dydata, fmt="bo", markersize = 3, label=data_label)
plt.title(graph_title)
plt.xlabel(x_label)
plt.ylabel(y_label)
plt.legend()
plt.show()

# Residuals step 2: Calculate the model prediction for each our data points from dxVec
ymodel = slope * xdata + intercept # y = mx at each data point, x_i

# Residuals step 3: Calcualte the residuals vector
residualsVec = ydata - ymodel

# Residuals step 4: Plot the residuals vector against the x-data vector
plt.figure()
plt.errorbar(xdata, residualsVec, dydata, fmt="bo", markersize = 3)

# Residuals step 5: Add a horizontal line at R=0 to the plot
plt.hlines(y=0, xmin=xmin, xmax=xmax, color='k') # draw a black line at y = 0.

# Residuals step 6: Add axis labels and title, and show the graph
plt.title(residuals_title)
plt.xlabel(x_label) # re-use the x_label from the scatter plot with model
plt.ylabel(residuals_y_label)
plt.show()

### *A nonlinear fit which adds `par3 * x**2` to the model and updates `slope` and `intercept` to create a good fit*

In [None]:
# Run me for an improved nonlinear fit to data set 3

# Scatter step 1: Define the variables we will be plotting, as well as labels and titles
# Plotting variables
xdata = xVec
ydata = y3Vec
dydata = dy3Vec

# Labels and titles
data_label = "Data set 3"
model_label = "y = mx + b"
graph_title = "Data set 3"
x_label = "x (s)"
y_label = "y (m)"
residuals_title = "Residuals for Data set 3 linear fit"
residuals_y_label = "Residual = data - model (m)"

# Model parameters
################### UPDATE MODEL PARAMETERS TO INCLUDE par3 * x**2 ####################
slope = 1 # Estimate of the slope
intercept = 3 # Estimate of the y-intercept
par3 = 0.01
#######################################################################################

# Scatter step 2: find the limits of the data:
xmin = np.min(xdata) # use the np.min function to find the smallest x-value
xmax = np.max(xdata) # same for max
# print (xmin, xmax)  # uncomment to see what the limits are

# Scatter step 3: generate a bunch of x points between xmin and xmax to help us plot the model line
xpoints = np.linspace(xmin, xmax, 200) # gives 200 evenly spaced points between xmin and xmax
# print(xpoints) # uncomment to see the x values that were generated.

# Scatter step 4: calculate the y points to plot the model line
############### UPDATE THE MODEL LINE TO INCLUDE par3 * xpoints**2 ####################
ypoints = par3 * xpoints**2 + xpoints * slope + intercept 
#######################################################################################

# Scatter step 5: plot the model line. We plot this as a red line "r-" :
plt.figure()
plt.plot(xpoints, ypoints, "r-", label = model_label)

# Scatter step 6: Plot the data, with the previous details from before
plt.errorbar(xdata, ydata, dydata, fmt="bo", markersize = 3, label=data_label)
plt.title(graph_title)
plt.xlabel(x_label)
plt.ylabel(y_label)
plt.legend()
plt.show()

# Residuals step 2: Calculate the model prediction for each our data points from dxVec
####### UPDATE THE MODEL PREDICTION FOR RESIDUALS TO INCLUDE par3 * xpoints**2 #######
ymodel = par3 * xdata**2 + slope * xdata + intercept # y = par3*x**2 + mx + b
######################################################################################

# Residuals step 3: Calcualte the residuals vector
residualsVec = ydata - ymodel

# Residuals step 4: Plot the residuals vector against the x-data vector
plt.figure()
plt.errorbar(xdata, residualsVec, dydata, fmt="bo", markersize = 3)

# Residuals step 5: Add a horizontal line at R=0 to the plot
plt.hlines(y=0, xmin=xmin, xmax=xmax, color='k') # draw a black line at y = 0.

# Residuals step 6: Add axis labels and title, and show the graph
plt.title(residuals_title)
plt.xlabel(x_label) # re-use the x_label from the scatter plot with model
plt.ylabel(residuals_y_label)
plt.show()

### *Summary of the "Using models that are more complicated than `y = mx + b` linear models" section*

In the above section we introduced a way to modify our models in our code for scatter plots with models and for residuals plots to use models other than `y = mx + b`. You may find this technique helpful in Lab 06 or in future labs.

## Your turn #5: Preparing your Lab 06 notebook
In this final set of tasks you will prepare your Lab 06 notebook for data collection and analysis

1. Open the Lab 06 Instructions on Canvas and take a couple of minutes to read through them so that you have a sense of how you will be spending your time during the lab.
2. In Part B, you will notice that we are going to re-analyze the Lab 05 data. Add some code to your Part B notebook to read your Lab 05 data into the Lab 06 notebook and to launch the `fit_plot.line()` interactive fitting widget. Update the fitting parameters to match your best fit from Lab 05. We are going to learn a new tool to help us improve our fits even further and we are going to want your previous best fit as a starting point.
3. Also in Part B (Step 3), you will using your matplotlib graphing skills to make nice, well-labelled plots, like you did in Section 5.5 of Prelab 05 and Part G of Lab 05. Copy in the code that you need to make those plots. 

You should now be ready for this lab.

# Submit

Steps for submission:

1. Click: Run => Run_All_Cells
2. Read through the notebook to ensure all the cells executed correctly and without error.
3. File => Save_and_Export_Notebook_As->HTML
4. Inspect your downloaded html document
5. Upload the HTML document to the lab submission assignment on Canvas.

In [None]:
display_sheets()