# Lab 06 notebook tutorial

<span style="color: red;">**Please do not read through this notebook until after the Lab 06 invention activity in class**</span>

**Please note:** A summary of our new goodness-of-fit tool can be found in Appendix A, at the end of this notebook

*Updated Oct 28, 2024: Fixed some typos*

## Load the libraries and the fitting data

In [None]:
%reset -f
import data_entry2
import numpy as np
import matplotlib.pyplot as plt
import fit_plot
%matplotlib widget

In [None]:
# Load the prelab data we used for Lab 05
de1 = data_entry2.sheet_copy("../Lab05/prelab05_hookes_law", "lab05prelab-copy")

## Use the `fit_plot.line` interactive fitting widget with $\chi^2$ output enabled to find the best-fit line

Use the following code, with `chi2 = True` to launch the `fit_plot.line` widget with $\chi^2$ output enabled. 

Try to minimize the value for $\chi^2$ using a combination of clicking within the scatter plot, within the residuals plot, and updating the values manually using the text boxes. You should be able to find a combination of parameters that gets you to a $\chi^2$ value that is slightly below 0.5. You can also check the answer in the cell that follows the fitting widget. 

In [None]:
xdata = DxVec
ydata = FVec
dydata = dFVec
unique_graph_title = "Chi-squared minimization for Prelab 05 data"

fit_plot.line(unique_graph_title, xdata, ydata, dydata, chi2 = True)

#### Answer: Best-fit model parameters

The best-fit parameters are `slope = 2.08` and `intercept = 0.0045`, which gives a $\chi^2$ of 0.66.

## Estimate the slope uncertainties using `slope_min` and `slope_max`

Our new goodness-of-fit statistic, $\chi^2$, allows us to use an improved method to determine the slope uncertainty, which we will demonstrate below. To find `max_slope` and `min_slope`, keep the `intercept` fixed at the value you found for the best-fit line and then adjust the slope (up and down) to approximately double the goodness-of-fit statistic as compared to the best fit. This provides the 68% Confidence Interval for the slope, which you can divide by 2 to get the standard uncertainty for the slope.

The steps:
1. The best estimate of your slope will be the one from your best-fit model above with the lowest chi-squared. In this example the best-fit model is `slope = 2.08 N/m` and `intercept = 0.0045 m`, which corresponds to a chi-squared of 0.66.
2. While keeping `intercept` fixed at `0.0045 m`, we adjust the `slope` upward until our chi-squared is approximately 2*0.66, which is 1.32. Being within 5% of this value (so ~ 1.25 - 1.39) is precise enough so we can use `slope_max = 2.148 N/m`, which gives a chi-squared of 1.33. Notice how the residuals for this fit look consistent with how we have described `slope_max` in the past. 
3. Take the slope back to the `best_fit` value of `2.09 N/m` while continuing to keep `intercept` fixed. Now we adjust the `slope` downward until it gives a chi-squared of approximately 1.32. Here we find `slope_min = 2.013 N/m` corresponds to chi-squared = 1.32.
4. The best estimate of the standard uncertainty of the slope is half of the difference between `slope_max` and `slope_min` (the 68% Confidence Interval for the slope), where each of these slopes were found by adjusting the `slope` until chi-squared doubled, approximately. We then additionally apply a factor of $1/\sqrt{N}$ to reward you with the precision benefits of taking more measurements.

The code below details these calculations.

In [None]:
# Slope uncertainty calculation

# All slope values correspond to a fit using intercept = 0.0045 m

slope_best = 2.08 # chi2 = 0.66
slope_max = 2.15 # chi2 = 1.33 (approximately 2*0.66)
slope_min = 2.013 # chi2 = 1.32 (approximately 2*0.66)
N = len(xdata)
dslope = (slope_max-slope_min)/(2 * np.sqrt(N) )
print("Slope uncertainty:", dslope, "N/m")

**Reporting results:** Our best estimate of the slope is 2.080 $\pm$ 0.026 N/m.

## Make nice plots and calculate $\chi^2$ yourself

In the code block below we make some small updates to our usual code for making nice plots. 
1. We add `P = 2` in "Scatter step 1" as the number of fitting parameters, and
2. At the end of the code block we add a new section "Calculate chi-squared", which shows how easy it is to calculate $\chi^2$ once you have already calculated your residuals. Recall that the $y_i - f(x_i)$ term in the $\chi^2$ equation in Appendix A is how one calculates a residual.
   * This code prints out the `slope` and `intercept` used in the model, and the resulting $\chi^2$ for that choice of `slope` and `intercept`.

In [None]:
'''
Code to make the scatter and residuals plots, as well as calculate 
chi-squared for a linear model
'''

# Scatter step 1: Define the variables we will be plotting, as well as labels and titles
# Plotting variables
xdata = DxVec
ydata = FVec
dydata = dFVec

# Labels and titles
data_label = "Experimental data"
model_label = "F = kx"
graph_title = "Hooke's law investigation using spring compression"
x_label = "Displacement of spring from equilibrium (m)"
y_label = "Force (N)"
residuals_title = "Residuals for Hooke's law investigation using spring compression"
residuals_y_label = "Residual = data - model (N)"

# Model parameters
slope = 2.08 # N/m
intercept = 0.0045 # N
### Added the number of fitting parameters = 2 (slope and intercept)
P = 2 # Your number of fitting parameters; to be used in chi2 calculation

# Scatter step 2: find the limits of the data:
xmin = np.min(xdata) # use the np.min function to find the smallest x-value
xmax = np.max(xdata) # same for max
# print (xmin, xmax)  # uncomment to see what the limits are

# Scatter step 3: generate a bunch of x points between xmin and xmax to help us plot the model line
xpoints = np.linspace(xmin, xmax, 200) # gives 200 evenly spaced points between xmin and xmax
# print(xpoints) # uncomment to see the x values that were generated.

# Scatter step 4: calculate the y points to plot the model line
ypoints = xpoints * slope + intercept # this calculates the model y-values at all 200 points.

# Scatter step 5: plot the model line. We plot this as a red line "r-" :
plt.figure()
plt.plot(xpoints, ypoints, "r-", label = model_label)

# Scatter step 6: Plot the data, with the previous details from before
plt.errorbar(xdata, ydata, dydata, fmt="bo", markersize = 3, label=data_label)
plt.title(graph_title)
plt.xlabel(x_label)
plt.ylabel(y_label)
plt.legend()
plt.show()

# Residuals step 2: Calculate the model prediction for each our data points from dxVec
ymodel = slope * xdata + intercept # y = mx at each data point, x_i

# Residuals step 3: Calcualte the residuals vector
residualsVec = ydata - ymodel

# Residuals step 4: Plot the residuals vector against the x-data vector
plt.figure()
plt.errorbar(xdata, residualsVec, dydata, fmt="bo", markersize = 3)

# Residuals step 5: Add a horizontal line at R=0 to the plot
plt.hlines(y=0, xmin=xmin, xmax=xmax, color='k') # draw a black line at y = 0.

# Residuals step 6: Add axis labels and title, and show the graph
plt.title(residuals_title)
plt.xlabel(x_label) # re-use the x_label from the scatter plot with model
plt.ylabel(residuals_y_label)
plt.show()

### Added the chi-squared calculation and provided output for the fitting parameters used

# Calculate chi-squared 
chi2 = np.sum((residualsVec/dydata)**2)/(len(residualsVec)-P)
print ("Slope: ", slope, "N/m")
print ("Intercept: ", intercept, "N")
print ("Weighted chi-squared: ", chi2)

# Appendix A: Fitting using reduced chi-squared minimization / weighted least squares fitting

$$\large \chi_w^2 = \frac{1}{N-P} \sum_{i=1}^N \left[ \frac{y_i - f(x_i) }{\delta y_i} \right]^2$$

We use chi-squared to help us find the **best possible fit** of the model to the data. To do so we adjust the fitting parameters to find the lowest possible value for chi-squared.

Interpreting $\large \chi_w^2$:

* $\large \chi_w^2 \approx 1$: The model fits the data well, assuming uncertainties have been characterized well
* $\large \chi_w^2 \gg 1$: Not a good fit or the uncertainties have been underestimated
* $\large \chi_w^2 \ll 1$: The uncertainties have been overestimated

Using chi-squared is a 2-step process:
1. First minimize chi-squared by adjusting parameters.
2. Then, once it is minimized, interpret the value. 

The goal is **not** to make chi-squared = 1, it is to minimize it to find the best possible fit, and then interpret the resulting chi-squared value