# Lab 06 notebook tutorial

<span style="color: red;">**Please do not read through this notebook until after the invention activity in class**</span>

A summary of our new tool can be found at the end of this notebook

In [None]:
import data_entry2
import numpy as np
import matplotlib.pyplot as plt

In [None]:
# Load the prelab data we used for Lab 05
de1 = data_entry2.sheet_copy("../Lab05/lab05_prelab_hookes_law", "lab05prelab-copy")

In [None]:
# Make the plots and calculate chi-squared

# Scatter step 1: Define the variables we will be plotting, as well as labels and titles
# Plotting variables
xdata = DxVec
ydata = FVec
dydata = dFVec

# Labels and titles
data_label = "Experimental data"
model_label = "F = kx"
graph_title = "Hooke's law investigation using spring compression"
x_label = "Displacement of spring from equilibrium (m)"
y_label = "Force (N)"
residuals_title = "Residuals for Hooke's law investigation using spring compression"
residuals_y_label = "Residual = data - model (N)"

# Model parameters
slope = 2.3 # The initial esimate of the slope
# slope = 2.09 # The slope that minimizes chi-squared to 0.57
# slope = 2.0 # Lower slope corresponding to chi2 = 0.57 + 1 = approximately 1.57
# slope = 2.185 # Higher slope corresponding to chi2 = 0.57 + 1 = approximately 1.57
P = 1 # Your number of fitting parameters, to be used in chi-squared calculation.


# Scatter step 2: find the limits of the data:
xmin = np.min(xdata) # use the np.min function to find the smallest x-value
xmax = np.max(xdata) # same for max
# print (xmin, xmax)  # uncomment to see what the limits are

# Scatter step 3: generate a bunch of x points between xmin and xmax to help us plot the model line
xpoints = np.linspace(xmin, xmax, 200) # gives 200 evenly spaced points between xmin and xmax
# print(xpoints) # uncomment to see the x values that were generated.

# Scatter step 4: calculate the model values:
ypoints = xpoints * slope # this calculates the yvalues at all 200 points.

# Scatter step 5: plot the model line. We plot this as a red line "r-" :
plt.plot(xpoints, ypoints, "r-", label = model_label)

# Scatter step 6: Plot the data, with the previous details from before
plt.errorbar(xdata, ydata, dydata, fmt="bo", markersize = 3, label=data_label)
plt.title(graph_title)
plt.xlabel(x_label)
plt.ylabel(y_label)
plt.legend()
plt.show()

# Residuals step 1: Calculate the model prediction for each our data points from dxVec
ymodel = slope * xdata # y = mx

# Residuals step 2: Calcualte the residuals vector
residualsVec = ydata - ymodel

# Residuals step 3: Plot the residuals vector against the x-data vector
plt.errorbar(xdata, residualsVec, dydata, fmt="bo", markersize = 3)

# Residuals step 4: Add a horizontal line at R=0 to the plot
plt.hlines(y=0, xmin=xmin, xmax=xmax, color='k') # draw a black line at y = 0.

# Residuals step 5: Add axis labels and title, and show the graph
plt.title(residuals_title)
plt.xlabel(x_label)
plt.ylabel(residuals_y_label)
plt.show()

# Calculate chi-squared 
chi2 = np.sum((residualsVec/dydata)**2)/(len(residualsVec)-P)
print ("Slope: ", slope, "N/m")
print ("Weighted chi-squared: ", chi2)


In [None]:
# Keep track of chi-squared based on changing the slope
# - This table is prefilled with example values
de_chi2 = data_entry2.sheet('Lab06-demo-table')

### Best estimate for slope:

* The best estimate of your slope will be the one you find above that has the lowest chi-squared. In this example it is slope = 2.09 N/m, which corresponds to a chi-squared of 0.57
* The best estimate of your uncertainties on the slope will be half of the difference between the slightly smaller slope `slope_min` and the slightly larger slope `slope_max` that each increase the chi-squared by approximately +1. Since our minimized chi-squared is 0.57, this means we are looking for the slopes that make chi-squared approximately equal to 1.6. This give `slope_min = 2.0` and `slope_max = 2.185`.

In [None]:
# Best slope
slope_best = 2.09 # Gives chi-squared of 0.57
slope_max = 2.185 # Give chi-squared of 1.6 (approximately 1 higher than 0.57)
slope_min = 2.0 # Gives chi-squared of 1.6 (approximately 1 higher than 0.57)
dslope = (slope_max - slope_min)/2.
print("Slope uncertainty:", dslope, "N/m")

Our best estimate of the slope is 2.090 $\pm$ 0.093 N/m.

# Appendix A: Fitting using reduced chi-squared minimization / weighted least squares fitting

$$\large \chi_w^2 = \frac{1}{N-P} \sum_{i=1}^N \left[ \frac{y_i - f(x_i) }{\delta y_i} \right]^2$$

When using chi-squared, the goal is to adjust your fitting parameters in order to minimize the value for chi-squared, which indicates the best possible fit of your model to the data.

Interpreting $\large \chi_w^2$:

* $\large \chi_w^2 \approx 1$: The model fits the data well, assuming uncertainties have been characterized well
* $\large \chi_w^2 \gg 1$: Not a good fit or the uncertainties have been underestimated
* $\large \chi_w^2 \ll 1$: The uncertainties have been overestimated

Using chi-squared is a 2-step process:
1. First minimize chi-squared by adjusting parameters.
2. Then, once it is minimized, interpret the value. 

The goal is **not** to make chi-squared = 1, it is to minimize it to find the best possible fit, and then interpret the resulting chi-squared value

# Appendix B: Including a y-intercept in your model

Here we provide an example of how to update everything to include a y-intercept (`intercept`) in your model. This requires the following changes, which are all indicated by `###` in the code below
1. In Scatter Step 1, add a y-intercept fitting parameter, `intercept`;
2. In Scatter Step 1, update the number of fitting parameters `P` to be 2 since the fitting parameters are `slope` and `intervept`;
3. In Scatter Step 4, update the model line to include the y-intercept: `ypoints = slope * xpoints + intercept`;
4. In Residuals Step 1, update the model predictions for each data point to include the y-intercept: `ymodel = slope * xdata + intercept`;
5. In the final Calculate Chi-squared step, the earlier update of `P` will make the `N-P` term be updated correctly.

**Note:** This does not actually reduce chi-squared further in this specific example because the additional parameter changes our number of parameters to two, `P=2`. Because the fit is already quite good without this y-intercept, this is telling us that the simpler model is preferred, meaning the `y=mx` model is preferred over the `y=mx+b` model.

In [None]:
# Make the plots and calculate chi-squared

# Scatter step 1: Define the variables we will be plotting, as well as labels and titles
# Plotting variables
xdata = DxVec
ydata = FVec
dydata = dFVec

# Labels and titles
data_label = "Experimental data"
model_label = "F = kx + b" ### Included a y-intercept
graph_title = "Hooke's law investigation using spring compression"
x_label = "Displacement of spring from equilibrium (m)"
y_label = "Force (N)"
residuals_title = "Residuals for Hooke's law investigation using spring compression"
residuals_y_label = "Residual = data - model (N)"

# Model parameters
slope = 2.09 # The best slope from the y = mx model
intercept = 0. ### Added the y-intercept, b
P = 2 ### Updated number of fitting parameters to be 2 (m and b)


# Scatter step 2: find the limits of the data:
xmin = np.min(xdata) # use the np.min function to find the smallest x-value
xmax = np.max(xdata) # same for max
# print (xmin, xmax)  # uncomment to see what the limits are

# Scatter step 3: generate a bunch of x points between xmin and xmax to help us plot the model line
xpoints = np.linspace(xmin, xmax, 200) # gives 200 evenly spaced points between xmin and xmax
# print(xpoints) # uncomment to see the x values that were generated.

# Scatter step 4: calculate the model values:
ypoints = slope * xpoints + intercept ### Update the model line to include the y-intercept

# Scatter step 5: plot the model line. We plot this as a red line "r-" :
plt.plot(xpoints, ypoints, "r-", label = model_label)

# Scatter step 6: Plot the data, with the previous details from before
plt.errorbar(xdata, ydata, dydata, fmt="bo", markersize = 3, label=data_label)
plt.title(graph_title)
plt.xlabel(x_label)
plt.ylabel(y_label)
plt.legend()
plt.show()

# Residuals step 1: Calculate the model prediction for each our data points from dxVec
ymodel = slope * xdata + intercept ### Updated model values at the data points to y = mx + b

# Residuals step 2: Calcualte the residuals vector
residualsVec = ydata - ymodel

# Residuals step 3: Plot the residuals vector against the x-data vector
plt.errorbar(xdata, residualsVec, dydata, fmt="bo", markersize = 3)

# Residuals step 4: Add a horizontal line at R=0 to the plot
plt.hlines(y=0, xmin=xmin, xmax=xmax, color='k') # draw a black line at y = 0.

# Residuals step 5: Add axis labels and title, and show the graph
plt.title(residuals_title)
plt.xlabel(x_label)
plt.ylabel(residuals_y_label)
plt.show()

# Calculate chi-squared 
chi2 = np.sum((residualsVec/dydata)**2)/(len(residualsVec)-P)
print ("Slope: ", slope, "N/m")
print ("Weighted chi-squared: ", chi2)