# The error matrix

Python activities to complement [*Measurements and their Uncertainties*](http://www.oupcanada.com/catalog/9780199566334.html) (*MU*), Chapter 7, "Computer minimization and the error matrix"

* [Preliminaries](#Preliminaries)
* [The error matrix for a linear fit](#The-error-matrix-for-a-linear-fit)
* [Summary](#Summary)

## Preliminaries
Before proceeding with this notebook you should review the topics from the [Example Fit notebook](A.1-Example-Fit.ipynb) and read *MU* Ch. 7, "Computer minimization and the error matrix", with the following [goals](https://wiki.its.sfu.ca/departments/phys-students/index.php/Reading_goals_for_Hughes_and_Hase#Computer_minimization_and_the_error_matrix) in mind.

1. Be able to explain qualitatively how data analysis computer programs fit a model to data by minimizing the *&chi;*<sup>2</sup> goodness-of-fit parameter as a function of the model parameters. Specifically,
    * recognize that the terms *grid search*, *gradient-descent*, *Newton's method*, *Gauss-Newton*, and *Levenberg-Marquardt* refer to different algorithms for minimizing a function;
    * be able to use matrix notation to expand *&chi;*<sup>2</sup> to second order around a particular point in space, as shown in (7.6); and
    * be able to write the gradient vector, Hessian matrix, and Jacobian matrix for a function of multiple variables, and explain how they appear in the context of computer minimization routines.
2. Be able to describe how the curvature matrix relates to the error surface near $\chi^2_{\min}$, and how it can be used to estimate the parameter values at which $\chi^2=\chi_{\text{min}}^2+1$.
3. Be able to describe the meaning and significance of the covariance matrix and the correlation matrix.
4. Recognize that the covariance matrix can be estimated from a fit by inverting the curvature matrix at the minimum of the error surface.

The following code cell includes the initialization commands needed for this notebook.

In [None]:
import numpy as np
from numpy import random
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy.stats import chi2

%matplotlib inline

## The error matrix for a linear fit

### Example fit
The code cell below reproduces the fit in the [Example Fit](A.1-Example-Fit.ipynb). The data are taken from *MU* Exercise (6.1), which is shown in *MU* Fig. 6.1(d) and discussed in *MU* Sec. 7.4.1, "Worked example 1—a straight-line fit." We use the NumPy function [`array_str`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.array_str.html) in the last line to format the covariance matrix to four digits of precision, with `suppress_small=True` to represent the entries in decimal notation instead of scientific notation. Compare with the error matrix $\mathbf{\mathsf{C}}$ at the top of p.&nbsp;96 in *MU*—note that our column order is reversed from theirs because our `model` function lists the parameters in order (`m`,`b`).

In [None]:
# Load file into array
frequency, voltage, err = np.genfromtxt('data/Example-Data.csv', delimiter=',', skip_header=1, unpack=True)

# Define model function
def model(x, m, b):
    return m*x + b

# Set initial parameters m0 and b0
mInit = 2
bInit = 0

# Fit the model to the data
pOpt, pCov = curve_fit(model, frequency, voltage, p0=[mInit, bInit], sigma=err, absolute_sigma=True)

# Assign results of curve_fit to new variables
mOpt = pOpt[0]
bOpt = pOpt[1]
mAlpha = np.sqrt(pCov[0, 0])
bAlpha = np.sqrt(pCov[1, 1])
rho_mb = pCov[0, 1]/(mAlpha*bAlpha)

# Display formatted results
print(f"Model slope (mV/Hz):     {mOpt:.2f} ± {mAlpha:.2f}")
print(f"Model intercept (mV):    {bOpt:.0f} ± {bAlpha:.0f}")
print(f"Correlation coefficient: {rho_mb:.1f}")
print("Covariance matrix:")
print(np.array_str(pCov, precision=4, suppress_small=True))

The matrix representation provides a convenient way to do error propagation. For example, we can rewrite *MU* Eq. (7.28) as

\begin{align}
\sigma_Z^2 &= \left(\frac{\partial Z}{\partial A}\right)^2\sigma_A^2 + 2\left(\frac{\partial Z}{\partial A}\right)\left(\frac{\partial Z}{\partial B}\right)\sigma_{AB} + \left(\frac{\partial Z}{\partial B}\right)^2\sigma_B^2\\
&= \begin{bmatrix}\frac{\partial Z}{\partial A} & \frac{\partial Z}{\partial B}\end{bmatrix}\begin{bmatrix}\sigma_A^2 & \sigma_{AB} \\ \sigma_{AB} & \sigma_B^2\end{bmatrix}\begin{bmatrix}\frac{\partial Z}{\partial A} \\ \frac{\partial Z}{\partial B}\end{bmatrix}.
\end{align}

For a linear fit $y = mx + b$, $\partial y/\partial m = x$ and $\partial y/\partial b = 1$, so we can use the covariance matrix $\mathbf{\mathsf{C}}$ given by the fit to estimate the uncertainty in the *functional estimate* $\hat{y}$ at a given $x$:

\begin{align}
\alpha_\hat{y}^2 &= \begin{bmatrix}x & 1\end{bmatrix}\begin{bmatrix}\mathsf{C}_{11} & \mathsf{C}_{12} \\ \mathsf{C}_{21} & \mathsf{C}_{22}\end{bmatrix}\begin{bmatrix}x \\ 1\end{bmatrix}\\
&= \mathsf{C}_{11} x^2 + 2\mathsf{C}_{12}x + \mathsf{C}_{22},
\end{align}

where in the last line we exploit the symmetry of the covariance matrix, $\mathbf{\mathsf{C}}^\intercal = \mathbf{\mathsf{C}}\Rightarrow \mathsf{C}_{21} = \mathsf{C}_{12}.$ Note that $\alpha_\hat{y}$ and $\mathbf{\mathsf{C}}$ are both determined from the fit, so they represent a *particular sample* from a statistical distribution that describes all similar experiments. In contrast, our earlier error propagation calculation involved known parameters $\sigma_Z, \sigma_A, \sigma_B$, and $\sigma_{AB}$ for a *parent distribution*. Our result for $\alpha_\hat{y}$ essentially treats the values estimated from the fit, $\hat{m}$, $\hat{b}$, $\hat{\sigma}_m = \sqrt{C_{11}}$, $\hat{\sigma}_{mb} = C_{12}$, and $\hat{\sigma}_b = \sqrt{C_{22}}$, as parameters for a hypothetical multivariate Gaussian parent distribution with mean values $\hat{m}$, $\hat{b}$ and covariance matrix $\mathbf{\mathsf{C}}$.

The quantity $\alpha_\hat{y}$ represents the uncertainty in the *model*, in contrast with the $\alpha_y$ (without the hat on $y$) that we determine from a sample of $y$ *data*. In general these are not the same—for example, we can define $\alpha_\hat{y}$ (with the hat) at an arbitrary value of $x$, even where we have no measurements. Moreover, $\alpha_y$ (without the hat) indicates the uncertainty in *a particular measurement* of $y$, whereas $\alpha_\hat{y}$ (with the hat) exploits information *from all measurements* of $y$—assuming, of course, that the model correctly describes the data in the first place!

The code cell below plots the fit with the 1*&sigma;* functional uncertainty bounds obtained from the fit covariance matrix `pCov`. We use the NumPy [`stack`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.stack.html) function to create the `2 x N` array,

$$
\nabla_{m,b}\mathbf{y} = \begin{bmatrix} x_1 & x_2 & \ldots & x_N\\ 1 & 1 & \ldots & 1\end{bmatrix},
$$

then compute the uncertainty $\alpha_\hat{f}(x_i)$ by computing the following for each column $\nabla_{m,b}y(x_i)$ of $\nabla_{m,b}\mathbf{y}$:

$$
\alpha_\hat{f}(x_i) = \sqrt{[\nabla_{m,b}y(x_i)]^\intercal\,\mathbf{\mathsf{C}}\,[\nabla_{m,b}y(x_i)]}.
$$

Note that NumPy uses the `@` character as the matrix multiplication operation.

In [None]:
# Define frequency array for displaying the model
N = 1000
fModel = np.linspace(0, 120, N)

gradf = np.stack((fModel, np.ones(np.shape(fModel))))
alphafhat = np.zeros(np.shape(fModel))
for i in np.arange(N):
    alphafhat[i] = np.sqrt(np.transpose(gradf[:,i])@pCov@gradf[:,i])
    
fMean = model(fModel, mInit, bInit)
fUpper = fMean + alphafhat
fLower = fMean - alphafhat

# Make the plot
plt.plot(fModel, fMean, 'r-')
plt.plot(fModel, fUpper, 'c--')
plt.plot(fModel, fLower, 'c--')
plt.errorbar(frequency, voltage, yerr=err, fmt='o')
plt.xlabel('Frequency (Hz)')
plt.ylabel('Voltage (mV)')
plt.title('Data with linear model, initial parameters')
plt.xlim(0, 120)
plt.ylim(0, 250)
plt.show()

### Exercise 1
The following code cell uses [`genfromtxt`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.genfromtxt.html) to import a data file from the [NIST Standard Reference Database](https://www.itl.nist.gov/div898/strd/index.html). Fit this data to the model

$$
y(\mathbf{x};\mathbf{A}, \gamma, \mathbf{x_0}, \mathbf{\Delta x}) = A_0\exp(-\gamma x) + A_1\exp[-(x - x_{01})^2/(\Delta x_1)^2] + A_2\exp[-(x - x_{02})^2/(\Delta x_2)^2]
$$

and report the best-fit values for the fit parameters and their estimated uncertainties. Note that the data file does not contain values for $\mathbf{\alpha}_y$. Use $\chi^2_\text{min}$ to estimate $\mathbf{\alpha}_y$, assuming it is uniform for all $\mathbf{y}$. Finally, determine the uncertainty $\mathbf{\alpha}_\hat{y}$ in the model function, and plot the data, the fit, and the model uncertainty bounds.

In [None]:
# Code cell for Exercise 1
# Use this cell for your response, adding cells if necessary.
y, x = np.genfromtxt('https://www.itl.nist.gov/div898/strd/nls/data/LINKS/DATA/Gauss1.dat', 
                  skip_header=60, unpack=True)

## Summary
Here is a list of what you should be able to do after completing this notebook.
* Use the covariance matrix to compute the uncertainty in the model function.

##### About this notebook
Notebook by J. S. Dodge, 2019. Available from [SFU GitLab](https://gitlab.rcg.sfu.ca/jsdodge/data-analysis-python). The notebook text is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. See more at [Creative Commons](http://creativecommons.org/licenses/by-nc-nd/4.0/). The notebook code is open source under the [MIT License](https://opensource.org/licenses/MIT).