# Error propagation

Python activities to complement [*Measurements and their Uncertainties*](http://www.oupcanada.com/catalog/9780199566334.html) (*MU*), Chapter 4, "Error propagation."

* [Preliminaries](#Preliminaries)
* [Functions of an uncertain variable](#Functions-of-an-uncertain-variable)
    * [A function of a distribution is another distribution](#A-function-of-a-distribution-is-another-distribution)
        * [Programming notes 1](#Programming-notes-1)
    * [Exercise 1](#Exercise-1)
* [Functions of many uncertain variables](#Functions-of-many-uncertain-variables)
    * [Example: pressure of a van der Waals gas](#Example&#58;-pressure-of-a-van-der-Waals-gas)
    * [Exercise 2](#Exercise-2)
* [Experimental design](#Experimental-design)
    * [Example: density measurement](#Example&#58;-density-measurement)
        * [Programming notes 2](#Programming-notes-2)
* [Combining measurements](#Combining-measurements)
    * [The weighted mean](#The-weighted-mean)
        * [Programming notes 3](#Programming-notes-3)
    * [Exercise 3](#Exercise-3)
* [Summary](#Summary)

## Preliminaries
Before proceeding with this notebook you should review the topics from the [previous notebook](3.0-Basic-probability.ipynb) and read *MU* Ch. 4, "Error propagation," with the following [goals](https://wiki.its.sfu.ca/departments/phys-students/index.php/Reading_goals_for_Hughes_and_Hase#Error_propagation) in mind.

1. Be able to explain why Eq. (4.7) gives the approximate uncertainty in the single-variable function *Z*(*A*) when there is uncertainty in the argument *A*, and be able to discuss the limitations of this approximation.
2. Be able to derive all of the results in Table 4.1 and use Eq. (4.7) in concrete examples.
3. Be able to explain why the component uncertainties $\alpha_Z^A, \alpha_Z^B, \alpha_Z^C\ldots$, add in quadrature to give $\alpha_Z$ in Eq. (4.10), and recognize (for now) that this expression is restricted to independent, uncorrelated variables (Ch. 7 discusses the reason for this restriction in more depth).
4. Be able to explain why Eq. (4.16) gives the approximate uncertainty in the multivariable function $Z\left(A, B, C,\ldots\right)$ when there is uncertainty in the arguments *A*, *B*, *C*,&hellip;, and be able to discuss the limitations of this approximation—here, as in (3) above, it is enough for you to recognize that the variables must be independent and uncorrelated.
5. Be able to derive all of the results in Table 4.2 and use Eq. (4.16) in concrete examples.
6. Be able to use error propagation methods to identify the dominant uncertainty in an experiment.
7. Be able to find the weighted mean and its standard error $\alpha_\text{CE}$ for a set of numbers $\left\{x_i\right\}$ with uncertainties $\left\{\alpha_i\right\}$.

The following code cell includes previously used initialization commands that we will need here.

In [None]:
import numpy as np
from numpy import random
from scipy.stats import norm
import matplotlib.pyplot as plt

%matplotlib inline

## Functions of an uncertain variable

### A function of a distribution is another distribution
We saw in the [2.0 Basic statistics](2.0-Basic-statistics.ipynb#How-to-interpret-the-mean-±-the-standard-error) notebook that the expression $\bar{x}\pm\hat{\alpha}$ represents an estimate of the statistical distribution that we expect for repeated identical measurements, typically $P_\text{DF}(\bar{x}) = \mathcal{N}(\bar{x};\mu,\alpha^2)$. The function $\bar{y} = f(\bar{x})$ now represents a *different* distribution, $P'_{DF}(\bar{y})$, that we can relate to $P_\text{DF}(\bar{x})$. Usually we can use the linear approximation $f(\bar{x}+\Delta)\approx f(\bar{x}) + f'(\bar{x})\Delta$ discussed in *MU* Sec. 4.1, but it is important to be aware that this approximation is not always valid. The code cell below shows the distribution for $Z = 10^A$ with $A = 2.3\pm 0.1$, as discussed in *MU* Sec. 4.1.4. The first plot shows that the funtion $Z(A)$ has clear positive curvature; this causes the distribution for $Z(A)$ to become skewed toward higher values. Nonetheless, the procedure described in *MU* Sec. 4.1 yields an accurate estimate of its mean and standard deviation.

#### Programming notes 1
We use the NumPy [`around`](https://numpy.org/devdocs/reference/generated/numpy.around.html) function (short for "array round") with the option `decimals=-1` to round the mean and standard error of *Z* to the nearest 10, since the format string syntax  only allows us to specify the number of digits to print to the *right* of the decimal place. The statement that prints *Z* is long, so we use the [`\`](https://docs.python.org/3/reference/lexical_analysis.html#index-6) character to indicate that the `print` statement continues on the next line.

We also use the pyplot function [`show`](https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.show.html) to display each figure in sequence; otherwise each new plot command will be applied to the same figure.

In [None]:
# Seed RNG and set simulation parameters
random.seed(0)
N = 1000
A_mean = 2.3
A_alpha = 0.1

# Simulate distributions
A = A_mean + A_alpha*random.randn(N)
Z = 10**A

# Show the mean and standard deviation for A and Z
Z_mean = np.mean(Z)
Z_alpha1 = np.std(Z,ddof=1)

print(f"A = {np.mean(A):.1f} ± {np.std(A,ddof=1):.1f}")
print(f"Z = {np.around(Z_mean, decimals=-1):.0f} ± \
    {np.around(Z_alpha1, decimals=-1):.0f}")

# Plot function
A_fun = np.linspace(A_mean - 2*A_alpha, A_mean + 2*A_alpha)
plt.plot(A_fun,10**A_fun)
plt.xlabel("A")
plt.ylabel("Z(A)")
plt.show()

# Plot distribution for A
plt.hist(A, ec='k')
plt.xlabel("A")
plt.ylabel("Occurrences")
plt.show();

# Plot distribution for Z
plt.hist(Z, ec='k')
plt.xlabel("Z")
plt.ylabel("Occurrences")
plt.show();

### Exercise 1
To see how the linear approximation can break down completely, copy the contents of the code cell above into the code cell below, then change `A_alpha = 0.1` to `A_alpha = 0.4`, as discussed at the end of *MU* Sec. 4.1.4.

In [None]:
# Code cell for Exercise 1
# Use this cell for your response, adding cells if necessary.

## Functions of many uncertain variables

### Example: pressure of a van der Waals gas
Below we simulate the distribution discussed in *MU* Sec. 4.2.2, for the pressure *P* of a van der Waals gas deduced from measurements of the molar volume *V*<sub>m</sub> and absolute temperature *T*.

In [None]:
# Seed RNG and set number of simulations
random.seed(0)
N = 1000

# Molar volume (m^3/mol)
Vm_mean = 2e-4
Vm_alpha = 0.003e-4
Vm = Vm_mean + Vm_alpha*random.randn(N)

# Absolute temperature (K)
T_mean = 298.0
T_alpha = 0.2
T = T_mean + T_alpha*random.randn(N)

# Assign constants
a = 1.408e-1    # m^6 mol^(-2) Pa
b = 3.913e-5    # m^3 mol^(-1)
R = 8.3145      # J K^(-1) mol(-1)

# Compute P and show distribution (in MPa)
P  = R*T/(Vm - b) - a/Vm**2

plt.hist(P/1e6, ec='k')
plt.xlabel("Pressure (MPa)")
plt.ylabel("Occurrences");

# Print distribution parameters
print(f"P = ({np.mean(P)/1e6:.2f} ± {np.std(P, ddof=1)/1e6:.2f}) MPa")

### Exercise 2
[Adapted from *MU* Prob. (4.4).] Use `randn` to simulate 1000 measurements of the incident angle $\theta_\text{i} = (45.0\pm 0.1)^\circ$ and transmitted angle $\theta_\text{t} = (34.5\pm 0.2)^\circ$ for a plane electromagnetic wave incident on a dielectric surface and polarized in the plane of incidence. Compute

$$ R = \frac{\tan^2(\theta_i - \theta_t)}{\tan^2(\theta_i + \theta_t)}$$

for each simulated pair $(\theta_\text{i}, \theta_\text{t})_k$, and compare the mean and standard deviation of the simulated *R* with the calculus-based error propagation estimate. Plot a histogram of the simulated distribution for *R*.

*Note:* Trigonometric functions in NumPy are defined in radians. The NumPy functions [`degrees`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.degrees.html), [`radians`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.radians.html), [`deg2rad`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.deg2rad.html), and [`rad2deg`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.rad2deg.html) convert between radians and degrees. See the NumPy documentation on [mathematical functions](#https://docs.scipy.org/doc/numpy/reference/routines.math.html).

In [None]:
# Code cell for Exercise 2
# Use this cell for your response, adding cells if necessary.

## Experimental design

### Example: density measurement
Below we simulate the distribution discussed in *MU* Sec. 4.4, for the relative error in the density of a sphere given measurements of its mass and radius.

#### Programming notes 2
To compare the three distributions we show the histograms as probability densities with common bin edges, and we set the [Patch](https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.patches.Patch.html) property [`alpha`](https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.patches.Patch.html#matplotlib.patches.Patch.set_alpha) to 0.6 to make each histogram semitransparent. We also assign each plot a label with the `label` keyword option, which is used by the [`legend`](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.legend.html) function to construct the legend. Previously we set `ec='k'` to show the histogram bars with a black outline, but in this case we return to the default behavior so that the outlines do not interfere with the visibility of the superposed histograms.

The total relative error in this simulation (with `random.seed(0)`) is 3.1%, not 3.2% as we would expect from our knowledge of the parent distributions. This is a statistical fluctuation, which you can see by changing the RNG seed.

In [None]:
# Seed RNG and set simulation parameters
random.seed(0)
N = 1000

m_alpha_rel = 0.01
r_alpha_rel = 0.01

# Simulate mass, radius, density (relative to mean)
m_rel = 1 + m_alpha_rel*random.randn(N)
r_rel = 1 + r_alpha_rel*random.randn(N)

rho_rel = m_rel/r_rel**3

# Show distributions
edges = np.arange(-0.1, 0.11, 0.01)
plt.hist(m_rel - 1, bins=edges, density=True,
         alpha=0.6, label="Mass only")
plt.hist(r_rel**-3 - 1, bins=edges, density=True,
         alpha=0.6, label="Radius only")
plt.hist(rho_rel - 1, bins=edges, density=True,
         alpha=0.6, label="Total")
plt.xlabel("Relative deviation from mean")
plt.ylabel("Density")
plt.legend();

# Print total relative error
print(f"Relative error = {np.std(rho_rel, ddof=1):.3f}")

## Combining measurements

### The weighted mean
In the following example we simulate a set of fifteen measurements based on those reported in *MU* Table 2.1, and separate them into one group of five (set *A*) and another group of ten (set *B*). We then find their respective means and standard errors and combine them in a weighted average.

Of course we could also just take the mean and standard error of *all* of the measurements in sets *A* and *B*, since we have the raw data—the weighted mean is really only necessary if we must rely on the reported means and standard errors. We compare the results of the two methods below. Typically they will be slightly different, but they should be close if the original measurements are limited only by statistical uncertainty.

#### Programming notes 3
After generating sets *A* and *B* and computing their respective means and standard errors, we use the NumPy [`array`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.array.html) function to organize them into two [NumPy arrays](https://docs.scipy.org/doc/numpy/reference/arrays.ndarray.html). We have not needed to use the `array` function yet, but you have already used NumPy arrays, since this is what the functions `zeros`, `rand`, `randn`, and `arange` return. NumPy arrays support a variety of mathematical array operations that are not defined for conventional Python arrays. For example, neither division nor exponentiation are defined on Python [lists](https://docs.python.org/3/tutorial/introduction.html#lists), so the expression `w = 1/period_AB_alpha**2` will generate an error with the definition `period_AB_alpha = [period_A_alpha, period_B_alpha]`.

After computing the weighted average directly, we compute it with the NumPy [`average`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.average.html) function and confirm that we get the same result.

Finally, we combine sets *A* and *B* with the NumPy [`concatenate`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.concatenate.html) function, then compute its mean and standard error.

In [None]:
# Seed RNG and set simulation parameters
random.seed(0)
period_mean = 9.93
period_std = 0.4

N_A = 5
N_B = 10

# Simulate data
period_A = period_mean + period_std*random.randn(N_A)
period_B = period_mean + period_std*random.randn(N_B)

period_A_mean = np.mean(period_A)
period_A_alpha = np.std(period_A, ddof=1)/np.sqrt(N_A)

period_B_mean = np.mean(period_B)
period_B_alpha = np.std(period_B, ddof=1)/np.sqrt(N_B)

print(f"Period, set A: ({period_A_mean:.2f} ± {period_A_alpha:.2f}) s")
print(f"Period, set B: ({period_B_mean:.2f} ± {period_B_alpha:.2f}) s")

# Compute weighted mean directly
period_AB_mean = np.array([period_A_mean, period_B_mean])
period_AB_alpha = np.array([period_A_alpha, period_B_alpha])

w = 1/period_AB_alpha**2
period_mean_weighted = np.sum(w*period_AB_mean)/np.sum(w)
alpha_CE = np.sqrt(1/np.sum(w))

print(f"Period, weighted: ({period_mean_weighted:.2f} ± {alpha_CE:.2f}) s")

# Compute weighted mean with average function
period_mean_weighted_alt = np.average(period_AB_mean, weights=w)
print(f"Period, weighted (alt): ({period_mean_weighted_alt:.2f} ± {alpha_CE:.2f}) s")

# Compute mean and standard error for combined data
period_all = np.concatenate((period_A, period_B))
period_all_mean = np.mean(period_all)
period_all_alpha = np.std(period_all, ddof=1)/np.sqrt(len(period_all))
print(f"Period, sets A and B: ({period_all_mean:.2f} ± {period_all_alpha:.2f}) s")

### Exercise 3
[Adapted from *MU* Prob. (4.10).]  Combine the following measurements of the speed of light and report the result.

| Measurement (10<sup>8</sup> m/s) | Standard error (10<sup>8</sup> m/s) |
|:--------------------------------:|:-----------------------------------:|
| 3.03 | 0.04 |
| 2.99 | 0.03 |
| 2.99 | 0.02 |
| 3.00 | 0.05 |
| 3.05 | 0.04 |
| 2.97 | 0.02 |

If another student then reports $c = (3.0 \pm 0.3)\times 10^8~\text{m}/\text{s}$, is there any change in the newly combined measurement? What would you do if a further student reported $c = (4.01 \pm 0.01)\times 10^8~\text{m}/\text{s}$?

In [None]:
# Code cell for Exercise 3
# Use this cell for your response, adding cells if necessary.

## Summary

Here is a list of what you should be able to do after completing this notebook.
* Use the [`around`](https://numpy.org/devdocs/reference/generated/numpy.around.html) NumPy function to round numbers to different levels of precision.
* Use the [`\`](https://docs.python.org/3/reference/lexical_analysis.html#index-6) character to explicitly break a statement across multiple lines. (Note that it is not always necessary to include the `\` character to do this; expressions with parentheses, square brackets, or curly braces may be continued [implicitly](https://docs.python.org/3/reference/lexical_analysis.html#implicit-line-joining) without it.)
* Use the [`show`](https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.show.html) pyplot function to display a figure and begin a new one.
* Recognize that error propagation is a simplified method for estimating the functional image of a distribution.
* Overlay multiple histograms by setting the [`alpha`](https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.patches.Patch.html#matplotlib.patches.Patch.set_alpha) transparency property as a keyword option in pyplot.
* Add a label to a plot element with the `label` keyword option, and use the [`legend`](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.legend.html) function in pyplot to add a legend to a plot.
* Recognize the difference between a [NumPy array](https://docs.scipy.org/doc/numpy/reference/arrays.ndarray.html) and a Python [list](https://docs.python.org/3/tutorial/introduction.html#lists).
* Use the NumPy [`array`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.array.html) function to create a NumPy array.
* Use the NumPy [`average`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.average.html) function to compute a weighted average.
* Use the NumPy [`concatenate`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.concatenate.html) function to concatenate two or more NumPy arrays.

##### About this notebook
Notebook by J. S. Dodge, 2019. Available from [SFU GitLab](https://gitlab.rcg.sfu.ca/jsdodge/data-analysis-python). The notebook text is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. See more at [Creative Commons](http://creativecommons.org/licenses/by-nc-nd/4.0/). The notebook code is open source under the [MIT License](https://opensource.org/licenses/MIT).