# 1.6:  Uncertainty estimation with the bootstrap#

<!--<badge>--><a href="https://colab.research.google.com/github/msambridge/InversionPracticals/blob/main/S1.6 - Bootstrap error propagation_cannonball.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a><!--</badge>-->

This practical explores the use of the Bootstrap method for uncertainty estimation. Recall that both linear theory error estimation as well as Monte Carlo error propagation required nowledge of the size of the data errors in the form of a data covariance matrix.   The bootstrap can be used to estimate error in a solution without knowledge of  size of errors in the data. Instead it can be applied by assuming that the data errors, or data residuals more usually, are independently, identically distributed, IID. This can be a reasonable assumption if data error correlation is minimal.

<img src="../Figures/ballistics.png" alt="Cannonball figure" width="600"/>
Cannonball heights as a function of time.


A cannon ball is fired directly upwards from an unknown starting height
above the surface, $m_1$, with unknown initial velocity, $m_2$ and
unknown gravitational acceleration, $m_3$ . Newton’s laws of motion tell
us that the relationship between position and time follows

$$ y(t) = m_1 + m_2t -\frac{1}{2}m_3t^2.
\label{eq:cannon} $$

An experiment has been performed and heights, $y_i$, $(i=1,\dots,8)$ are
collected at fixed time intervals of a second. We obtain the data
$y = [26.94, 33.45, 40.72 , 42.32, 44.30 , 47.19 , 43.33 , 40.13 ]$,
$t = [1.0,2.0,\dots,8.0]$.

First load some libraries.

In [1]:
# -------------------------------------------------------- #
#                                                          #
#     Uncomment below to set up environment on "colab"     #
#                                                          #
# -------------------------------------------------------- #

# !git clone https://github.com/msambridge/InversionPracticals
# %cd InversionPracticals

In [2]:
%matplotlib inline
import matplotlib.pyplot as plt
from scipy import stats
import numpy as np
import math
import pickle
import sys
sys.path.append("software")
import plotcovellipse as pc

To find the unknowns $(m_1, m_2, m_3)$ we must fit a quadratic curve
(as above) to the observed data (see figure 2). This can be achieved
by solving the linear system ${\bf d} = G{\bf m}$, where ${\bf d}$ is
the data, ${\bf m}$ is the vector of unknowns and $G$ is the matrix
connecting the two, determined by the expression above. The solution to this is
in your course exercise, but is equivalent to evaluating the expression

$${\bf m} = (G^TG)^{-1} G^T {\bf d}
\label{eq:LS}$$

All terms on the right hand side of this equation are known and so
its a simple case of plugging in values to determine the best fit
estimates of $(m_1, m_2, m_3)$.

**Task 1:** Calculate the best fit values of the three unknowns (height, velocity
and gravitational acceleration). We call these values
$(m_1^0, m_2^0, m_3^0)$ our <span>**solution**</span>. Can you guess
where this experiment took place?

In [3]:
# Try it here! You can insert more cells by selecting Cell > Insert Cell Above/Below from the menu
# bar, or by pressing Esc to enter command mode and then hitting A or B (for above/below). 

yobs = [26.94, 33.45, 40.72, 42.32, 44.30, 47.19, 43.33, 40.13]


The problem now is to use the <span>**bootstrap**</span> to determine
how error in the data propagate into the estimated unknowns. We do not
know the size of errors in the data but we can apply the bootstrap.
Since the data are associated with increasing time it does not make
sense to directly resample the data (because we could end up with two
heights of the same value associated with different times). The data are
not IID, since they belong to a trend. However we can still proceed by
applying the bootstrap principle to the data residuals produced by the
best fit solution., i.e. we have 8 residuals, $r_i$, where

$$r_i = y_i - m^0_1 - m^0_2t_i +\frac{1}{2}m^0_3t_i^2.\quad (i=1,\dots, 8).$$

If we assume that the residuals are IID they can be re-sampled with
replacement in the usual way to form multiple sets of 8 residual values
$r^*_j, (j=1,\dots,8)$ and new bootstrap data are constructed using this
set of residuals by

$$y^*_j = r^*_j + m^0_1 + m^0_2t_j -\frac{1}{2}m^0_3t_j^2.\quad (j=1,\dots, 8).$$

Using this approach the residuals are mixed between different data, and
so each y values does not simply get its own residual back.

**Task 2:**  Write a python script to build bootstrap data sets and
for each of these calculate
the bootstrap estimates of the unknowns. Lets call these
$(m_1^i, m_2^i, m_3^i), (i=1,\dots, B)$. The number of bootstrap samples
$B$ is your choice but it should be at least 100.

It can be instructive to <span>**plot the bootstrap samples**</span> as
a scatter plot for the three pair of variables, i.e. $(m^i_1, m^i_2)$,
$(m^i_2, m^i_3)$ and $(m^i_1, m^i_3)$, $(i=1,\dots, B)$. They should
look something like the Figure below.

In [4]:
# Try it here! You can insert more cells by selecting Cell > Insert Cell Above/Below from the menu
# bar, or by pressing Esc to enter command mode and then hitting A or B (for above/below). 



**Task 3:** From the bootstrap output samples
$(m_1^i, m_2^i, m_3^i), (i=1,\dots, B)$ calculate the i) <span>**the mean**</span>, ii) <span>**the
variance**</span>, iii) <span>**the bias corrected solution**</span>,
and iv) <span>**the 95% confidence intervals**</span> for each of the
three unknowns. The bias correction is the mean of the differences between each Bootstrap solution and the estimator itself, which in this case is the best fit solution. This is subtracted from the best fit to produce the  <span>**the bias corrected solution**</span>.

The mean should look similar to the best fit values and
the bias should be small. The variance and confidence intervals
characterize the error in the estimated values of the unknowns.



In [5]:
# Try it here! You can insert more cells by selecting Cell > Insert Cell Above/Below from the menu
# bar, or by pressing Esc to enter command mode and then hitting A or B (for above/below). 



----