# Day 3-Part 2: Uncertainties

Uncertainties are everywhere in geosciences. Everytime we take a measurement, there are uncertainties associated to this measurement, and everytime we look at data (e.g. seismic data), there are uncertainties associated to how these data were acquired, processed, displayed and interpreted.

Uncertainties (call them errors if you want) propagate through any calculation where we use measurements (or observations) with errors. If these measurements are statistically independent (they are uncorrelated with the magnitude and error of all other measurements), the general formula to propagate the errors is:

### <div align="center">$\sigma_z=\sqrt{\left(\frac{\partial z}{\partial a}\right)^2\left(\sigma_a\right)^2+\left(\frac{\partial z}{\partial b}\right)^2\left(\sigma_b\right)^2+\left(\frac{\partial z}{\partial c}\right)^2\left(\sigma_c\right)^2+\cdots}$</div>

where $z$ is a multi-variable function $z=f(a,b,c,...)$ that depends on the measurements $a$, $b$, $c$, etc. $\sigma_z$ is the uncertainty of $z$, and $\sigma_a$, $\sigma_b$, $\sigma_c$, etc., are the uncertainty of the measurements.

It is relatively easy, to calculate the formula above for simple cases (e.g. sum or multiplication of two variables), but it is quite a nightmare, when the formulas become more complicated, and even worse, when we need to use several formulas to arrive to the final result.

Fortunately in Python, there is a library to deal with uncertainties and propagate errors as inidcated by the equation above. This library is called [uncertainties](https://pythonhosted.org/uncertainties/). So if you have not installed `uncertainties`, please do so by running the cell below:

In [None]:
# Run this cell if uncertainties is not installed
import sys
!{sys.executable} -m pip install --upgrade uncertainties

In this notebook, I illustrate the use of the `uncertainties` library using three examples.

## Example 1: Bed thickness

The first example is problem 4 in Chapter 2 of [Ragan, Structural Geology textbook](https://www.cambridge.org/core/books/structural-geology/4D631885C9FBBCDEF90C555445ED1160#):

The orientation of a sandstone unit is 245/35 (right hand rule convention). A horizontal traverse with a bearing of N10E made from the bottom to the top of the unit measured 125 m.

- Calculate the thickness of the unit
- If the uncertainty in dip is 2$^\circ$, the uncertainty of the traverse direction is 1$^\circ$, and the uncertainty in the measured length is 0.5%, what is the uncertainty in the calculated thickness?

The figure below shows on map view the variables (measurements) for this problem, and the equation we can use to determine the thickness of the sandstone:

<img src="../figures/ss_thickness.png" alt="varTypes" width="600"/><br><br>

Let's solve first this problem the hard way, by actually computing all the partial derivatives and solving for the error in quadrature formula:

In [None]:
import math # Import math

pi = math.pi # pi

# Problem 4, chapter 2 of Ragan
l = 125 # transect length
l_u = l * 0.005 # uncertainty in transect length
dip = 35 * pi/180.0 # dip in radians
dip_u = 2 * pi/180.0 # uncertainty in dip in radians
beta = 55 * pi/180.0 # angle of traverse with strike line in radians
beta_u = 1 * pi/180.0 # uncertainty in beta in radians

# Compute thickness of bed, Eq. 2.2 of Ragan
t = l*math.sin(beta)*math.sin(dip)
print("Thickness = {:.1f} m".format(t))

In [None]:
# Compute error in thickness

# Partial derivatives, here we need to use calculus
ptl = math.sin(beta)*math.sin(dip) # partial derivative of t with respect to l
ptb = l*math.cos(beta)*math.sin(dip) # partial derivative of t with respect to beta
ptd = l*math.sin(beta)*math.cos(dip) # partial derivative of t with respect to dip

# Quadrature formula
t_u = math.sqrt((ptl*l_u)**2 + (ptb*beta_u)**2 + (ptd*dip_u)**2)

# Output result
print("Thickness = {:.1f} +/- {:.1f} m".format(t, t_u))

So the error in thickness is about 5% the computed thickness. Now let's solve this problem using the `uncertainties` library. For that, we will need to create `ufloat`s (floats with uncertainties), and use `umath` (math with uncertainty):

In [None]:
from uncertainties import ufloat # float with uncertainties
from uncertainties import umath # math with uncertainties

# Define parameters with uncertainties
l = ufloat(125, 125*0.005)
dip = ufloat(35, 2) * pi/180
beta = ufloat(55, 1) * pi/180

# Compute thickness
t = l*umath.sin(beta)*umath.sin(dip)

# Output result
print("Thickness = {:.1f} m".format(t))

We got the same result than above but in a much more efficient way. We also demonstrated that the `uncertainties` package works. Now let's try a second example with more data.

## Example 2: Error bars

For this example, we are going to use again the data on trace elements concentrations in tephra deposits of a volcano in Italy. Let's load the data:

In [None]:
import os
import pandas as pd
import matplotlib.pyplot as plt

path = os.path.join("..", "data", "Smith_glass_post_NYT_data.xlsx")
my_dataset = pd.read_excel(path, sheet_name="Supp_traces")

Now, suppose the concentration measurements have errors that are 10% the measured value, and that we want to make a plot of La versus the ratio of Rb/Th. Most importantly we want to calculate the errors in that ratio, and display the plot with error bars. The following code will do it. Here we use the `uncertainties.unumpy` module to make arrays with uncertainties, and the `pyplot.errorbar` function to plot the data with error bars:

In [None]:
from uncertainties import unumpy # arrays with uncertainties

epochs = ["one","two","three","three-b"]
colors = ["#afbbb5", "#f10e4a", "#27449c", "#f9a20e"]

fig, ax = plt.subplots(figsize=(10,8))
for epoch, color in zip(epochs, colors):
    # Select data in epoch
    my_data = my_dataset[(my_dataset.Epoch == epoch)]
    
    # Concentrations as numpy arrays
    la = my_data.La.to_numpy() # La concentration
    rb = my_data.Rb.to_numpy() # Rb concentration 
    th = my_data.Th.to_numpy() # Th concentration
    
    # Concentrations with uncertainties = 10% measured value
    la = unumpy.uarray(la, la*0.1) # La concentration
    rb = unumpy.uarray(rb, rb*0.1) # Rb concentration
    th = unumpy.uarray(th, th*0.1) # Th concentration
    
    # Compute Rb/Th
    rb_th = rb / th # Rb/Th with uncertainties
    
    # Plot data with uncertainties
    x = unumpy.nominal_values(la) # nominal values of La
    y = unumpy.nominal_values(rb_th) # nominal values of Rb/Th
    xerr = unumpy.std_devs(la) # uncertainties in La
    yerr = unumpy.std_devs(rb_th) # uncertainties in Rb/Th
    
    ax.errorbar(x=x, y=y, xerr=xerr, yerr=yerr, linestyle="", 
                markerfacecolor= color, markersize=6, marker="o", 
                markeredgecolor="k", ecolor= color, elinewidth=0.5, capsize=0, 
                label="Epoch " + epoch)
    
ax.legend(title="CFC Recent Activity")
ax.set_ylabel("Rb/Th")
ax.set_xlabel("La [ppm]");

Isn't that cool? That saved us a lot of time. Most importantly, we could produce a meaningful plot with the uncertainties (error bars) included. Let's try a final example.

## Example 3: Calculating reserve volumes

Suppose we have an isochore map (a contour map of vertical thickness) of net oil in a trap, and we want to estimate the total volume of oil. The Excel file `net_oil.xlsx` contains the x (east), y (north), and z (thickness value) of the contours. Let's read this file:

In [None]:
# read contours of net_oil map
path = os.path.join("..", "data", "net_oil.xlsx") 
net_oil = pd.read_excel(path)

net_oil.head()

Let's assume the uncertainties in the x and y coordinates of the contours are 5 m, and the uncertainty in thickness is 1 m. Also, the contour interval, or thickness difference between adjacent contours, is 5 m. We use that information to make an array of contour values with uncertainties:

In [None]:
import numpy as np

# uncertainties in x (east) and y (north) of contour points, and contour values
unc_x = 5 # uncertainty in x is 5 m
unc_y = 5 # uncertainty in y is 5 m
unc_z = 1 # uncertainty in contour value

# contour interval is 5 m
c_int = 5

# contour values
c_vals = np.arange(np.amin(net_oil.z), np.amax(net_oil.z)+c_int, c_int) 
c_vals = unumpy.uarray(c_vals, np.ones(c_vals.size)*unc_z)
print(c_vals)

Let's plot the contours:

In [None]:
# plot contours
fig, ax = plt.subplots(figsize=(10,6))
for c_val in c_vals: # for each contour
    my_data = net_oil[(net_oil.z == c_val.nominal_value)] # get data
    plt.plot(my_data.x, my_data.y, label=str(c_val.nominal_value)) # plot contour

# set title, legend, and axes
ax.set_title("Net oil map")
ax.legend(title="Isochores in m")
ax.set_xlabel("East [m]")
ax.set_ylabel("North[m]")
ax.axis("scaled");

Let's start by calculating the areas inside each contour. As we saw in lab 2, the area of a polygon made up of line segments between N vertices ($x_i, y_i$), $i=0$ to $N-1$, is:

### <div align="center">$A=\frac{1}{2} \sum_{i=0}^{N-1}\left(x_i y_{i+1}-x_{i+1} y_i\right)$</div>

where the last vertex ($x_N, y_N$) is assumed to be the same as the first; the polygon is closed. The equation above gives a positive area if the polygon vertices are ordered counter clockwise, and vice versa. Obviously we want positive areas, so we take the absolute value of this calculation.

The function below computes the area of a polygon from the x and y coordinates of its vertices:

In [None]:
# function to compute area
def polyg_area(x,y):
    """
    computes the area of a polygon from the x and y
    coordinates of its vertices
    """
    npoints = x.shape[0] # number of points in polygon
    area = 0.0 # initialize area
    
    for i in range(npoints):
        # 1st point
        x1 = x[i]
        y1 = y[i]
        # 2nd point
        next_i = i + 1
        if i == npoints-1:
            next_i = 0
        x2 = x[next_i]
        y2 = y[next_i]
        # add to area
        area += (x1*y2 - x2*y1)
    
    return np.absolute(area/2) # return area

Now, let's compute the areas of the contours and their uncertainties:

In [None]:
# initialize areas to zero
areas = unumpy.uarray(np.zeros(c_vals.size), np.zeros(c_vals.size))

# compute areas
for i in range(c_vals.size): # for each contour
    # select contour
    my_data = net_oil[(net_oil.z == c_vals[i].nominal_value)]
    # extract x and y coordinates of contour as numpy arrays
    x = my_data.x.to_numpy() # this is a numpy array
    y = my_data.y.to_numpy() # this is a numpy array
    # add uncertainties to the x and y coordinates -> unumpy arrays
    x = unumpy.uarray(x, np.ones(x.size)*unc_x)
    y = unumpy.uarray(y, np.ones(y.size)*unc_y)
    # compute area using our function
    areas[i] = polyg_area(x,y)
    # print area
    print("Contour = {:.0f} m, area in square meters = {:0,.0f} ".format(c_vals[i].nominal_value, areas[i]))

Now we are in the position to calculate the volume of oil in the trap. We can matematically express the volume as:

### <div align="center">$V=\int_a^b A(z) d z$</div>

To visualize why this is the case, just imagine slicing the volume into many horizontal slices, these are esentially contours. We can then estimate the volume between each pair of adjacent contours (a trapezoidal prism), and finally sum up all the volumes to get the total volume.

So, we just need to do an integration. We can do this using the `scipy.integrate` module. In the code below, we are using for the integration a trapezoidal rule `trapz`, but if you want, you can try another one (e.g. composite Simpson, `simps`):

In [None]:
from scipy import integrate

vol_oil = integrate.trapz(areas, c_vals)
print("The volume of oil is = {:0,.0f} cubic meters or {:0,.0f} barrels".format(vol_oil, vol_oil*6.2898))

So, for the measured oil trap and its uncertainties, the volume of oil is about 99 MMbbl $\pm$ 10 MMbbl. Here we assume a simplistic model for uncertainties, but you could of course define uncertainties locally (e.g. at every point of the isochores), and then propagate them as we did here. The possibilities are endless 🙂