# Exercise 7: Applying your thermochronology model (20 points)

## Overview

![Bhutan Himalaya](img/Bhutan_Himalaya.png)<br/>
*Figure 1. The Himalaya in Bhutan. [Image source](http://commons.wikimedia.org/wiki/File:View_of_Gasa_Dzong.JPG).*

The overall goal of this exercise is to use our heat transfer model to interpret [a thermochronometer dataset from the Himalaya of Bhutan](data/Bhutan_age_data.txt). The interpretation entails determining long-term average rock exhumation rates from rock samples analyzed using apatite and zircon (U-Th)/He, and muscovite <sup>40</sup>Ar/<sup>39</sup>Ar thermochronology. As you will recall, [our exercise last week](https://github.com/IntroQG-2021/Exercise-6) used a 1-D time-dependent solution to the advection-diffusion equation to calculate a temperature-depth profile in the Earth, which was then used to predict thermochronometer ages based on Dodson's method. In that model we specified a rock advection velocity and observed variations in the thermochronometer ages as a function of this  velocity, also known as the exhumation rate. This week we will compare those predicted thermochronometer ages to data from Bhutan with the goal of minimizing the misfit between the measured and predicted thermochronometer ages by varying the specified rock advection velocity (exhumation rate), which will allow us to define a best-fit exhumation rate (or exhumation history) for the Himalaya of Bhutan. For this exercise, we will be using data from [Coutand et al., 2014](https://dx.doi.org/10.1002/2013JB010891) and [Stüwe and Foster, 2001](https://www.sciencedirect.com/science/article/pii/S1367912000000183) (PDFs available on the [main course page](https://introqg-site.readthedocs.io/en/latest/final-paper/articles.html)).

### Tips for completing this exercise

- Use **exactly** the same variable names as in the instructions because your answers will be automatically graded, and the tests that grade your answers rely on following the same formatting or variable naming as in the instructions.
- **Please do not**:

    - **Change the file names**. Do all of your editing in the provided `Exercise-7-problems-1-2.ipynb` file (this file).
    - **Copy/paste cells in this notebook**. We use an automated grading system that will fail if there are copies of code cells.
    - **Change the existing cell types**. You can add cells, but changing the cell types for existing cells (from code to markdown, for example) will also cause the automated grader to fail.

## AI tool usage agreement

**Enter your name in the cell below** to confirm that you have followed the [course guidelines on the use of AI tools](https://introqg-site.readthedocs.io/en/latest/general-info/ai-tools.html) and understand that misuse of AI tools is considered cheating.

YOUR ANSWER HERE

## Problem 1: A "functional" model (12 points)

In the first problem we will read in the data file containing the measured age data, calculate a goodness of fit to the data with our predicted thermochronometer ages, plot the results, and create a final function that does all of the steps in our thermochronometer age prediction numerical model described at the start of Exercise 6.

### Scores for this problem

**Your score on this problem will be based on following criteria**:

- Reading in the age data file
- Calculating a goodness of fit for the measured and predicted ages
- Creating a plot of the thermal model geotherms and the age data
- Creating a new age prediction function that will allow us to do all of the steps in Exercises 6 and 7 in a single function call
- Including comments that explain what most lines in the code do
- Uploading your notebook and `introqg_functions.py` script file to your GitHub repository for this week's exercise

### Part 0: Copying and testing your script file from Exercise 6 (0 points)

The first task in this problem is to copy your `introqg_functions.py` script file from Exercise 6 to the directory containing this notebook and then run the cell below to ensure it has been copied and is functioning as expected. Note: We will only check some of the functions in this file using the tests below, not all of them.

- Copy your `introqg_functions.py` script file from Exercise 6 to the directory containing this notebook
- Run the tests below

In [None]:
# The test below should work

import numpy as np
from nose.tools import assert_equal
from introqg_functions import calculate_temp_history
from introqg_functions import calculate_closure_temps
from introqg_functions import calculate_age
from introqg_functions import chi_squared

# Create time, depth histories
time_history = np.linspace(0.0, 30.0, 51)
depth_history = np.linspace(45.0, 0.0, 51)

# Create temperature history
temp_history = calculate_temp_history(
    time_history=time_history,
    depth_history=depth_history,
    velocity=1.5,
    initial_gradient=10.0,
    diffusivity=32.0
)

# Calculate closure temperatures
ahe_tc, zhe_tc, mar_tc = calculate_closure_temps(
    time_history=time_history,
    temp_history=temp_history,
    calc_ahe=True,
    calc_zhe=True,
    calc_mar=True
)

# Calculate ages
ahe_age = calculate_age(
    time_history=time_history,
    temp_history=temp_history,
    closure_temp=ahe_tc
)
zhe_age = calculate_age(
    time_history=time_history,
    temp_history=temp_history,
    closure_temp=zhe_tc
)
mar_age = calculate_age(
    time_history=time_history,
    temp_history=temp_history,
    closure_temp=mar_tc
)

# Print calculated closure temperature, age
print(f"Calculated zircon (U-Th)/He closure temperature: {zhe_tc:.1f} °C")
print(f"Calculated zircon (U-Th)/He age: {zhe_age:.1f} Ma")

# Check that the closure temperature, age values are correct
assert_equal(round(zhe_tc, 3), 212.043)
assert_equal(round(zhe_age, 3), 4.542)

# Calculate age misfit
misfit = chi_squared([6.0], [zhe_age], [1.0])

# Print misfit
print(f"Calculated age misfit: {misfit:.3f}")

# Check that the misfit value is correct
assert_equal(round(misfit, 3), 2.126)

# Print message if it is safe to continue
print("\nAll tests pass! You are ready to proceed with this exercise.")

### Part 1: Reading in the data (2 points)

In order to be able to compare our predicted thermochronometer ages to some data, we'll need to read in [the data file](data/Bhutan_age_data.txt). Using the past exercises as examples (look at those notebooks :) ), you should read the data file and store the contents in a pandas DataFrame. The data file has a header that lists the data contained in each column, and there are NA values indicated by `'-9999'`.

- In the cell below, read [the data file](data/Bhutan_age_data.txt) into a variable called `data` using pandas.
    - When you read in the data, also convert the `'-9999'` values to NA values in the DataFrame.
        - Measured ages are listed for each sample location in the data file, but not every sample has been analyzed for each different thermochronometer system.

**What to do for this part:**

- Read in [the data file](data/Bhutan_age_data.txt) using pandas to the variable `data`
    - Be sure to convert the `-9999` values to NA values

In [None]:
# Import pandas
import pandas as pd

# Read in the data file
data = None

# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# The tests below should work

# Import libraries we need
import pandas as pd
from nose.tools import assert_equal

# Print DataFrame info
print(f"The data DataFrame has the dimensions {data.shape}. Expected value: (39, 9).")

# Print the average AHe age
print(f"The average apatite (U-Th)/He age is {data['AHe age'].mean():.1f} Ma. Expected value: 4.9 Ma.")

# Check that the temperature value is correct
assert_equal(data.shape, (39, 9))
assert_equal(round(data['AHe age'].mean(), 3), 4.888)


In [None]:
# The tests below should work

# Print the number of NA values in the ZHe age column
print(f"The number of NA values in the ZHe age column is {data['ZHe age'].isna().sum()}. Expected value: 21.")

# Print the number of NA values in the ZHe standard deviation column
print(f"The number of NA values in the ZHe standard deviation column is {data['ZHe standard deviation'].isna().sum()}. Expected value: 21.")

# Check number of NA values are correct
assert_equal(data['ZHe age'].isna().sum(), 21)
assert_equal(data['ZHe standard deviation'].isna().sum(), 21)


### Part 2: Calculating a goodness of fit (2 points)

Your next task is to use the `chi_squared()` function you created back in Exercise 2 to calculate some goodness-of-fit values for each thermochronometer system. As a reminder, the equation is

\begin{equation}
  \Large
  \chi^{2} = \frac{1}{N} \sum \frac{(O_{i} - E_{i})^{2}}{\sigma_{i}^2}
\end{equation}

where $N$ is the number of ages, $O_{i}$ is the $i$th measured age, $E_{i}$ is the $i$th predicted age, and $\sigma_{i}$ is the $i$th standard deviation.

- Use your `chi_squared()` function in the cell below to calculate the misfit for:
    - The apatite (U-Th)/He ages and standard errors in the data file and the `ahe_age` predicted age
    - The zircon (U-Th)/He ages and standard deviations in the data file and the `zhe_age` predicted age
    - The muscovite Ar/Ar ages and errors in the data file and the `mar_age` predicted age
    - The sum of all three thermochronometers combined (the sum of the three values above)
    - **NOTE**: You only have a single predicted age, and the `chi_squared()` function expects the same number of measured and predicted ages. You will need to solve this issue somehow. If you are unsure of what you might do, check out [the hints for this week's exercise](https://introqg.github.io/site/lessons/L7/exercise-7.html).
    - **NOTE 2**: Remember to exclude the NA values when calculating the misfits!
- Store the calculated misfit values as:
    - `ahe_misfit` for the apatite (U-Th)/He ages
    - `zhe_misfit` for the zircon (U-Th)/He ages
    - `mar_misfit` for the muscovite Ar/Ar ages
    - `total_misfit` for the sum of the misfits above

**What to do for this part:**

- Calculate goodness-of-fit values for each of the three thermochronometer systems and their total sum
    - Store the calculated values as `ahe_misfit`, `zhe_misfit`, `mar_misfit`, and `total_misfit`

In [None]:
# Calculate the goodness of fit for each thermochronometer (and the total) below

ahe_misfit = None
zhe_misfit = None
mar_misfit = None
total_misfit = None

# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# This test should work

# Import libraries we need
from nose.tools import assert_equal

# Print AHe goodness of fit
print(f"Apatite (U-Th)/He goodness of fit: {ahe_misfit:.2f}. Expected value: 158.35")

# Test AHe goodness-of-fit value is correct
assert_equal(round(ahe_misfit, 3), 158.346)


In [None]:
# This test should work

# Print total goodness of fit
print(f"Total goodness of fit: {total_misfit:.2f}. Expected value: 1499.58")

# Test AHe goodness-of-fit value is correct
assert_equal(round(total_misfit, 3), 1499.582)

### Part 3: Creating a pair of useful plots (6 points)

At this point, we basically have everything we need to start working on Problem 2, where you will try to find the exhumation rates that best fit each thermochronometer dataset. However, you may still be somewhat unclear on the temperatures that have been recorded in the cooling history and how well our measured ages are fit by the predictions. The goal in this part is to produce two plots that will help you more easily understand and visualize the data we are working with.

**Configuring the plot layout**

To start, we can create a figure and plot axes in the cell below using Matplotlib.

- Create a figure and plot using Matplotlib's `plt.subplots()` function with two subplots in one column and a figure size of 10 inches wide by 8 inches tall
    - Use the variable `fig` for the figure and `ax1` and `ax2` for the plot axes
        - **Hint**: You can assign the two plot axes the suggested names by listing them together as a tuple. Instead of the typical `ax`, you can list `(ax1, ax2)`.
    - You can also add a figure title at this point using `fig.suptitle()`. If you want more information about how to use this function, check out the [Matplotlib documentation](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.suptitle.html).

**Subplot one (upper)**

In the first subplot you will plot the initial geotherm in the thermal model, the final geotherm, and the thermal history. For this plot you should:

- Create a depth array (variable `depths`) with 51 points from 0 to 45 km depth 
- Calculate the initial geotherm (variable `initial_geotherm`) with 51 points using an initial thermal gradient of 10 °C/km from 0 to 45 km depth (the same as in the `introqg_functions()` test cell at the start of this exercise
    - Note that all you need to do here is just use the `np.linspace()` function to make this as it is a straight line
- Calculate the final geotherm (variable `final_geotherm`) with 51 points using the `transient_temp()` function from your `introqg_functions.py` script file. You can use the following values for this calculation:
    - `initial_gradient` = 10 °C/km
    - `diffusivity` = 32 km$^{2}$/Myr
    - `velocity` = 1.5 km/Myr
    - `depths` = `depths` array
    - `time` = 30 Myr
- Plot the initial and final geotherms on this figure
    - You can use Matplotlib's `ax1.plot()` function for this
    - Be sure to add some labels for each so they can be identified in the legend
    - Also remember to plot the depths as negative values or invert the y-axis in order to have depths increase downwards
- Also plot the temperature history using the variables `temp_history` and `depth_history` as black filled circles
    - You can again use Matplotlib's `ax1.plot()` function for this
    - Again, be sure to include an label for this plot item so it appears in the legend and plot the depths as negative values or invert the y-axis in order to have depths increase downwards
- You should add the advection velocity as text on the plot
- Be sure to display the legend, and add axis labels

**Subplot two (lower)**

The second subplot can be created in the lower set of axes. There you should plot the measured age data, the predicted ages, and the goodness-of-fit values as follows:

- Plot the measured apatite (U-Th)/He, zircon (U-Th)/He, and muscovite Ar/Ar ages as a scatter plot using pandas
    - The x-axis should be latitude and the y-axis should have the measured ages
    - You should set the parameter for the plot axes to use `ax2`. Check the [pandas documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.plot.html) for details.
    - You should also plot the uncertainties in the measured ages using the `yerr` parameter. Again, check the [pandas documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.plot.html) for details.
    - Finally, use the `color` parameter to assign unique colors for the different age systems, and add a label for the plotted points so they can be displayed in the legend
- Plot the predicted thermochronometer ages as horizontal lines spanning the latitude range
    - You can do this using the `ax2.plot()` function
    - Again, set the line color to be different for each thermochronometer system and use the same colors as you did for the measured ages
    - If you are unsure of how to proceed, have a look at [the hints for this week's exercise](https://introqg.github.io/site/lessons/L7/exercise-7.html).
- Also add the misfits for the different thermochronometer systems as well as the total misfit as text on the plot
    - If you want to display only the misfits for thermochronometers that have calculated ages you can use an `if` test that checks that the predicted thermochronometer closure temperature is not `None`
- Finally, add axis labels

**What to do for this part:**

- Create the pair of subplots as instructed above
- Include a figure caption in the cell beneath the plot describing it as if it were in a scientific journal article

In [None]:
import matplotlib.pyplot as plt
from introqg_functions import transient_temp

# Create your subplots below

# YOUR CODE HERE
raise NotImplementedError()

YOUR ANSWER HERE

### Part 4: A numerical model in one function (2 points)

The final part of this problem is to take everything we have done in Exercise 6 and this exercise and combine it into a single function that we can call to calculate geotherms, create a thermal history, calculate thermochronometer ages, read in a data file, and calculate the goodness of fit between the predicted ages and the measured ages from the data file. We can mostly copy and paste things from Exercise 6 and this exercise, but we will need to make a few minor adjustments.

**The function definition**

The function you will create in your `introqg_functions.py` script file should be called `age_calculator()`. The function should take the following parameters:

- `total_time` (total simulation time; units: Myr)
- `num_pts` (number of points where the temperatures and thermal history are calculated; units: none)
- `velocity` (advection velocity; units: km / Myr)
- `initial_gradient` (temperature gradient at start of calculation; units: deg. C / km)
- `diffusivity` (thermal diffisivity; units: km^2 / Myr)

**The function contents**

The function should do the following things, which you can copy and paste from cells above and those in Exercise 6.

1. Read in the age data file used earlier in this exercise to the variable `data` using pandas
2. Create a time and depth history as was done in Part 1 of Exercise 6 (using the parameter values)
3. Calculate a thermal history, thermochronometer closure temperatures, and predicted thermochronometer ages
    - Be sure to use the parameter values in the function definition here
    - Set the closure temperature calculation flags to all be `True`
4. Calculate the goodness of fit for each thermochronometer system and the total goodness of fit
    - Note that you may want to set the total misfit to `0.0` at first and then add the misfit values for each thermochronometer system to that value only if the thermochronometer closure temperature is not `None`
    - At this point it may also be helpful to print out the predicted thermochronometer ages and misfit values as text using the `print()` function, just for your reference
    - **Hint**: You should pass the values from the `data` DataFrame used in the goodness of fit calculation as arrays (not pandas Series) by adding `.values` to the passed values
5. Plot the results as done in the previous part of this problem
    - You can create an array for the geotherm depths using the maximum depth used for the thermal history
    - You can also use the `velocity` parameter value to display the advection velocity on the plot and the `total_time` parameter to label the line for the final geotherm
6. You can return `None` from this function

**What to do for this part:**

- Create the `age_calculator()` function in your `introqg_functions.py` script file as described above
- Create an example plot using your function in the cell below by calling the function with the following parameters:
    - `total_time`: 50.0
    - `num_pts`: 401
    - `velocity`: 0.5
    - `initial_gradient`: 10.0
    - `diffusivity`: 32.0

In [None]:
# Call your age_calculator() function below
from introqg_functions import age_calculator

# YOUR CODE HERE
raise NotImplementedError()

## Problem 2 - "Fitting" thermochronometer data (8 points)

Using the function you have created in Problem 1, your goal is now to find the average long-term exhumation rates that provide good fits to the measured thermochronometer data.

### Scores for this problem

**Your score on this problem will be based on following criteria**:

- Calculating the advection velocities that produce the best goodness of fit for the whole age dataset and each individual thermochronometer system
- Adding captions that describe the plots produced in each part
- Including comments that explain what most lines in the code do
- Uploading your notebook to your GitHub repository for this week's exercise

### Part 1: Fitting all the ages (2 points)

We'll start this problem by minimizing the misfit value for all of the thermochronometer data in this part.

**What to do for this part:**

- Call the `age_calculator()` function in the cell below in a series of models where you change only the advection velocity in order to find a minimum goodness-of-fit value for the whole thermochronometer age dataset (i.e., the total misfit)
    - Define the advection velocity you will use as `velocity_all`, then call the function using that value
    - You do not need to find the absolute minimum goodness-of-fit value, but rather the minimum value you get for advection velocities to the nearest 0.1 km/Myr
- For the other parameters you can use:
    - `total_time`: 50.0
    - `num_pts`: 401
    - `initial_gradient`: 10.0
    - `diffusivity`: 32.0
- Add a figure caption for the plot in the cell beneath it describing it as if it were in a scientific journal article
    - Be sure the advection velocity (exhumation rate) and misfit value are clearly displayed on the plot

In [None]:
from introqg_functions import age_calculator

# Define the advection velocity to use
velocity_all = None

# Call your age_calculator() function below


# YOUR CODE HERE
raise NotImplementedError()

YOUR ANSWER HERE

In [None]:
# This print statement should work
print(f"The best goodness of fit I found for all the data was using an advection velocity of {velocity_all:.1f} km/Ma.")


### Part 2: Fitting the apatite (U-Th)/He ages (2 points)

Now we can focus on fitting a single thermochronometer system.

**What to do for this part:**

- Call the `age_calculator()` function in the cell below in a series of models where you change only the advection velocity in order to find a minimum goodness-of-fit value for the apatite (U-Th)/He age data
    - Define the advection velocity you will use as `velocity_ahe`, then call the function using that value
    - You do not need to find the absolute minimum goodness-of-fit value, but rather the minimum value you get for advection velocities to the nearest 0.1 km/Myr
- For the other parameters you can use:
    - `total_time`: 50.0
    - `num_pts`: 401
    - `initial_gradient`: 10.0
    - `diffusivity`: 32.0
- Add a figure caption for the plot in the cell beneath it describing it as if it were in a scientific journal article
    - Be sure the advection velocity (exhumation rate) and apatite (U-Th)/He misfit value are clearly displayed on the plot

In [None]:
# Define the advection velocity to use
velocity_ahe = None

# Call your age_calculator() function below


# YOUR CODE HERE
raise NotImplementedError()

YOUR ANSWER HERE

In [None]:
# This print statement should work
print(f"The best goodness of fit I found for the AHe data was using an advection velocity of {velocity_ahe:.1f} km/Ma.")


### Part 3: Fitting the zircon (U-Th)/He ages (2 points)

Now we can focus on fitting a single thermochronometer system.

**What to do for this part:**

- Call the `age_calculator()` function in the cell below in a series of models where you change only the advection velocity in order to find a minimum goodness-of-fit value for the zircon (U-Th)/He age data
    - Define the advection velocity you will use as `velocity_zhe`, then call the function using that value
    - You do not need to find the absolute minimum goodness-of-fit value, but rather the minimum value you get for advection velocities to the nearest 0.1 km/Myr
- For the other parameters you can use:
    - `total_time`: 50.0
    - `num_pts`: 401
    - `initial_gradient`: 10.0
    - `diffusivity`: 32.0
- Add a figure caption for the plot in the cell beneath it describing it as if it were in a scientific journal article
    - Be sure the advection velocity (exhumation rate) and zircon (U-Th)/He misfit value are clearly displayed on the plot

In [None]:
# Define the advection velocity to use
velocity_zhe = None

# Call your age_calculator() function below


# YOUR CODE HERE
raise NotImplementedError()

YOUR ANSWER HERE

In [None]:
# This print statement should work
print(f"The best goodness of fit I found for the ZHe data was using an advection velocity of {velocity_zhe:.1f} km/Ma.")


### Part 4: Fitting the muscovite Ar/Ar ages (2 points)

Now we can focus on fitting a single thermochronometer system.

**What to do for this part:**

- Call the `age_calculator()` function in the cell below in a series of models where you change only the advection velocity in order to find a minimum goodness-of-fit value for the muscovite Ar/Ar age data
    - Define the advection velocity you will use as `velocity_mar`, then call the function using that value
    - You do not need to find the absolute minimum goodness-of-fit value, but rather the minimum value you get for advection velocities to the nearest 0.1 km/Myr
- For the other parameters you can use:
    - `total_time`: 50.0
    - `num_pts`: 401
    - `initial_gradient`: 10.0
    - `diffusivity`: 32.0
- Add a figure caption for the plot in the cell beneath it describing it as if it were in a scientific journal article
    - Be sure the advection velocity (exhumation rate) and muscovite Ar/Ar misfit value are clearly displayed on the plot

In [None]:
# Define the advection velocity to use
velocity_mar = None

# Call your age_calculator() function below


# YOUR CODE HERE
raise NotImplementedError()

YOUR ANSWER HERE

In [None]:
# This print statement should work
print(f"The best goodness of fit I found for the MAr data was using an advection velocity of {velocity_mar:.1f} km/Ma.")


## Optional reflection questions (0 points)

We invite you to consider the following questions. Please answer them briefly in the Markdown cell below.

1. What general trend do you observe for advection velocities that provide the best goodness of fit for the different thermochronometer systems?
2. Do all thermochronmeter systems seem to have the same influence on the total misfit value, or are some more influential than others?
3. How helpful are the plots you have created in terms of understanding rock thermal histories and predicted thermochronometer ages?

YOUR ANSWER HERE

## References

Coutand, I., Whipp, D. M., Grujic, D., Bernet, M., Fellin, M. G., Bookhagen, B., et al. (2014). [Geometry and kinematics of the Main Himalayan Thrust and Neogene crustal exhumation in the Bhutanese Himalaya derived from inversion of multithermochronologic data](https://dx.doi.org/10.1002/2013JB010891). *Journal of Geophysical Research: Solid Earth*.

Stüwe, K., & Foster, D. (2001). [<sup>40</sup>Ar/<sup>39</sup>Ar, pressure, temperature and fission track constraints on the age and nature of metamorphism around the main central thrust in the eastern Bhutan Himalaya](https://www.sciencedirect.com/science/article/pii/S1367912000000183). *Journal of Asian Earth Sciences*, 19(1), 85–95.