# Exercise 5
## Model fitting
*** 

### Questions
How can I fit a model to my dataset?

### Objectives
<div class=obj>
<ol>
    <li>Investigate simple line fitting.</li>
    <li>Plot data with errors.</li>
    <li>Learn to use minimisation algorithms to fit more general (non-linear) models to data.</li>
</ol>
    
<ul>
Revise:
    <li>Data readin and plotting;</li>
    <li>Defining functions.</li>
    <li>Map plotting;</li>
</ul>
</div>

### Independent coding
Use a minimisation algorithm to find the optimal pole of rotation between South America and Africa.

## 5.1 Fitting a line to data
***

### 5.1.1 Scientific background

Fitting a model to some data is a very common task in the sciences. We have a model (in practice a function) we believe fits some data, subject to us specifying the right set of parameters.  The task is to identify the model parameters that enable the model to best fit the data - often these parameters will be what we are trying to identify  in the first place, so the 'fitting' procedure is actually at the heart of the science.  

An example you will probably be familiar with is isochrons.  As a reminder, an isochron equation is written below for the Rb-Sr system, where $^\mathrm{87}\mathrm{Rb}$ decays to $^\mathrm{87}\mathrm{Sr}$ with a half life of 49.23 billion years,

\begin{align}
\frac{^\mathrm{87}\mathrm{Sr}}{^\mathrm{86}\mathrm{Sr}} &= \frac{^\mathrm{87}\mathrm{Sr}}{^\mathrm{86}\mathrm{Sr}}\bigg\rvert_0 + 
\frac{^\mathrm{87}\mathrm{Rb}}{^\mathrm{86}\mathrm{Sr}}\big(\exp{(\lambda{}t)-1}\big),\\\\
\lambda &= -\frac{\ln{1/2}}{\tau_{1/2}}.
\end{align}

Where in these equations $\frac{^\mathrm{87}\mathrm{Sr}}{^\mathrm{86}\mathrm{Sr}}$ is the present Sr isotopic composition of the material, $\frac{^\mathrm{87}\mathrm{Sr}}{^\mathrm{86}\mathrm{Sr}}\big\rvert_0$ was the initial Sr isotope composition in the material when the chronometer was last reset, $\frac{^\mathrm{87}\mathrm{Rb}}{^\mathrm{86}\mathrm{Sr}}$ is the present parent to daughter isotope ratio, and $\lambda$ is the decay constant, which is related to the half life $\tau_{1/2}$.  This equation is simply telling us how the Sr isotope composition of a material evolves with time.

The number we are most interested extracting from the isochron equation is $t$, the time since the isotope system was last reset.  For example, in a volcanic ash layer, which are commonly used to provide absolute ages for sediment cores, this time will be the age at which the minerals in the ash layer cooled through their closure temperature, i.e., the eruption age.  

The utility of the isochron equation is that we can measure both the Sr isotopic composition and Rb abundance of a sample at the present, and when we plot those data on a graph we should obtain a familiar linear relationship, $y=mx+c$, where in this case

\begin{align}
y &= \frac{^\mathrm{87}\mathrm{Sr}}{^\mathrm{86}\mathrm{Sr}},\ &x &= \frac{^\mathrm{87}\mathrm{Rb}}{^\mathrm{86}\mathrm{Sr}}\\
m &= \exp{(\lambda{}t)-1},\ &c &= \frac{^\mathrm{87}\mathrm{Sr}}{^\mathrm{86}\mathrm{Sr}}\bigg\rvert_0.
\end{align}

So as an exercise in model fitting, all we need to do is find the 'best' line through the data and we can obtain both the initial Sr isotopic composition of the material and its age. 

### 5.1.2 Describing how good a line is at fitting isochron data

From a computiational perspective this is a special case of a minimisation/optimisation problem.  It is a minimisation problem in that we want to minimise the misfit between a model and the data, i.e., in the case of an isochron, have the straight line go as close as possible to all the points.

Let's illustrate how we might do this in Python with an Rb-Sr isotopic dataset.  We are going to be looking at data from the Sierra Nevada Granodiorite, published in __[Kistler et al. (1986)](https://link.springer.com/article/10.1007/BF00592937)__.  First, we will read in the data and plot it.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

#Readin data
df = pd.read_csv('./data/Sierra_nevada_granodiorite.txt',sep=',',header=0)

#Plot data
fg, ax = plt.subplots(1)
ax.scatter(df.rbsr8786, df.srsr8786)

#we will need to manually change the y range, else its a bit too zoomed out
ax.set_ylim([0.706,0.708])

#we can label axes, here is the notation for putting mathematical expressions in labels
# if you know Latex you will be familiaar with this, you just need an 'r' before the text
ax.set_xlabel(r'$^{87}Rb/^{86}Sr$')
ax.set_ylabel(r'$^{87}Sr/^{86}Sr$');

ax.grid(ls=':');

Now, we have a huge range of options for how to fit a line to the above data.  Afterall, we are just asking the question 
>what choice of model parameters minimise the distance between our model and the data?

which is a very general problem.  The human brain is quite good at evaluating how effective a model is in these simple cases, which is what you have relied upon previously when hand drawing lines through graphed data.  

The formal method, however, when faced with x-y data and straight line fitting is __linear least squares regression__.  A visualisation of what least squares regression is doing is included below.

![lsr](img/least-squares-regression-line.jpg)

Least squares regression is trying to minimise the summed area of those blue squares, which just represent the squared vertical (i.e., y-axis-parallel) distance between each data point and the model line.  Representing this mathematically, least squares regression is minimising the equation

\begin{align}
S &= \sum_i^n{\big[y_i - y'_i\big]^2}\\\\
y'_i &= f(x_i) = mx_i + c,
\end{align}

where, $y_i$ is the y-value of the $i$'th data point, and $f(x_i)$ is our model evaluated at $x_i$, the x-location of the $i$'th data point.

There are tools to minimise $S$ for us in Python, although if you want to see the maths behind solving the linear least squares regression problem start __[here](https://en.wikipedia.org/wiki/Least_squares)__.  

We will use start by using `scipy`'s inbuilt method `scipy.stats.linregress`.  Linear regression problems are so common, that these inbuilt methods don't even require us to pass our own function to the solver, they are built to expect a $y=mx+c$ type problem.  As with everything `scipy`/`numpy`, there is __[good documentation](https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.stats.linregress.html)__ explaining what we need to give this function and what it is going to give back to us.  The function simply wants
- $\mathbf{x}$
- $\mathbf{y}$

two arrays of equal length defining the x and y locations of each data point.  It will return an object with methods giving us the best fit slope, intercept and other statistics, check __[the documentation](https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.stats.linregress.html)__ for details.

Let's use this function, plotting the output and putting some annotations onto the graph to tell us the age.

In [None]:
import scipy.stats as sts
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

#readin data
df = pd.read_csv('./data/Sierra_nevada_granodiorite.txt',sep=',',header=0)

#lets call our linear regression function and store the result to an object called 'res'
res = sts.linregress(df.rbsr8786, df.srsr8786)

#now we are going to need our own function for a straight line so that we can plot
# the model result given an x value, m and c.
def lfunc(x, m, c):
    y = m*x + c
    return y

#let's plot this solution to see how well it has done
# first plotting the data...
#Plot data, changing the symbol style to look more sciency
fg, ax = plt.subplots(1)
ax.scatter(df.rbsr8786, df.srsr8786, facecolor='white', edgecolor='black')

#we will need to manually change the y range, else its a bit too zoomed out
ax.set_ylim([0.706,0.708])

#plot on our model result
# first creating some x values, from the minimum x and maximum x values the plot has automatically assigned
xlim = ax.get_xlim()
x = np.linspace(xlim[0], xlim[1], 100)
#pass these x values to our function with the slope and intercept found from our fitting
y = lfunc(x, res.slope, res.intercept)

#now we are ready to plot our solution
ax.plot(x,y, linestyle='-', color='black')

#define lambda, decay constant, note we _cannot_ call this variable 'lambda' as Python reserves that word
lmbd = 1.393e-11
age = np.log(res.slope + 1)/lmbd
text = 'age: {:.1f} Myr'.format(age/1e6)
ax.text(0.9,0.1, text, transform=ax.transAxes, horizontalalignment='right',verticalalignment='bottom')

#we can label axes, here is the notation for putting mathematical expressions in labels
# if you know Latex you will be familiaar with this, you just need an 'r' before the text
ax.set_xlabel(r'$^{87}Rb/^{86}Sr$')
ax.set_ylabel(r'$^{87}Sr/^{86}Sr$');

ax.grid(ls=':');

We were able to fit this model to our data without even providing an initial guess at the slope or intercept. For more complex model fitting problems however, where we have a complex function with more parameters, we would have had to use a different fitting algorithm that would have required an initial guess near the right solution to have got to the right answer.

The age obtained from the __[original paper](https://link.springer.com/article/10.1007/BF00592937)__ is copied below for comparison. 

<img src="./img/kintler_excerpt.png" width="400">

Our result is a little different to what was obtained by __[Kintler et al. (1986)](https://link.springer.com/article/10.1007/BF00592937)__.  What did we not do in our fitting that we should have done?

## 5.2 Model fitting with errors
***

Now, a key thing we forgot in our solution above was that the data have errors on them.  It turns out that this makes the job of fitting a model quite a bit more tricky, especially when there are errors both on x and y, _and_ the errors are correlated between the x and y axes as is the case with isochron data (both axes have $^{86}\mathrm{Sr}$ as a common denominator).  Exploring how to properly fit such data is beyond the scope of this exercise, however, it is useful to see how to plot the data with its errors, which we will do below.

Note, that the errors on the $^{87}$Rb/$^{87}$Sr ratios are in percent (i.e., are relative to the Rb/Sr value itself), so need to be transformed to give absolute errors.   

We will use `ax.errorbar` as our plotting tool, the arguments you need to pass to control the appearance are a bit different to usual here, so __[check the documentation](https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.axes.Axes.errorbar.html)__.

In [None]:
#-------------same as above---------------------------#
import scipy.stats as sts
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

df = pd.read_csv('./data/Sierra_nevada_granodiorite.txt',sep=',',header=0)

res = sts.linregress(df.rbsr8786, df.srsr8786)

def lfunc(x, m, c):
    y = m*x + c
    return y

fg, ax = plt.subplots(1)
#-----------------------------------------------------#

#make plot with error bars
ax.errorbar(df.rbsr8786, df.srsr8786, yerr=df.srsr_1sig, xerr=(df.rbsr_1sigpc*df.rbsr8786/100), 
            color='white', marker='o', markeredgecolor='black', linestyle='None',
           ecolor='black')

#we will need to manually change the y range, else its a bit too zoomed out
# comment out the line below to see what matplotlib automatically sets as the y-range
ax.set_ylim([0.706,0.708])

#plot on our model result
# first we need to create some x values, we can pick these from the minimum x and maximum x values 
# the plot has automatically assigned
xlim = ax.get_xlim()
# we use np.linspace to generate 100 points between the minimum and maxmimum x values in order to
# plot our function.  Note: that because this is a straight line, we would actually have been ok with
# just 2 points, but this is a more general method that would plot more complex functions
x = np.linspace(xlim[0], xlim[1], 100)
# pass these x values to our function with the slope and intercept found from our fitting
y = lfunc(x, res.slope, res.intercept)

#now we are ready to plot our solution
ax.plot(x,y, linestyle='-', color='black')

#define lambda, the decay constant, note we _cannot_ call this variable 'lambda' as Python reserves that word
lmbd = 1.393e-11
age = np.log(res.slope + 1)/lmbd
text = 'age: {:.1f} Myr'.format(age/1e6)
ax.text(0.9,0.1, text, transform=ax.transAxes, horizontalalignment='right',verticalalignment='bottom')

#we can label axes, here is the syntax for putting mathematical expressions in labels
# if you know Latex you will be familiaar with this, you just need an 'r' before the text
ax.set_xlabel(r'${^{87}Rb/^{86}Sr}$')
ax.set_ylabel(r'${^{87}Sr/^{86}Sr}$');

ax.grid(ls=':');

## 5.3 Minimisation algorithms
***

The linear regression problem described above is a special case of minimisation, which as a reminder is where we are looking to minimise the distance between data and a model.  Let's now look at a slightly more complicated example, where we are no longer expecting a simple straight line to fit our data.  This example will introduce us to a more general case of minimisation.



### 5.3.1 Scientific background

We are going to develop an Fe-Ti oxide thermometer.  This is possible because ilmenite's structure ($\mathsf{FeTiO_3}$) at thermodynamic equlibrium is described by $Q$, an ordering parameter, and $T$, the temperature.  The order parameter describes the ordering of Fe and Ti cations onto adjacent lattice planes in the ilmenite structure, measured with neutron diffraction.  Q = 0 corresponds to a disordered distribution of Fe and Ti, Q = 1 describes full segregation of Fe and Ti onto adjacent layers.

Temperature and the ordering parameter are related by a specific functional form given below
\begin{equation}
T = T_c−aQ^2−(T_c−a)Q^{n−2},
\end{equation}
which includes a series of constants, $T_c$, $a$, and $n$, which need to be determined in order for us to use the model.  These constants need to be determined from experiments, where oxides of a particular composition (compositionally intermediate between pure ilmenite and hematite) are synthesised at particular temperatures, their ordering, $Q$, measured and then from a series of these data points the best fitting choice of constants identified.  Each ilmenite composition will have a different set of these parameters.

### 5.3.2 Data reconnaissance

Now, let's readin the experimental data for ilmenite and plot that up.  One trick with this data file (open `data/ilmenite_data_simple.txt` and have a look), is that there are multiple ilmenite compositions.  These each need to be fitted separately.  We will start by plotting each ilmenite dataset with a different coloured symbol so that we can see the different trends the data describe.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.cm as cm

#Readin the data
# note, that this file is separated by tabs rather than commas (which is the default assumption when
# read_csv is used), so we specify this explicitly using 'sep='
df = pd.read_csv('./data/ilmenite_data_simple.txt', sep='\t', header=0)

#We need to store the unique mineral names so we can plot and fit these datasets separately
minerals = df.composition.unique()

#start plotting by initialising a subplot and storing the figure and axis objects
f, a = plt.subplots()

#loop over the minerals, plotting each one separately
for m in minerals:
    #we use 'df.composition==m' to select specifically the data entries corresponding to the 
    # specific mineral
    x = df.Q[df.composition==m]
    y = df.temp[df.composition==m]

    a.scatter(x, y, label=m)  

a.set_xlabel('Q');
a.set_ylabel('T (K)');
a.legend();

Notice how matplotlib automatically adjusted the color of symbols/lines when we repeatedly plot onto the same axes.  So all we needed to do was to use separate plotting commands to plot the different ilmenite data.

Plotting the data like this illuminates an important feature: some of the low temperature data look a little odd, not falling along a smooth transition from the high temperature data.  This might be because the low temperature experiments failed to equilibrate (because the timescales for equilibration were too long).  We need to take this into account when performing our fit to the data, and exclude the low temperature data.

> When fitting data to extract model parameters, always visualise the data and model fit to make sure things are working as expected.

### 5.3.3 Fitting a model

To make this a minimisation problem, we want to write a fitting function that will calculate a value for $T$ for a given input value of $Q$ and a particular choice of $T_c$, $a$ and $n$.  All these values will be passed to the function.  Let's start by writing a function for $T$.

In [None]:
#Define fitting function
def ffunc(x, w0, w1, w2):
    #w0 is Tc
    #w1 is a
    #w2 is n
    #x represents Q
    return w0 - w1*x**2 - (w0 - w1)*x**(w2-2)

Now we want to take the pieces of code we have from above and combine them to fit a model to the different ilmenite curves and from that identify the parameters that go into our thermometer equation.

For this we are going to use `scipy.optimize.curve_fit`, find the __[documentation here](https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html)__ for the full details of how to use it.  This is a general and powerful tool for fitting a 'curve' (or, in other words a 'model') to data.  It takes the function we have defined above `ffunc`, the x and y coordinates of our data (i.e., Q and T values), and passes guesses for the constants we want to determine to the function.  It then calcualtes the misfit between the model and the data and adjusts its guess for the values of the constants to try and improve the misfit.  It returns the optimal values of the parameters and other information about the success of the fitting exercise (see the documentation for details).

A key difference from the user perspective between `curve_fit` and the `linregress` tool we used previously, is that `curve_fit` requires an initial guess of what the parameter values are.  We will use
\begin{align}
T_c &= 1200\\
a &= 1\\
n &= 6,
\end{align}
which work.  In practice, you may need to experiment at picking several different values before you have a set of initial guesses that allow the minimisation algorithm to find the best fit parameter values.  And to know whether a 'good fit' has been found, you will need to be visualising the results (data and model prediction), emphasising again that plotting the data and model is very important.

The code below loops through each of the ilmenite compositions and uses `scipy.optimize.curve_fit` to fit a model to each of them, plotting the model and the data on a graph.

In [None]:
#---assume we have run the cells above that read in the data etc.

#import scipy.optimize
import scipy.optimize as opt

#Define a temperature threshold below which the data suggest the experiments haven't
# reached equilibrium
T_thresh = 1000

#setup our plot
f, a = plt.subplots()

#initial guesses
w0 = 1200
w1 = 1
w2 = 6

#loop over the different ilmenite compositions
for m in minerals:
    #define a list of True False values, telling us which data we want to include
    # here, combining the constraint from temperature and mineral name
    inc = (df.composition==m) & (df.temp>T_thresh)

    #plot _all_ the data
    x = df.Q[df.composition==m]
    y = df.temp[df.composition==m]

    #plot all the data as grey points
    a.scatter(x, y, c = 'grey', label=m)  
    #plot the fitted data as coloured symbols
    c = a.scatter(x[inc], y[inc], label=m)  

    #now fit the data passing specifically the data we want to fit, i.e., using 'inc' to select
    popt, pcov = opt.curve_fit(ffunc, x[inc], y[inc], p0=[w0, w1, w2])

    #plot over full range as thin line, using our function to generate the y values for the data's x values
    a.plot(x, ffunc(x, popt[0], popt[1], popt[2]), linewidth=1)
    
a.set_xlabel('Q')
a.set_ylabel('T (K)');


# Independent coding
***



<div class=obj>
    <b>Aim:</b> To use a minimisation algorithm to identify the optimal pole of rotation between South American and Africa.
</div>
<p></p>



Since first year undergraduate you have been told that plate tectonic theory is the crowning triumph of 20th Century geology.  But have you ever tested it?  Recall that it makes a basic prediction about the geometry of plate interfaces: that transform faults should become parallel lines when projected onto an oblique Mercator projection, i.e., when the pole selected for a Mercator projection is the Euler pole of rotation describing the rigid plate movement (look __[here](https://www.open.edu/openlearn/science-maths-technology/science/geology/plate-tectonics/content-section-4.3)__ for a refesher).  

Now you are able to read data in, plot geographical data and use `scipy` minimisation tools, you can test plate tectonics for yourself!  

We can do this by taking a population of fracture zones in the South Atlantic and seeing what pole most effectively reduces them to parallel horizontal lines in an oblique mercator projection.

The equations to project longitude and latitude coordinates into $x$ and $y$ of the oblique Mercator projection are:

\begin{align}
x &= \arctan{\Bigg(\frac{(\tan(\phi)\cos(\phi_\mathrm{p}) + \sin(\phi_\mathrm{p})\sin(\lambda-\lambda_\mathrm{c}))}{\cos(\lambda-\lambda_\mathrm{c})}\Bigg)}\\\\
y &= \frac{1}{2}\ln\Bigg(\frac{1+A}{1-A}\Bigg)
\end{align}
where
\begin{align}
A &= \sin(\phi_\mathrm{p})\sin(\phi) - \cos(\phi_\mathrm{p})\cos(\phi)\sin(\lambda-\lambda_\mathrm{c})\\\\
\lambda_\mathrm{c} &= \lambda_\mathrm{p}+90.
\end{align}

In these equations $\phi$ is latitude and $\lambda$ is longitude in degrees, subscript $p$ indicates the location of the pole of projection (pole of rotation) and symbols with no subscript represent the location of the point being projected.

To begin with, follow through the code below to plot the ridge segments and fracture zones in the region using a standard Mercator projection (i.e., the special case of the oblique Mercator where the pole is at 90N).  _Note: you will need to use elements of this code when writing your own solution._

In [None]:
#the usual imports
import cartopy as ctp
import cartopy.crs as ccrs
import matplotlib.pyplot as plt
import pandas as pd

#let's get our figure setup
fg = plt.figure(figsize=(10,6));

#add a subplot specifuing the projection and the central longitude
ax = fg.add_subplot(1, 1, 1,
                         projection=ccrs.Mercator(central_longitude=-13))
#these coordinates roughly bound our region of interest
lonmin, lonmax, latmin, latmax = -60, -5, -6, 20
ax.set_extent([lonmin, lonmax, latmin, latmax])

#let's add the oceans this time for fun
feature = ctp.feature.NaturalEarthFeature(
        name='ocean', category='physical',
        scale='110m',
        edgecolor='#000000', facecolor='lightblue', zorder=0)
ax.add_feature(feature)

#add coastlines
feature = ctp.feature.NaturalEarthFeature(
        name='coastline', category='physical',
        scale='110m',
        edgecolor='#000000', facecolor='tan', zorder=2)
ax.add_feature(feature)

#add gridlines
gl = ax.gridlines(linestyle=':', color='black',
                 draw_labels=True, zorder=1)
gl.xlabels_top = False
gl.ylabels_right = False

#now for the ridges,
# each ridge in the file is represented by two pairs of lat-lon values
dr = pd.read_csv('data/gale_ridge_segs.csv',header=0, index_col=None)
#we should now restrict the ridges to those in the geographical region we are interested in
dr = dr[(dr.Lat1<latmax) & (dr.Lat1>latmin) & (dr.Lat2<latmax) & (dr.Lat2>latmin) &\
        (dr.Lon1<lonmax) & (dr.Lon1>lonmin) & (dr.Lon2<lonmax) & (dr.Lon2>lonmin)].reset_index()

#let's loop through the data plotting on the ridge segments
for d in dr.iterrows():
    #remember, we need to use the 'transford=Geodetic()' to let the plot know that we are giving
    # it lon-lat data, rather than plot (i.e., x-y) coordinates.
    ax.plot([d[1].Lon1, d[1].Lon2], [d[1].Lat1, d[1].Lat2], transform=ccrs.Geodetic(),
           color='orangered')

#now we can readin fracture zones and plot those
df = pd.read_csv('data/fracture_zones.csv',header=0, index_col=0)

for d in df.iterrows():
    #remember, we need to use the 'transford=Geodetic()' to let the plot know that we are giving
    # it lon-lat data, rather than plot (i.e., x-y) coordinates.
        ax.plot([d[1].lon1, d[1].lon2], [d[1].lat1, d[1].lat2], transform=ccrs.Geodetic(),
               color='white')


Reassuringly, these fracture zones are somewhat misoriented with respect to the lines of latitude, but they are not wildly off: this is an important observation, just knowing this tells us that the 'answer' is probably a pole of rotation at reasonably high latitude.

Now, the task is to write a program to find the pole of rotation that fits these fracture zones.

Identifying that a pole at high latitude probably fits these fracture zones not only helps us evaluate whether we have a sensible solution once we have run our code, but also gives us a good idea of what to use for the initial guess, or for how to bound the parameter space our algorithm searches over.  All solvers will require some indication of what values the model parameters can possibly take.  In this case there are some important and obvious constraints: we know we are looking for a pole location lying on the surface of the Earth, so its latitude, $\phi_\mathrm{p}$ is restricted to $-90\le{}\phi_\mathrm{p}\le{}90$ and longitude, $\lambda_\mathrm{p}$, to $-180\le{}\lambda_\mathrm{p}\le{}180$.  What's more, we only need to find the 'north' end of the pole (there will of course be two solutions!), so we can restrict our search domain further.

Bounding the problem as tightly as possible with prior information is extremely important, as solvers can otherwise struggle to find the 'best' solution, amongst many good, but not quite as good, alternatives.

This brings us to a key step in posing our problem such that a computer can solve it: deciding how to express the 'fitness' of the solution, i.e., 'how good is the current pole at flattening out the transform faults?'  All our transform faults are expressed as pairs of longitude, latitude coordinates (the i'th transform fault is defined by [$\lambda^i_1$, $\phi^i_1$] and [$\lambda^i_2$, $\phi^i_2$]), and when we have the right solution the y values in the transformed coordinates for each transform fault should nearly be equal to each other.  This means we are looking to minimise

\begin{equation}
    S = \Bigg(\dfrac{\sum^n_i{\big[y^i_1-y^i_2\big]^2}}{n}\Bigg)^{0.5}.
\end{equation}

I.e, the root mean squared misfit of the data from the ideal scenario.  

Choosing the right 'solver' for your problem is something of an art.  For now, focus on using a __[dual annealing solver](https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.dual_annealing.html#scipy.optimize.dual_annealing)__ from `scipy`.  Read the description of what input the solver wants very carefully: 
- It wants you to give it a function to which it can pass guesses about where the pole could be, and for that function to return to it a single number telling it how bad that guess is (the misfit), then it will update its guesses and try again, stopping when it thinks it can't do any better.   
- It also wants 'bounds', that constrain the parameter space.  These are the latitude and longitude values that the pole could take, we have looked at this above.
- The final essential thing we will need to pass the solver is the additional arguments our minimsation function takes to run.  In our case this is going to be the data describing the locations of the fracture zones, so these can be projected onto an oblique mercator projection using the current best-guess pole, and their deviation from horizontal parallel lines calculated.

You will see from the documentation for `scipy`'s 'dual_annealing' solver that there are many other options you can pass.  For now, accept that sensible defaults of these options have probably been chosen.

To code this up you will need to:
1. Readin the fracture zone data __[fracure_zones.csv](data/fracture_zones.csv)__.
1. Write a funtion to perform a projection of the latitude longitude coordinates into the oblique Mercator reference frame using the maths above.
1. Now, to test your function, project the coordinates of the fracture zones using a pole at 90N and plot the results in a normal rectangular `matplotlib` x-y plot (i.e., don't use `Cartopy` to plot this!).  This should just return a plot that looks the same as the map we produced above.  If it doesn't something has gone wrong!
1. Write your minimisation function, to calculate the misfit between the projected fracture zone coordinates and the ideal scenario (where the two y values of each fracture zone are equal, i.e., they lie horizontally).  
1. Test your minimisation function out manually, passing different pole locations and making sure the misfit updates.
1. Call `scipy.optimize.dual_anneal(...)` passing it your minimsation function, the fracture zone locations and bounds on the problem.
1. Re-plot the data about this 'best fit' solution, the fracture zones should appear now as horizontal lines!
1. Compare your answer to a __[published estimate](https://agupubs.onlinelibrary.wiley.com/doi/abs/10.1029/94GL02118%4010.1002/%28ISSN%291944-8007.GRL40)__ (their Table 2). 

_Hint 1: Remember numpy trig functions take radians not degrees._

_Hint 2: You will need to pass the pandas dataframe containing the fracture zones to the solver as `.dual_annealing(..., arg=[df])`._

_Hint 3: you will need to set the bounds for the dual annealing alogorithm as `bounds = list(zip([0, -180],[90, 180]))`._

### Going further
- Think about why our result is different to that of the published Africa-South America Pole.  How could what we did be improved to increase accuracy of plate rotation parameters?
- Adjust the other parameters setting __[how the dual annealing solver works](https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.dual_annealing.html#scipy.optimize.dual_annealing)__.  In particular, `maxiter` and `initial_temp`.  Can you get the solver to _fail_ to find the best solution?
- Investigate whether __[other solvers](https://docs.scipy.org/doc/scipy/reference/optimize.html)__ work as well as the dual annealing solver we used.
- Try changing the function we are minimising, does this affect the result?