# Click "Edit App" to see the code

In [None]:
# python packages
from sys import stdout
import pandas as pd # Dataframes and reading CSV files
import numpy as np # Numerical libraries
import matplotlib.pyplot as plt # Plotting library
# %matplotlib notebook
from lmfit import Model # Least squares fitting library

As always let's start with importing the file in a dataframe and renaming the columns

In [None]:
data = pd.read_csv("arrhenius.csv")
data.columns = ("T","Kr")
print(data)

The temperatures are not in order; we can sort them to make a nice plot later

In [None]:
data = data.sort_values("T")
print(data)

We may also want to reset the row indices, just in case

In [None]:
data.reset_index(drop=True, inplace=True)
print(data)

The Arrhenius equation can be written in exponential
\begin{equation}
k_r = A\exp\bigg[-\frac{E_a}{RT}\bigg] \tag{1}
\end{equation}
or in a linear form ($y=mx+q$)
\begin{equation}
\ln\big[k_r\big] = \ln A -\frac{E_a}{R}\frac{1}{T} \tag{1}
\end{equation}
where $y=\ln[k_r]$ and $x=1/T$.

As an illustrative example of how the fitting works, we'll fit both functions using the lmfit library.
Although there are pre-built models, we assume that we are fitting a generic user-defined function. 
Hence, we ca start by defining two functions to solve the two equations

In [None]:
R = 8.314 # ideal gas constant in J/mol/K
def arrExp(x,A,Ea):
    return A * np.exp(-Ea/R/x)

def arrLin(x,lnA,Ea):
    return lnA - Ea/R * (1/x)

We then have to create two _Models_ using the *lmfit* library for the two independet fits

In [None]:
modExp = Model(arrExp)
modLin = Model(arrLin)

Like all fitting procedures, *lmfit* requires a starting set of parameters to operate, that can be set by the user. The closer the parameters are to the correct ones the better the library will work. Typically the frequency factor is of the order of $10^{12}-10^{15}\ Hz$ and the activation energy for a slow-ish reaction is in the $1-100\ kJ/mol$ range.

In [None]:
paramsExp = modExp.make_params(A=1e13 , Ea=5)
paramsLin = modLin.make_params(lnA=np.log(1e13) , Ea=5)

We can now fit the data. The _fit_ function takes three main arguments, $y$, the initial parameters and the $x$ of the funtion to fit. Note how for the linear fit we passed the log of the rate not NOT the inverse of the temperature to the _lmfit_ function. This is because of the way we defined the **arrLin** function

In [None]:
fitExp = modExp.fit(data["Kr"], paramsExp, x=data["T"])
fitLin = modLin.fit(np.log(data["Kr"]), paramsLin, x=data["T"])

## Results for the exponential fit

In [None]:
fitExp.result

## Results for the linear fit

In [None]:
fitLin.result

Note how much smaller the error on the activation energy is when using a linear fit.

_lmfit_ does not compute the $R^2$, but we can easily do that using its definition.
\begin{equation}
R^2 = 1 - \frac{\sum_i (y_i-f_i)}{\sum_i(y_i-\langle y \rangle)^2}
\end{equation}
where the numerator is the sum of the residuals and the denominator is the variance, _i.e._ the standard deviation squared.
The sum of the residuals is already computed by _lmfit_ and we can obtain the variance from NumPy.
The R squared confirms that the linear fit it better.

In [None]:
expR2 = 1 - fitExp.residual.var() / np.var(data["Kr"])
linR2 = 1 - fitLin.residual.var() / np.var(data["Kr"])

print("Rsquared for the Exponential fit :",expR2)
print("Rsquared for the Linear fit      :",linR2)

Let's now see how the two fits compare with the input data graphically by plotting the data and fit functions.
_lmfit_ has already computed the values of best fit, so there's nothing for us to do.
Note that for the linear fit we took the exponential of the data.

In [None]:
fig , ax = plt.subplots(figsize=(10,6))

ax.scatter(data["T"],data["Kr"],label="Data")
ax.plot(data["T"],fitExp.best_fit,label="ExpFit",color='red')
ax.plot(data["T"],np.exp(fitLin.best_fit),label="LinFit",color='green')


ax.set(xlabel="Temperature (K)")
ax.set(ylabel="Rate constant")

ax.legend()
plt.show()

Both fits seem pretty good, but let's now replot the data as the logarithm of the rate vs the inverse of the temperature.

In [None]:
ig , ax = plt.subplots(figsize=(10,6))

ax.scatter(1/data["T"],np.log(data["Kr"]),label="Data")
ax.plot(1/data["T"],np.log(fitExp.best_fit),label="ExpFit",color='red')
ax.plot(1/data["T"],fitLin.best_fit,label="LinFit",color='green')


ax.set(xlabel="1/Temperature (1/K)")
ax.set(ylabel="ln[Rate constant]")

ax.legend()
plt.show()

Clearly the linear fit is much better at reproducing the data.
This is due to fact that the fitting function minimises the residuals, so in the exponential form the tail of the function is _less_ important than the part with large numbers.