# Computing the formation energy using data from the JANAF tables

The Gibb's free energy is:

$$G = H-TS$$

The equilibrium constant is:
$$\ln{K_{eq}} = -\frac{G}{RT}$$

Logarithm base change:
$$\log_{10}{K_{eq}} = \ln{K_{eq}} / \ln{10}$$

By substitution:
$$\log_{10}{K_{eq}} =  -\frac{G}{RT\ln{10}}  = -\frac{H}{RT\ln{10}} + \frac{S}{R\ln{10}}$$

Hence we have a form of:
$$\log_{10}{K_{eq}} =  a + b/T$$
where:
$$ a = \frac{S}{R\ln{10}},\quad b =  -\frac{H}{R\ln{10}} $$

To compute a least-squares fit to the data we can multiply by $T$ so the gradient is equal to $a$ and the intercept is equal to $b$:
$$T \log_{10}{K_{eq}} =  aT + b$$

In [48]:
import pandas as pd
import numpy as np

Define the data range to use:

In [49]:
TEMPERATURE_HIGH = 3000 # K
TEMPERATURE_LOW = 1500 # K

In [55]:
def least_squares_fit_to_janaf_data(molecule:str):
    """Fit the JANAF Gibb's free energy data.
    
    Args:
        molecule: The molecule.

    Returns:
        The least-squares fit coefficients.
    """
    datafile: str = f"tables/{molecule}.dat"
    df:pd.DataFrame = pd.read_csv(datafile, sep='\t')

    # Filter the data between the relevant temperature range.
    df = df.loc[(df['T(K)'] <= TEMPERATURE_HIGH) & (df['T(K)'] >= TEMPERATURE_LOW)]

    # Least squares fitting.
    temperature: np.ndarray = df['T(K)'].to_numpy()
    log_Kf: np.ndarray = df['log Kf'].to_numpy()
    design_matrix: np.ndarray = temperature[:, np.newaxis]**[1, 0]
    solution, _, _, _ = np.linalg.lstsq(design_matrix, log_Kf*temperature, rcond=None)

    return solution

In [61]:
for molecule in ("CO", "CO2", "CH4", "H2O"):
    a, b = least_squares_fit_to_janaf_data(molecule)
    print(molecule, ': a = ', a, ', b = ', b)

CO : a =  4.319860294117643 , b =  6286.120588235306
CO2 : a =  -0.028289705882357442 , b =  20753.870588235302
CH4 : a =  -5.830066176470588 , b =  4829.067647058815
H2O : a =  -3.0385132352941198 , b =  13152.698529411768
