In case you are running this notebook in Colab! If you want to enable GPU acceleration, ensure you select a GPU runtime in the top-right dropdown menu 🚀

# Training a MACE potential for liquid water

> **FYI**, you can open this documentation as a [Google Colab notebook](https://colab.research.google.com/github/omidshy/ml-notebooks/blob/master/notebooks/graph-pes-water.ipynb) to follow along interactively

For more information on the ``graph-pes-train`` command, and the plethora of options available for specification in your ``config.yaml`` see the [CLI reference](https://jla-gardner.github.io/graph-pes/cli/graph-pes-train/root.html).

Below, we train a [MACE](https://jla-gardner.github.io/graph-pes/models/many-body/mace.html) model on a dataset containing ... structures of liquid water.

In [None]:
!pip install graph-pes

We should have access to the ``graph-pes-train`` command. We can check this by running:

In [None]:
!graph-pes-train -h

## Reference Data 

We download our dataset, use [load-atoms](https://jla-gardner.github.io/load-atoms/) to load our local copy and split the dataset into training, validation and test datasets:

In [None]:
%%bash

if [ ! -f water.xyz ]; then
    wget https://tinyurl.com/water-dataset -O water.xyz
fi

In [None]:
import ase.io
from load_atoms import load_dataset

structures = load_dataset("water.xyz")
train, valid, test = structures.random_split([0.8, 0.1, 0.1])

ase.io.write("train-water.xyz", train)
ase.io.write("valid-water.xyz", valid)
ase.io.write("test-water.xyz", test)

We can visualise the kinds of structures we're training on using [load_atoms.view](https://jla-gardner.github.io/load-atoms/api/viz.html):

In [None]:
from load_atoms import view

view(train[0], show_bonds=True)

As you can see, each structure has an energy label:

In [None]:
train[0].info["energy"]

... as well as a forces label (one for each atom in the structure):

In [None]:
train[0].arrays["forces"].shape

These properties are stored in the files we have just created:

In [None]:
!head train-water.xyz

## Configuration

Great - now lets train a model. To do this, we have specified the following in our ``water.yaml`` file:

* the model architecture to instantiate and train, here [MACE](https://jla-gardner.github.io/graph-pes/models/many-body/mace.html). Note that we also include a [FixedOffset](https://jla-gardner.github.io/graph-pes/models/offsets.html#graph_pes.models.FixedOffset) component to account for the fact that the energy labels have an arbitrary energy offset.
* the data to train on, here the liquid water dataset we just loaded
* the loss function to use, here a combination of a per-atom energy loss and a per-atom force loss
* and various other training hyperparameters (e.g. the learning rate, batch size, etc.)



We can download [this config file](https://raw.githubusercontent.com/omidshy/ml-notebooks/refs/heads/master/data/quickstart-cgap17.yaml) using wget:

In [None]:
%%bash

if [ ! -f water.yaml ]; then
    wget https://tinyurl.com/water-config -O water.yaml
fi

## Training

We use the downloaded config file to start the training.


In [None]:
!graph-pes-train water.yaml general/run_id=train-mace-water

## Model Analysis

As part of the `graph-pes-train` run, the model was tested on the test set we specified in the config file (see the final section of the logs above).

To analyse the model in more detail, we first need to load it from disk. You can see from the command we used, and the training logs above, that the best model from the training run (i.e. the set of weights that gave the lowest validation loss) has been saved as `graph-pes-results/train-mace-water/model.pt`.

Let's load that best model, put it on the GPU for accelerated inference if available, and get it ready for evaluation:

In [None]:
import torch
from graph_pes.models import load_model

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
best_model = (
    load_model("graph-pes-results/train-mace-water/model.pt")  # load the model
    .to(device)  # move to GPU if available
    .eval()  # set to evaluation mode
)

The easiest way to use our model is to use the [GraphPESCalculator](https://jla-gardner.github.io/graph-pes/tools/ase.html#graph_pes.utils.calculator.GraphPESCalculator) to act directly on [ase.Atoms](https://wiki.fysik.dtu.dk/ase/ase/atoms.html#module-ase.atoms) objects:

In [None]:
calculator = best_model.ase_calculator()
calculator.calculate(test[0], properties=["energy", "forces"])
calculator.results

We can see from a single data point that our model has done a reasonable job of learning the PES:

In [None]:
calculator.get_potential_energy(test[0]), test[0].info["energy"]

... and predicting the atomic forces:

In [None]:
calculator.get_forces(test[0])[1], test[0].arrays["forces"][1]

``graph-pes`` provides a few utility functions for visualising model performance:

In [None]:
import matplotlib.pyplot as plt
from graph_pes.utils.analysis import parity_plot

%config InlineBackend.figure_format = 'retina'

parity_plot(
    best_model,
    test,
    property="energy_per_atom",
    units="eV / atom",
    lw=0,
    s=12,
    color="crimson",
)

In [None]:
parity_plot(
    best_model,
    test,
    property="forces",
    units="eV / Å",
    lw=0,
    s=2,
    alpha=0.5,
    color="crimson",
)

## Dynamics Simulation

Running molecular dynamics (MD) with our trained model is straightforward.

Below, we use ASE-driven MD for simplicity - please see the [LAMMPS MD guide](https://jla-gardner.github.io/graph-pes/tools/lammps.html) for instructions on how to run MD with our model in LAMMPS.

In [None]:
from ase import units
from ase.md.langevin import Langevin

# set up structure
structure = ase.io.read("test-water.xyz", index=0)

# set up MD
structure.calc = calculator
dynamics = Langevin(
    structure,
    timestep=1.0 * units.fs,
    temperature_K=300,
    friction=0.01 / units.fs,
)
dynamics.attach(
    lambda: structure.write("traj-water.xyz", append=True),
    interval=10,
)

# run MD
dynamics.run(5000) # 5000 steps = 5 ps

Loading the trajectory and visualizing the last frame.

In [None]:
trajectory = ase.io.read("traj-water.xyz", index=":")
view(trajectory[500], show_bonds=True)

## Simulation Analysis

We now compute the radial distribution functions (RDF) from the collected trajectory and compare the results to the RDFs from a reference DFT MD simulation.

In [None]:
from ase.geometry.analysis import Analysis

# The Analysis class takes the Atoms object as input.
atoms = Analysis(trajectory)

# Calculate the RDF for Oxygen-Oxygen pairs
rdf_OO = atoms.get_rdf(rmax=7.0, nbins=200, elements=('O', 'O'))

In [None]:
%%bash

if [ ! -f rdf.csv ]; then
    wget https://tinyurl.com/ref-rdf -O rdf.csv
fi

In [None]:
import numpy as np

ref_rdf_OO = np.genfromtxt("rdf.csv", delimiter=",", dtype=float)

In [None]:

plt.figure(figsize=(8, 6))
plt.plot(np.linspace(0, 7, num=200), np.mean(rdf_OO, axis=0), label='O-O', color='red', linewidth=2)
plt.plot(ref_rdf_OO.T[0]/100, ref_rdf_OO.T[1], label='O-O (DFT)', color='green', linestyle='dashed', linewidth=2)
plt.title('Radial Distribution Function (RDF)')
plt.xlabel('Distance, $r$ (Å)')
plt.ylabel('g($r$)')
plt.legend()
plt.axhline(1, color='gray', linestyle=':')
plt.show()