In case you are running this notebook in Colab! If you want to enable GPU acceleration, ensure you select a GPU runtime in the top-right dropdown menu 🚀

# Training ML potentials using GNNs

> **FYI**, you can open this documentation as a [Google Colab notebook](https://colab.research.google.com/github/omidshy/ML/blob/master/src/graph-pes-quickstart.ipynb) to follow along interactively

[graph-pes-train](https://jla-gardner.github.io/graph-pes/cli/graph-pes-train/root.html) provides a unified interface to train any [GraphPESModel](https://jla-gardner.github.io/graph-pes/models/root.html#graph_pes.GraphPESModel), including those packaged within [graph_pes.models](https://jla-gardner.github.io/graph-pes/models/root.html) and those defined by you, the user.

For more information on the ``graph-pes-train`` command, and the plethora of options available for specification in your ``config.yaml`` see the [CLI reference](https://jla-gardner.github.io/graph-pes/cli/graph-pes-train/root.html).

Below, we train a lightweight [MACE](https://jla-gardner.github.io/graph-pes/models/many-body/mace.html) model on the [C-GAP-17](https://jla-gardner.github.io/load-atoms/datasets/C-GAP-17.html) a dataset containing 4530 structures of amorphous carbon.

## Installation


In [None]:
!pip install graph-pes

We now should have access to the ``graph-pes-train`` command. We can check this by running:

In [None]:
!graph-pes-train -h

## Reference Data 

We use [load-atoms](https://jla-gardner.github.io/load-atoms/) to download and split the C-GAP-17 dataset into training, validation and test datasets:

In [None]:
import ase.io
from load_atoms import load_dataset

structures = load_dataset("C-GAP-17")
train, val, test = structures.random_split([0.8, 0.1, 0.1])

ase.io.write("train-cgap17.xyz", train)
ase.io.write("val-cgap17.xyz", val)
ase.io.write("test-cgap17.xyz", test)

We can visualise the kinds of structures we're training on using [load_atoms.view](https://jla-gardner.github.io/load-atoms/api/viz.html):

In [None]:
from load_atoms import view

view(train[10], show_bonds=True)

## Configuration

Great - now lets train a model. To do this, we have specified the following in our ``quickstart-cgap17.yaml`` file:

* the model architecture to instantiate and train, here [MACE](https://jla-gardner.github.io/graph-pes/models/many-body/mace.html). Note that we also include a [FixedOffset](https://jla-gardner.github.io/graph-pes/models/offsets.html#graph_pes.models.FixedOffset) component to account for the fact that the C-GAP-17 labels have an arbitrary energy offset.
* the data to train on, here the [C-GAP-17](https://jla-gardner.github.io/load-atoms/datasets/C-GAP-17.html) dataset we just downloaded
* the loss function to use, here a combination of a per-atom energy loss and a per-atom force loss
* and various other training hyperparameters (e.g. the learning rate, batch size, etc.)



We can download [this config file](https://raw.githubusercontent.com/jla-gardner/graph-pes/refs/heads/main/docs/source/quickstart/quickstart-cgap17.yaml) using wget:

In [None]:
%%bash

if [ ! -f quickstart-cgap17.yaml ]; then
    wget https://tinyurl.com/quickstart-conf -O quickstart-cgap17.yaml
fi

## Training

We use the downloaded config file to start the training.


In [None]:
!graph-pes-train quickstart-cgap17.yaml

## Model Analysis

Let's load the best model from the above training run and evaluate it on the test dataset:

In [None]:
from graph_pes.models import load_model

best_model = load_model("graph-pes-results/quickstart-cgap17/model.pt")

[GraphPESModel](https://jla-gardner.github.io/graph-pes/models/root.html#graph_pes.GraphPESModel) act on [AtomicGraph](https://jla-gardner.github.io/graph-pes/atomic_graph.html#graph_pes.AtomicGraph) objects. 

We can easily convert our [ase.Atoms](https://wiki.fysik.dtu.dk/ase/ase/atoms.html#module-ase.atoms) objects into [AtomicGraph](https://jla-gardner.github.io/graph-pes/data/atomic_graph.html#graph_pes.AtomicGraph) objects using [AtomicGraph.from_ase](https://jla-gardner.github.io/graph-pes/data/atomic_graph.html#graph_pes.AtomicGraph) (we could also use the [GraphPESCalculator](https://jla-gardner.github.io/graph-pes/utils.html#graph_pes.utils.calculator.GraphPESCalculator) to act directly on the [ase.Atoms](https://wiki.fysik.dtu.dk/ase/ase/atoms.html#module-ase.atoms) objects if we wanted to).

In [None]:
from graph_pes.atomic_graph import AtomicGraph

test_graphs = [
    AtomicGraph.from_ase(structure, cutoff=3.7) for structure in test
]
test_graphs[0]

Our predictions look like this:

In [None]:
{
    k: v.shape
    for k, v in best_model.get_all_PES_predictions(test_graphs[0]).items()
}

We can see from a single data point that our model has done a reasonable job of learning the potential:

In [None]:
best_model.predict_energy(test_graphs[0]), test_graphs[0].properties["energy"]

``graph-pes`` provides a few utility functions for visualising model performance:

In [None]:
import matplotlib.pyplot as plt

from graph_pes.atomic_graph import divide_per_atom
from graph_pes.utils.analysis import parity_plot

%config InlineBackend.figure_format = 'retina'

parity_plot(
    best_model,
    test_graphs,
    property="energy",
    transform=divide_per_atom,
    units="eV / atom",
    lw=0,
    s=12,
    color="crimson",
)
plt.xlim(-158.5, -155)
plt.ylim(-158.5, -155);

In [None]:
parity_plot(
    best_model,
    test_graphs,
    property="forces",
    units="eV / Å",
    lw=0,
    s=2,
    alpha=0.5,
    color="crimson",
)