# A Short MACE Tutorial

## Introduction

This is a short tutorial for MACE, a highly accurate and efficient ML interatomic potential.
Please read the associated [paper](https://arxiv.org/pdf/2206.07697.pdf).
The reference implementation is available [here](https://github.com/ACEsuit/mace).

## Installation

In [None]:
# Install dependencies
!pip install e3nn==0.4.4 opt_einsum ase torch_ema prettytable

# Clone MACE
!git clone --depth 1 https://github.com/ACEsuit/mace.git

Collecting e3nn==0.4.4
  Downloading e3nn-0.4.4-py3-none-any.whl.metadata (5.1 kB)
Collecting ase
  Downloading ase-3.23.0-py3-none-any.whl.metadata (3.8 kB)
Collecting torch_ema
  Downloading torch_ema-0.3-py3-none-any.whl.metadata (415 bytes)
Collecting opt-einsum-fx>=0.1.4 (from e3nn==0.4.4)
  Downloading opt_einsum_fx-0.1.4-py3-none-any.whl.metadata (3.3 kB)
Downloading e3nn-0.4.4-py3-none-any.whl (387 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m387.7/387.7 kB[0m [31m5.4 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading ase-3.23.0-py3-none-any.whl (2.9 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.9/2.9 MB[0m [31m27.9 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading torch_ema-0.3-py3-none-any.whl (5.5 kB)
Downloading opt_einsum_fx-0.1.4-py3-none-any.whl (13 kB)
Installing collected packages: torch_ema, opt-einsum-fx, ase, e3nn
Successfully installed ase-3.23.0 e3nn-0.4.4 opt-einsum-fx-0.1.4 torch_ema-0.3
Cloning into 'mace'...
remote: En

In [None]:
!pip install mace/

Processing ./mace
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting matscipy (from mace-torch==0.3.7)
  Downloading matscipy-1.1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (37 kB)
Collecting torchmetrics (from mace-torch==0.3.7)
  Downloading torchmetrics-1.5.2-py3-none-any.whl.metadata (20 kB)
Collecting python-hostlist (from mace-torch==0.3.7)
  Downloading python-hostlist-2.0.0.tar.gz (37 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting configargparse (from mace-torch==0.3.7)
  Downloading ConfigArgParse-1.7-py3-none-any.whl.metadata (23 kB)
Collecting lightning-utilities>=0.8.0 (from torchmetrics->mace-torch==0.3.7)
  Downloading lightning_utilities-0.11.8-py3-none-any.whl.metadata (5.2 kB)
Downloading ConfigArgParse-1.7-py3-none-any.whl (25 kB)
Downloading matscipy-1.1.1-cp310-cp310-manylinux_2_17_x86_64.m

**Note:** Make sure to enable GPU: Runtime --> Change runtime type to GPU

## Loading Data
The data files used to train the MACE model have to be in `extxyz` format.
In this tutorial, we use the 3BPA dataset consisting of 500 configurations sampled a 300K with DFT.
The energies are in eV and forces in eV/A.

In [None]:
!git clone https://github.com/davkovacs/BOTNet-datasets.git

Cloning into 'BOTNet-datasets'...
remote: Enumerating objects: 57, done.[K
remote: Counting objects: 100% (57/57), done.[K
remote: Compressing objects: 100% (50/50), done.[K
remote: Total 57 (delta 13), reused 37 (delta 7), pack-reused 0 (from 0)[K
Receiving objects: 100% (57/57), 28.73 MiB | 17.47 MiB/s, done.
Resolving deltas: 100% (13/13), done.


In [None]:
!ls BOTNet-datasets/dataset_3BPA

iso_atoms.xyz  test_1200K.xyz  test_600K.xyz  train_300K.xyz
README.md      test_300K.xyz   test_dih.xyz   train_mixedT.xyz


## Training

To train a MACE model you can specify the training file with the `--train_file` flag. The validation set can either be specified as a separate file using the `--valid_file` keyword, or it can be specified as a fraction of the training set using the `--valid_fraction` keyword. It is also possible to provide a test set that only gets evaluated at the end of the training using the `--test_file` keyword. If you want to compute the RMSE for different parts of the training set separately, specify the `config_type` keyword in the `info` dict of the configurations.

When parsing the data files the energies are read using the keyword `energy` and the forces using the keyword `forces`. To change that, specify the `--energy_key` and `--forces_key`.

For illustration, we create a very small model with 16 invariant messages specified by `hidden_irreps='16x0e'`.

In [None]:
!python3 ./mace/scripts/run_train.py \
  --name="MACE_model" \
  --train_file="BOTNet-datasets/dataset_3BPA/train_300K.xyz" \
  --valid_fraction=0.05 \
  --forces_key="forces" \
  --energy_key="energy" \
  --test_file="BOTNet-datasets/dataset_3BPA/test_300K.xyz" \
  --E0s='{1:-13.663181292231226, 6:-1029.2809654211628, 7:-1484.1187695035828, 8:-2042.0330099956639}' \
  --model="ScaleShiftMACE" \
  --hidden_irreps='32x0e' \
  --r_max=4.0 \
  --batch_size=20 \
  --max_num_epochs=100 \
  --ema \
  --ema_decay=0.99 \
  --amsgrad \
  --default_dtype="float32" \
  --device=cpu \
  --seed=123 \
  --swa

  _Jd, _W3j_flat, _W3j_indices = torch.load(os.path.join(os.path.dirname(__file__), 'constants.pt'))
2024-11-12 14:36:07.976 INFO: MACE version: 0.3.7
2024-11-12 14:36:07.976 INFO: Using CPU
2024-11-12 14:36:08.071 INFO: Using heads: ['default']
2024-11-12 14:36:08.750 INFO: Training set [500 configs, 500 energy, 40500 forces] loaded from 'BOTNet-datasets/dataset_3BPA/train_300K.xyz'
2024-11-12 14:36:08.751 INFO: Using random 5% of training set for validation with indices saved in: ./valid_indices_123.txt
2024-11-12 14:36:08.751 INFO: Validaton set contains 25 configurations [25 energy, 2025 forces]
2024-11-12 14:36:10.340 INFO: Test set (1669 configs) loaded from 'BOTNet-datasets/dataset_3BPA/test_300K.xyz':
2024-11-12 14:36:10.341 INFO: Default_Default: 1669 configs, 1669 energy, 135189 forces
2024-11-12 14:36:10.341 INFO: Total number of configurations: train=475, valid=25, tests=[Default_Default: 1669],
2024-11-12 14:36:10.345 INFO: Atomic Numbers used: [1, 6, 7, 8]
2024-11-12 14:3

It is possible to use `--model=MACE`, in order to have the correct limit for isolated atoms. This is recommanded for task studying bond breaking events.

## Run

The trained model is realidy usable to run some ASE MD for illustration. The Colab hardware are not very performant so we put a small number of timesteps for illustration.

In [None]:
from ase import units
from ase.md.langevin import Langevin
from ase.io import read, write
import numpy as np
import time

from mace.calculators import MACECalculator

calculator = MACECalculator(model_paths='/content/checkpoints/MACE_model_run-123.model', device='cpu')
init_conf = read('BOTNet-datasets/dataset_3BPA/test_300K.xyz', '0')
init_conf.set_calculator(calculator)

dyn = Langevin(init_conf, 0.5*units.fs, temperature_K=310, friction=5e-3)
def write_frame():
        dyn.atoms.write('md_3bpa.xyz', append=True)
dyn.attach(write_frame, interval=50)
dyn.run(100)
print("MD finished!")

  _Jd, _W3j_flat, _W3j_indices = torch.load(os.path.join(os.path.dirname(__file__), 'constants.pt'))
  torch.load(f=model_path, map_location=device)


No dtype selected, switching to float32 to match model dtype.


  init_conf.set_calculator(calculator)


KeyboardInterrupt: 