# Machine-learned interatomic potential

Running simulations with energies and forces from *ab initio* methods, such as Density Functional Theory (DFT) allows for higher accuracy and the investigation of different systems. E.g., *ab initio* MD can be used to study chemical reactions on an atomistic scale. With the recent advancements in machine-learning interatomic potentials (MLIPs) it is possible to scale up these types of simulations to large systems [Kozinsky et al.](https://doi.org/10.1145/3581784.3627041) and long simulation times [Zills et al.](https://doi.org/10.1039/d4fd00025k) while retaining the accuracy of the underlying *ab initio* methods.

In this tutorial you will use pre-trained MLIPs to run simulations within ESPResSo. There are many different MLIPs architectures and respective open-source python packages:
- MACE https://github.com/ACEsuit/mace
- NequIP / allegro https://github.com/mir-group/nequip
- Apax https://github.com/apax-hub/apax
- DeepMD https://github.com/deepmodeling/deepmd-kit
- TorchANI https://github.com/aiqm/torchani

In this tutorial, we will focus on the [MACE-MP-0](https://doi.org/10.48550/arXiv.2401.00096) model, which has been trained on the [materials project dataset](https://doi.org/10.1063/1.4812323) and is able to accurately predict energies and forces for many organic and inorganic systems.
The MACE-MP-0 model uses the state-of-the-art MACE equivariant message passing model architecture.

Most MLIPs provide a Python interface through the ASE package. ESPResSo also utilizes this interface, and thus we will have a quick overview of the ASE features. The main object used is the `ase.Atoms`, which represents a single frame of an atomistic simulation.
We will attach the MACE-MP-0 `Calculator` to the `ase.Atoms` object for the computation of energies (`atoms.get_potential_energy() -> float`) and forces (`atoms.get_forces() -> np.ndarray`). More information can be found in the ASE documentation https://wiki.fysik.dtu.dk/ase/ase/atoms.html.

We can use 

In [None]:
# packmol

In [None]:
# import os
# os.environ["PATH"] = f"{os.environ['PATH']}:/tikhome/ess_delme/Documents/test_for_ml_tutorial/espresso/build"
# os.environ["CUDA_VISIBLE_DEVICES"] = "-1"

In [None]:
# ESPResSo imports
import espressomd
from espressomd.plugins.ase import ASEInterface
from espressomd.zn import Visualizer

# MACE-MP-0
from mace.calculators import mace_mp

# Simulation box
from rdkit2ase import smiles2atoms, pack

# Miscellaneous
import numpy as np
import pandas as pd
import pint
import plotly.express as px
import tqdm

For MLIP the typical unit system is energies in eV, distances in Angstrom, time in fs and mass in atomic units. This is also the default unit system within ASE. We will use pint to ensure correct unit usage within ESPResSo.

In [None]:
ureg = pint.UnitRegistry()

In this tutorial we will use [Simplified Molecular Input Line Entry System](https://doi.org/10.1021/ci00057a005) (SMILES) representations and [RDKit](https://github.com/rdkit/rdkit) to generate starting structures.
RDKit provides us with a powerful utilities to create 3D structures from these string representations of molecules.

![ethanol structure](https://upload.wikimedia.org/wikipedia/commons/thumb/b/be/Ethanol_Lewis.svg/320px-Ethanol_Lewis.svg.png)

Looking at the structure of ethanol, the respective SMILES string is given, by removing all the hydrogens and writing down the atomic symbols as `CCO`.
Note that SMILES are also capable of representing e.g. cyclic structures, double bonds and other chemically relevant properties, which are not required when representing ethanol.


In this first part, we will look at a gas phase system of Ethanol, run a geometry optimization followed by a short MD simulation.
Let us first create our ASE atoms object to define the sysyem we want to investigate.

In [None]:
ethanol_atoms = smiles2atoms("CCO")
ethanol_atoms

We can no initialize the MACE-MP-0 calculator and attach it to the `ase.Atoms` object.

In [None]:
mace_mp_calc = mace_mp()
ethanol_atoms.calc = mace_mp_calc

We can now leverage the ase interface to compute energies and forces.

In [None]:
print(ethanol_atoms.get_potential_energy() * ureg.eV)
print(ethanol_atoms.get_forces() * ureg.eV / ureg.angstrom)

Now that we know how to compute the properties that interest us, it is time to plug it into ESPResSo. We will create a box of arbitrary size, define a simulation time step of 0.5 fs to resolve the movement of the hydrogens and run a geometry optimization. For MLIP a good starting structure is still very important, because e.g. overlapping atoms can lead the models to produce entirely non-physical structures and "explode", because they often do not include any Lennard-Jones or other short ranged repulsion potentials but purely rely on the output of the neural network.

In [None]:
system = espressomd.System(box_l=[16] * 3) # A cubic box of 16 Angstroms
system.time_step = (
    (0.5 * ureg.fs)
    .m_as(((1 * ureg.u * ureg.angstrom**2) / ureg.electron_volt) ** 0.5)
) # we convert the time units from fs to be in line with our other units for mass, distance and energy.
system.cell_system.skin = 0.4

We add the ASE structure to the ESPResSo system and use the provided ASEInterface to go back to ASE, such that we can use the calculator to compute forces.
We need to configure the `ASEInterface` to go back and forth between the ESPResSo system and the `ase.Atoms` objects. Because `ase.Atoms` is built around the atomic numbers, we use them for the ESPResSo type mapping.

In [None]:
for atom in ethanol_atoms:
    system.part.add(pos=atom.position, type=atom.number, mass=atom.mass)
system.ase = ASEInterface(type_mapping={x: x for x in set(ethanol_atoms.numbers)})
system.ase.get()

We will use the [ZnDraw](https://github.com/zincware/ZnDraw) visualizer integrated with ESPResSo to follow the simulation. To see the effect of the geometry optimization, we display the atomic forces. You can either follow the simulation inside this Notebook or open the app in a dedicated window to the side, following the printed URL.

In [None]:
vis = Visualizer(system)
vis.zndraw.config.scene.vectors = "forces"
vis.zndraw.config.scene.vector_scale = 5
vis.zndraw.config.scene.simulation_box = False
vis

In [None]:
system.integrator.set_steepest_descent(f_max=0.1, gamma=4, max_displacement=0.001)

In [None]:
tbar = tqdm.trange(10, ncols=120)
for idx in tbar:
    forces = mace_mp_calc.get_forces(system.ase.get())
    system.part.all().ext_force = forces
    system.integrator.run(1)
    vis.zndraw.append(system.ase.get())
    tbar.set_description(f"fmax: {np.linalg.norm(forces, axis=0).max():.3f}")

In this way, we can get ground state energies using MLIP and ESPResSo. You can investigate different systems by altering the smiles representation.

# MD Simulations of Sulfuric Acid and Water

What makes MLIP so attractive beyond their *ab initio* energy and force calculation accuracy, is the ability to describe chemical reactions. Observing chemical reactions in MD simulations requires overcoming the energy of the reaction barrier. Methods that aid this process are plentiful but beyond the scope of this tutorial. Therefore, we will look at the proton transfer from sulfuric acid to water which can be seen without biasing the simulation.

In [None]:
# clear the system for a fresh start
system.part.clear()
system.thermostat.turn_off()

In this part of the tutorial, we generate two different molecular species - water and sulfuric acid - and use [packmol](https://github.com/m3g/packmol) to create a suitable starting structure. We will use the Python interface to packmol provided by rdkit2ase. This package was initally designed for the usage within [IPSuite](https://github.com/zincware/IPSuite), which provides many tools used around MLIP training, evaluation and deployment.

In [None]:
water = smiles2atoms("O")
sulfuric_acid = smiles2atoms("OS(=O)(=O)O")
box_of_atoms = pack([[water], [sulfuric_acid]], [100, 1], density=1000)

We will repeat the geometry optimization process. Now that we have a bulk structure, we need to set the correct cell vectors, which we also get from the ase structure we just created.

In [None]:
system.box_l = box_of_atoms.get_cell().diagonal()
system.time_step = (
    (0.25 * ureg.fs)
    .to(((1 * ureg.u * ureg.angstrom**2) / ureg.electron_volt) ** 0.5)
    .magnitude
)
system.cell_system.skin = 0.4

In [None]:
system.ase = ASEInterface(type_mapping={x: x for x in set(box_of_atoms.numbers)})
for atom in box_of_atoms:
    system.part.add(pos=atom.position, type=atom.number, mass=atom.mass)
system.integrator.set_steepest_descent(f_max=0.1, gamma=4, max_displacement=0.001)

We will reset ZnDraw and enable showing the box this time.

In [None]:
del vis.zndraw[:]
vis.zndraw.config.scene.simulation_box = True
vis.zndraw.config.scene.vector_scale = 1
vis

In [None]:
vis.zndraw.config.scene.vector_scale = 0

In [None]:
tbar = tqdm.trange(50, ncols=120)
for _ in tbar:
    atoms = system.ase.get()
    atoms.calc = mace_mp_calc
    system.part.all().ext_force = atoms.get_forces()
    vis.zndraw.append(atoms)
    system.integrator.run(1)
    tbar.set_description(
        f"fmax: {np.linalg.norm(atoms.get_forces(), axis=0).max():.3f}"
    )

With this minimized structure, we can now run an MD simulation.
Let us highlight the two hydrogen atoms from the sulfuric acid, These are the particles we want to follow.

In [None]:
vis.zndraw.selection = [305, 306]

In [None]:
# Langevin dynamics at 400 K
system.integrator.set_vv()
system.thermostat.set_langevin(kT=(400 * ureg.K * ureg.boltzmann_constant).m_as("eV"), gamma=2, seed=42)

In [None]:
tbar = tqdm.trange(500, ncols=120)
for idx in tbar:
    atoms = system.ase.get()
    atoms.calc = mace_mp_calc
    system.part.all().ext_force = atoms.get_forces()
    if idx % 5 == 0:
        vis.zndraw.append(atoms)
    system.integrator.run(1)
    tbar.set_description(f"e_pot: {atoms.get_potential_energy():.3f} eV")

We have visually observed the reaction of the hydrogen atoms. Let us now plot the distance from the hydrogens to sulfur from the sulfate ion we have generated.

In [None]:
OH_1 = []
OH_2 = []
energies = []
for atoms in vis.zndraw:
    OH_1.append(atoms.get_distance(300, 305, mic=True))
    OH_2.append(atoms.get_distance(304, 306, mic=True))
    energies.append(atoms.get_potential_energy())

df = pd.DataFrame({"OH_1": OH_1, "OH_2": OH_2})
fig = px.line(df, y=["OH_1", "OH_2"])

We can attach the figure to the visualizer, such that we can see the trajectory and the distance side-by-side.

In [None]:
vis.zndraw.figures = {"distance": fig.to_json()}

In [None]:
fig

In [None]:
df = pd.DataFrame({"Energies": energies})
fig = px.line(df, y="Energies")
fig

# Conclusion

In this tutorial, you have created simulations from simple SMILES representations and were able to run geometry optimization and MD simulations with (almost) DFT accuracy but much faster performance using machine-learned interatomic potentials (MLIP). For the usage of MLIP you did not have to define bonds or dihedrals. You have seen this in the second part of the tutorial, where you observed bond breaking and bond formation in an MD simulation.

MLIP can be used to study systems with much higher **ab initio** accuracy or to investigate phenomena on an atomistic scale which are impossible to study using classical force fields, such as chemical reactions.

## References

- Batatia, I.; Benner, P.; Chiang, Y.; Elena, A. M.; Kovács, D. P.; Riebesell, J.; Advincula, X. R.; Asta, M.; Baldwin, W. J.; Bernstein, N.; Bhowmik, A.; Blau, S. M.; Cărare, V.; Darby, J. P.; De, S.; Della Pia, F.; Deringer, V. L.; Elijošius, R.; El-Machachi, Z.; Fako, E.; Ferrari, A. C.; Genreith-Schriever, A.; George, J.; Goodall, R. E. A.; Grey, C. P.; Han, S.; Handley, W.; Heenen, H. H.; Hermansson, K.; Holm, C.; Jaafar, J.; Hofmann, S.; Jakob, K. S.; Jung, H.; Kapil, V.; Kaplan, A. D.; Karimitari, N.; Kroupa, N.; Kullgren, J.; Kuner, M. C.; Kuryla, D.; Liepuoniute, G.; Margraf, J. T.; Magdău, I.-B.; Michaelides, A.; Moore, J. H.; Naik, A. A.; Niblett, S. P.; Norwood, S. W.; O’Neill, N.; Ortner, C.; Persson, K. A.; Reuter, K.; Rosen, A. S.; Schaaf, L. L.; Schran, C.; Sivonxay, E.; Stenczel, T. K.; Svahn, V.; Sutton, C.; van der Oord, C.; Varga-Umbrich, E.; Vegge, T.; Vondrák, M.; Wang, Y.; Witt, W. C.; Zills, F.; Csányi, G. A Foundation Model for Atomistic Materials Chemistry. arXiv December 29, 2023. https://doi.org/10.48550/arXiv.2401.00096.
- Batatia, I.; Kovacs, D. P.; Simm, G.; Ortner, C.; Csanyi, G. MACE: Higher Order Equivariant Message Passing Neural Networks for Fast and Accurate Force Fields. In Advances in neural information processing systems; Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A., Eds.; Curran Associates, Inc., 2022; Vol. 35, pp 11423–11436. https://openreview.net/forum?id=YPpSngE-ZU.
- Landrum, G.; Tosco, P.; Kelley, B.; Ric; Cosgrove, D.; sriniker; gedeck; Vianello, R.; NadineSchneider; Kawashima, E.; N, D.; Jones, G.; Dalke, A.; Cole, B.; Swain, M.; Turk, S.; AlexanderSavelyev; Vaucher, A.; Wójcikowski, M.; Take, I.; Probst, D.; Ujihara, K.; Scalfani, V. F.; godin,  guillaume; Lehtivarjo, J.; Pahl, A.; Walker, R.; Berenger, F.; jasondbiggs; strets123. Rdkit/Rdkit: 2023_03_2 (Q1 2023) Release, 2023. https://doi.org/10.5281/zenodo.8053810.
- Martínez, L.; Andrade, R.; Birgin, E. G.; Martínez, J. M. PACKMOL: A Package for Building Initial Configurations for Molecular Dynamics Simulations. J Comput Chem 2009, 30 (13), 2157–2164. https://doi.org/10.1002/jcc.21224.
- Zills, F.; Schäfer, M. R.; Segreto, N.; Kästner, J.; Holm, C.; Tovey, S. Collaboration on Machine-Learned Potentials with IPSuite: A Modular Framework for Learning-on-the-Fly. J. Phys. Chem. B 2024. https://doi.org/10.1021/acs.jpcb.3c07187.