# Machine Learning Potentials

You can run this notebook in your browser: 

[![Open On Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/openmm/openmm_workshops/blob/main/section_3/machine_learning_potentials.ipynb)

## Table of contents
- Introduction
- OpenMM ML software
- Basics of OpenMM compatible MLPs
- Installing
- Exporting a PyTorch model for use in OpenMM
- Simulation of alanine dipeptide with ANI-2x using OpenMM-Torch
    - Prepate test system
    - Create a NNP
    - Adding the NNP to an OpenMM simulation
- Mixed MM/ML system
    - Creating the system
    - create the MLP
- Using OpenMM-ML package
- Using NNPOps
- Implementing other models - MACE
- Extra exercises
- References
- Solutions



## Introduction
<a id="intro"></a>

Machine Learning Potentials (MLPs) are a relatively new method where the potential energy surface of an atomic system is described by some sort of machine learning model. The model could be a feed forward neural network (ANI)[3], Gaussian process regression (GAP)[4], graph neural network (MACE)[5], equivariant transformer (TorchMD-NET)[6], or something else. MLPs are trained on first-principles quantum mechanical (QM) methods such as DFT, and they can be used in atomistic MD simulations in the place of a classical force fields, bringing the accuracy of QM methods without the computational expense.

## OpenMM Machine Learning Software
<a id="openmmmlsoftware"></a>

OpenMM is an MD engine, so we will focus  be covering how to run simulations using MLPs, and not how to train them.

OpenMM has serval packages supporting the use of MLPs.
- [openmm-torch](https://github.com/openmm/openmm-torch). This is the OpenMM PyTorch plugin what allows [PyTorch](https://pytorch.org/) static computation graphs to be used for defining an OpenMM TorchForce object, i.e., an [OpenMM Force class](http://docs.openmm.org/latest/api-python/library.html#forces) that computes a contribution to the total potential energy.
- [openmm-ml](https://github.com/openmm/openmm-ml). This is a high level API for using machine learning models in OpenMM simulations. With just a few lines of code, you can set up a simulation that uses a standard, pretrained model to represent some or all of the interactions in a system.
- [NNPops](https://github.com/openmm/NNPOps). This is a library of optimized operations commonly used in popular neural network potentials that can be used to speed up your MLP implementation.

If you were to use your own model, you would first use `openmm-torch` to interface your MLP with OpenMM. You could then try and optimize it using operations from `NNPOps`. When you are ready to deploy it, you can use `openmm-ml` to create an easy-to-use wrapper around it.

## Basics of (OpenMM compatible) MLPs
<a id="basicsofmlps"></a>

A MLP reads a set of atomic coordinates as input and outputs the potential energy. The forces on each particle are computed by backpropagation, taking the gradient of the energy with respect to the coordinates ($F=-\nabla V$).

To use a MLP in OpenMM, you need to be able to write it as a PyTorch model that can be exported to [TorchScript]([https://pytorch.org/docs/stable/jit.html#torchscript]). The model takes a $(N,3)$-shaped tensor of particle positions, where $N$ is the number of particles, and outputs the potential energy. `openmm-torch` calculates the forces using [Autograd](https://pytorch.org/docs/stable/autograd.html). Optionally, the model can return the forces, which `openmm-torch` will then use.

## Installing Packages
<a id="installing"></a>

The packages can be installed from conda-forge. Note that there is no Windows package for `openmm-torch` and`NNPOps`.

<div class="alert alert-block alert-info">
  ⚠️ <b>Due to the added complexity of creating a conda environment with PyTorch, we recommend you run this notebook in Colab. It should work on Linux and MacOS though. This tutorial will not work on Windows!</b>
</div>

We will install `openmm-torch` and at the same time install `torch-ani`, which we will need later.

In [None]:
# Execute this cell to install mamba in the Colab environment
if 'google.colab' in str(get_ipython()):
    print('Running on colab')
    !pip install -q condacolab
    import condacolab
    condacolab.install_mambaforge()
else:
    print('Not running on colab.')
    print('Make sure you create and activate a new conda environment!')

<div class="alert alert-block alert-info">
  ⚠️ <b>Note: During this step in Colab, the kernel will restart, which may trigger the error message: "Your session crashed for an unknown reason." This is expected behavior and can be safely ignored.</b>
</div>


<div class="alert alert-block alert-info">
⚠️ <b>Note that the installation will take several minutes!</b>
</div>

In [None]:
if 'google.colab' in str(get_ipython()):
    %env CONDA_OVERRIDE_CUDA=11.8
    # you might also need to set this is you are on linux without CUDA installed!
!mamba install -y -c conda-forge openmm-torch torchani

Download the files needed for this tutorial.

In [None]:
!wget https://raw.githubusercontent.com/openmm/openmm_workshops/main/section_3/alanine-dipeptide.pdb
!wget https://raw.githubusercontent.com/openmm/openmm_workshops/main/section_3/workshop_utils.py

## PyTorch

PyTorch is an open-source machine learning framework that is primarily used for developing and training deep learning models. It provides a dynamic computational graph that allows users to define and modify neural networks on the fly, making it flexible and efficient for building complex models.

Some key features of PyTorch include:

 - Dynamic computational graph: PyTorch uses a tape-based automatic differentiation system, which enables users to define and modify models on-the-fly. This dynamic nature makes it easy to debug and experiment with different model architectures.

 - GPU acceleration: PyTorch leverages the power of graphics processing units (GPUs) to accelerate the training and inference processes. It provides seamless integration with CUDA, a parallel computing platform, allowing for efficient computation on GPUs.

 - Neural network modules: PyTorch provides a rich library of pre-defined modules and functions for building neural networks. These modules include layers, activation functions, loss functions, and optimization algorithms, making it easy to construct and train deep learning models.

 - Community and ecosystem: PyTorch has a vibrant community of developers and researchers who contribute to its development and share their work. This community has created numerous resources, such as tutorials, libraries, and pre-trained models, which can be readily used for various machine learning tasks.

 - Deployment options: PyTorch offers various deployment options, including exporting models for inference in production environments. It provides tools like TorchScript, which allows models to be serialized and executed independently of the Python runtime, enabling deployment on platforms with limited resources.

PyTorch has gained popularity due to its ease of use, flexibility, and extensive support for research and development in the field of deep learning. It is widely used by researchers, academics, and industry professionals for a range of applications, including computer vision, natural language processing, reinforcement learning, and our use case of atomic Force Fields. 

We will be using pre-trained models so we do not need to know much about PyTorch. We just need to know that you can create models that take in atomic coordinates and predict the energy. It you want to learn more about PyTorch there are plenty of tutorials available: https://pytorch.org/tutorials/index.html

## Exporting a PyTorch model for use in OpenMM
<a id="pytorchmodelinopenmm"></a>

We can check that our installation is working by defining a very simple potential --- a harmonic force attracting every particle to the origin.

The first step is to create a PyTorch model defining the calculation. It should take the particle positions in nanometers as `torch.Tensor` of shape `(N, 3)` as input, and return the potential energy in kJ/mol.

In [None]:
import torch

class ForceModule(torch.nn.Module):
    """A simple harmonic potential as a static compute graph."""
    def forward(self, positions: torch.Tensor):
        """The forward method returns the energy computed from positions.

        Parameters
        ----------
        positions : torch.Tensor with shape (N,3)
           positions[i,k] is the position (in nanometers) of spatial dimension k of particle i

        Returns
        -------
        potential : torch.Tensor
           The potential energy (in kJ/mol)
        """
        return torch.sum(positions**2)

The ForceModule inherits from [`torch.nn.module`](https://pytorch.org/docs/stable/generated/torch.nn.Module.html) which is a PyTorch base class for neural network modules. We write the code in the `forward` method. The forward method defines the computation performed at every call of the model.

Now that we have defined a model, we can create an instance of it.

In [None]:
force_module = ForceModule()

We can give it some example input

In [None]:
# Create random tensor of 4 coordinates
# We specify we want to record the gradient so we can compute the forces
input = torch.rand((4,3), requires_grad=True)
print("Input:", input)
energy = force_module(input)
print("Energy:", energy)

We can calculate the forces resulting from this harmonic potential using Autograd.

In [None]:
# Backward pass computes the gradients
energy.backward()
# We can access the gradients with the grad attribute
# F = - grad (Potential)
forces = -input.grad 
print("Forces:", forces)

To export the model for use in OpenMM (or other software), it must be converted to a TorchScript module (and optionally saved to a file). Converting to TorchScript can usually be done with a single call to `torch.jit.script`. See the [PyTorch documentation](https://pytorch.org/tutorials/beginner/Intro_to_TorchScript_tutorial.html) for details.  

In [None]:
# Convert to TorchScript
scripted_module = torch.jit.script(force_module)
print(scripted_module)

We can then save the module to a file. The saving process serializes the module, allowing us to load it at any time into the C++ API as required by `openmm-torch`'s `TorchForce`. For more information, see the [PyTorch documentation](https://pytorch.org/docs/stable/generated/torch.jit.save.html).

In [None]:
# Save the serialized compute graph to a file
scripted_module.save('model.pt')

To use the exported model in a simulation, we need to create a TorchForce object and add it to the OpenMM System.

In [None]:
# Create the TorchForce from the serialized compute graph
from openmmtorch import TorchForce
torch_force = TorchForce('model.pt')

# Create an empty OpenMM system
import openmm as mm
system = mm.System()
print("number of forces = ", system.getNumForces())

# Add the TorchForce to the system
system.addForce(torch_force)
print("number of forces = ", system.getNumForces())

Now that we know how to use a PyTorch model in an OpenMM simulation, we are ready to learn how to create a neural network potential (NNP) that uses ANI-2x.

## Simulation of Alanine Dipeptide with ANI-2x Using OpenMM-Torch
<a id="ani"></a>

ANI-2x is a general NNP that works with molecules containing H, C, N, O, F, Cl, and S. For more information, please refer to the publication [3]. The model is available in the TorchANI package, which we installed earlier.

### Prepare a Test System
<a id="prepare"></a>

For simplicity, we will use an alanine dipeptide test system. We prepare it as we have done before, but then we remove all the standard MM forces.

In [None]:
import openmm as mm
import openmm.app as app
import openmm.unit as unit

# Create an alanine-dipeptide test system
pdb = app.PDBFile('alanine-dipeptide.pdb')
forcefield = app.ForceField('amber14-all.xml')
system = forcefield.createSystem(pdb.topology, constraints=None)

# Remove MM forces
while system.getNumForces() > 0:
    system.removeForce(0)

# The system should not contain any additional force and constraints
assert system.getNumConstraints() == 0
assert system.getNumForces() == 0

# Get the list of atomic numbers. 
# We will use this list when creating the model instance
atomic_numbers = [atom.element.atomic_number for atom in pdb.topology.atoms()]

### Define the NNP
<a id="nnp"></a>

We are now ready to create a NNP class that uses ANI-2x.

In [None]:
import torch
from torchani.models import ANI2x

class NNP(torch.nn.Module):
    """A simple wrapper around the ANI-2x model from torchani."""

    def __init__(self, atomic_numbers: torch.Tensor):
        """
        Initialize the NNP model.

        Parameters
        ----------
        atomic_numbers : torch.Tensor with shape (N,)
        """
        super().__init__()

        # Store the atomic numbers
        self.atomic_numbers = atomic_numbers.unsqueeze(0)

        # Create an ANI-2x model
        self.model = ANI2x(periodic_table_index=True)

        # make sure it is on the same device at the atomic_numbers tensor
        self.model.to(self.atomic_numbers.device)

    def forward(self, positions: torch.Tensor):
        """The forward method returns the energy computed from positions.

        Parameters
        ----------
        positions : torch.Tensor with shape (N,3)
        positions[i,k] is the position (in nanometers) of spatial dimension k of particle i

        Returns
        -------
        potential : torch.Tensor
        The potential energy (in kJ/mol)
        """
        # Convert the atomic positions to Angstrom
        positions = positions.unsqueeze(0).float() * 10 # nm --> Å

        # Run ANI-2x
        result = self.model((self.atomic_numbers, positions))

        # Get the potential energy
        energy = result.energies[0] * 2625.5 # Hartree --> kJ/mol

        return energy

The `NNP` looks rather complex so we will break it down line by line.

The first line
```python
class NNP(torch.nn.Module):
```
defines the NNP as a python [`Class`](https://docs.python.org/3/tutorial/classes.html) called `NNP` that inherits from the [`torch.nn.Module`](https://pytorch.org/docs/stable/generated/torch.nn.Module.html) class.

```python
def __init__(self, atomic_numbers: torch.Tensor):
```
Is the constructor definition. It states that when we create a new instance of the class, we must pass a `torch.Tensor` of the atomic numbers of the system as an argument. All the code within the constructor is called when the NNP is first created.

```python
super().__init__()
```
Calls the constructor of the parent class, viz. `torch.nn.Module`.

```python
# Store the atomic numbers
self.atomic_numbers = atomic_numbers.unsqueeze(0)
```
Stores the provided `atomic_numbers` tensor as an attribute of the class. The `unsqueeze(0)` converts the tensor from 1D with size $(N)$ to 2D with size $(1,N)$. This is know as adding a batch dimension. This needs to be done because the model we will use expects batched data (even if the batch size is 1).

```python
# Create an ANI-2x model
self.model = ANI2x(periodic_table_index=True)
```
This creates an instance of an ANI2x model from the [TorchANI](https://aiqm.github.io/torchani/api.html#module-torchani.models) package.

```python
# Make sure it is on the same device at the atomic_numbers tensor
self.model.to(self.atomic_numbers.device)
```

This makes sure the model is on the same device as the `atomic_numbers` tensor. The [device](https://pytorch.org/docs/stable/tensor_attributes.html#torch.device) is either 'cpu' or 'cuda' (GPU). 

```python
def forward(self, positions: torch.Tensor):
```
This defines the [forward method](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.forward). The forward method is the code that is called every time the model is evaluated. We define that a `torch.Tensor` of the atomic positions must be passed as an argument when we evaluate the model.

```python
# Prepare the positions
positions = positions.unsqueeze(0).float() * 10 # nm --> Å
```
This adds a batch dimension to the positions tensor, converts it to floating point precision (it might be in double if OpenMM is running in double precision), and converts the units from the OpenMM default of nanometers to Angstrom, as required by the ANI model.

```python
# Run ANI-2x
result = self.model((self.atomic_numbers, positions))

energy = result.energies[0] * 2625.5 # Hartree --> kJ/mol
```
This evaluates the ANI-2x model on the atomic configuration. The result will contain the total potential energy. We need to convert from the ANI units of Hartree to the OpenMM units of kJ/mol.

Implementing other NNPs will follow a similar format. The key point is that you must convert between the OpenMM format and the format expected by the model.

### Create NNP
<a id="creatennp"></a>

We can now create an instance of the NNP making sure to use the gpu ('cuda' device) if available. If we print the model, we can see the underlying neural network architecture.

In [None]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'
nnp = NNP(torch.tensor(atomic_numbers,device=device))
print(nnp)

We can now compute the potential energy of the system using the PyTorch interface. We can also compute the forces using autograd.

In [None]:
# Need to make a torch.tensor of the positions. Require grad so we can compute the forces.
positions = torch.tensor(pdb.positions.value_in_unit(unit.nanometers), device=device, requires_grad=True)

# Put the positions into the NNP and it returns the energy
energy = nnp(positions)

print(energy)

# We can compute the forces using autograd
energy.backward()
force = -positions.grad

print(force)

### Add the NNP to the System
<a id="addnnp"></a>

We now compile the model as TorchScript code and load it with `TorchForce`.

In [None]:
from openmmtorch import TorchForce
import sys

# Save the NNP to a file and load it with OpenMM-Torch
torch_module = torch.jit.script(nnp)
torchforce = TorchForce(torch_module)

# Add the NNP to the system
system.addForce(torchforce)

print("number of forces = ", system.getNumForces())
assert(system.getNumForces() == 1)

<div class="alert alert-block alert-info">
⚠️ <b>There should be only 1 force in the system. If the assertion above fails, it might be because you have run the cell multiple times. You must go back and run the "Prepare a Test System" cell to create the system with no forces!</b>
</div>

### Create a Simulation

In [None]:
# Create an integrator with a time step of 0.5 fs
temperature = 298.15 * unit.kelvin
frictionCoeff = 1 / unit.picosecond
timeStep = 0.5 * unit.femtosecond
integrator = mm.LangevinMiddleIntegrator(temperature, frictionCoeff, timeStep)

# Create a simulation and set the initial positions and velocities
simulation = app.Simulation(pdb.topology, system, integrator)
simulation.context.setPositions(pdb.positions)

# Add a PDB reporter to save the trajectory to a file
simulation.reporters.append(app.PDBReporter('traj.pdb', 100))

# Configure a reporter to print to the console every 100 steps
reporter = app.StateDataReporter(file=sys.stdout, reportInterval=100, step=True, time=True, potentialEnergy=True, temperature=True)
simulation.reporters.append(reporter)

Now, we can compute the energy and forces again using the OpenMM interface and compare them to the energy and forces computed with the PyTorch interface. This is a useful check, as bugs can sometimes arise during the export, serialization, and loading steps.

In [None]:
import numpy as np

state = simulation.context.getState(getEnergy=True, getForces=True)
openmm_energy = state.getPotentialEnergy().value_in_unit(unit.kilojoule_per_mole)
openmm_force = state.getForces(asNumpy=True).value_in_unit(unit.kilojoule_per_mole/unit.nanometer)

print(openmm_energy)
print(openmm_force)

assert(np.isclose(openmm_energy, energy.cpu().detach().numpy()))
assert(np.allclose(openmm_force, force.cpu().detach().numpy(),rtol=1e-3))

### Run the simulation

In [None]:
simulation.step(1000)

<div class="alert alert-block alert-info">
ℹ️ <b>Exercise 1</b>

Download the `'traj.pdb'` file and visualize it. You should see alanine dipeptide moving around. 
</div>

## Mixed System
<a id="mixedsystem"></a>

In general, ML force fields are still too computationally expensive to be used to model entire solvated biomolecules. However, a use-case to exploit their accuracy without the prohibitive cost is to model a small part of the system with the MLP and the rest of the system with a traditional MM forcefield [[1, 2]](#references).
For example a ligand's intramolecular energy could be modelled with the MLP and the rest of the system, including intermolecular interactions between ligand and the protein/solvent, would be modelled with a MM forcefield. This approach is similar to hybrid QM/MM methods.

We will not learn how to create hybrid ML/MM systems in OpenMM. Our example system will be the alanine-dipeptide in a water box.
The intramolecular forces of the alanine dipeptide will be modeled with the ANI-2x MLP, while the water molecules and the intermolecular interactions between the alanine dipeptide and the water will be modeled with an MM force field.

The total potential energy of the mixed system can be written as

$V_{MM/ML}(r) = V_{MM}(r_{MM}) + V_{MM-ML}(r) + V_{ML}(r_{ML})$ 

where $r$ are the coordinates of all atoms, $r_{MM}$ are the coordinates of the atoms of the MM region, and $r_{ML}$ are the coordinates of the atoms of the ML region. The three terms are:

  - $V_{MM}(r_{MM})$ - The potential energy of the MM region (water molecules) using the MM forcefield.
  - $V_{MM-ML}(r)$ - The coupling term between the MM and ML regions. We will define this to compute the non-bonded intermolecular interactions between the ML region and MM region atoms using the MM forcefield.
  - $V_{ML}(r_{ML})$ - The intramolecular potential energy of the ML region (alanine-dipeptide) using the ML forcefield.

<div class="alert alert-block alert-info">
⚠️ <b>The mixed system strategy we have outlined and will implement is only appropriate when the ML region is a whole single molecule. ML/MM divisions spliting the system along a chemical bond are not yet supported.</b>
</div>

### Creating the System
<a id="createmixed"></a>

We will first create a MM system as normal. Then, we will define a function that modifies it to use the hybrid potential described above.

In [None]:
import openmm as mm
import openmm.app as app
import openmm.unit as unit

# Create an alanine-dipeptide test system
pdb = app.PDBFile('alanine-dipeptide.pdb')
forcefield = app.ForceField('amber14-all.xml', 'amber14/tip3p.xml')
modeller = app.Modeller(pdb.topology, pdb.positions)
modeller.addSolvent(forcefield, padding=1.0*unit.nanometers)
system = forcefield.createSystem(modeller.topology, nonbondedMethod=app.PME, constraints=None)

Now we create a function that will remove the MM interactions within the ML region.

In [None]:
from workshop_utils import removeBonds

def removeMMInteraction(system, ml_atoms):
    """
    Remove the MM interactions between the ML atoms.

    Parameters
    ----------
    system : openmm.System
        The system object
    ml_atoms : list of int
        The list of atom indices in the ML region

    Returns
    -------
    newSystem : openmm.System
        The new system with the MM interactions between the ML atoms removed
    """
    # Remove the bonded interactions within the ML subset
    newSystem = removeBonds(system, ml_atoms)

    # Add nonbonded exceptions and exclusions.
    # This removes the nonbonded interactions between the ML atoms
    atomList = list(ml_atoms)
    for force in newSystem.getForces():
        if isinstance(force, mm.NonbondedForce):
            for i in range(len(atomList)):
                for j in range(i):
                    force.addException(atomList[i], atomList[j], 0, 1, 0, True)
        elif isinstance(force, mm.CustomNonbondedForce):
            existing = set(tuple(force.getExclusionParticles(i)) for i in range(force.getNumExclusions()))
            for i in range(len(atomList)):
                a1 = atomList[i]
                for j in range(i):
                    a2 = atomList[j]
                    if (a1, a2) not in existing and (a2, a1) not in existing:
                        force.addExclusion(a1, a2, True)

    return newSystem

### Create the MLP for a Mixed System
<a id="createmlp"></a>

We will create an ANI-2x MLP as before but add in an extra argument that lists the atoms in the ML region.

<div class="alert alert-block alert-info">
ℹ️ <b>Exercise 2</b>

Add the code that stores the atomic numbers and the code that creates the ANI-2x model. 

Tip: this part is the same as the previous example.
</div>

In [None]:
import torch
from torchani.models import ANI2x

class hybridNNP(torch.nn.Module):
    def __init__(self, atomic_numbers: torch.Tensor, ml_atoms: torch.Tensor):
        super().__init__()

        # The atomic_numbers tensor contains the atomic number of the ML region atoms.
        # the ml_atoms tensor contains the index of each ML atom with respect to the full system.
        assert(atomic_numbers.shape == ml_atoms.shape)

        # Store the indices of the ml atoms
        self.indices = ml_atoms

        # Store the atomic numbers
        FIXME

        # Create an ANI-2x model
        FIXME

        # Make sure it is on the same device at the atomic_numbers tensor
        self.model.to(self.atomic_numbers.device)

    def forward(self, positions: torch.Tensor):
        # Extract the positions of the ML atoms
        positions = positions[self.indices]

        # Prepare the positions
        positions = positions.unsqueeze(0).float() * 10 # nm --> Å

        # Run ANI-2x
        result = self.model((self.atomic_numbers, positions))

        # Get the potential energy
        energy = result.energies[0] * 2625.5 # Hartree --> kJ/mol

        return energy

Now we can create an instance of the MLP and add it to the system.

<div class="alert alert-block alert-info">
ℹ️ <b>Exercise 3</b>

Add the `TorchForce` to the system.
</div>

In [None]:
# Get a list of the ML atoms. The alanine-dipeptide is chain 0.
chains = list(modeller.topology.chains())
ml_atoms = [atom.index for atom in chains[0].atoms()]
print(ml_atoms)

# Get the atomic numbers
atomic_numbers = [atom.element.atomic_number for atom in chains[0].atoms()]

# Convert to torch tensors
device = 'cuda' if torch.cuda.is_available() else 'cpu'
ml_atoms = torch.tensor(ml_atoms, device=device, dtype=torch.int64)
atomic_numbers = torch.tensor(atomic_numbers, device=device)

hybridnnp = hybridNNP(atomic_numbers, ml_atoms)

# Compile the NNP to TorchScript and load it with OpenMM-Torch
torch_module = torch.jit.script(hybridnnp)

from openmmtorch import TorchForce
torchforce = TorchForce(torch_module)

# Make the mixed system
mixed_system = removeMMInteraction(system, ml_atoms.tolist())

# Add the TorchForce
FIXME

# Print out the forces
for force in mixed_system.getForces():
    print(force)

assert(mixed_system.getNumForces()==6)

### Simulate the Hybrid ML/MM System

In [None]:
# Create an integrator with a time step of 0.5 fs
temperature = 298.15 * unit.kelvin
frictionCoeff = 1 / unit.picosecond
timeStep = 0.5 * unit.femtosecond
integrator = mm.LangevinMiddleIntegrator(temperature, frictionCoeff, timeStep)

# Create a simulation and set the initial positions and velocities
simulation = app.Simulation(modeller.topology, mixed_system, integrator)
simulation.context.setPositions(modeller.positions)

# Minimize the system
simulation.minimizeEnergy(maxIterations=100)

# Configure a reporter to print to the console every 100 steps and write to a PDB file
reporter = app.StateDataReporter(file=sys.stdout, reportInterval=100, step=True, time=True, potentialEnergy=True, temperature=True)
simulation.reporters.append(reporter)
simulation.reporters.append(app.PDBReporter('mixed_traj.pdb', 100, enforcePeriodicBox=False))

# Simulate 0.5 ps
simulation.step(1000)

<div class="alert alert-block alert-info">
ℹ️ <b>Exercise 4</b>

Visualize `'mixed_traj.pdb'`.
</div>

## Using the Openmm-ML Package
<a id="openmmml"></a>

We have covered how to use `openmm-torch` to add an MLP to a system. As you’ve seen, it involves quite a lot of code. The [openmm-ml](https://github.com/openmm/openmm-ml) package was created as a high-level interface to simplify the use of pre-trained ML models in OpenMM simulations. We will now run the same simulations using `openmm-ml`.

### Install software
The `openmm-ml` package can be installed from [conda-forge](https://anaconda.org/conda-forge/openmm-ml).

In [None]:
!mamba install -y -c conda-forge openmm-ml

### Create a Pure ML system

We will load the alanine dipeptide molecule and simulate it in vacuum with ANI-2x with the `MLPotential.createSystem` function from the `openmm-ml` package.

In [None]:
import openmm as mm
import openmm.app as app
import openmm.unit as unit

from openmmml import MLPotential

pdb = app.PDBFile('alanine-dipeptide.pdb')
print(pdb.topology)
# Create the MLP using ANI-2x
potential = MLPotential('ani2x')

# Create a system that uses the MLP
system = potential.createSystem(pdb.topology)

# Create an integrator with a time step of 0.5 fs
temperature = 298.15 * unit.kelvin
frictionCoeff = 1 / unit.picosecond
timeStep = 0.5 * unit.femtosecond
integrator = mm.LangevinMiddleIntegrator(temperature, frictionCoeff, timeStep)

# Create a simulation and set the initial positions and velocities
simulation = app.Simulation(pdb.topology, system, integrator)
simulation.context.setPositions(pdb.positions)

# Configure a reporter to print to the console every 100 steps and write to a PDB file
reporter = app.StateDataReporter(file=sys.stdout, reportInterval=100, step=True, time=True, potentialEnergy=True, temperature=True)
simulation.reporters.append(reporter)
simulation.reporters.append(app.PDBReporter('traj.pdb', 100))

# Simulate 0.5 ps
simulation.step(1000)

<div class="alert alert-block alert-info">
⚠️ <b>You can safely ignore the error message that says `failed to equip 'nnpops' with error: No module named 'NNPOps'`.</b>
</div>

### Create a Mixed System
We can just as easily create a mixed ML/MM system.

<div class="alert alert-block alert-info">
ℹ️ <b>Exercise 5</b>

Write the code to create the potential using MLPotential. 

Tip: Look at the OpenMM-ML readme: https://github.com/openmm/openmm-ml
</div>

In [None]:
import openmm as mm
import openmm.app as app
import openmm.unit as unit
import sys

from openmmml import MLPotential

pdb = app.PDBFile('alanine-dipeptide.pdb')

forcefield = app.ForceField('amber14-all.xml', 'amber14/tip3p.xml')
modeller = app.Modeller(pdb.topology, pdb.positions)
modeller.addSolvent(forcefield, padding=1.0*unit.nanometers)

# Create the MM system
mm_system = forcefield.createSystem(modeller.topology, nonbondedMethod=app.PME, constraints=None)

# Create the MLP using ANI-2x
FIXME

# Create the mixed system
chains = list(modeller.topology.chains())
ml_atoms = [atom.index for atom in chains[0].atoms()]
mixed_system = potential.createMixedSystem(modeller.topology, mm_system, ml_atoms)

# Create an integrator with a time step of 0.5 fs
temperature = 298.15 * unit.kelvin
frictionCoeff = 1 / unit.picosecond
timeStep = 0.5 * unit.femtosecond
integrator = mm.LangevinMiddleIntegrator(temperature, frictionCoeff, timeStep)

# Create a simulation and set the initial positions and velocities
simulation = app.Simulation(modeller.topology, mixed_system, integrator)
simulation.context.setPositions(modeller.positions)
simulation.minimizeEnergy(maxIterations=100)

# Configure a reporter to print to the console every 100 steps and write to a PDB file
reporter = app.StateDataReporter(file=sys.stdout, reportInterval=100, step=True, time=True, potentialEnergy=True, temperature=True)
simulation.reporters.append(reporter)
simulation.reporters.append(app.PDBReporter('traj.pdb', 100, enforcePeriodicBox=False))

# Simulate 0.5 ps
simulation.step(1000)

## Using NNPOps
<a id="nnpops"></a>

The NNPOps package provides highly optimized, open-source implementations of bottleneck operations that appear in popular potentials. NNPOps can be used to speed up ANI simulations. We can install it from conda-forge and use it through the `openmm-ml` interface.

<div class="alert alert-block alert-info">
⚠️ <b>NNPOps is not available on Windows.</b>
</div>

In [None]:
!mamba install -y -c conda-forge nnpops

If you are using a GPU it should offer you some speed up. We will use the script from before and specifiy to openmm-ml that is should use NNPOps.

In [None]:
import openmm as mm
import openmm.app as app
import openmm.unit as unit

from openmmml import MLPotential

pdb = app.PDBFile('alanine-dipeptide.pdb')

# Create the MLP using ANI-2x and use the nnpops implementation
potential = MLPotential('ani2x', implementation='nnpops')

# create a system that uses the MLP
system = potential.createSystem(pdb.topology)

# Create an integrator with a time step of 0.5 fs
temperature = 298.15 * unit.kelvin
frictionCoeff = 1 / unit.picosecond
timeStep = 0.5 * unit.femtosecond
integrator = mm.LangevinMiddleIntegrator(temperature, frictionCoeff, timeStep)

# Create a simulation and set the initial positions and velocities
simulation = app.Simulation(pdb.topology, system, integrator)
simulation.context.setPositions(pdb.positions)

# Configure a reporter to print to the console every 100 steps 
reporter = app.StateDataReporter(file=sys.stdout, reportInterval=100, step=True, time=True, potentialEnergy=True, temperature=True)
simulation.reporters.append(reporter)

# Simulate 0.5 ps
simulation.step(1000)

The `implementation='nnpops'` argument tells `MLPotential` to replace certain TorchANI PyTorch functions with their optimized NNPOps counterparts. If you are interesting in seeing how this is implemented, checkout out the [GitHub repository](https://github.com/openmm/NNPOps).

## Implementing Other Models - MACE
<a id="mace"></a>


Any model that can be written as a PyTorch model can be used with `openmm-torch`. In this tutorial, we will show how to implement a MACE model.

### Install software
MACE can be installed via pip.

In [None]:
!pip install mace-torch

### Get a Pretrained Model

We are going to use the small pre-trained model from the MACE-OFF23 family, a transferable force field for organic molecules [7]. You can find the remaining versions [here](https://github.com/ACEsuit/mace-off/tree/main/mace_off23).

In [None]:
# Download the model
!wget https://raw.githubusercontent.com/ACEsuit/mace-off/main/mace_off23/MACE-OFF23_small.model

### Define the MLP

We will create a MACE MLP class, similar to what we did previously for ANI-2x. The MACE model, however, requires additional code to convert atomic numbers and positions into the necessary format. Additionally, the `simple_nl` function for calculating the neighbor list is available in the `workshop_utils` module.

In [None]:
import torch
from workshop_utils import simple_nl
from e3nn.util import jit
from mace.tools import utils, to_one_hot, atomic_numbers_to_indices
from typing import Optional

class MACEForce(torch.nn.Module):
    def __init__(self, model_path, atomic_numbers, indices, periodic, device, dtype=torch.float64):
        super().__init__()

        if device is None: 
            # Use cuda if available
            self.device=torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
        else: 
            # Unless user has specified the device
            self.device=torch.device(device)

        self.default_dtype = dtype
        print("Running MACEForce on device: ", self.device, " with dtype: ", self.default_dtype)

        # Conversion constants
        self.nm_to_distance = 10.0 # nm->A
        self.energy_to_kJ = 96.49  # eV->kJ

        # Load the model
        self.model = torch.load(model_path,map_location=device)
        self.model.to(self.default_dtype)
        
        # Extract model parameters, define atomic number table, and compile the model
        self.r_max = self.model.r_max
        self.z_table = utils.AtomicNumberTable([int(z) for z in self.model.atomic_numbers])
        self.model = jit.compile(self.model)

        # Setup input
        N=len(atomic_numbers)
        self.ptr = torch.tensor([0,N],dtype=torch.long, device=self.device)
        self.batch = torch.zeros(N, dtype=torch.long, device=self.device)

        # One hot encoding of atomic number
        self.node_attrs = to_one_hot(
                torch.tensor(atomic_numbers_to_indices(atomic_numbers, z_table=self.z_table), dtype=torch.long, device=self.device).unsqueeze(-1),
                num_classes=len(self.z_table),
            )

        if periodic:
            self.pbc = torch.tensor([True, True, True], device=self.device)
        else:
            self.pbc = torch.tensor([False, False, False], device=self.device)

        if indices is None:
            self.indices = None
        else:
            self.indices = torch.tensor(indices, dtype=torch.int64)

    def forward(self, positions, boxvectors: Optional[torch.Tensor] = None):
        # Setup positions
        positions = positions.to(device=self.device, dtype=self.default_dtype)
        if self.indices is not None:
            positions = positions[self.indices]
        positions = positions*self.nm_to_distance

        # Setup cell and pbc
        if boxvectors is not None:
            cell = boxvectors.to(device=self.device,dtype=self.default_dtype) * self.nm_to_distance
            pbc = torch.tensor([True, True, True], device=self.device)
        else:
            cell = torch.eye(3, device=self.device)
            pbc = torch.tensor([False, False, False], device=self.device)

        # Calculate the shifts and edge_index
        mapping, shifts_idx = simple_nl(positions, cell, pbc, self.r_max)
        edge_index = torch.stack((mapping[0], mapping[1])).to(torch.int64)
        shifts = torch.mm(shifts_idx, cell).to(self.default_dtype)

        # Create input dict
        input_dict = {
            "ptr" : self.ptr,
            "node_attrs": self.node_attrs,
            "batch": self.batch,
            "pbc": self.pbc,
            "positions": positions,
            "edge_index": edge_index,
            "shifts": shifts,
            "cell": cell,
        }

        # Predict the energy
        energy = self.model(input_dict, compute_force=False)["interaction_energy"]

        assert energy is not None, "The model did not return any energy. Please check the input."

        # Return energy
        energy = energy*self.energy_to_kJ

        return energy

### Use the MACE MLP

The rest of the code is then similar to using the ANI-2x MLP via the `openmm-torch` interface.

In [None]:
import openmm as mm
import openmm.app as app
import openmm.unit as unit
from openmmtorch import TorchForce

pdb = app.PDBFile('alanine-dipeptide.pdb')
forcefield = app.ForceField('amber14-all.xml')
system = forcefield.createSystem(pdb.topology, constraints=None)

# Remove MM forces
while system.getNumForces() > 0:
  system.removeForce(0)

# The system should not contain any additional force and constraints
assert system.getNumConstraints() == 0
assert system.getNumForces() == 0

# Get the list of atomic numbers
atomic_numbers = [atom.element.atomic_number for atom in pdb.topology.atoms()]

# Create the MACE MLP
model_path = "MACE-OFF23_small.model"
pbc = False 
indices = None # None means all atoms are used
device = 'cuda' if torch.cuda.is_available() else 'cpu'

# Create the MACE MLP
mace_mlp = MACEForce(model_path, atomic_numbers, indices, pbc, device)

# Compile the NNP to TorchScript and load it with OpenMM-Torch
torch_module = torch.jit.script(mace_mlp)
torchforce = TorchForce(torch_module)

# Add it to the system
system.addForce(torchforce)

# Create an integrator with a time step of 0.5 fs
temperature = 298.15 * unit.kelvin
frictionCoeff = 1 / unit.picosecond
timeStep = 0.5 * unit.femtosecond
integrator = mm.LangevinMiddleIntegrator(temperature, frictionCoeff, timeStep)

# Create a simulation and set the initial positions and velocities
simulation = app.Simulation(pdb.topology, system, integrator)
simulation.context.setPositions(pdb.positions)

# Configure a reporter to print to the console every 100 steps and write to a PDB file
reporter = app.StateDataReporter(file=sys.stdout, reportInterval=100, step=True, time=True, potentialEnergy=True, temperature=True, speed=True)
simulation.reporters.append(reporter)
simulation.reporters.append(app.PDBReporter('mace_traj.pdb', 100))

# Simulate 0.5 ps
simulation.step(1000)

## Extra exercises
<a id="extra"></a>
<div class="alert alert-block alert-info">
ℹ️ <b>Exercise 6</b>

Use one of the other pretrained MACE-OFF23 models.

Tip: For example, to download the medium-sized model, run `!wget https://raw.githubusercontent.com/ACEsuit/mace-off/main/mace_off23/MACE-OFF23_medium.model`.
</div>
<div class="alert alert-block alert-info">
ℹ️ <b>Exercise 7</b>

Increase the size of the water box in the mixed system. Measure the performance (ns/day) to find out how many MM atoms there need to be before the speed of the MM part becomes significant compared to the speed of the ML part.
</div>
<div class="alert alert-block alert-info">
ℹ️ <b>Exercise 8 (Hard)</b>

Make a mixed system using the MACE MLP.
</div>
<div class="alert alert-block alert-info">
ℹ️ <b>Exercise 9 (Hard)</b>

The `openmm-ml` `createMixedSystem` function has the ability to create a mixed system where the ML region can be interpolated by a lambda value between the ML and MM representations. Look at the [API documentation](https://github.com/openmm/openmm-ml/blob/d5120bd1fe8cd7330bb3169f3549fd2d550d4c39/openmmml/mlpotential.py#L181) in the source code and try to use this functionality. 
</div>

## References
<a id="references"></a>

[1] Towards chemical accuracy for alchemical free energy calculations with hybrid physics-based machine learning / molecular mechanics potentials,
Dominic A. Rufa, Hannah E. Bruce Macdonald, Josh Fass, Marcus Wieder, Patrick B. Grinaway, Adrian E. Roitberg, Olexandr Isayev, John D. Chodera,
bioRxiv 2020.07.29.227959; doi: https://doi.org/10.1101/2020.07.29.227959

[2] NNP/MM: Fast molecular dynamics simulations with machine learning potentials and molecular mechanics,
Raimondas Galvelis, Alejandro Varela-Rial, Stefan Doerr, Roberto Fino, Peter Eastman, Thomas E. Markland, John D. Chodera, Gianni De Fabritiis,
arXiv:2201.08110; doi: https://doi.org/10.48550/arXiv.2201.08110

[3] Xiang Gao, Farhad Ramezanghorbani, Olexandr Isayev, Justin S. Smith, and Adrian E. Roitberg, Chem. Inf. Model. 60, 7, 3408–3415 (2020), https://doi.org/10.1021/acs.jcim.0c00451 | https://aiqm.github.io/torchani/

[4] AP Bartók, MC Payne, R Kondor, G Csányi, Physical review letters 104 (13), 136403 (2010), https://link.aps.org/doi/10.1103/PhysRevLett.104.136403

[5] I. Batatia, D. P. Kovacs, G. Simm, C. Ortner, and G. Csányi. Advances in Neural Information 
    Processing Systems 35, 11423 (2022). https://github.com/ACEsuit/mace

[6] P Thölke, G De Fabritiis, International Conference on Learning Representations, 2021, https://doi.org/10.48550/arXiv.2202.02541

[7] Kovács, D. P.; Moore, J. H.; Browning, N. J.; Batatia, I.; Horton, J. T.; Kapil, V.; Witt, W. C.; Magdău, I.-B.; Cole, D. J.; Csányi, G. MACE-OFF23: Transferable Machine Learning Force Fields for Organic Molecules. arXiv December 29, 2023. https://doi.org/10.48550/arXiv.2312.15211.

## Solutions

*Exercise 2*
```python
# Store the atomic numbers
self.atomic_numbers = atomic_numbers.unsqueeze(0)

# Create an ANI-2x model
self.model = ANI2x(periodic_table_index=True)
```

*Exercise 3*
```python
mixed_system.addForce(torchforce)
```

*Exercise 5*
```python
# Create the MLP using ANI-2x
potential = MLPotential('ani2x')
```