# Machine Learning Potentials

You can run this notebook in your browser: 

[![Open On Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/openmm/openmm_workshop_july2023/blob/main/section_3/machine_learning_potentials.ipynb)

## Table of contents
- Introduction
- OpenMM ML software
- Basics of OpenMM compatible MLPs
- Installing
- Exporting a PyTorch model for use in OpenMM
- Simulation of alanine dipeptide with ANI-2x using OpenMM-Torch
    - Prepate test system
    - Create a NNP
    - Adding the NNP to an OpenMM simulation
- Mixed MM/ML system
    - Creating the system
    - create the MLP
- Using OpenMM-ML package
- Using NNPOps
- Implementing other models - MACE
- Extra exercises
- References
- Solutions



## Introduction
<a id="intro"></a>

Machine Learning Potentials (MLPs) are a relatively new method where the potential energy surface of an atomic system is described by some sort of machine learning model. The model could be a feed forward Neural Network (ANI)[3], Gaussian process regression (GAP)[4], Graph neural network (MACE)[5], Equivariant Transformer (TorchMD-NET)[6], or something else. The MLP is trained on a first principles method such as DFT. It can then be used in an atomistic MD simulation in the place of a classical forcefield bringing the accuracy of first principles methods without the computational expense.


## OpenMM ML software
<a id="openmmmlsoftware"></a>

OpenMM is an MD engine therefore we will be covering how to use MLPs in a simulation and not cover how to train them.

OpenMM has serval packages supporting the use of MLPs.
- [openmm-torch](https://github.com/openmm/openmm-torch). This is the OpenMM PyTorch plugin what allows [PyTorch](https://pytorch.org/) static computation graphs to be used for defining an OpenMM TorchForce object, an [OpenMM Force class](http://docs.openmm.org/latest/api-python/library.html#forces) that computes a contribution to the potential energy.
- [openmm-ml](https://github.com/openmm/openmm-ml). This is a high level API for using machine learning models in OpenMM simulations. With just a few lines of code, you can set up a simulation that uses a standard, pretrained model to represent some or all of the interactions in a system.
- [NNPops](https://github.com/openmm/NNPOps). This is a library of optimized operations that appear in popular Neural Network Potentials that can be used to speed up your MLP implementation.


You would use `openmm-torch` to interface your MLP with OpenMM. You could then try and optimize it using operations from `NNPOps`. When you want to deploy it you can use `openmm-ml` to create an easy to use wrapper around it.

## Basics of (OpenMM compatible) MLPs
<a id="basicsofmlps"></a>

A MLP (or Neural Network Potential - NNP , we will use these terms interchangeably) reads in a set of atomic coordinates and outputs the potential energy. The forces on each particle can then be computed by backpropagation and taking the gradient of the energy with respect to the coordinates ($F=-\nabla V$).

To use a MLP in OpenMM you need to be able to write it as a PyTorch model than can be exported to [TorchScript]([https://pytorch.org/docs/stable/jit.html#torchscript]). The model takes a (nparticle, 3) shape tensor of particle positions and produces the energy. openmm-torch will calculate the forces using [Autograd](https://pytorch.org/docs/stable/autograd.html). Optionally the model can return the forces and openmm-torch will use them directly.

## Installing packages
<a id="installing"></a>

The packages can be installed from conda-forge. Note that there is no windows package for `openmm-torch` and there are only linux packages for `NNPOps`.

**Note:** Due to the added complexity of creating a conda environment with PyTorch we recommend you run this notebook in Colab. It should work on Linux. It will work on MacOS with the exception of the NNPOps section which you will need to skip. It will not work on Windows!

We will install `openmm-torch` and at the same time install `torch-ani` which we will need later.

In [None]:
# Execute this cell to install mamba in the Colab environment

if 'google.colab' in str(get_ipython()):
  print('Running on colab')
  !pip install -q condacolab
  import condacolab
  condacolab.install_mambaforge()
else:
  print('Not running on colab.')
  print('Make sure you create and activate a new conda environment!')

**Note:** During this step on Colab the kernel will be restarted. This will produce the error message:
"Your session crashed for an unknown reason. " This is normal and you can safely ignore it.

**Note:** Installing the packages will take several minutes!

In [None]:
if 'google.colab' in str(get_ipython()):
  #https://github.com/openmm/openmm-torch/issues/88
  %env CONDA_OVERRIDE_CUDA=12.0
  # you might also need to set this is you are on linux without CUDA installed!
!mamba install -y -c conda-forge openmm-torch torchani=2.2.2

Download the files we will need.

In [None]:
!wget https://raw.githubusercontent.com/openmm/openmm_workshop_july2023/main/section_3/alanine-dipeptide.pdb
!wget https://raw.githubusercontent.com/openmm/openmm_workshop_july2023/main/section_3/section_3_utils.py

## PyTorch

PyTorch is an open-source machine learning framework that is primarily used for developing and training deep learning models. It provides a dynamic computational graph that allows users to define and modify neural networks on the fly, making it flexible and efficient for building complex models.

Some key features of PyTorch include:

 - Dynamic computational graph: PyTorch uses a tape-based automatic differentiation system, which enables users to define and modify models on-the-fly. This dynamic nature makes it easy to debug and experiment with different model architectures.

 - GPU acceleration: PyTorch leverages the power of graphics processing units (GPUs) to accelerate the training and inference processes. It provides seamless integration with CUDA, a parallel computing platform, allowing for efficient computation on GPUs.

 - Neural network modules: PyTorch provides a rich library of pre-defined modules and functions for building neural networks. These modules include layers, activation functions, loss functions, and optimization algorithms, making it easy to construct and train deep learning models.

 - Community and ecosystem: PyTorch has a vibrant community of developers and researchers who contribute to its development and share their work. This community has created numerous resources, such as tutorials, libraries, and pre-trained models, which can be readily used for various machine learning tasks.

 - Deployment options: PyTorch offers various deployment options, including exporting models for inference in production environments. It provides tools like TorchScript, which allows models to be serialized and executed independently of the Python runtime, enabling deployment on platforms with limited resources.

PyTorch has gained popularity due to its ease of use, flexibility, and extensive support for research and development in the field of deep learning. It is widely used by researchers, academics, and industry professionals for a range of applications, including computer vision, natural language processing, reinforcement learning, and our use case of atomic Force Fields. 

We will be using pre-trained models so we do not need to know much about PyTorch. We just need to know that you can create models that take in atomic coordinates and predict the energy. It you want to learn more about PyTorch there are plenty of tutorials available: https://pytorch.org/tutorials/index.html

## Exporting a PyTorch model for use in OpenMM
<a id="pytorchmodelinopenmm"></a>

We can check that our installation is working by defining a very simple potential --- a harmonic force attracting every particle to the origin.

The first step is to create a PyTorch model defining the calculation. It should take the particle positions in nanometers (in the form of `torch.Tensor` of shape `(nparticles, 3)` as input, and return the potential energy in kJ/mol.)

In [None]:
import torch

class ForceModule(torch.nn.Module):
    """A central harmonic potential as a static compute graph"""
    def forward(self, positions: torch.Tensor):
        """The forward method returns the energy computed from positions.

        Parameters
        ----------
        positions : torch.Tensor with shape (nparticles,3)
           positions[i,k] is the position (in nanometers) of spatial dimension k of particle i

        Returns
        -------
        potential : torch.Tensor
           The potential energy (in kJ/mol)
        """
        return torch.sum(positions**2)

The ForceModule inherits from [`torch.nn.module`](https://pytorch.org/docs/stable/generated/torch.nn.Module.html) which is a pytorch base class for neural network modules. We put the code in the `forward` method. The forward method defines the computation performed at every call of the model.

Now that we have defined a model we can create an instance of it.

In [None]:
force_module = ForceModule()


We can give it some example input

In [None]:
# create random tensor of 4 coordinates
# we specify we want to record the gradient so we can compute the forces
input = torch.rand((4,3), requires_grad=True)
print(input)
energy = force_module(input)
print(energy)


We can calculate the force due to the potential using Autograd.

In [None]:
# backward pass computes the gradients
energy.backward()
# We can access the gradients with the grad attribute
# F = - grad (Potential)
forces = -input.grad 
print(forces)


To export the model for use in OpenMM (or other software) it must be converted to a TorchScript module and saved to a file. Converting to TorchScript can usually be done with a single call to `torch.jit.script`. See the [PyTorch documentation](https://pytorch.org/tutorials/beginner/Intro_to_TorchScript_tutorial.html) for details.  

In [None]:
# convert to TorchScript
scripted_module = torch.jit.script(force_module)
print(scripted_module)

We then need to save the module. The saving process serializes the module. This means we can load it into the C++ API as required by openmm-torch `TorchForce`. See the [PyTorch docs](https://pytorch.org/docs/stable/generated/torch.jit.save.html) for more information.

In [None]:
# saved the serialized compute graph to a file
scripted_module.save('model.pt')

To use the exported model in a simulation we need to create a TorchForce object and add it to the OpenMM System.

In [None]:
# Create the TorchForce from the serialized compute graph
from openmmtorch import TorchForce
torch_force = TorchForce('model.pt')

# Create an empty OpenMM system
import openmm as mm
system = mm.System()

print("number of forces = ", system.getNumForces())

# add the TorchForce to the system
system.addForce(torch_force)

print("number of forces = ", system.getNumForces())


Now that we know how to use a PyTorch model in an OpenMM simulation we will move onto an example of creating a Neural Network Potential that uses ANI-2x.


## Simulation of alanine dipeptide with ANI-2x using OpenMM-Torch
<a id="ani"></a>

ANI-2x is a general Neural Network Potential that works with molecules containing (H, C, N, O, F, Cl, S) atoms. For more information please read the publication [3].
The model is available from the TorchANI package which we installed earlier.

### Prepare a test system
<a id="prepare"></a>

For simplicity we will use an alanine-dipeptide test system. We prepare it as we have done before but then we remove all the standard MM forces.

In [None]:
import openmm as mm
import openmm.app as app
import openmm.unit as unit

# create an alanine-dipeptide test system
pdb = app.PDBFile('alanine-dipeptide.pdb')
forcefield = app.ForceField('amber14-all.xml')
system = forcefield.createSystem(pdb.topology, constraints=None)

# Remove MM forces
while system.getNumForces() > 0:
  system.removeForce(0)

# The system should not contain any additional force and constraints
assert system.getNumConstraints() == 0
assert system.getNumForces() == 0


# Get the list of atomic numbers. We will need this when creating the model instance
atomic_numbers = [atom.element.atomic_number for atom in pdb.topology.atoms()]

### Define the NNP
<a id="nnp"></a>

We can create a NNP class that uses ANI-2x.

In [None]:
import torch
from torchani.models import ANI2x

class NNP(torch.nn.Module):

  def __init__(self, atomic_numbers: torch.Tensor):

    super().__init__()

    # Store the atomic numbers
    self.atomic_numbers = atomic_numbers.unsqueeze(0)

    # Create an ANI-2x model
    self.model = ANI2x(periodic_table_index=True)

    # make sure it is on the same device at the atomic_numbers tensor
    self.model.to(self.atomic_numbers.device)

  def forward(self, positions: torch.Tensor):

    # Prepare the positions
    positions = positions.unsqueeze(0).float() * 10 # nm --> Å

    # Run ANI-2x
    result = self.model((self.atomic_numbers, positions))

    # Get the potential energy
    energy = result.energies[0] * 2625.5 # Hartree --> kJ/mol

    return energy


The `NNP` looks rather complex so we will break it down line by line.

The first line
```python
class NNP(torch.nn.Module):
```
defines this as a python [`Class`](https://docs.python.org/3/tutorial/classes.html) called `NNP` that inherits from the [`torch.nn.Module`](https://pytorch.org/docs/stable/generated/torch.nn.Module.html) class.

```python
def __init__(self, atomic_numbers: torch.Tensor):
```
Is the constructor definition. It states that when we create a new instance of the class we must pass a `torch.Tensor` of the atomic numbers of the system as an argument. All the code within the constructor is called when the NNP is first created.

```python
  super().__init__()
```
Calls the constructor of the parent class (`torch.nn.Module`).

```python
# Store the atomic numbers
  self.atomic_numbers = atomic_numbers.unsqueeze(0)
```
Stores the provided atomic_numbers tensor as an attribute of the class. The `unsqueeze(0)` converts the tensor from 1D with size (N) to 2D with size (1,N). This is know as adding a batch dimension. This needs to be done because the model we will use expects batched data (even if the batch size is 1).

```python
  # Create an ANI-2x model
  self.model = ANI2x(periodic_table_index=True)
```
This creates an instance of an ANI2x model from the [torchANI](https://aiqm.github.io/torchani/api.html#module-torchani.models) package.

```python
  # make sure it is on the same device at the atomic_numbers tensor
  self.model.to(self.atomic_numbers.device)
```

This makes sure the model is on the same device as the atomic_numbers tensor. The [device](https://pytorch.org/docs/stable/tensor_attributes.html#torch.device) is either 'cpu' or 'cuda' (GPU). 

```python
def forward(self, positions: torch.Tensor):
```
This defines the [forward method](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.forward). The forward method is the code that gets called every time the model is evaluated. We define that a `torch.Tensor` of the atomic positions must be passed as an argument when we evaluate the model.

```python
    # Prepare the positions
    positions = positions.unsqueeze(0).float() * 10 # nm --> Å
```
This adds a batch dimension to the positions tensor, converts it to floating point precision (it might be in double if OpenMM is running in double precision), and converts the units from the OpenMM default of nm to Angstrom as required by the ANI model.


```python
    # Run ANI-2x
    result = self.model((self.atomic_numbers, positions))

    energy = result.energies[0] * 2625.5 # Hartree --> kJ/mol
```
This evaluates the ANI-2x model on the atomic configuration. The result will contain the total potential energy. We need to convert from the ANI units of Hartree to the OpenMM units of kJ/mol.


Implementing other NNPs will follow a similar format to this. The key point is that you must convert between OpenMM format and the format the model expects.

### Create NNP
<a id="creatennp"></a>

We can now create an instance of the NNP making sure to use the gpu ('cuda' device) if available. If we print the model we can see the underlying neural network architecture.

In [None]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'
nnp = NNP(torch.tensor(atomic_numbers,device=device))

print(nnp)

#https://github.com/aiqm/torchani/issues/628
torch._C._jit_set_nvfuser_enabled(False)

We can now compute the potential energy of the system using the PyTorch interface. We can also compute the forces using autograd.

In [None]:
# Need to make a torch.tensor of the positions. Require grad so we can compute the forces.
positions = torch.tensor(pdb.positions.value_in_unit(unit.nanometers), device=device, requires_grad=True)

# put the positions into the NNP and it returns the energy
energy = nnp(positions)

print(energy)

# we can compute the forces using autograd
energy.backward()
force = -positions.grad

print(force)

### Add the NNP to the system
<a id="addnnp"></a>

We now export the model and load it with `TorchForce`.

In [None]:
from openmmtorch import TorchForce
import sys

# Save the NNP to a file and load it with OpenMM-Torch
torch.jit.script(nnp).save('model.pt')
torchforce = TorchForce('model.pt')

# Add the NNP to the system
system.addForce(torchforce)

print("number of forces = ", system.getNumForces())
assert(system.getNumForces()==1)

**Note:** There should be 1 force in the system. If the assertion above fails it might be because you have run the cell multiple times. You must go back and run the ["prepare a test system"](#prepare) cell to create the system with no forces!

### Create a simulation

In [None]:
# Create an integrator with a time step of 0.5 fs
temperature = 298.15 * unit.kelvin
frictionCoeff = 1 / unit.picosecond
timeStep = 0.5 * unit.femtosecond
integrator = mm.LangevinMiddleIntegrator(temperature, frictionCoeff, timeStep)

# Create a simulation and set the initial positions and velocities
simulation = app.Simulation(pdb.topology, system, integrator)
simulation.context.setPositions(pdb.positions)

simulation.reporters.append(app.PDBReporter('traj.pdb', 100))

# Configure a reporter to print to the console every 100 steps
reporter = app.StateDataReporter(file=sys.stdout, reportInterval=100, step=True, time=True, potentialEnergy=True, temperature=True)
simulation.reporters.append(reporter)

Now we can compute the energy and forces again but using the OpenMM interface. We compare them to the energy and forces computed from the PyTorch interface. This is a good check to do because sometimes bugs can arise during the export/serialization and load step.

In [None]:
state = simulation.context.getState(getEnergy=True, getForces=True)
openmm_energy = state.getPotentialEnergy().value_in_unit(unit.kilojoule_per_mole)
openmm_force  =  state.getForces(asNumpy=True).value_in_unit(unit.kilojoule_per_mole/unit.nanometer)

print(openmm_energy)
print(openmm_force)
import numpy as np

assert(np.isclose(openmm_energy, energy.cpu().detach().numpy()))
assert(np.allclose(openmm_force, force.cpu().detach().numpy(),rtol=1e-3))

### Run the simulation

In [None]:
simulation.step(1000)

**Exercise 1.** Download the "traj.pdb" file and visualize it. You should see the alanine-dipeptide molecule moving around. 

## Mixed system
<a id="mixedsystem"></a>

In general ML forcefields are still too computationally slow to be used to model entire solvated biomolecules. However, a use-case to exploit their accuracy without the prohibitive cost is to model a small part of the system with the MLP and the rest of the system with a traditional MM forcefield [[1, 2]](#references).
For example a ligand's intramolecular energy could be modelled with the MLP and the rest of the system, including intermolecular interactions between ligand and the protein/solvent, would be modelled with a MM forcefield. This approach is similar to hybrid QM/MM methods.

Our example system will be the alanine-dipeptide in a water box.
We will model the alanine-dipeptide intramolecular forces with the ANI-2x MLP and the water molecules will use a MM forcefield, additionally the intermolecular interactions between the alanine-dipeptide and the water will use the MM forcefield.

The total potential energy of the mixed system can be written as
$V_{MM/ML}(r) = V_{MM}(r_{MM}) + V_{MM-ML}(r) + V_{ML}(r_{ML})$ ,
where $r$ are the coordinates of all atoms, $r_{MM}$ are the coordinates of just the atoms in the MM region, and $r_{ML}$ are the coordinates of just the atoms in the ML region. The three terms are:

  - $V_{MM}(r_{MM})$ - The potential energy of the MM region (water molecules) using the MM forcefield.
  - $V_{MM-ML}(r)$ - The coupling term between the MM and ML regions. We will define this to compute the non-bonded intermolecular interactions between the ML region and MM region atoms using the MM forcefield.
  - $V_{ML}(r_{ML})$ - The intramolecular potential energy of the ML region (alanine-dipeptide) using the ML forcefield.

**Note:** The mixed system strategy we have outlined, and will implement, is only appropriate when the ML region is a whole single molecule.

### Creating the system
<a id="createmixed"></a>

We will first create a MM system as normal. Then we will define a function that modifies it to use the hybrid potential described above.

In [None]:
import openmm as mm
import openmm.app as app
import openmm.unit as unit

# create an alanine-dipeptide test system
pdb = app.PDBFile('alanine-dipeptide.pdb')
forcefield = app.ForceField('amber14-all.xml', 'amber14/tip3p.xml')
modeller = app.Modeller(pdb.topology, pdb.positions)
modeller.addSolvent(forcefield, padding=1.0*unit.nanometers)
system = forcefield.createSystem(modeller.topology, nonbondedMethod=app.PME, constraints=None)

Now we create a function that will remove the MM interactions within the ML region.

In [None]:
from section_3_utils import removeBonds

def removeMMInteraction(topology, system, ml_atoms):

  # remove the bonded interactions within the ML subset
  newSystem = removeBonds(system, ml_atoms)

  # Add nonbonded exceptions and exclusions.
  # This removes the nonbonded interactions between the ML atoms
  atomList = list(ml_atoms)
  for force in newSystem.getForces():
      if isinstance(force, mm.NonbondedForce):
          for i in range(len(atomList)):
              for j in range(i):
                  force.addException(i, j, 0, 1, 0, True)
      elif isinstance(force, mm.CustomNonbondedForce):
          existing = set(tuple(force.getExclusionParticles(i)) for i in range(force.getNumExclusions()))
          for i in range(len(atomList)):
              for j in range(i):
                  if (i, j) not in existing and (j, i) not in existing:
                      force.addExclusion(i, j, True)

  return newSystem


### Create the MLP for a mixed system
<a id="createmlp"></a>

We will create an ANI-2x MLP as before but add in an extra argument that lists the atoms in the ML region.

**Exercise 2**. You will need to add the code that stores the atomic numbers and the code that creates the ANI-2x model. Tip: this part is the same as the previous example.

In [None]:
import torch
from torchani.models import ANI2x

class hybridNNP(torch.nn.Module):

  def __init__(self, atomic_numbers: torch.Tensor, ml_atoms: torch.Tensor):

    super().__init__()

    # The atomic_numbers tensor contains the atomic number of the ML region atoms.
    # the ml_atoms tensor contains the index of each ML atom with respect to the full system.
    assert(atomic_numbers.shape == ml_atoms.shape)

    # Store the indices of the ml atoms
    self.indices = ml_atoms

    # Store the atomic numbers
    FIXME

    # Create an ANI-2x model
    FIXME

    # make sure it is on the same device at the atomic_numbers tensor
    self.model.to(self.atomic_numbers.device)

  def forward(self, positions: torch.Tensor):

    # extract the positions of the ML atoms
    positions = positions[self.indices]

    # Prepare the positions
    positions = positions.unsqueeze(0).float() * 10 # nm --> Å

    # Run ANI-2x
    result = self.model((self.atomic_numbers, positions))

    # Get the potential energy
    energy = result.energies[0] * 2625.5 # Hartree --> kJ/mol

    return energy


Now we can create an instance of the MLP and add it to the system.

**Exercise 3.** You will need to add the `torchForce` to the system.

In [None]:
# get a list of the ML atoms. The alanine-dipeptide is chain 0.
chains = list(modeller.topology.chains())
ml_atoms = [atom.index for atom in chains[0].atoms()]
print(ml_atoms)

# get the atomic numbers
atomic_numbers = [atom.element.atomic_number for atom in chains[0].atoms()]

# convert to torch tensors
device = 'cuda' if torch.cuda.is_available() else 'cpu'
ml_atoms = torch.tensor(ml_atoms, device=device, dtype=torch.int64)
atomic_numbers = torch.tensor(atomic_numbers, device=device)

hybridnnp = hybridNNP(atomic_numbers, ml_atoms)

#https://github.com/aiqm/torchani/issues/628
torch._C._jit_set_nvfuser_enabled(False)

# Save the NNP to a file and load it with OpenMM-Torch
torch.jit.script(hybridnnp).save('mixed_model.pt')

from openmmtorch import TorchForce
torchforce = TorchForce('mixed_model.pt')

# make the mixed system
mixed_system = removeMMInteraction(modeller.topology, system, ml_atoms.tolist())

# add the TorchForce
FIXME

# print out the forces
for force in mixed_system.getForces():
    print(force)

assert(mixed_system.getNumForces()==6)

### Simulate

In [None]:
# Create an integrator with a time step of 0.5 fs
temperature = 298.15 * unit.kelvin
frictionCoeff = 1 / unit.picosecond
timeStep = 0.5 * unit.femtosecond
integrator = mm.LangevinMiddleIntegrator(temperature, frictionCoeff, timeStep)

# Create a simulation and set the initial positions and velocities
simulation = app.Simulation(modeller.topology, mixed_system, integrator)
simulation.context.setPositions(modeller.positions)

simulation.minimizeEnergy(maxIterations=100)

# Configure a reporter to print to the console every 100 steps
reporter = app.StateDataReporter(file=sys.stdout, reportInterval=100, step=True, time=True, potentialEnergy=True, temperature=True)
simulation.reporters.append(reporter)

simulation.reporters.append(app.PDBReporter('mixed_traj.pdb', 100, enforcePeriodicBox=False))

simulation.step(1000)

**Exercise 4.** Visualize "mixed_traj.pdb"

## Using the Openmm-ML package
<a id="openmmml"></a>

We have covered how to use openmm-torch to add a MLP to a system. As you have seen it involves quite a lot of code. The [Openmm-ML](https://github.com/openmm/openmm-ml) package was created to be a high level interface for people to use pre-trained ML models in their OpenMM simulations. We will now do the same simulations using openmm-ml

### Install software
The openmm-ml package can be installed from [conda-forge](https://anaconda.org/conda-forge/openmm-ml)

In [None]:
!mamba install -y -c conda-forge openmm-ml

### Create a pure ML system

We will load in the alanine-dipeptide molecule and simulate it in vacuum with ANI-2x using the OpenMM-ML `MLPPotential.createSystem` function.

In [None]:
import openmm as mm
import openmm.app as app
import openmm.unit as unit

from openmmml import MLPotential

pdb = app.PDBFile('alanine-dipeptide.pdb')

# create the MLP using ANI-2x
potential = MLPotential('ani2x')

# create a system that uses the MLP
system = potential.createSystem(pdb.topology)


# Create an integrator with a time step of 0.5 fs
temperature = 298.15 * unit.kelvin
frictionCoeff = 1 / unit.picosecond
timeStep = 0.5 * unit.femtosecond
integrator = mm.LangevinMiddleIntegrator(temperature, frictionCoeff, timeStep)

# Create a simulation and set the initial positions and velocities
simulation = app.Simulation(pdb.topology, system, integrator)
simulation.context.setPositions(pdb.positions)

# Configure a reporter to print to the console every 100 steps
reporter = app.StateDataReporter(file=sys.stdout, reportInterval=100, step=True, time=True, potentialEnergy=True, temperature=True)
simulation.reporters.append(reporter)

simulation.reporters.append(app.PDBReporter('traj.pdb', 100))

simulation.step(1000)


You can safely ignore the error message that says `failed to equip 'nnpops' with error: No module named 'NNPOps'`.


### Create a mixed system
We can just as easily create a mixed system.

**Exercise 5**. You will need to write the code to create the potential using MLPotential. Tip look at the OpenMM-ML readme: https://github.com/openmm/openmm-ml

In [None]:
import openmm as mm
import openmm.app as app
import openmm.unit as unit
import sys

from openmmml import MLPotential

pdb = app.PDBFile('alanine-dipeptide.pdb')

forcefield = app.ForceField('amber14-all.xml', 'amber14/tip3p.xml')
modeller = app.Modeller(pdb.topology, pdb.positions)
modeller.addSolvent(forcefield, padding=1.0*unit.nanometers)


# create the MM system
mm_system = forcefield.createSystem(modeller.topology, nonbondedMethod=app.PME, constraints=None)

# create the MLP using ANI-2x
FIXME


# create the mixed system
chains = list(modeller.topology.chains())
ml_atoms = [atom.index for atom in chains[0].atoms()]
mixed_system = potential.createMixedSystem(modeller.topology, mm_system, ml_atoms)


# Create an integrator with a time step of 0.5 fs
temperature = 298.15 * unit.kelvin
frictionCoeff = 1 / unit.picosecond
timeStep = 0.5 * unit.femtosecond
integrator = mm.LangevinMiddleIntegrator(temperature, frictionCoeff, timeStep)

# Create a simulation and set the initial positions and velocities
simulation = app.Simulation(modeller.topology, mixed_system, integrator)
simulation.context.setPositions(modeller.positions)
simulation.minimizeEnergy(maxIterations=100)

# Configure a reporter to print to the console every 100 steps
reporter = app.StateDataReporter(file=sys.stdout, reportInterval=100, step=True, time=True, potentialEnergy=True, temperature=True)
simulation.reporters.append(reporter)

simulation.reporters.append(app.PDBReporter('traj.pdb', 100, enforcePeriodicBox=False))

simulation.step(1000)

## Using NNPOps
<a id="nnpops"></a>

The NNPOps package provides highly optimized, open source implementations of bottleneck operations that appear in popular potentials. It can be used to speed up ANI simulations. We can install it from conda-forge (only on Linux) and use it through the Openmm-ml interface.

**Note.** NNPOps is only available on linux.

In [None]:
!mamba install -y -c conda-forge nnpops

If you are using a GPU it should offer you some speed up. We will use the script from before and specifiy to openmm-ml that is should use NNPOps.

In [None]:
import openmm as mm
import openmm.app as app
import openmm.unit as unit

from openmmml import MLPotential

pdb = app.PDBFile('alanine-dipeptide.pdb')

# create the MLP using ANI-2x
# Use NNPOps
potential = MLPotential('ani2x', implementation='nnpops')

# create a system that uses the MLP
system = potential.createSystem(pdb.topology)


# Create an integrator with a time step of 0.5 fs
temperature = 298.15 * unit.kelvin
frictionCoeff = 1 / unit.picosecond
timeStep = 0.5 * unit.femtosecond
integrator = mm.LangevinMiddleIntegrator(temperature, frictionCoeff, timeStep)

# Create a simulation and set the initial positions and velocities
simulation = app.Simulation(pdb.topology, system, integrator)
simulation.context.setPositions(pdb.positions)

# Configure a reporter to print to the console every 100 steps
reporter = app.StateDataReporter(file=sys.stdout, reportInterval=100, step=True, time=True, potentialEnergy=True, temperature=True)
simulation.reporters.append(reporter)

simulation.step(1000)

The `implementation='nnpops'` argument tells `MLPotential` that it should swap out some of the torchANI pytorch functions for the optimized NNPOps versions. If you want to see how this is done in the code take a look at the [github repo](https://github.com/openmm/NNPOps).

## Implementing other Models - MACE
<a id="mace"></a>


Any model that can be written as a pytorch model can be used with OpenMM-torch. We will show how to implement a MACE model.

Please also take a look at the [MACE documentation](https://mace-docs.readthedocs.io/en/latest/guide/openmm.html) about their `mace-md` OpenMM interface.

### Install software
MACE can be installed with pip.

In [None]:
!pip install git+https://github.com/ACEsuit/mace

### Get a pretrained model

In [None]:
!wget https://github.com/ACEsuit/mace/raw/docs/docs/examples/ANI_trained_MACE.zip
!unzip ANI_trained_MACE.zip

### Define the MLP
We will create a MACE MLP class as we did before for ANI-2x. The MACE model requires some extra code to covert from atomic numbers and positions into the required format. Some of this code has been put in the section_3_utils.py module.

In [None]:
import torch
from section_3_utils import simple_nl
from e3nn.util import jit
from mace.tools import utils, to_one_hot, atomic_numbers_to_indices
from typing import Optional

class MACEForce(torch.nn.Module):
  def __init__(self, model_path, atomic_numbers, indices, periodic, device, dtype=torch.float64):
      super().__init__()

      if device is None: # use cuda if available
          self.device=torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")

      else: # unless user has specified the device
          self.device=torch.device(device)

      self.default_dtype = dtype
      torch.set_default_dtype(self.default_dtype)

      print("Running MACEForce on device: ", self.device, " with dtype: ", self.default_dtype)

      # conversion constants
      self.nm_to_distance = 10.0 # nm->A
      self.distance_to_nm = 0.1 # A->nm
      self.energy_to_kJ = 96.49 # eV->kJ

      self.model = torch.load(model_path,map_location=device)
      self.model.to(self.default_dtype)
      self.model.eval()


      self.r_max = self.model.r_max
      self.z_table = utils.AtomicNumberTable([int(z) for z in self.model.atomic_numbers])

      self.model = jit.compile(self.model)

      # setup input
      N=len(atomic_numbers)
      self.ptr = torch.tensor([0,N],dtype=torch.long, device=self.device)
      self.batch = torch.zeros(N, dtype=torch.long, device=self.device)

      # one hot encoding of atomic number
      self.node_attrs = to_one_hot(
              torch.tensor(atomic_numbers_to_indices(atomic_numbers, z_table=self.z_table), dtype=torch.long, device=self.device).unsqueeze(-1),
              num_classes=len(self.z_table),
          )

      if periodic:
          self.pbc=torch.tensor([True, True, True], device=self.device)
      else:
          self.pbc=torch.tensor([False, False, False], device=self.device)

      if indices is None:
          self.indices = None
      else:
          self.indices = torch.tensor(indices, dtype=torch.int64)



  def forward(self, positions, boxvectors: Optional[torch.Tensor] = None):
      # setup positions

      positions = positions.to(device=self.device,dtype=self.default_dtype)
      if self.indices is not None:
          positions = positions[self.indices]

      positions = positions*self.nm_to_distance

      if boxvectors is not None:
          cell = boxvectors.to(device=self.device,dtype=self.default_dtype) * self.nm_to_distance
          pbc = True
      else:
          cell = torch.eye(3, device=self.device)
          pbc = False

      mapping, shifts_idx = simple_nl(positions, cell, pbc, self.r_max)

      edge_index = torch.stack((mapping[0], mapping[1]))

      shifts = torch.mm(shifts_idx, cell)

      # create input dict
      input_dict = { "ptr" : self.ptr,
                    "node_attrs": self.node_attrs,
                    "batch": self.batch,
                    "pbc": self.pbc,
                    "cell": cell,
                    "positions": positions,
                    "edge_index": edge_index,
                    "unit_shifts": shifts_idx,
                    "shifts": shifts}

      # predict
      out = self.model(input_dict,compute_force=False)

      energy = out["interaction_energy"]
      if energy is None:
          energy = torch.tensor(0.0, device=self.device)

      # return energy
      energy = energy*self.energy_to_kJ

      return energy


### Use the MACE MLP

The rest of the code is then similar to using the ANI-2x MLP via the openmm-torch interface.

In [None]:
import openmm as mm
import openmm.app as app
import openmm.unit as unit
from openmmtorch import TorchForce

pdb = app.PDBFile('alanine-dipeptide.pdb')
forcefield = app.ForceField('amber14-all.xml')
system = forcefield.createSystem(pdb.topology, constraints=None)
# Remove MM forces
while system.getNumForces() > 0:
  system.removeForce(0)
# The system should not contain any additional force and constraints
assert system.getNumConstraints() == 0
assert system.getNumForces() == 0

# Get the list of atomic numbers.
atomic_numbers = [atom.element.atomic_number for atom in pdb.topology.atoms()]

# Create the MACE MLP
model_path = "ANI_trained/ani500k_small_DFT.model"
pbc = False
indices = None
device = 'cuda' if torch.cuda.is_available() else 'cpu'

mace_mlp = MACEForce(model_path, atomic_numbers, indices, pbc, device)

# export it with torchscript
torch.jit.script(mace_mlp).save('macemodel.pt')

# load it in with torchforce
torchforce = TorchForce('macemodel.pt')

# add it to the system
system.addForce(torchforce)


# Create an integrator with a time step of 0.5 fs
temperature = 298.15 * unit.kelvin
frictionCoeff = 1 / unit.picosecond
timeStep = 0.5 * unit.femtosecond
integrator = mm.LangevinMiddleIntegrator(temperature, frictionCoeff, timeStep)

# Create a simulation and set the initial positions and velocities
simulation = app.Simulation(pdb.topology, system, integrator)
simulation.context.setPositions(pdb.positions)

# Configure a reporter to print to the console every 100 steps
reporter = app.StateDataReporter(file=sys.stdout, reportInterval=100, step=True, time=True, potentialEnergy=True, temperature=True, speed=True)
simulation.reporters.append(reporter)

simulation.reporters.append(app.PDBReporter('mace_traj.pdb', 100))

simulation.step(1000)

## Extra exercises
<a id="extra"></a>

**Exercise 6.** Use one of the other pretrained MACE models in the ANI_trained_MACE.zip file.

**Exercise 7.** Increase the size of the waterbox in the mixed system. Measure the performance (ns/day) to find out how many MM atoms there need to be before the speed of the MM part becomes significant compared to the speed of the ML part.

**Exercise 8. (Hard)** Make a mixed system using the MACE MLP.

**Exercise 9. (Hard)** The openmm-ml createMixedSystem function has the ability to create a mixed system where the ML region can be interpolated by a lambda value between the ML and MM representation. Look at the [API documentation](https://github.com/openmm/openmm-ml/blob/d5120bd1fe8cd7330bb3169f3549fd2d550d4c39/openmmml/mlpotential.py#L181) in the source code and try and use this functionality. 

## References
<a id="references"></a>

[1] Towards chemical accuracy for alchemical free energy calculations with hybrid physics-based machine learning / molecular mechanics potentials,
Dominic A. Rufa, Hannah E. Bruce Macdonald, Josh Fass, Marcus Wieder, Patrick B. Grinaway, Adrian E. Roitberg, Olexandr Isayev, John D. Chodera,
bioRxiv 2020.07.29.227959; doi: https://doi.org/10.1101/2020.07.29.227959

[2] NNP/MM: Fast molecular dynamics simulations with machine learning potentials and molecular mechanics,
Raimondas Galvelis, Alejandro Varela-Rial, Stefan Doerr, Roberto Fino, Peter Eastman, Thomas E. Markland, John D. Chodera, Gianni De Fabritiis,
arXiv:2201.08110; doi: https://doi.org/10.48550/arXiv.2201.08110

[3] Xiang Gao, Farhad Ramezanghorbani, Olexandr Isayev, Justin S. Smith, and Adrian E. Roitberg, Chem. Inf. Model. 60, 7, 3408–3415 (2020), https://doi.org/10.1021/acs.jcim.0c00451 | https://aiqm.github.io/torchani/

[4] AP Bartók, MC Payne, R Kondor, G Csányi, Physical review letters 104 (13), 136403 (2010), https://link.aps.org/doi/10.1103/PhysRevLett.104.136403

[5] I. Batatia, D. P. Kovacs, G. Simm, C. Ortner, and G. Csányi. Advances in Neural Information 
    Processing Systems 35, 11423 (2022). https://github.com/ACEsuit/mace

[6] P Thölke, G De Fabritiis, International Conference on Learning Representations, 2021, https://doi.org/10.48550/arXiv.2202.02541





## Solutions

*exercise 2*
```python
    # Store the atomic numbers
    self.atomic_numbers = atomic_numbers.unsqueeze(0)

    # Create an ANI-2x model
    self.model = ANI2x(periodic_table_index=True)
```

*exercise 3*
```python
mixed_system.addForce(torchforce)
```

*exercise 5*
```python
# create the MLP using ANI-2x
potential = MLPotential('ani2x')
```