<a href="https://colab.research.google.com/github/gabriele16/nequip/blob/main/mod_nequip_tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Molecular Dynamics with NequIP 

### Authors: Simon Batzner, Albert Musaelian, Lixin Sun, Anders Johansson, Boris Kozinsky

<img src="https://github.com/mir-group/nequip_mrs_tutorial/blob/master/nequip3.png?raw=true" width="60%">

## What is this? 

This is a Colab tutorial for NequIP, software for building extremely accurate Machine Learning Interatomic Potentials. The ideas are described in the paper below. We have released an open-source software with the goal of building NequIP potentials with a few simple commands, at the Github link below. This tutorial serves as a simple introduction into the NequIP code. 

The goal of NequIP is to be as simple as possible. You will never have to write a single line of Python, but instead you can train a network with one single command and you will be ready to run MD with it in LAMMPS or ASE. 

Paper: https://www.nature.com/articles/s41467-022-29939-5

Code: https://github.com/mir-group/nequip

## Contents

This tutorial will teach you how to:

* Train a model 
* Deploy the model intro production
* Run MD with it in LAMMPS

We will do all this in this Colab, including LAMMPS. Training + inference will take only about 10 minutes. Before you get started, however, you will have to compile LAMMPS which takes approximately 5 minutes. Once we have installed NequIP + LAMMPS, we're ready to get started. 


## Before you begin 🛑

1. Save a copy of this colab in your own drive 
2. Run the first two code cells below to install NequIP and the Molecular Dynamics code LAMMPS 

## Now you're ready to get started :) ✅

In [None]:
!pip install torch==1.10

### Turn on GPU

Make sure Runtime --> Change runtime type is set to GPU

In [None]:
%%capture
# install wandb
!pip install wandb
# install nequip
!git clone --depth 1 "https://github.com/mir-group/nequip.git"
!pip install nequip/
# fix colab imports
import site
site.main()
# set to allow anonymous WandB
import os
os.environ["WANDB_ANONYMOUS"] = "must"
import numpy as np
import torch 
from ase.io import read, write
np.random.seed(0)
torch.manual_seed(0)

In [None]:
# compile lammps
!git clone -b "stable_29Sep2021_update2" --depth 1 "https://github.com/lammps/lammps.git"
!git clone https://github.com/mir-group/pair_nequip
!cd pair_nequip && ./patch_lammps.sh /content/lammps/
!cp /content/pair_nequip/*.cpp /content/lammps/src/
!cp /content/pair_nequip/*.h /content/lammps/src/
! sed -i 's/CMAKE_CXX_STANDARD 11/CMAKE_CXX_STANDARD 14/g'  /content/lammps/cmake/CMakeLists.txt
!pip install mkl mkl-include
!cd lammps && mkdir -p build && cd build && cmake ../cmake -DCMAKE_PREFIX_PATH=`python -c 'import torch;print(torch.utils.cmake_prefix_path)'` && make -j4

Cloning into 'lammps'...
remote: Enumerating objects: 11732, done.[K
remote: Counting objects: 100% (11732/11732), done.[K
remote: Compressing objects: 100% (8603/8603), done.[K
remote: Total 11732 (delta 3943), reused 6318 (delta 2930), pack-reused 0[K
Receiving objects: 100% (11732/11732), 110.00 MiB | 13.25 MiB/s, done.
Resolving deltas: 100% (3943/3943), done.
Note: checking out '7586adbb6a61254125992709ef2fda9134cfca6c'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b <new-branch-name>

Checking out files: 100% (11058/11058), done.
Cloning into 'pair_nequip'...
remote: Enumerating objects: 418, done.[K
remote: Counting objects: 100% (116/116), d

## 3 Steps: 
* Train: using a data set, train the neural network 🧠 
* Deploy: convert the Python-based model into a stand-alone potential file for fast execution ⚡
* Run: run Molecular Dynamics, Monte Carlo, Structural Minimization, ...  with it in LAMMPS 🏃

<img src="https://github.com/mir-group/nequip_mrs_tutorial/blob/master/all.png?raw=true" width="60%">

### Train a model

<img src="https://github.com/mir-group/nequip_mrs_tutorial/blob/master/train.png?raw=true" width="60%">

This tutorial is set up to use `wandb` in anonymous mode; when you use NequIP yourself you will be presented with a login prompt.

Here, we will train a NequIP potential on the following system

* Toluene
* sampled at T=500K from AIMD
* at CCSD(T) accuracy (gold standard of quantum chemistry)
* Using 100 training configurations
* The units of the reference data are in kcal/mol and A. If you're more familiar with eV, remember 1 kcal/mol is chemical accuracy and is approximately 43 meV

Start a training run: this will print output to our console, but it is usually more convenient to view the results in a web interface called Weights and Biases. Click the link next to the rocket emoji to watch the run in the WandB interface 🚀 

In WandB, watch the followingkeys:

* Plot 1: validation_all_f_mae, training_all_f_mae
* Plot 2: validation_e/N_mae, training_e/N_mae

These are the validation/training error in all force components and the validation/training error in the potential energy, normalized by the number of atoms, respectively. 

In [None]:
!rm -rf ./results
!nequip-train nequip/configs/example.yaml

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
! Best model       61    0.004

training
# Epoch batch         loss       loss_f       loss_e        f_mae       f_rmse      H_f_mae      C_f_mae  psavg_f_mae     H_f_rmse     C_f_rmse psavg_f_rmse        e_mae      e/N_mae
     62    10      0.00292      0.00292     4.19e-06         1.23         1.65         0.98         1.53         1.25         1.34         1.96         1.65        0.856       0.0571
     62    20      0.00267      0.00266     7.51e-06         1.24         1.58         0.98         1.53         1.25         1.29         1.85         1.57         1.22       0.0812

validation
# Epoch batch         loss       loss_f       loss_e        f_mae       f_rmse      H_f_mae      C_f_mae  psavg_f_mae     H_f_rmse     C_f_rmse psavg_f_rmse        e_mae      e/N_mae
     62     5      0.00397      0.00397      3.3e-06         1.44         1.93         1.19         1.72         1.46         1.61         2.24       

We see that the model has converged to an energy accuarcy < 1meV/atom and a force accuracy of approx. 40 meV/A within 5 minutes and trained on only 100 samples. That should give us a good first potential! Note that these numbers will decrease significantly if you increase the training set size and the number of epochs to train. 

### Deploy the model

<img src="https://github.com/mir-group/nequip_mrs_tutorial/blob/master/deploy.png?raw=true" width="60%">

We now convert the model to a potential file. This makes it independent of NequIP and we can use it any downstream application, such as LAMMPS. 

In [None]:
!nequip-deploy build --train-dir results/toluene/example-run-toluene toluene-deployed.pth

## Evaluate Test Error on all remaining frames

Before running inference, we'd like to know how well the model is doing on a hold-out test set. We run the nequip-evaluate command to compute the test error on all data that we didn't use for training or validation. 

In [None]:
!nequip-evaluate --train-dir results/toluene/example-run-toluene --batch-size 50

Again, energy errors of < 1meV/atom (converted from kcal/mol to eV), and force errors of ~45 meV/A 🎉

# LAMMPS

We are now in a position to run MD with our potential. Here, we will minimize the geometry of the toluene molecule we trained on from a perturbed initial state. 

<img src="https://github.com/mir-group/nequip_mrs_tutorial/blob/master/run.png?raw=true" width="60%">

Set up a simple LAMMPS input file

CAUTION: the reference data here are in kcal/mol for the energies and kcal/mol/A for the forces. The NequIP model will therefore also be predicting outputs in these units. We are therefore using `units real` in LAMMPS (see [docs](https://docs.lammps.org/units.html)). If your reference data are in other units, you should using the corresponding units command in LAMMPS (e.g. if you use eV, A then `units metal` would be appropriate, which would then also change time units from `fs` to `ps`).

In [None]:
lammps_input_minimize = """
units	real
atom_style atomic
newton off
thermo 1
read_data structure.data

pair_style	nequip
pair_coeff	* * ../toluene-deployed.pth C H 
mass            1 15.9994
mass            2 1.00794

neighbor 1.0 bin
neigh_modify delay 5 every 1

minimize 0.0 1.0e-8 10000 1000000
write_dump all custom output.dump id type x y z fx fy fz
"""
!mkdir lammps_run
with open("lammps_run/toluene_minimize.in", "w") as f:
    f.write(lammps_input_minimize)

Here's starting configuration for Toluene at CCSD(T) accuracy. We will strongly perturb the inital positions by sampling from a uniform distribution $\mathcal{U}([0, 0.5])$

In [None]:
toluene_example = """15
 Lattice="100.0 0.0 0.0 0.0 100.0 0.0 0.0 0.0 100.0" Properties=species:S:1:pos:R:3 -169777.5840406276=T pbc="F F F"
 C       52.48936904      49.86911725      50.09520748
 C       51.01088202      49.89609925      50.17978049
 C       50.36647401      50.04650925      48.96054247
 C       48.95673398      50.29576626      48.71580846
 C       48.04533296      50.26023426      49.82589448
 C       48.70932398      49.85770925      51.01923950
 C       50.06326400      49.77782925      51.25691751
 H       52.94467905      50.48672926      50.86545150
 H       52.89060405      48.87175023      50.14480949
 H       53.02173405      50.05890725      49.03968247
 H       51.01439802      50.38234726      48.05314045
 H       48.80598498      50.64314926      47.68195744
 H       46.96754695      50.20586626      49.53998848
 H       48.16716997      49.75850325      51.88622952
 H       50.45791001      49.55387424      52.15303052
 """

with open('toluene.xyz', 'w') as f: 
    f.write(toluene_example)

# read as ASE objects
atoms = read('toluene.xyz', format='extxyz')

# perturb positions
p = atoms.get_positions()
p += np.random.rand(15, 3) * 0.5
atoms.set_positions(p)
atoms.set_pbc(False)

# write to a LAMMPS file
write("lammps_run/structure.data", atoms, format="lammps-data")

### Run the LAMMPS command: 

In [None]:
!cd lammps_run/ && ../lammps/build/lmp -in toluene_minimize.in

We see LAMMPS converges quickly to a minimum. Let's check how well we did. 

In [None]:
# read the final structure back in 
minimized = read('./lammps_run/output.dump', format='lammps-dump-text')

### Compare optimized bond length to true coupled cluster reference from CCCBDB

In [None]:
# get distances of optimized geometry (reference data: CCSD(T) [Psi4, cc-pVDZ])
d_12 = minimized.get_distances(1, 2)

# reference: https://cccbdb.nist.gov/geom3x.asp?method=6&basis=2, coupled cluster
d_12_ccd = 1.4086

print('Relative Error in bond length w.r.t. Coupled Cluster from CCCBDB: {:.3f}%'.format((100 * np.abs(d_12 - d_12_ccd) / d_12_ccd)[0]))

We find a final relative error close to Coupled Cluster accuracy 🎉

## Next Steps

This concludes our tutorial. A next step would be to head over to https://github.com/mir-group/nequip, install NequIP and get started with your own system. If you have questions, please don't hesitate to reach out to batzner@g.harvard.edu, we're happy to help! 

