# Introduction
<!-- Banner -->
<div style="background-color:#DCE9F8; border-radius:10px; padding:20px; display:flex; align-items:center; justify-content:space-between; margin-bottom:20px;">

  <!-- Centered Title -->
  <div style="flex:1; text-align:center;">
    <h1 style="margin:0; font-size:5rem; color:#1E4D9D;">NequIP/Allegro 0.6.2/0.3.0 Zadar Tutorial</h1>
  </div>

  <!-- Right-aligned Logo -->
  <div style="flex:1; text-align:center;">
    <img src="https://github.com/mir-group/nequip/blob/main/logo.png?raw=true" style="width:300px;">
  </div>
</div>

<!-- Tutorial Introduction -->
<div style="background-color:#ffffff; border-left:0px solid #3C82E3; border-radius:10px; padding:0px; font-size:1.1rem; color:#1E4D9D; margin-bottom:20px;">
  <h2 style="margin-top:0; font-size:2rem; color:#3C82E3;">Introduction</h2>
  <p>This is a tutorial for <b><code>NequIP</code></b>, an architecture for building highly accurate and scalable Machine Learning Interatomic Potentials (MLIPs) and deploy them in production simulations. The ideas are described in <a href="https://www.nature.com/articles/s41467-022-29939-5" target="_blank" style="color:#3C82E3; text-decoration:none;">E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials</a>. <b><code>NequIP</code></b> is available as an open-source package <a href="https://github.com/mir-group/allegro" target="_blank" style="color:#3C82E3; text-decoration:none;"> HERE</a>. This tutorial serves as a simple introduction to the <b><code>NequIP</code></b> code. </p>
</div>

<!-- Contents Section -->
<div style="background-color:#ffffff; border-left:0px solid #3C82E3; border-radius:10px; padding:0px; margin-bottom:20px;">
  <h2 style="margin-top:0; font-size:2rem; color:#3C82E3;">Contents</h2>
  <p style="font-size:1.1rem; color:#1E4D9D;">
    This tutorial will walk you through:
  </p>
  <ul style="list-style:disc; padding-left:20px; font-size:1.1rem; color:#1E4D9D;">
    <li style="margin-bottom:10px;">
        <b>Train</b>: Train a neural network potential using a simple dataset.
    </li>
    <li style="margin-bottom:10px;">
        <b>Deploy</b>: Convert the Python-based model into a stand-alone potential file optimized for fast execution.
    </li>
    <li style="margin-bottom:10px;">
        <b>Run</b>: Use the trained model to perform tasks such as MD in <b><code>LAMMPS</code></b>.
    </li>
    <!-- <li style="margin-bottom:10px;">
        <b>(Optional) Extend the model with custom code</b>
    </li> -->
</ul>

  <p style="font-size:1.1rem; color:#1E4D9D;">
    Everything will happen in this Notebook. We're ready to get started!
  </p>
</div>


# Setup

<div style="background-color:#ffffff; border-left: 0px solid #3C82E3; border-radius: 5px; padding: 0px; margin-bottom: 20px; font-size: 1.1rem; color:#333;">

  <!-- Title -->
  <h2 style="margin-top: 0; font-size: 2rem; color: #3C82E3;">⚙️ Setup</h2>

  <!-- Introduction -->
  <p style="margin-bottom: 10px;">
    The following tools are used throughout the tutorial.
  </p>

  <!-- List of Tools -->
  <ul style="list-style: disc; padding-left: 10px;">
    <li style="margin-bottom: 5px;"><b>NequIP</b></li>
    <li style="margin-bottom: 5px;"><b>pair-nequip</b></li>
    <li style="margin-bottom: 5px;"><b>LAMMPS</b></li>
  </ul>

</div>

In [None]:
import wandb

In [None]:
%%capture

## set anonymous WandB
import os
os.environ["WANDB_ANONYMOUS"] = "must"

# Your first model - a test
<div style="background-color:#ffffff; border-left: 0px solid #3C82E3; border-radius: 5px; padding: 0px; margin-bottom: 20px; font-size: 1.1rem; color:#333;">

  <!-- Title -->
  <h2 style="margin-top: 0; font-size: 2rem; color: #3C82E3;">🦾 Let's see if the installation went fine, let's train our first (minimal) model!</h2>

</div>

In [None]:
%%bash

# activate conda environment to access the nequip-train command
source /opt/conda/bin/activate base && conda activate T1

## minimal - runtime ~1min

# if we are running this cell (first train) start from scratch and re-download everything
if [ -d "results" ]; then rm -rf results; fi
if [ -d "benchmark_data" ]; then rm -rf benchmark_data; fi
if [ -e "test_1_out.txt" ]; then rm -r test_1_out.txt; fi

# print help for the nequip-train command
#nequip-train --help

# run allegro
nequip-train allegro/configs/minimal.yaml &> test_1_out.txt

In [None]:
# Tail the final lines of the output
!tail -n 7 test_1_out.txt

# Let's upgrade the model
<div style="background-color:#ffffff; border-left: 0px solid #3C82E3; border-radius: 5px; padding: 0px; margin-bottom: 20px; font-size: 1.1rem; color:#333;">

  <!-- Title -->
  <h2 style="margin-top: 0; font-size: 2rem; color: #3C82E3;">Let's tweak the example file</h2>

  <!-- Introduction -->
  <p style="margin-bottom: 10px;">
    Let's take a look at the final metrics and exit condition. Can we improve?
  </p>

  <!-- List of Tools -->
  <ul style="list-style: disc; padding-left: 10px;">
    <li style="margin-bottom: 5px;">We can try to boost up the maximum iteration.</li>
    <li style="margin-bottom: 5px;">What about model angular descriptive power?</li>
    <li style="margin-bottom: 5px;">Any other suggestions?</li>
  </ul>

</div>

In [None]:
!cat allegro/configs/minimal.yaml | grep -e "l_max" -e "max_epochs" -e "num_layers" -e "num_tensor_features" -e "num_bessels_per_basis"

In [None]:
%%bash

variable_1_Lmax="2"                             # originally 1
variable_2_epochsmax="20"                       # originally 10
variable_3_nlayers="3"                          # originally 2
variable_4_tensor_features="32"                 # originally 32
variable_5_r_basis="8"                          # originally 8
variable_6_batch="1"                            # originally 1

## arch update needed
variable_7_resnet_update="[1.0, 1.0, 1.0, 1.0]" # originally size 3 vec

awk '{ \
  if ($0 ~ /root:/) { sub(/root:.*/, "root: results/aspirin_boosted") }; \
  if ($0 ~ /run_name:/) { sub(/run_name:.*/, "run_name: aspirin_boosted") }; \
  if ($0 ~ /l_max:/) { sub(/l_max:.*/, "l_max: '"$variable_1_Lmax"'") }; \
  if ($0 ~ /max_epochs:/) { sub(/max_epochs:.*/, "max_epochs: '"$variable_2_epochsmax"'") }; \
  if ($0 ~ /num_layers:/) { sub(/num_layers:.*/, "num_layers: '"$variable_3_nlayers"'") }; \
  if ($0 ~ /num_tensor_features:/) { sub(/num_tensor_features:.*/, "num_tensor_features: '"$variable_4_tensor_features"'") }; \
  if ($0 ~ /num_bessels_per_basis:/) { sub(/num_bessels_per_basis:.*/, "num_bessels_per_basis: '"$variable_5_r_basis"'") }; \
  if ($0 ~ /batch_size:/) { sub(/batch_size:.*/, "batch_size: '"$variable_6_batch"'") }; \
  if ($0 ~ /latent_resnet_coefficients:/) { sub(/latent_resnet_coefficients:.*/, "latent_resnet_coefficients: '"$variable_7_resnet_update"'") }; \
  print \
}' allegro/configs/minimal.yaml > allegro/configs/minimal_boosted.yaml

In [None]:
%%bash

# activate conda environment to access the nequip-train command
source /opt/conda/bin/activate base && conda activate T1

## minimal_boosted - runtime ~3min

aspirin_boosted_path="results/aspirin_boosted"
if [ -d "$aspirin_boosted_path" ]; then rm -rf $aspirin_boosted_path; fi
if [ -e "test_2_out.txt" ]; then rm -r test_2_out.txt; fi

# run allegro
nequip-train allegro/configs/minimal_boosted.yaml &> test_2_out.txt

In [None]:
# Tail the final lines of the output
!tail -n 7 test_2_out.txt

# Let's make ending training smarter
<div style="background-color:#ffffff; border-left: 0px solid #3C82E3; border-radius: 5px; padding: 0px; margin-bottom: 20px; font-size: 1.1rem; color:#333;">

  <!-- Title -->
  <h2 style="margin-top: 0; font-size: 2rem; color: #3C82E3;">Let's make training smarter</h2>

  <!-- Introduction -->
  <p style="margin-bottom: 10px;">
    Notice that as it stands, the training is not very smart, i.e., it goes brutally to the max number of epochs. Can we improve?
  </p>

  <!-- List of Tools -->
  <ul style="list-style: disc; padding-left: 10px;">
    <li style="margin-bottom: 5px;"><b>Important notion</b>: lowering learning rate on plateau.</li>
    <li style="margin-bottom: 5px;">Loss function is focusing only on forces.</li>
  </ul>

</div>

In [None]:
%%bash

## let's update the loss function to include forces AND energies, and weight them "appropriately"
loss_lines="\n  forces: 1.\n  total_energy:\n    - 1.\n    - PerAtomMSELoss"

## change LR on plateau
lr_scheduler_lines="\n#LR Scheduler\nlr_scheduler_name: ReduceLROnPlateau\nlr_scheduler_patience: 3\nlr_scheduler_factor: 0.5"

## since we are at it, let's add some early stopping flags
## walltime (10 mins)
walltime_lines="\n#Early stopping based on walltime\nearly_stopping_upper_bounds:\n  cumulative_wall: 600."
## LR dropping below 1e-5
LR_lower_bound_lines="\n#Early stopping based on LR reduction\nearly_stopping_lower_bounds:\n  LR: 1.0e-5"
## no patience on no improvement in val loss
val_loss_impatience_lines="\n#Early stopping based on val loss\nearly_stopping_patiences:\n  validation_loss: 100"

variable_2_epochsmax="100"                       # originally 10, then 20, now 100

awk '{ \
  if ($0 ~ /root:/) { sub(/root:.*/, "root: results/aspirin_boosted_smarter") }; \
  if ($0 ~ /run_name:/) { sub(/run_name:.*/, "run_name: aspirin_boosted_smarter") }; \
  if ($0 ~ /loss_coeffs:/) { sub(/loss_coeffs:.*/, "loss_coeffs: '"$loss_lines"'") }; \
  if ($0 ~ /max_epochs:/) { sub(/max_epochs:.*/, "max_epochs: '"$variable_2_epochsmax"'") }; \
  print \
}' allegro/configs/minimal_boosted.yaml > allegro/configs/minimal_boosted_smarter.yaml

echo -e $lr_scheduler_lines >> allegro/configs/minimal_boosted_smarter.yaml
echo -e $walltime_lines >> allegro/configs/minimal_boosted_smarter.yaml
echo -e $LR_lower_bound_lines >> allegro/configs/minimal_boosted_smarter.yaml
echo -e $val_loss_impatience_lines >> allegro/configs/minimal_boosted_smarter.yaml

In [None]:
%%bash

# activate conda environment to access the nequip-train command
source /opt/conda/bin/activate base && conda activate T1

## minimal_boosted_smarter - runtime ~Xmin

aspirin_boosted_smarter_path="results/aspirin_boosted_smarter"
if [ -d "$aspirin_boosted_smarter_path" ]; then rm -r $aspirin_boosted_smarter_path; fi
if [ -e "test_3_out.txt" ]; then rm -r test_3_out.txt; fi

# run allegro
nequip-train allegro/configs/minimal_boosted_smarter.yaml &> test_3_out.txt

In [None]:
# Tail the final lines of the output
!tail -n 7 test_3_out.txt

# Let's plot
<div style="background-color:#ffffff; border-left: 0px solid #3C82E3; border-radius: 5px; padding: 0px; margin-bottom: 20px; font-size: 1.1rem; color:#333;">

  <!-- Title -->
  <h2 style="margin-top: 0; font-size: 2rem; color: #3C82E3;">Let's plot the validation of our trainings!</h2>

</div>

In [None]:
import numpy as np
import matplotlib.pyplot as plt

logfiles = ["test_1_out.txt", "test_2_out.txt", "test_3_out.txt"]
what_we_want = ["Epoch", "f_mae"]

def get_val_metrics(logfile_name, what_we_want_array):
  extracted_validation_data = {col: [] for col in what_we_want_array}

  with open(logfile) as f:
    data = f.readlines()
    data = np.array([x.split() for x in data if "Validation" in x.split() or "Train" in x.split()][1:])

    for wanted in what_we_want_array:
      index_wanted = np.where(data[0,:] == wanted)[0][0].astype(int)
      extracted_validation_data[wanted] = data[2::3][:,index_wanted]

  return extracted_validation_data

fig, axs = plt.subplots(1, 3, figsize=(15, 5), sharex=True, sharey=True)
for idx, logfile in enumerate(logfiles):

  data = get_val_metrics(logfile, what_we_want)
  epochs = data["Epoch"].astype(int)
  f_mae = data["f_mae"].astype(float)
  axs[idx].semilogy(epochs, f_mae)

  axs[idx].set_xlabel("Epoch")
  axs[idx].set_ylabel("Validation f_mae")
  axs[idx].set_title(f"Validation f_mae from {logfile}")

plt.tight_layout()
plt.show()

In [None]:
%%bash

# activate conda environment to access the nequip-train command
source /opt/conda/bin/activate base && conda activate T1

## minimal_boosted_smarter - runtime ~3min

rm -rf ./results/silicon-tutorial
nequip-train ./Si_info/Si_tutorial.yaml &> Si_tutorial_out.txt

In [None]:
# tail the final lines of the output
!tail -n 100 Si_tutorial_out.txt

# Let's lmp
<div style="background-color:#ffffff; border-left: 0px solid #3C82E3; border-radius: 5px; padding: 0px; margin-bottom: 20px; font-size: 1.1rem; color:#333;">

  <!-- Title -->
  <h2 style="margin-top: 0; font-size: 2rem; color: #3C82E3;">Let's LAMMPS!</h2>

  <!-- Introduction -->
  <p style="margin-bottom: 10px;">
    We will be doing the following three things:
  </p>

  <!-- List of Tools -->
  <ul style="list-style: disc; padding-left: 10px;">
    <li style="margin-bottom: 5px;">Deploy (compile) the learned model.</li>
    <li style="margin-bottom: 5px;">Generate files for LAMMPS</li>
    <li style="margin-bottom: 5px;">Run!</li>
  </ul>

</div>

In [None]:
%%bash

# activate conda environment to access the nequip-deploy command
source /opt/conda/bin/activate base && conda activate T1

# deploy
nequip-deploy build --train-dir results/silicon-tutorial/Si-tutorial si-deployed.pth

In [None]:
# info for lammps

# 1: the structure
from ase.io import read, write
from os import system

Si_str_start = read('./Si_info/sitraj.extxyz', index=0)
system(f"mkdir ./Si_run")
write('./Si_run/si.data', Si_str_start, format='lammps-data')

# 2: the input file
lammps_input = """
units	metal
atom_style atomic
dimension 3

# set newton on for pair_allegro (off for pair_nequip)
newton on
boundary p p p
read_data ./si.data

# let's make it bigger
replicate 3 3 3

# allegro pair style
pair_style	allegro
pair_coeff	* * ../si-deployed.pth Si

mass 1 28.0855

velocity all create 300.0 1234567 loop geom

neighbor 1.0 bin
neigh_modify delay 5 every 1

timestep 0.001
thermo 10

# nose-hoover thermostat, 300K
fix  1 all nvt temp 300 300 $(100*dt)

# compute rdf and average after some equilibration
comm_modify cutoff 7.0
compute rdfall all rdf 1000 cutoff 5.0
fix 2 all ave/time 1 2500 5000 c_rdfall[*] file si.rdf mode vector

# run 5ps
run 5000
"""
with open("Si_run/si_rdf.in", "w") as f:
    f.write(lammps_input)

In [None]:
## runtime ~3min

# 3: run lammps!
# note: change LAMMPS path below if needed

!cd ./Si_run && /opt/lammps/build/lmp -in si_rdf.in

In [None]:
import numpy as np
import matplotlib.pyplot as plt

!gdown --no-cookies 1aa2Kga_w-Zcw6BsmzJqH67NPcVwgHPS- --output Si_exp_1.txt
!mv ./Si_exp_1.txt Si_run/

with open("./Si_run/si.rdf", "r") as f:
    data_allegro = f.readlines()
    data_allegro = np.array([x.split() for x in data_allegro[4:]]).astype(float)

with open("./Si_info/Si_exp_1.txt", "r") as f:
    data_exp = f.readlines()
    data_exp = np.array([x.split() for x in data_exp[3:]]).astype(float)

plt.figure(figsize=(15,8))
plt.plot(data_allegro[:,1], data_allegro[:,2], label="Si, Allegro, $T=300K$")
plt.plot(data_exp[:,0], data_exp[:,1], label="Si, some Exp")

plt.xlim(1.5, 4.0)
plt.xlabel('r [$\AA$]')
plt.ylabel('g(r)')
plt.legend(loc='upper right')
plt.show()