<a href="https://colab.research.google.com/github/giorginolab/2024-MolSim-UniPD/blob/main/OpenMM_2024.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Colab-specific instructions start here

Note to myself: a copy of this file should be in https://github.com/giorginolab/MD-Tutorial-Data/blob/main/notebooks/1_OpenMM_build.ipynb

In [1]:
# Here we use a Conda environment inside Google Colab. Blocks specific for Colab
# (like this one) mention "condacolab". On "normal" platforms the procedure
# for installation may be different - you need to check the system's documentation.

# Colab notebooks are "brittle": in the course of time Colab is updated
# and dependencies no longer work properly. Proper HPC platforms are more
# stable (and supported)

# After executing this cell, Colab restarts.

if 'google.colab' in str(get_ipython()):
  ! pip install -q condacolab
  import condacolab
  condacolab.install_miniforge()

[0m✨🍰✨ Everything looks OK!


In [2]:
if 'google.colab' in str(get_ipython()):
  condacolab.check()

✨🍰✨ Everything looks OK!


In [3]:
# Colab-specific workaround for a weird error upon shell escape:
#   NotImplementedError: A UTF-8 locale is required. Got ANSI_X3.4-1968
if 'google.colab' in str(get_ipython()):
  import locale
  def getpreferredencoding(do_setlocale = True):
    return "UTF-8"
  locale.getpreferredencoding = getpreferredencoding

In [4]:
# Install OpenMM. Takes a long time (unless already installed).
if 'google.colab' in str(get_ipython()):
  !conda install -q -c conda-forge openmm openmmforcefields openmmtools pdbfixer

Channels:
 - conda-forge
Platform: linux-64
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... done

## Package Plan ##

  environment location: /usr/local

  added / updated specs:
    - openmm
    - openmmforcefields
    - openmmtools
    - pdbfixer


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ambertools-23.6            |nompi_py310hba4a467_102        90.3 MB  conda-forge
    annotated-types-0.6.0      |     pyhd8ed1ab_0          17 KB  conda-forge
    anyio-4.3.0                |     pyhd8ed1ab_0         100 KB  conda-forge
    argon2-cffi-23.1.0         |     pyhd8ed1ab_0          18 KB  conda-forge
    argon2-cffi-bindings-21.2.0|  py310h2372a71_4          34 KB  conda-forge
    arpack-3.8.0               |nompi_h0baa96a_101         214 KB  conda-forge
    arrow-1.3.0                |     pyhd8ed1ab_0          98 KB  conda-forge
 

# Generic installation instructions

In [5]:
# Verify Python version
import sys
sys.version

'3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]'

In [6]:
# Verify GPU availability and type. If you get an error, check that
# "Runtime / Runtime type / GPU" is selected.
!nvidia-smi

/bin/bash: line 1: nvidia-smi: command not found


# Tests

In [7]:
# A quick test
import openmm.testInstallation
openmm.testInstallation.main()


OpenMM Version: 8.1.1
Git Revision: ec797acabe5de4ce9f56c92d349baa889f4b0821

There are 2 Platforms available:

1 Reference - Successfully computed forces
2 CPU - Successfully computed forces

Median difference in forces between platforms:

Reference vs. CPU: 6.30273e-06

All differences are within tolerance.


In [8]:
#condacolab does not set CONDA_PREFIX
import os
pfx=os.environ.get("CONDA_PREFIX","/usr/local")
%env PFX=$pfx

env: PFX=/usr/local


In [9]:
!(cd $PFX/share/openmm/examples; python benchmark.py)

usage: benchmark.py [-h] [--platform {CPU,Reference}] [--test TEST] [--ensemble ENSEMBLE]
                    [--pme-cutoff PME_CUTOFF] [--seconds SECONDS]
                    [--polarization {direct,extrapolated,mutual}] [--mutual-epsilon EPSILON]
                    [--bond-constraints BOND_CONSTRAINTS] [--device DEVICE]
                    [--opencl-platform OPENCL_PLATFORM] [--precision PRECISION]
                    [--style {simple,table}] [--outfile OUTFILE] [--serialize SERIALIZE]
                    [--verbose]
benchmark.py: error: No platform specified


In [10]:
# A more realistic benchmark. Note the "ns_per_day" figure
!(cd $PFX/share/openmm/examples; python benchmark.py --platform CUDA --test pme --seconds 5  --precision mixed)

usage: benchmark.py [-h] [--platform {CPU,Reference}] [--test TEST] [--ensemble ENSEMBLE]
                    [--pme-cutoff PME_CUTOFF] [--seconds SECONDS]
                    [--polarization {direct,extrapolated,mutual}] [--mutual-epsilon EPSILON]
                    [--bond-constraints BOND_CONSTRAINTS] [--device DEVICE]
                    [--opencl-platform OPENCL_PLATFORM] [--precision PRECISION]
                    [--style {simple,table}] [--outfile OUTFILE] [--serialize SERIALIZE]
                    [--verbose]
benchmark.py: error: argument --platform: invalid choice: 'CUDA' (choose from 'CPU', 'Reference')


In [11]:
# CUDA (NVIDIA GPU) is the fastest platform. You can test the others with...
%env OPENMM_CPU_THREADS 2
!(cd $PFX/share/openmm/examples; python benchmark.py --platform CPU --test pme --seconds 2  --precision mixed)

/bin/sh: 1: nvidia-smi: not found
hostname: 99aa13529bf4
timestamp: 2024-05-06T11:54:56.004323
openmm_version: 8.1.1.dev-ec797ac
cpuinfo: Intel(R) Xeon(R) CPU @ 2.20GHz
cpuarch: x86_64
system: Linux
test: pme
constraints: HBonds
hydrogen_mass: 1.5
cutoff: 0.9
ensemble: NVT
precision: mixed
timestep_in_fs: 4.0
platform: CPU
platform_properties: {'Threads': '2', 'DeterministicForces': 'false'}
steps: 20
elapsed_time: 4.807677
ns_per_day: 1.437700577638639

usage: benchmark.py [-h] [--platform {CPU,Reference}] [--test TEST] [--ensemble ENSEMBLE]
                    [--pme-cutoff PME_CUTOFF] [--seconds SECONDS]
                    [--polarization {direct,extrapolated,mutual}] [--mutual-epsilon EPSILON]
                    [--bond-constraints BOND_CONSTRAINTS] [--device DEVICE]
                    [--opencl-platform OPENCL_PLATFORM] [--precision PRECISION]
                    [--style {simple,table}] [--outfile OUTFILE] [--serialize SERIALIZE]
                    [--verbose]
benchmark.py: e

In [None]:
!(cd $PFX/share/openmm/examples; python benchmark.py --platform OpenCL --test pme --seconds 2  --precision mixed)

# Here begins the simulations tutorial proper

In [12]:
from openmm.app import *
from openmm import *
from openmm.unit import *
from pdbfixer import *
from sys import stdout

## Download, fix missing atoms, solvate

Can also be done on the command line with the `pdbfixer` executable.

In [13]:
# Retrieve the structure from the RCSB
fixer = PDBFixer(pdbid="6H1F")

# Add missing (unresolved) residues. We don't want to model anything.
fixer.findMissingResidues()
fixer.missingResidues = {}
# fixer.addMissingResidues()

# Add missing (unresolved) atoms
fixer.findMissingAtoms()
fixer.addMissingAtoms()

# Protonate (roughly) at chosen pH
fixer.addMissingHydrogens(pH=7.0)

# Explicit solvent: 10 nm^3 box
fixer.addSolvent(boxSize=10 * Vec3(1, 1, 1))

# Save the file so it can be inspected
PDBFile.writeFile(fixer.topology, fixer.positions, open("6H1F-fixed.pdb", "w"))

## Modeling

Modelling step. Sometimes unnecessary. Here it is
needed to remove an "SCN" (THIOCYANATE ION) residue.


In [14]:
# There is an "SCN" residue to remove
modeller = Modeller(fixer.topology, fixer.positions)

res_SCN = [r for r in modeller.topology.residues() if r.name == "SCN"]
modeller.delete(res_SCN)

PDBFile.writeFile(
    modeller.topology, modeller.positions, open("6H1F-modelled.pdb", "w"), keepIds=True
)


## Create integration-related objects

In [15]:
# The FF object holds the parameters
forcefield = ForceField("amber14-all.xml", "amber14/tip3pfb.xml")

In [16]:
# This specifies the system to be simulated.
system = forcefield.createSystem(
    modeller.topology,
    nonbondedMethod=PME,
    nonbondedCutoff=1 * nanometer,
    constraints=HBonds,
)

In [17]:
# Specify the integrator: temperature, relaxation time, timestep (important)
integrator = LangevinMiddleIntegrator(300 * kelvin, 1 / picosecond, 0.004 * picoseconds)

In [18]:
# The barostat is added to the system so that density is controlled
# in addition to temperature.

# Pressure, Temperature (only used for calculation),
# Frequency (how frequently the system should update the box size)
barostat = MonteCarloBarostat(1.0 * atmosphere, 300.0 * kelvin, 25)

system.addForce(barostat)


5

In [19]:
# Combines the molecular topology, system, and integrator
# to begin a new simulation.
simulation = Simulation(modeller.topology, system, integrator)
simulation.context.setPositions(modeller.positions)

## Minimize energy

In [20]:
# Perform local energy minimization
print("Minimizing energy...")
simulation.minimizeEnergy(maxIterations=500)


# Write the minimized coordinates (for checking)
PDBFile.writeFile(
    simulation.topology,
    simulation.context.getState(getPositions=True).getPositions(),
    open("6H1F-minimized.pdb", "w"),
    keepIds=True,
)

Minimizing energy...


## Integrate

In [21]:
Nsteps = 5000

In [22]:
# When the simulation runs, it will write the trajectory to a file called "output.pdb"
simulation.reporters.append(
    DCDReporter("output.dcd", reportInterval=1000, enforcePeriodicBox=True)
)


In [23]:
# Also report infomation to the screen as the simulation runs
simulation.reporters.append(
    StateDataReporter(
        stdout,
        100,
        step=True,
        time=True,
        potentialEnergy=True,
        kineticEnergy=True,
        totalEnergy=True,
        temperature=True,
        volume=True,
        density=True,
        progress=True,
        remainingTime=True,
        speed=True,
        elapsedTime=True,
        separator=" ",
        totalSteps=Nsteps,
    )
)



In [None]:
# Finally run the simulation for the given timesteps
print("Running simulation...")
simulation.step(Nsteps)

Running simulation...
#"Progress (%)" "Step" "Time (ps)" "Potential Energy (kJ/mole)" "Kinetic Energy (kJ/mole)" "Total Energy (kJ/mole)" "Temperature (K)" "Box Volume (nm^3)" "Density (g/mL)" "Speed (ns/day)" "Elapsed Time (s)" "Time Remaining"
2.0% 100 0.4000000000000003 -1738426.0665760953 98364.34190830702 -1640061.7246677883 120.54544768054288 993.18810550496 0.9837224499575474 0 0.002024412155151367 --
4.0% 200 0.8000000000000006 -1695078.1213151705 148990.14861543776 -1546087.9726997328 182.58734635342284 983.5386092541745 0.9933737498692783 0.831 41.60710525512695 33:17
6.0% 300 1.2000000000000008 -1664213.2641333784 180629.618515231 -1483583.6456181474 221.3614995623261 980.4733099136511 0.9964793804556283 0.833 82.96896243095398 32:29
8.0% 400 1.6000000000000012 -1643280.4106627689 201426.46660654966 -1441853.9440562192 246.8480256232556 980.4733099136511 0.9964793804556283 0.803 129.10689210891724 32:59
10.0% 500 2.0000000000000013 -1625251.9301168104 213414.1799416183 -1411

# Results

The simulation is completed. Now download the minimized PDB file (that gives the starting coordinates and the identity of the atoms) and the DCD file (a binary file, providing the trajectory, i.e. a series of snapshots of the coordinates). They can be best visualized locally on e.g. PyMol or VMD.
