## OpenMM simulation for small proteins

In this section, we will demonstrate how to set up and simulate a small molecular system using **OpenMM’s built-in force fields**.

This approach will:
- **Directly load a PDB file**
- **Assign force field parameters** using built-in XML force fields (e.g., `amber14-all.xml`, `amber14/tip3p.xml`)

This is ideal for:
- Small proteins or peptides
- Simple systems with standard residues
- Quick prototyping and testing

> We will follow these steps:
> 1. Load the PDB structure
> 2. Apply built-in force fields
> 3. Solvate the system
> 4. Minimize energy and equilibrate
> 5. Run a short MD simulation

## Chignolin (PDB ID: 1UAO)

We now shift to simulating a **small, de novo designed protein**.

![Chignolin Structure](image1.jpeg)


- **PDB ID**: [1UAO](https://www.rcsb.org/structure/1UAO)
- **Length**: 10 amino acids
- **Sequence**: GYDPETGTWG
- **Structure Type**: β-hairpin
- **Atom Count**: ~77 atoms
- **Molecular Weight**: ~1.08 kDa

Chignolin is a **minimal β-hairpin peptide** designed from statistical analyses of protein segments. Despite its size, it exhibits well-defined secondary structure and cooperative folding, making it a popular benchmark for folding studies.

- Very small → ideal for testing new simulation workflows
- Fast to simulate → perfect for prototyping
- Compatible with built-in OpenMM force fields (e.g., `amber14-all.xml`, `amber14/tip3p.xml`)
- No need for external parameterization (Amber, GROMACS, CHARMM)

We will now proceed to:

1. **Download the PDB file** from the RCSB
2. **Load it into OpenMM**
3. **Apply built-in force fields**
4. **Solvate the system**
5. **Minimize, equilibrate, and run MD**

## Download and load chignolin (1UAO)

We now load the structure, apply force fields, and solvate it.

In [None]:
from openmm.app import PDBFile, Modeller, ForceField
from openmm.unit import nanometer

# Load the PDB file
pdb = PDBFile("1uao.pdb")

# Load force fields
forcefield = ForceField("amber14-all.xml", "amber14/tip3p.xml")

# Create modeller and apply modifications
modeller = Modeller(pdb.topology, pdb.positions)
modeller.addHydrogens(forcefield)
modeller.addSolvent(forcefield, model='tip3p', padding=1.0 * nanometer)

# Show system size
print(f"Number of atoms after solvation: {modeller.topology.getNumAtoms()}")

## Build the OpenMM system

We now generate the system object, which will be used for simulation.

In [None]:
from openmm.app import PME, HBonds

# Create the system from the solvated topology
system = forcefield.createSystem(modeller.topology,
                                 nonbondedMethod=PME,
                                 nonbondedCutoff=1.0 * nanometer,
                                 constraints=HBonds)

## Energy minimization

We minimize the system to remove any steric clashes introduced during hydrogen addition or solvation.

In [None]:
from openmm.unit import kelvin, picosecond, femtosecond, kilojoule_per_mole, nanometer
from openmm import LangevinIntegrator, LocalEnergyMinimizer, Platform
from openmm.app import Simulation, PDBFile

# Set up integrator
integrator = LangevinIntegrator(300 * kelvin, 1 / picosecond, 2 * femtosecond)

# Use GPU 
platform = Platform.getPlatformByName("CUDA") 

# Create simulation object
simulation = Simulation(modeller.topology, system, integrator, platform)
simulation.context.setPositions(modeller.positions)

# Get and print initial energy
initial_state = simulation.context.getState(getEnergy=True)
initial_energy = initial_state.getPotentialEnergy()
print(f"Initial potential energy: {initial_energy}")

# Run minimization
print("Running energy minimization...")
LocalEnergyMinimizer.minimize(simulation.context, tolerance=1.0 * kilojoule_per_mole / nanometer, maxIterations=1000)

# Get and print minimized energy
minimized_state = simulation.context.getState(getEnergy=True, getPositions=True)
minimized_energy = minimized_state.getPotentialEnergy()
print(f"Minimized potential energy: {minimized_energy}")

# Save minimized structure
PDBFile.writeFile(simulation.topology, minimized_state.getPositions(), open("pre_equilibration.pdb", "w"))
print("Minimization complete.")

## Equilibration

We now equilibrate the system at 300 K for 100 ps.

In [None]:
from openmm.app import StateDataReporter, DCDReporter
from openmm import XmlSerializer
import time
import sys

# Initialize velocities at 300 K
simulation.context.setVelocitiesToTemperature(300 * kelvin)

# Write to terminal (stdout)
simulation.reporters.append(StateDataReporter(sys.stdout, 10000, step=True,
                                               potentialEnergy=True, kineticEnergy=True,
                                               totalEnergy=True, temperature=True, speed=True, separator="\t"))

# Write to file
simulation.reporters.append(StateDataReporter("equilibration.log", 10000, step=True,
                                               potentialEnergy=True, kineticEnergy=True,
                                               totalEnergy=True, temperature=True, speed=True, separator="\t"))

# DCD trajectory output
simulation.reporters.append(DCDReporter("equilibration.dcd", 10000))

# Run equilibration (50000 steps at 2 fs = 100 ps)
print("Running equilibration...")
start = time.time()
simulation.step(50000)
end = time.time()
print(f"Equilibration complete in {end - start:.2f} seconds.")

# Save the final state
state = simulation.context.getState(getPositions=True, getVelocities=True)
with open("equilibration.xml", "w") as f:
    f.write(XmlSerializer.serialize(state))

print("Equilibrated state saved to 'equilibration.xml'.")

## MD simulation

We now load the equilibrated state and continue the simulation for 500,000 steps (1 ns).

In [None]:
from openmm.app import Simulation, StateDataReporter, DCDReporter
from openmm import XmlSerializer
import sys

# Load saved state from equilibration
with open("equilibration.xml", "r") as f:
    state = XmlSerializer.deserialize(f.read())

system = forcefield.createSystem(modeller.topology,
                                 nonbondedMethod=PME,
                                 nonbondedCutoff=1.0 * nanometer,
                                 constraints=HBonds)

# Create integrator
integrator = LangevinIntegrator(300 * kelvin, 1 / picosecond, 2 * femtosecond)

# Set up simulation
simulation = Simulation(modeller.topology, system, integrator, platform)
simulation.context.setState(state)
simulation.currentStep = 0  # Reset step count

# Reporters
simulation.reporters.append(StateDataReporter(sys.stdout, 1000, step=True,
                                               potentialEnergy=True, kineticEnergy=True,
                                               totalEnergy=True, temperature=True, speed=True, separator="\t"))
simulation.reporters.append(StateDataReporter("simulation.log", 1000, step=True,
                                               potentialEnergy=True, kineticEnergy=True,
                                               totalEnergy=True, temperature=True, speed=True, separator="\t"))
simulation.reporters.append(DCDReporter("simulation.dcd", 1000))

### Production simulation

We now simulate for 500,000 steps (1 ns).

In [None]:
import time

print("Starting production simulation...")
start_time = time.time()
simulation.step(500000)
end_time = time.time()
print(f"Production simulation complete in {end_time - start_time:.2f} seconds.")

## Trajectory analysis: RMSD

We now analyze the `simulation.dcd` trajectory using `MDTraj` to compute the RMSD over time.
Steps:
- Load trajectory
- Remove water atoms
- Align all frames to the first
- Compute and plot RMSD

In [None]:
import matplotlib.pyplot as plt
import mdtraj as md

# Load trajectory and topology
traj = md.load("simulation.dcd", top="pre_equilibration.pdb")

# Strip water (TIP3P uses residue name 'HOH')
traj = traj.atom_slice(traj.topology.select("not water"))

# Align to first frame (superpose on reference)
traj.superpose(traj, 0)

# Compute RMSD (to frame 0)
rmsd = md.rmsd(traj, traj, 0)

# Plot
plt.figure(figsize=(8, 5))
plt.plot(traj.time / 500, rmsd * 10)  # convert ps → ns and nm → Å
plt.xlabel("Time (ns)")
plt.ylabel("RMSD (Å)")
plt.title("RMSD of chignolin over time")
plt.tight_layout()
plt.show()

In [None]:
import shutil
import os

# Create folder if it does not exist
output_dir = "simulation_files_GPU"
os.makedirs(output_dir, exist_ok=True)

# List of files to move
files_to_move = ["equilibration.xml", "equilibration.dcd", "equilibration.log", 
                 "simulation.dcd", "simulation.log", "pre_equilibration.pdb"]

# Move files
for filename in files_to_move:
    if os.path.exists(filename):
        shutil.move(filename, os.path.join(output_dir, filename))
        print(f"Moved: {filename}")
    else:
        print(f"File not found: {filename}")