# Simulating non-proteins and assigning charges to large molecules

We don't want you to get the impression that OpenFF only handles protein-ligand simulations. In fact, our force fields are trained to small molecule physical properties - for example part of the training of Sage was making sure it could reproduce experimental densities of organic mixtures. To show the versatility of our force fields, the following example will simulate [a "host-guest" system from the SAMPL6 challenge](https://github.com/samplchallenges/SAMPL6/tree/master/host_guest). 

In [1]:
from openff.toolkit import Molecule, Topology
from viz import visualize_topology

host = Molecule.from_file("../sdf/CB8.sdf")
guest = Molecule.from_file("../sdf/CB8-G0.sdf")
top = Topology.from_molecules([host, guest])
top.to_file("host-guest.pdb")

visualize_topology(top)



NGLWidget()

Here we'll use OpenMM's "PDBFixer" tool to add a box of water and ions. There are many other open source software tools for doing this, including AMBER's `tleap`, GROMACS' `gmx solvate`, and PackMol. We're just using PDBFixer here because it's convenient and we don't have to leave the notebook.

In [2]:
import openmm.app
import openmm.unit
from pdbfixer import PDBFixer

fixer = PDBFixer("host-guest.pdb")
fixer.addSolvent(
    padding=1.0 * openmm.unit.nanometer,
    ionicStrength=0.5 * openmm.unit.molar,
)

with open("host-guest_solvated.pdb", "w") as f:
    openmm.app.PDBFile.writeFile(fixer.topology, fixer.positions, f)

top = Topology.from_pdb("host-guest_solvated.pdb", unique_molecules=[host, guest])
visualize_topology(top)

NGLWidget()

OpenFF's mainline force fields currently use a semi-empirical (read: "fast and dirty") QM method to assign partial charges, called "AM1BCC". While this method gives surprisingly good accuracy for its computational cost, said cost scales worse than quadratically with molecule size. So AM1BCC will fail (or run for days) on the "host" in this complex, due to the large number of atoms.

Thankfully, OpenFF is jumping on the ML bandwagon. We're very close to publishing a graph neural network called "NAGL", which we've trained to produce AM1BCC charges. NAGL is very fast, typically running in less than a second. 

Here, we'll use NAGL to assign partial charges to the host and guest molecules.

**Note**: Due to some technical constraints, this example is using a slightly older version of this plugin which prints a bunch of annoying warnings. This will be fixed soon!

In [3]:
from openff.toolkit.utils._nagl_wrapper import _NAGLToolkitWrapper

nagl_tkw = _NAGLToolkitWrapper()
nagl_tkw.assign_partial_charges(host, "_nagl_am1bccelf10")
nagl_tkw.assign_partial_charges(guest, "_nagl_am1bccelf10")
print(host.partial_charges)

 - Atom C (index 3)
 - Atom C (index 4)
 - Atom C (index 6)
 - Atom C (index 10)
 - Atom C (index 11)
 - Atom C (index 13)
 - Atom C (index 17)
 - Atom C (index 18)
 - Atom C (index 20)
 - Atom C (index 24)
 - Atom C (index 25)
 - Atom C (index 27)
 - Atom C (index 31)
 - Atom C (index 32)
 - Atom C (index 34)
 - Atom C (index 38)
 - Atom C (index 39)
 - Atom C (index 41)
 - Atom C (index 45)
 - Atom C (index 46)
 - Atom C (index 48)
 - Atom C (index 52)
 - Atom C (index 53)
 - Atom C (index 55)
 - Atom C (index 57)
 - Atom C (index 61)
 - Atom C (index 66)
 - Atom C (index 71)
 - Atom C (index 76)
 - Atom C (index 81)
 - Atom C (index 86)
 - Atom C (index 91)

 - Atom C (index 13)
 - Atom C (index 15)
 - Atom C (index 16)
 - Atom C (index 17)
 - Atom C (index 18)
 - Atom C (index 19)
 - Atom N (index 21)



[-0.5945126274600625 0.7963424464687705 -0.46853987965732813 0.1959001561626792 0.27432878222316504 -0.4685399392619729 0.1959001263603568 0.7963424464687705 -0.5945126274600625 -0.4685399988666177 0.19590009655803442 0.27432878222316504 -0.4685399392619729 0.1959001561626792 0.7963424464687705 -0.5945126274600625 -0.46853987965732813 0.1959001561626792 0.27432878222316504 -0.4685399988666177 0.19590009655803442 0.7963424464687705 -0.5945126274600625 -0.4685399988666177 0.1959001263603568 0.27432878222316504 -0.4685399392619729 0.1959001561626792 0.7963424464687705 -0.5945126274600625 -0.4685399392619729 0.1959001561626792 0.27432878222316504 -0.4685399988666177 0.1959001263603568 0.7963424464687705 -0.5945126274600625 -0.4685399988666177 0.1959001263603568 0.27432878222316504 -0.46853987965732813 0.1959001561626792 0.7963424464687705 -0.5945126274600625 -0.4685399392619729 0.1959001561626792 0.27432878222316504 -0.4685399392619729 0.19590009655803442 0.7963424464687705 -0.594512627460

Now we create the simulation system. This is the step where charges get assigned. To force the simulation to use the NAGL charges we just generated instead of recalculating everything with "expensive" AM1BCC, we use the `charge_from_molecules` keyword argument. 

In [4]:
from openff.toolkit import ForceField

sage = ForceField("openff-2.1.0.offxml")
interchange = sage.create_interchange(top, charge_from_molecules=[host, guest])
sys = interchange.to_openmm()

Finally, we run and visualize the simulation using some boilerplate code. This is the "quick and dirty" version of how to run a simulation. In a real workflow, you'd want to equilibrate the temperature and pressure to make sure things are at the right density and are stable. 

In [5]:
import openmm

# Construct and configure a Langevin integrator at 300 K with an appropriate friction constant and time-step
integrator = openmm.LangevinIntegrator(
    300 * openmm.unit.kelvin,
    1 / openmm.unit.picosecond,
    0.002 * openmm.unit.picoseconds,
)

# Combine the topology, system, integrator and initial positions into a simulation
simulation = openmm.app.Simulation(top.to_openmm(), sys, integrator)
simulation.context.setPositions(top.get_positions().to_openmm())
simulation.minimizeEnergy()

# Add a reporter to record the structure every 250 steps
dcd_reporter = openmm.app.DCDReporter("non_protein.dcd", 250)
simulation.reporters.append(dcd_reporter)
simulation.context.setVelocitiesToTemperature(300 * openmm.unit.kelvin)
simulation.runForClockTime(30 * openmm.unit.second)

In [6]:
import mdtraj
import nglview

trajectory: mdtraj.Trajectory = mdtraj.load(
    "non_protein.dcd", top=mdtraj.Topology.from_openmm(top.to_openmm())
)

view = nglview.show_mdtraj(
    trajectory.image_molecules(
        anchor_molecules=[
            [
                trajectory.topology.atom(index)
                for index in range(top.molecule(0).n_atoms + top.molecule(1).n_atoms)
            ]
        ]
    )
)

view.add_representation("line", selection="water")
view

NGLWidget(max_frame=99)