Frequently Asked Questions
Frequently Asked Questions
How do periodic boundary conditions work?
Periodic boundary conditions are a very common source of questions. Often they take the form of questions like, "Why are water molecules diffusing outside the periodic box?" or, "Why does my protein stick out past the edge of the water box?" Other times they relate to how specific forces apply (or don't apply) periodic boundary conditions. This is a much more complicated subject than it first appears, so let's start at the beginning.
Conceptually speaking, when you run a simulation with periodic boundary conditions, every "particle" really represents an infinite grid of particles repeating through space. If the periodic box is a 10 nm cube and a particle is located at (1, 2, 3), that means there also is a particle at (11, 2, 3), another at (21, 2, 3), another at (-9, 12, 103), and so on. It does not matter which of those positions is listed for the particle. Any one of them represents exactly the same infinite grid of particles.
Next there is a very important point to understand:
Periodic boundary conditions are applied to displacements, not positions.
Again using the example of a 10 nm cube, what is the displacement between two particles at (1, 0, 0) and (9, 0, 0)? Without periodic boundary conditions, the displacement would be (8, 0, 0), but with periodic boundary conditions it is (-2, 0, 0). If the second particle's position were instead (-1, 0, 0) or (99, 0, 0) it would not matter. The displacement would be the same. Periodic boundary conditions are applied while computing forces that depend on the displacements between particles.
Here is another very important point:
Periodic boundary conditions are applied only to nonbonded forces, not bonded ones.
Consider a bond between two particles. Remember that each of them really represents an infinite grid of particles, but that doesn't mean every copy of the first particle is bonded to every copy of the second one. Each copy of the first particle is bonded to one specific copy of the second particle and no other. If a different copy of the second particle somehow manages to come closer than the copy it is bonded to, that doesn't mean the bond should suddenly "break" and transfer to a different copy.
(Actually, it is possible to apply periodic boundary conditions to bonded forces if you really want it. Just call
setUsesPeriodicBoundaryConditions() on the force. But this is an unusual thing to do, and is only appropriate in special situations. For example, if you want to simulate an infinite molecule where one end is bonded to the next periodic copy of the other end, you would do it.)
This also applies to constraints and to nonbonded exceptions, which are really a type of bonded force. Nonbonded interactions are usually omitted or reduced between particles that are bonded to each other (1-2, 1-3, and 1-4 interactions). This applies only to particles that are directly bonded, not to other periodic copies of those particles.
A consequence is that you must never break a molecule between two different periodic copies. I said it didn't matter whether a particle was at (9, 0, 0) or (-1, 0, 0). But if it is bonded to a particle at (1, 0, 0) then it does matter. One indicates a bond length of 8, while the other indicates a bond length of 2. Most MD packages follow this same requirement, so usually it is not a problem. But occasionally you may come across a file in which every atom has independently been wrapped to a single periodic box, causing molecules to be broken in half. There are tools that can be used to fix these files, such as the make_molecules_whole() function in MDTraj.
Everything so far has related to running simulations, but now let's talk about saving output files. To the MD engine, a molecule centered at (0, 0, 0) may be identical to one centered at (100, -30, -80), but if you are trying to visualize a trajectory, it is impossible to tell what is going on if different molecules are scattered across many different periodic boxes. For this reason,
Context.getState() has an option to wrap all the returned particle positions into a single periodic box. Likewise, reporters do this automatically when writing trajectories for a periodic system. The wrapping is always done to whole molecules. The center of every molecule is in a single box, but some atoms in that molecule may extend outside it. This is why it often appears as if a protein is only partly surrounded by water.
Visualization is further complicated by the following important point:
There is no uniquely defined "periodic box".
For example, if the box width is 10, does that mean the particle positions should all be between 0 and 10? Or between -5 and 5? Both of those conventions are common in different programs. If you load an initial structure that has the protein centered at the origin, then write a trajectory that puts all the molecules between 0 and 10, it will appear that most of the protein is outside the water, and there is a large hole in the water box on the opposite side. Because this is such a common problem, many analysis programs have features specifically to deal with it, such as the image_molecules() function in MDTraj or the
autoimage command in Cpptraj. These tools try to intelligently put all particles into a single box with the solutes in the center and water around them.
Why am I getting an error that says, "Particle position is NaN" or "Energy is NaN"?
NaN stands for "not a number". It is how computers represent results of computations that don't have well defined real values, like dividing by zero or taking the square root of a negative number. When you get this error, it generally means your simulation has blown up. That can happen for many reasons, but generally starts from a force or velocity being too large. That causes a particle to jump abruptly, which leads to even larger forces in the next time step, and within a few steps the position of every particle has turned into NaN.
Many types of problems can cause a simulation to blow up. Here are some of the more common ones.
The time step is too large. If it is only slightly too large, the simulation may seem to run successfully for a while before blowing up. If it is much too large, it will become unstable immediately.
The initial conformation was not energy minimized. For example, if you are using a crystal structure as your starting conformation, the coordinates are probably not at a minimum of the force field and will produce very large forces on the first time step. This can usually be fixed by running a local energy minimization.
The periodic box vectors were set incorrectly. For example, this is a common mistake when starting from Amber prmtop and inpcrd files. Both files specify periodic box vectors, and they may be different from each other. If the particle coordinates assume one box size, but the simulation is using a different box size, that can cause severe clashes between particles.
Constraints are defined incorrectly. Applying constraints involves inverting a matrix (sometimes explicitly, sometimes only implicitly). If that matrix is poorly conditioned, it will cause the simulation to blow up. A simple cause of this is having two different constraints between the same pair of particles. It also can happen when there are inconsistent constraints that cannot all be satisfied at the same time. Other, more complicated sets of constraints can also lead to a poorly conditioned matrix.
Forces have been defined incorrectly. This is most likely to be a problem when you are building the System directly, or adding extra forces to a System. If the forces you apply are too large, that can cause the simulation to blow up. Also, when creating custom forces be careful not to write expressions that may directly produce a NaN (such as by taking the square root of a value that is sometimes negative).
Since so many things can cause these errors, they can be difficult to debug. The first thing to look for is whether the simulation blows up immediately (within the first few time steps), or whether it runs for a long time first. If the problem is in either the initial conformation (such as not being minimized) or the system definition (such as constraints or forces), it will usually blow up immediately. Try running a local energy minimization and see if that fixes it. On the other hand, if your step size is too large, it may run for a while first. In that case, the first thing to try is reducing the step size.
If the error is reproducible rather than intermittent, you can usually track down exactly what is happening. Start by looking for any particles with very large forces on them. Then identify where those forces are coming from. Are they a result of bonded or nonbonded interactions? Can you identify the specific interactions? For example, perhaps two particles are too close together?
Another approach to debugging these errors is to systematically remove things that can cause them. Try removing forces one at a time. Try removing all constraints. Try disabling periodic boundary conditions. With a bit of persistence and patience, you can usually identify the cause.
Why am I getting an error that says, "No template found for residue"?
This error can occur when you call
createSystem() on a ForceField. To understand what it means, you need to know a little about how force fields work in OpenMM.
The first step in using a force field to build a System is to identify the type of every atom. Atom types can be very specific. For example, not all carbon atoms are the same. A force field will probably use different parameters for an alpha carbon than for one in an aromatic side chain. Atom types are determined by matching residues to templates. A force field includes template definitions for all the types of residues it knows about. When you call
createSystem() it loops over every residue in the Topology you passed in, tries to match each one to a template, then assigns the atom types specified in that template.
Residues are matched to templates based on two pieces of information: the element of each atom, and the set of bonds connecting them (including bonds that connect an atom in the residue to one in a different residue). It does not care about other information, such as the name of the residue. (However, as discussed below, incorrect residue names can still cause problems.) Only the elements and bonds matter.
There are many problems that can prevent it from finding a matching template for a residue. Here are some of the more common ones.
You omitted a force field XML file. This is one of the simplest possibilities. When you created your ForceField object and told it which files to load, you may have simply left one out. For example, the Amber14 and CHARMM36 definitions put water into a separate file from the rest of the force field, so that you can easily select between different water models. If you did not specify that file, the force field will not have a matching template for water molecules in your Topology.
You have a nonstandard residue. Standard force fields like Amber and CHARMM cover a variety of standard molecules, like proteins and nucleic acids. But if your Topology also includes a ligand or other small molecule, the force field will probably not have parameters (and hence no template) for it.
The Topology is missing atoms. For example, crystal structures usually do not include hydrogens, and often also have missing heavy atoms in flexible regions. Without those atoms, the Topology will not match the force field's templates. OpenMM-Setup can automatically fix many problems of this sort.
The water model is inconsistent with the Topology. Four and five site water models expect each water molecule to include "extra particles" for the virtual sites. If they are not there (or conversely, if they are there but you try to use a three site water model), the template will not match.
Modeller.addExtraParticles()can be used to fix this problem.
A chain is terminated incorrectly. Proteins, and nucleic acids to an even greater extent, can be terminated in different ways. The residues at the ends of the chain are different from those in the middle and require different templates. Often a force field will only support one or two ways of terminating each chain. If your Topology has them terminated in a different way, they will not match.
You have a nonstandard PDB file. PDB is a standardized format, precisely defined by a specification. Unfortunately, many PDB files do not follow that specification, which causes all sorts of problems.
That last point requires discussion, because there are so many ways a PDB file could be nonstandard. Remember that templates are matched based on two things: the elements of the atoms and the bonds between them. Anything that causes one of those to be reconstructed incorrectly will cause errors in matching templates.
Each ATOM record in a PDB file has a field for specifying the element. Unfortunately, some files leave that field blank. In that case, the PDBFile class tries to guess the element based on the atom's name. Usually it can guess correctly, but not always.
Bonds are more complicated. The PDB specification distinguishes between standard residues and heterogens. Standard residues (amino acids, nucleotides, and water) do not have their bonds specified in the file. The reader is supposed to add them automatically. All other molecules are supposed to have their bonds explicitly given with CONECT records. If those records are missing, the PDB reader will not know which atoms are supposed to be bonded, and so the ForceField will be unable to match templates to them.
Standard residues present their own problems. They are recognized by name. The PDB Chemical Components Dictionary specifies standardized names for all residues and atoms. That is how the PDB reader knows, for example, that a residue called
ALA should have a bond between the atom named
CA and the atom named
CB. Unfortunately, some files use nonstandard names for residues and atoms. In that case, PDBFile does not know where to add bonds.
This is why atom and residue names can matter. The ForceField does not use names when matching templates. But if a file has nonstandard names, the PDB reader will be unable to add the correct bonds, and that will prevent the ForceField from matching templates.
PDBFile does try to accommodate nonstandard names. It has a large table of commonly encountered nonstandard names that it will recognize. For example, the standard name for a water molecule is
HOH, but it will also accept several other names like
TP3. Likewise, the standard name for the oxygen in a water molecule is
O, but it also accepts the names
OH2. These are names that various programs have in the past incorrectly used in PDB files that they wrote, so OpenMM knows to look for them. But if your file uses some other nonstandard name that it does not know about, then bonds will not be added correctly, and so the ForceField will not be able to match templates.