Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does CustomRMSDForce prevent atoms within the specified atom group from being periodically imaged? #2913

Open
jchodera opened this issue Nov 7, 2020 · 13 comments

Comments

@jchodera
Copy link
Member

jchodera commented Nov 7, 2020

When a CustomRMSDForce is defined for a set of atoms, is this atom set prevented from being placed in different periodic images by MonteCarloBarostat?

@jchodera
Copy link
Member Author

jchodera commented Nov 7, 2020

For example, I have a simple CustomCVForce to restrain the RMSD between a protein and a ligand that are not otherwise covalently connected:

        from simtk import unit, openmm
        kB = unit.AVOGADRO_CONSTANT_NA * unit.BOLTZMANN_CONSTANT_kB 
        temperature = 300 * unit.kelvin
        kT = kB * temperature     
        sigma = 1.0 * unit.angstrom
        buffer = 2.0 * unit.angstrom
        custom_cv_force = openmm.CustomCVForce('step(RMSD-buffer)*(K_RMSD/2)*(RMSD-buffer)^2')
        custom_cv_force.addGlobalParameter('K_RMSD', kT / sigma**2)
        custom_cv_force.addGlobalParameter('buffer', buffer)
        rmsd_force = openmm.RMSDForce(self.hybrid_positions, rmsd_atom_indices)
        custom_cv_force.addCollectiveVariable('RMSD', rmsd_force)
        system.addForce(custom_cv_force)

My guess is that the barostat is imaging the ligand into different boxes from the protein, since I see this if I report the CV value:

iteration    27 RMSD 0.854 A
iteration    28 RMSD 0.853 A
iteration    29 RMSD 0.775 A
iteration    30 RMSD 0.760 A
iteration    31 RMSD 0.768 A
iteration    32 RMSD 0.758 A
iteration    33 RMSD 46.048 A
iteration    34 RMSD 46.021 A
iteration    35 RMSD 0.833 A

Would adding a zero-energy CustomBondForce between two of these atoms prevent this?

@jchodera
Copy link
Member Author

jchodera commented Nov 7, 2020

It appears that this is the case: Adding a zero-energy CustomBondForce seems to solve this!

iteration    49 RMSD 0.715 A
iteration    50 RMSD 0.667 A
iteration    51 RMSD 0.860 A
iteration    52 RMSD 0.653 A
iteration    53 RMSD 0.642 A
iteration    54 RMSD 0.651 A
iteration    55 RMSD 0.638 A
iteration    56 RMSD 0.639 A
iteration    57 RMSD 0.812 A
iteration    58 RMSD 0.701 A
iteration    59 RMSD 0.708 A
iteration    60 RMSD 0.740 A
iteration    61 RMSD 0.762 A
iteration    62 RMSD 0.704 A
iteration    63 RMSD 0.615 A
iteration    64 RMSD 0.694 A
iteration    65 RMSD 0.608 A
iteration    66 RMSD 0.707 A
iteration    67 RMSD 0.628 A
iteration    68 RMSD 0.619 A
iteration    69 RMSD 0.718 A
iteration    70 RMSD 0.681 A

@jchodera
Copy link
Member Author

jchodera commented Nov 7, 2020

@peastman: Perhaps we should modify the default behavior to group any RMSD restrained atoms into the same "molecule", since the RMSD and alignment will be meaningless otherwise?

@peastman
Copy link
Member

peastman commented Nov 7, 2020

I think that would be a reasonable change.

@peastman
Copy link
Member

Since the barostat no longer translates molecules into different periodic boxes (see #3260), is this still an issue? getState() will still translate them if you specify enforcePeriodicBox=True. But that's only output, so it doesn't affect calculating CVs or anything else inside the simulation, and it only happens if you request it. What is the preferred behavior in that case?

@peastman
Copy link
Member

Actually, I don't understand this issue. The RMSD calculation begins by finding an optimal translation and rotation to align the two groups of atoms. Moving one of them into a different periodic box shouldn't make any difference. Are you sure the problem isn't something different?

@peastman peastman modified the milestones: 7.7, 7.8 Nov 5, 2021
@peastman
Copy link
Member

@jchodera was this ever resolved? See my comments above from October.

@jchodera
Copy link
Member Author

jchodera commented Feb 14, 2022

I have verified this problem still exists in OpenMM 7.7.

Here is a test that computes the RMSD for a ligand bound to a protein dimer, showing that the RMSD remains low only if I add a virtual bond (a CustomBondForce with zero energy) between the two protein monomers and one monomer and the ligand to make sure everything is imaged together. Otherwise, the molecules are all imaged into different boxes before the RMSD is computed, causing the RMSD to be large and nonsensical:
issue-2913.zip

Here is the output of this test:

$ python test-customrmsdforce-imaging.py
Running test with virtual_bond = True
Warning: importing 'simtk.openmm' is deprecated.  Import 'openmm' instead.
Time :     1000.000 ps : RMSD        0.007 A
Time :     1001.000 ps : RMSD        0.581 A
Time :     1002.000 ps : RMSD        0.646 A
Time :     1003.000 ps : RMSD        0.691 A
Time :     1004.000 ps : RMSD        0.738 A
Time :     1005.000 ps : RMSD        0.771 A
Time :     1006.000 ps : RMSD        0.805 A
Time :     1007.000 ps : RMSD        0.795 A
Time :     1008.000 ps : RMSD        0.823 A
Time :     1009.000 ps : RMSD        0.851 A
Running test with virtual_bond = False
Time :     1000.000 ps : RMSD       52.452 A
Time :     1001.000 ps : RMSD       52.477 A
Time :     1002.000 ps : RMSD       52.412 A
Time :     1003.000 ps : RMSD       52.198 A
Time :     1004.000 ps : RMSD       52.249 A
Time :     1005.000 ps : RMSD       52.160 A
Time :     1006.000 ps : RMSD       52.044 A
Time :     1007.000 ps : RMSD       52.141 A
Time :     1008.000 ps : RMSD       52.053 A
Time :     1009.000 ps : RMSD       52.078 A

I don't understand why this is happening. As I understand it, OpenMM does not image or wrap the molecular coordinates unless requested when getState() is called and a State with imaged coordinates are generated. Because they are a tightly-bound complex, the protein dimer and bound ligand should diffuse together.

Instead, somehow, these molecules are being imaged differently before the RMSD is computed. This is not correct behavior.

If it is unavoidable, the RMSD computation should find the imaging scheme that leads to minimal RMSD and use/report that. Otherwise, if an arbitrary box imaging displacement causes the RMSD to increase, it is by definition not the RMSD corresponding to the minimum value over translations and rotations.

@jchodera jchodera added the bug label Feb 14, 2022
@peastman
Copy link
Member

I see what you're doing now. You have multiple separate molecules, but you want to compute a single RMSD for all of them against a single set of reference coordinates. If you have periodic boundary conditions, it isn't clear to me how to define the RMSD in that situation. Each atom represents an infinite grid of atoms repeating through space. The "position" stored for each atom is an arbitrarily chosen one of those infinite periodic copies. I guess for each molecule, we could start by finding a different lattice translation to minimize the distance to the reference conformation for that particular molecule?

Your workaround of adding a virtual bond guarantees you'll always get one particular periodic copy, but who is to say that's the "right" one? As the molecules move around, a different periodic copy might come to give a smaller RMSD.

Also, imagine if someone asked for the RMSD of their entire system, including water, to a reference. If that forced us to treat the whole system as a single molecule, the simulation would slow to a crawl since we couldn't reorder the waters.

@peastman
Copy link
Member

I've been trying to come up with a definition that makes sense for multiple molecules with periodic boundary conditions. Does the following make sense?

For a single molecule, it doesn't matter which periodic copy you choose. The translation built into the RMSD calculation will lead to the same result regardless. For the same reason, when you have multiple molecules, it doesn't matter which periodic copy you pick for the first one. That just sets an overall translation. But it does matter which copies you pick for all molecules after the first one. Let's define the correct result to be the minimum RMSD over all possible combinations of all possible periodic copies of the remaining molecules.

Let's put aside the question of how to calculate it for the moment. Does that seem like an appropriate definition of what we want to calculate?

@jchodera
Copy link
Member Author

I see what you're doing now. You have multiple separate molecules, but you want to compute a single RMSD for all of them against a single set of reference coordinates. If you have periodic boundary conditions, it isn't clear to me how to define the RMSD in that situation. Each atom represents an infinite grid of atoms repeating through space. The "position" stored for each atom is an arbitrarily chosen one of those infinite periodic copies. I guess for each molecule, we could start by finding a different lattice translation to minimize the distance to the reference conformation for that particular molecule?

I don't think there's anything ill-defined about this particular problem: By definition, the least-squares RMSD is intended to produce the minimum RMSD for any translation and rotation of the two systems. If translation of the collection of three associated objects by some amount will reduce the RMSD, the minimum over all those translations and rotations should be the appropriate least-squared RMSD.

Your workaround of adding a virtual bond guarantees you'll always get one particular periodic copy, but who is to say that's the "right" one? As the molecules move around, a different periodic copy might come to give a smaller RMSD.

This is only relevant if the collection of three objects dissociates, in which case the least-squares RMSD would be detectably large anyway. In this case, a restraint force is imposed to prevent this, but if any molecules in the collection are re-imaged, this fails. That's why I had to impose virtual bonds.

Also, imagine if someone asked for the RMSD of their entire system, including water, to a reference. If that forced us to treat the whole system as a single molecule, the simulation would slow to a crawl since we couldn't reorder the waters.

That's a different problem: That would be the least-squares RMSD over all possible molecular permutations, translations, and rotations. While there are some algorithms for dealing with this case efficiently, here, we're just considering all possible translations and rotations of the collection of atoms.

For a single molecule, it doesn't matter which periodic copy you choose. The translation built into the RMSD calculation will lead to the same result regardless. For the same reason, when you have multiple molecules, it doesn't matter which periodic copy you pick for the first one. That just sets an overall translation. But it does matter which copies you pick for all molecules after the first one. Let's define the correct result to be the minimum RMSD over all possible combinations of all possible periodic copies of the remaining molecules.

What I'm not quite understanding is why the molecules are being imaged into different boxes in the first place. How and when is this happening? I didn't think this should be happening at all unless we retrieve the state and ask for re-imaging. RMSD shouldn't be imaging either---it should just use the positions, or at least provide the user with the option to not attempt any (especially erroneous) imaging. That would be a simple solution: If no imaging is done during the simulation, simply not imaging when RMSD is computed should provide the correct result in this particular case. (Other cases where no restraint is applied and the ligand might unbind and rebind after passing through a unit cell length would not behave correctly, however.)

Let's put aside the question of how to calculate it for the moment. Does that seem like an appropriate definition of what we want to calculate?

The least-squares RMSD feature should provide the smallest RMSD over all possible translations and rotations of the collections of atoms. If some of those translations would cause re-imaging of the collection of molecules in a way to reduce the RMSD, the minimum over all those translations is the correct least-squares RMSD.

@peastman
Copy link
Member

What I'm not quite understanding is why the molecules are being imaged into different boxes in the first place. How and when is this happening?

The code that does it is in ComputeContext::reorderAtomsImpl(). Internally we always translate the molecules into a single periodic box. We also keep track of how many box widths each one has been translated by, so we can put them back on output. This is essential to maintain accuracy, especially in single precision. The larger an atom coordinate is, the less precisely we can specify it. So we need to keep them small.

@jchodera
Copy link
Member Author

jchodera commented Feb 17, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants