Does CustomRMSDForce prevent atoms within the specified atom group from being periodically imaged? #2913

jchodera · 2020-11-07T00:41:38Z

When a CustomRMSDForce is defined for a set of atoms, is this atom set prevented from being placed in different periodic images by MonteCarloBarostat?

The text was updated successfully, but these errors were encountered:

jchodera · 2020-11-07T01:27:02Z

For example, I have a simple CustomCVForce to restrain the RMSD between a protein and a ligand that are not otherwise covalently connected:

        from simtk import unit, openmm
        kB = unit.AVOGADRO_CONSTANT_NA * unit.BOLTZMANN_CONSTANT_kB 
        temperature = 300 * unit.kelvin
        kT = kB * temperature     
        sigma = 1.0 * unit.angstrom
        buffer = 2.0 * unit.angstrom
        custom_cv_force = openmm.CustomCVForce('step(RMSD-buffer)*(K_RMSD/2)*(RMSD-buffer)^2')
        custom_cv_force.addGlobalParameter('K_RMSD', kT / sigma**2)
        custom_cv_force.addGlobalParameter('buffer', buffer)
        rmsd_force = openmm.RMSDForce(self.hybrid_positions, rmsd_atom_indices)
        custom_cv_force.addCollectiveVariable('RMSD', rmsd_force)
        system.addForce(custom_cv_force)

My guess is that the barostat is imaging the ligand into different boxes from the protein, since I see this if I report the CV value:

iteration    27 RMSD 0.854 A
iteration    28 RMSD 0.853 A
iteration    29 RMSD 0.775 A
iteration    30 RMSD 0.760 A
iteration    31 RMSD 0.768 A
iteration    32 RMSD 0.758 A
iteration    33 RMSD 46.048 A
iteration    34 RMSD 46.021 A
iteration    35 RMSD 0.833 A

Would adding a zero-energy CustomBondForce between two of these atoms prevent this?

jchodera · 2020-11-07T01:33:13Z

It appears that this is the case: Adding a zero-energy CustomBondForce seems to solve this!

iteration    49 RMSD 0.715 A
iteration    50 RMSD 0.667 A
iteration    51 RMSD 0.860 A
iteration    52 RMSD 0.653 A
iteration    53 RMSD 0.642 A
iteration    54 RMSD 0.651 A
iteration    55 RMSD 0.638 A
iteration    56 RMSD 0.639 A
iteration    57 RMSD 0.812 A
iteration    58 RMSD 0.701 A
iteration    59 RMSD 0.708 A
iteration    60 RMSD 0.740 A
iteration    61 RMSD 0.762 A
iteration    62 RMSD 0.704 A
iteration    63 RMSD 0.615 A
iteration    64 RMSD 0.694 A
iteration    65 RMSD 0.608 A
iteration    66 RMSD 0.707 A
iteration    67 RMSD 0.628 A
iteration    68 RMSD 0.619 A
iteration    69 RMSD 0.718 A
iteration    70 RMSD 0.681 A

jchodera · 2020-11-07T01:34:04Z

@peastman: Perhaps we should modify the default behavior to group any RMSD restrained atoms into the same "molecule", since the RMSD and alignment will be meaningless otherwise?

peastman · 2020-11-07T01:38:59Z

I think that would be a reasonable change.

peastman · 2021-10-13T23:32:04Z

Since the barostat no longer translates molecules into different periodic boxes (see #3260), is this still an issue? getState() will still translate them if you specify enforcePeriodicBox=True. But that's only output, so it doesn't affect calculating CVs or anything else inside the simulation, and it only happens if you request it. What is the preferred behavior in that case?

peastman · 2021-10-14T01:26:55Z

Actually, I don't understand this issue. The RMSD calculation begins by finding an optimal translation and rotation to align the two groups of atoms. Moving one of them into a different periodic box shouldn't make any difference. Are you sure the problem isn't something different?

peastman · 2022-02-10T20:14:12Z

@jchodera was this ever resolved? See my comments above from October.

jchodera · 2022-02-14T01:32:23Z

I have verified this problem still exists in OpenMM 7.7.

Here is a test that computes the RMSD for a ligand bound to a protein dimer, showing that the RMSD remains low only if I add a virtual bond (a CustomBondForce with zero energy) between the two protein monomers and one monomer and the ligand to make sure everything is imaged together. Otherwise, the molecules are all imaged into different boxes before the RMSD is computed, causing the RMSD to be large and nonsensical:
issue-2913.zip

Here is the output of this test:

$ python test-customrmsdforce-imaging.py
Running test with virtual_bond = True
Warning: importing 'simtk.openmm' is deprecated.  Import 'openmm' instead.
Time :     1000.000 ps : RMSD        0.007 A
Time :     1001.000 ps : RMSD        0.581 A
Time :     1002.000 ps : RMSD        0.646 A
Time :     1003.000 ps : RMSD        0.691 A
Time :     1004.000 ps : RMSD        0.738 A
Time :     1005.000 ps : RMSD        0.771 A
Time :     1006.000 ps : RMSD        0.805 A
Time :     1007.000 ps : RMSD        0.795 A
Time :     1008.000 ps : RMSD        0.823 A
Time :     1009.000 ps : RMSD        0.851 A
Running test with virtual_bond = False
Time :     1000.000 ps : RMSD       52.452 A
Time :     1001.000 ps : RMSD       52.477 A
Time :     1002.000 ps : RMSD       52.412 A
Time :     1003.000 ps : RMSD       52.198 A
Time :     1004.000 ps : RMSD       52.249 A
Time :     1005.000 ps : RMSD       52.160 A
Time :     1006.000 ps : RMSD       52.044 A
Time :     1007.000 ps : RMSD       52.141 A
Time :     1008.000 ps : RMSD       52.053 A
Time :     1009.000 ps : RMSD       52.078 A

I don't understand why this is happening. As I understand it, OpenMM does not image or wrap the molecular coordinates unless requested when getState() is called and a State with imaged coordinates are generated. Because they are a tightly-bound complex, the protein dimer and bound ligand should diffuse together.

Instead, somehow, these molecules are being imaged differently before the RMSD is computed. This is not correct behavior.

If it is unavoidable, the RMSD computation should find the imaging scheme that leads to minimal RMSD and use/report that. Otherwise, if an arbitrary box imaging displacement causes the RMSD to increase, it is by definition not the RMSD corresponding to the minimum value over translations and rotations.

peastman · 2022-02-15T18:22:17Z

I see what you're doing now. You have multiple separate molecules, but you want to compute a single RMSD for all of them against a single set of reference coordinates. If you have periodic boundary conditions, it isn't clear to me how to define the RMSD in that situation. Each atom represents an infinite grid of atoms repeating through space. The "position" stored for each atom is an arbitrarily chosen one of those infinite periodic copies. I guess for each molecule, we could start by finding a different lattice translation to minimize the distance to the reference conformation for that particular molecule?

Your workaround of adding a virtual bond guarantees you'll always get one particular periodic copy, but who is to say that's the "right" one? As the molecules move around, a different periodic copy might come to give a smaller RMSD.

Also, imagine if someone asked for the RMSD of their entire system, including water, to a reference. If that forced us to treat the whole system as a single molecule, the simulation would slow to a crawl since we couldn't reorder the waters.

peastman · 2022-02-15T22:02:53Z

I've been trying to come up with a definition that makes sense for multiple molecules with periodic boundary conditions. Does the following make sense?

For a single molecule, it doesn't matter which periodic copy you choose. The translation built into the RMSD calculation will lead to the same result regardless. For the same reason, when you have multiple molecules, it doesn't matter which periodic copy you pick for the first one. That just sets an overall translation. But it does matter which copies you pick for all molecules after the first one. Let's define the correct result to be the minimum RMSD over all possible combinations of all possible periodic copies of the remaining molecules.

Let's put aside the question of how to calculate it for the moment. Does that seem like an appropriate definition of what we want to calculate?

jchodera · 2022-02-16T01:17:43Z

I see what you're doing now. You have multiple separate molecules, but you want to compute a single RMSD for all of them against a single set of reference coordinates. If you have periodic boundary conditions, it isn't clear to me how to define the RMSD in that situation. Each atom represents an infinite grid of atoms repeating through space. The "position" stored for each atom is an arbitrarily chosen one of those infinite periodic copies. I guess for each molecule, we could start by finding a different lattice translation to minimize the distance to the reference conformation for that particular molecule?

I don't think there's anything ill-defined about this particular problem: By definition, the least-squares RMSD is intended to produce the minimum RMSD for any translation and rotation of the two systems. If translation of the collection of three associated objects by some amount will reduce the RMSD, the minimum over all those translations and rotations should be the appropriate least-squared RMSD.

Your workaround of adding a virtual bond guarantees you'll always get one particular periodic copy, but who is to say that's the "right" one? As the molecules move around, a different periodic copy might come to give a smaller RMSD.

This is only relevant if the collection of three objects dissociates, in which case the least-squares RMSD would be detectably large anyway. In this case, a restraint force is imposed to prevent this, but if any molecules in the collection are re-imaged, this fails. That's why I had to impose virtual bonds.

Also, imagine if someone asked for the RMSD of their entire system, including water, to a reference. If that forced us to treat the whole system as a single molecule, the simulation would slow to a crawl since we couldn't reorder the waters.

That's a different problem: That would be the least-squares RMSD over all possible molecular permutations, translations, and rotations. While there are some algorithms for dealing with this case efficiently, here, we're just considering all possible translations and rotations of the collection of atoms.

For a single molecule, it doesn't matter which periodic copy you choose. The translation built into the RMSD calculation will lead to the same result regardless. For the same reason, when you have multiple molecules, it doesn't matter which periodic copy you pick for the first one. That just sets an overall translation. But it does matter which copies you pick for all molecules after the first one. Let's define the correct result to be the minimum RMSD over all possible combinations of all possible periodic copies of the remaining molecules.

What I'm not quite understanding is why the molecules are being imaged into different boxes in the first place. How and when is this happening? I didn't think this should be happening at all unless we retrieve the state and ask for re-imaging. RMSD shouldn't be imaging either---it should just use the positions, or at least provide the user with the option to not attempt any (especially erroneous) imaging. That would be a simple solution: If no imaging is done during the simulation, simply not imaging when RMSD is computed should provide the correct result in this particular case. (Other cases where no restraint is applied and the ligand might unbind and rebind after passing through a unit cell length would not behave correctly, however.)

Let's put aside the question of how to calculate it for the moment. Does that seem like an appropriate definition of what we want to calculate?

The least-squares RMSD feature should provide the smallest RMSD over all possible translations and rotations of the collections of atoms. If some of those translations would cause re-imaging of the collection of molecules in a way to reduce the RMSD, the minimum over all those translations is the correct least-squares RMSD.

peastman · 2022-02-16T16:49:30Z

What I'm not quite understanding is why the molecules are being imaged into different boxes in the first place. How and when is this happening?

The code that does it is in ComputeContext::reorderAtomsImpl(). Internally we always translate the molecules into a single periodic box. We also keep track of how many box widths each one has been translated by, so we can put them back on output. This is essential to maintain accuracy, especially in single precision. The larger an atom coordinate is, the less precisely we can specify it. So we need to keep them small.

jchodera · 2022-02-17T01:01:11Z

The simple solution in this case (when the RMSD is intended to be applied to the non-translated system) is to have an option to not image the molecules in the RMSD computation. Practically, for numerical stability, this could * Compute the mean transformation vector stored for each atom * Apply the stored transformation vector for each atom minus this mean vector That would result in all atoms being restored to their original positions, but with the mean translation vector subtracted to avoid precision issues. For the more general case, something more sophisticated would be needed, but given that applying am RMSD restraint is a major use case, this is probably sufficient to recover the correct behavior.

peastman added the enhancement label Nov 7, 2020

jchodera mentioned this issue Nov 7, 2020

Add option to impose RMSD restraint on core heavy atoms and nearby protein CA atoms choderalab/perses#748

Merged

peastman mentioned this issue Jul 28, 2021

Planning for 7.7/8.0 #3191

Closed

peastman added this to the 7.7 milestone Aug 9, 2021

peastman modified the milestones: 7.7, 7.8 Nov 5, 2021

jchodera added the bug label Feb 14, 2022

jchodera mentioned this issue Mar 2, 2022

Barostat does not reset cell offsets #3490

Merged

jchodera mentioned this issue Mar 17, 2022

Add support for receptor dimers choderalab/openmmtools#544

Closed

5 tasks

peastman removed this from the 8.0 milestone Oct 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does CustomRMSDForce prevent atoms within the specified atom group from being periodically imaged? #2913

Does CustomRMSDForce prevent atoms within the specified atom group from being periodically imaged? #2913

jchodera commented Nov 7, 2020

jchodera commented Nov 7, 2020

jchodera commented Nov 7, 2020

jchodera commented Nov 7, 2020

peastman commented Nov 7, 2020

peastman commented Oct 13, 2021

peastman commented Oct 14, 2021

peastman commented Feb 10, 2022

jchodera commented Feb 14, 2022 •

edited

peastman commented Feb 15, 2022

peastman commented Feb 15, 2022

jchodera commented Feb 16, 2022

peastman commented Feb 16, 2022

jchodera commented Feb 17, 2022 via email

Does CustomRMSDForce prevent atoms within the specified atom group from being periodically imaged? #2913

Does CustomRMSDForce prevent atoms within the specified atom group from being periodically imaged? #2913

Comments

jchodera commented Nov 7, 2020

jchodera commented Nov 7, 2020

jchodera commented Nov 7, 2020

jchodera commented Nov 7, 2020

peastman commented Nov 7, 2020

peastman commented Oct 13, 2021

peastman commented Oct 14, 2021

peastman commented Feb 10, 2022

jchodera commented Feb 14, 2022 • edited

peastman commented Feb 15, 2022

peastman commented Feb 15, 2022

jchodera commented Feb 16, 2022

peastman commented Feb 16, 2022

jchodera commented Feb 17, 2022 via email

jchodera commented Feb 14, 2022 •

edited