Potential improvements to modeller.addSolvent()? #3687

zhang-ivy · 2022-07-07T21:01:08Z

@jchodera and I were going through the new solvation code (introduced by: #3480 and #3537) and we have 2 questions:

Question 1: The center of the box is computed using:

positions = self.positions.value_in_unit(nanometer)
minRange = Vec3(*(min((pos[i] for pos in positions)) for i in range(3)))
maxRange = Vec3(*(max((pos[i] for pos in positions)) for i in range(3)))
center = 0.5*(minRange+maxRange)

However, if the starting solute is not perfectly spherical, the center may be different depending on how it's rotated. Is this the right way to compute the center that always yields the minimal solvent box? Or should the solute be aligned to some axis first to ensure we add the minimal amount of solvent?

Question 2: I am currently working with a protein:protein system that is very long in one dimension (i.e. not spherical).

I am running free energy calculations (5-50 ns per replica) on them, so I am not super worried about solute tumbling. Therefore, the current solvation code adds a lot of unecessary solvent. I would like to be able to use a minimal solvent box that ensures there is enough clearance on all sides in the solvent box. An easy way to do this would be to compute the principal axes of the solute and use that to determine the the right amount of padding to each side.

Could we add a different mode to addSolvent() to better handle the aforementioned non-spherical solute case?

Attached is a PDB of my protein:protein system before solvating and after solvating.
Here are my solvation parameters:

water_model = 'tip3p'
padding = 0.9 * unit.nanometers
box_shape = 'octahedron'
ionic_strength = 0.05 * unit.molar

files_for_issue.zip

cc: @peastman

The text was updated successfully, but these errors were encountered:

peastman · 2022-07-07T21:57:00Z

Good question! Finding minimal bounding spheres is a complicated subject. There are algorithms to solve it exactly, but they're both complicated and slow. A variety of approximate algorithms have been developed. The most popular one is the Ritter algorithm, which is fast and not too complicated. But I decided to use this method based on Table 2 of https://ep.liu.se/ecp/034/009/ecp083409.pdf. They tried a bunch of methods and found that the trivial method (referred to as "AABB center" in that table) on average is about as good as the Ritter algorithm, while being even simpler. (They develop a new algorithm which is more accurate while still being reasonably fast. However, it's quite complicated to implement.)
If your solute is very irregularly shaped, and the simulation is short enough that you're not worried about rotation, then padding probably isn't the right argument to use. In that case, you probably want to find the size of the axis aligned bounding box (maxRange-minRange in the code quoted above), add whatever padding you want to it, and pass it as boxSize. But monitor the rotation over the simulation and make sure it really isn't coming too close to itself. If your assumption turns out to be wrong, it could cause serious artifacts.

jchodera · 2022-07-12T13:39:45Z

The only two models that were not visually aligned with the axes were the tetrahedron and the teapot, and AABB appeared to perform among the worst methods in these cases. They did not appear to use randomization. Doesn't it seem at least reasonable to use the principal axes instead of the coordinate axes?
Here, it would be useful to align the solute principal axes and use a minimally solvating bounding rectangular solid. It would be extremely useful if we could provide an optional mode of operation to do this for users.

zhang-ivy · 2022-08-02T15:11:07Z

In that case, you probably want to find the size of the axis aligned bounding box (maxRange-minRange in the code quoted above)

@peastman : It's not obvious to me how to find the axis aligned bounding box if the protein isn't already aligned to the cartesian axes.

If the protein was already aligned like this:

then I could just use maxRange-minRange as you suggested.

However, if my protein is oriented like this:

I would need to first align the protein to the cartesian axes (to look like the first image) before computing maxRange-minRange -- is there an easy way to do this?

peastman · 2022-08-02T16:36:30Z

The axis aligned bounding box is always aligned with the axes. However your protein is oriented, addSolvent() is going to build an axis aligned solvent box around it. Normally you would want that box to be big enough to make sure the protein will never contact itself no matter how it rotates. But if you're running a very short simulation and you're sure it won't rotate much, you might be satisfied with knowing it can't contact itself in its initial orientation. Whatever way your protein is aligned, you want to calculate its extent in that orientation. There might be a different orientation that would have a smaller extent, but that's not how the protein is aligned. You need a solvent box that's big enough in its current alignment.

Of course, you could try to first rotate the protein before building the solvent box. That would allow for a smaller solvent box (assuming you are really really certain it isn't going to rotate). John suggested calculating the principle axes, which is a quick way of doing it that would probably work reasonably well. A more robust approach would be to run an optimization (with scipy.optimize for example), where you optimize a rotation matrix to minimize the volume of the axis aligned bounding box.

zhang-ivy · 2022-08-02T17:57:49Z

Ah sorry, I think I was misusing some of the terminology since I'm new to thinking about solvent boxes.

Of course, you could try to first rotate the protein before building the solvent box.

This is what I'm trying to do -- I want to rotate the protein before building the solvent box so that the solvent box will be smaller than if I hadn't rotated it.

John suggested calculating the principle axes, which is a quick way of doing it that would probably work reasonably well.

It's not obvious to me how to implement this -- is this something you could help implement as a feature for openmm?

peastman · 2022-08-02T18:30:32Z

I'm not sure it would make sense as a feature in OpenMM. Because most of the time, it's something people really shouldn't do. If the solute rotates during the simulation, it will lead to getting wrong results. (And I'm still not convinced you should be doing it. Are you really certain your protein won't rotate? 50 ns is a long time.)

You should be able to find instructions online for computing principle axes. It's pretty easy: just form a matrix, diagonalize it, and the eigenvectors become the columns of your rotation matrix.

zhang-ivy · 2022-08-02T19:32:53Z

I just realized the PDB file I had been using was the output after alignment to a variant of the same protein. When I look at the PDB file without any alignment, the protein doesn't need any rotation -- its principle axes are already oriented in the same directions as the cartesian axes.

I am going to do what @peastman suggested above:

you probably want to find the size of the axis aligned bounding box 
(maxRange-minRange in the code quoted above), add
whatever padding you want to it, and pass it as boxSize.
But monitor the rotation over the simulation and make sure it really isn't 
coming too close to itself. If your assumption turns out to be wrong, it could cause serious artifacts.

with the aformentioned PDB file -- if it doesn't seem to be rotating during the simulation, maybe this will help motivate the need for a rotation feature in openmm.

jchodera · 2022-08-02T20:59:38Z

Regarding the point about orientational correlation times: this paper has a great plot:

From experimental measurements, this should give an idea of how much reorientation to expect for very large systems.
We can also apply a weak restraint to prevent rotation in cases where we simulate with the protein principle axes aligned to the coordinate axes.

peastman · 2022-10-25T21:29:12Z

Closing this since I believe the question has been answered. Feel free to reopen if I missed something.

peastman closed this as completed Oct 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential improvements to modeller.addSolvent()? #3687

Potential improvements to modeller.addSolvent()? #3687

zhang-ivy commented Jul 7, 2022

peastman commented Jul 7, 2022

jchodera commented Jul 12, 2022

zhang-ivy commented Aug 2, 2022 •

edited

peastman commented Aug 2, 2022

zhang-ivy commented Aug 2, 2022

peastman commented Aug 2, 2022

zhang-ivy commented Aug 2, 2022

jchodera commented Aug 2, 2022

peastman commented Oct 25, 2022

Potential improvements to modeller.addSolvent()? #3687

Potential improvements to modeller.addSolvent()? #3687

Comments

zhang-ivy commented Jul 7, 2022

peastman commented Jul 7, 2022

jchodera commented Jul 12, 2022

zhang-ivy commented Aug 2, 2022 • edited

peastman commented Aug 2, 2022

zhang-ivy commented Aug 2, 2022

peastman commented Aug 2, 2022

zhang-ivy commented Aug 2, 2022

jchodera commented Aug 2, 2022

peastman commented Oct 25, 2022

zhang-ivy commented Aug 2, 2022 •

edited