Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checks for correct topology parameterization #155

Merged
merged 33 commits into from
May 16, 2018

Conversation

summeraz
Copy link
Contributor

@summeraz summeraz commented Jan 9, 2018

Currently if parameters are not found for certain topology objects (angles, proper dihedrals, and improper dihedrals) Foyer will continue through without notifying the user. This can lead to serious problems as the incorrect parameterization may not be caught until after production runs have been performed on systems parameterized by Foyer. In many cases, parameters are not found due to typos in the user's force field XML file, although there are cases (e.g. coarse-grained systems) where dihedral forces may not be desired for atoms separated by three bonds. I've implemented a fix here that provides either a warning or error to the user if the systems is found to be incorrectly parameterized. By default, an error is raised if angles and proper dihedrals are not parameterized correctly and a warning is raised if impropers are not parameterized correctly. This behavior can be overridden by the user.

`Forcefield.apply` will now check to make sure parameters are found
for all angles, proper dihedrals, and improper dihedrals. By
default, if parameters are not found for all angles and proper
dihedrals, the script will exit with an error. A warning will be
raised if parameters are not found for all impropers. However, the
behavior (error/warning) for each of these topology objects can be
toggled by the user.

Unit tests have been added to test this behavior.
Copy link
Member

@ctk3b ctk3b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, thanks! Just a few code simplification suggestions.

Also just to double check, missing a bond will already yield an error?

"system impropers: {}, Parameterized impropers: {}"
"".format(len(data.impropers), len(structure.impropers)))
if assert_improper_params:
raise Exception(msg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you factor these 3 checks out into a consistent "error or warn" function? Lots of repeated code here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, should help to condense this

@@ -589,7 +633,7 @@ def createSystem(self, topology, atomtype=True, use_residue_map=True,
# Execute scripts found in the XML files.
for script in self._scripts:
exec(script, locals())
return sys
return sys, data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Is there any other way to reach this data? Perhaps setting it as an attribute? This function is called createSystem so it's somewhat unintuitive that it would return other things as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something did feel a little off when I took this approach... data is only available from within createSystem, but yeah I could just throw a line like self._system_data = data or something like that before the return.

with pytest.raises(Exception):
ethane = oplsaa_dihedral_typo.apply(ethane)
with pytest.warns(UserWarning):
ethane = oplsaa_dihedral_typo.apply(ethane, assert_dihedral_params=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Could you factor out this test as well and just @parameterize the function with the forcefield file name and the assert flag?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that should make the tests a bit cleaner

@summeraz
Copy link
Contributor Author

Will work on getting those changes made. And yes, unparameterized/missing bonds already yield an error, although the message itself is not particularly helpful (something about an empty array or similar, so not mentioning the missing bonds explicitly), but occurs in OpenMM-land and would be difficult for us to catch since that's where the parameterization happens.

@summeraz
Copy link
Contributor Author

Also, it looks like there are several molecules in the TraPPE and OPLS tests where parameters must not be defined for all relevant angles or dihedrals as some of these tests are failing after hitting the checks I've put in. So this is something we'll need to look into...

@mattwthompson
Copy link
Member

mattwthompson commented Jan 10, 2018

This is exactly the problem that I encountered when using foyer with an xml I built from the ground up: some dihedrals, including at least one crucial dihedral, weren't getting assigned due to nothing more than my typos. Nothing in my workflow was flagging it and it went unnoticed for several months and many, many GFlops of simulations.

@@ -337,32 +353,24 @@ def apply(self, topology, references_file=None, assert_angle_params=True,
have parameters assigned. OpenMM will generate an error if bond parameters
are not assigned.
'''
data = self._SystemData
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

data.atoms = list(topology.atoms())
for atom in data.atoms:
data.excludeAtomWith.append([])
self._SystemData.atoms = list(topology.atoms())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you limit this to just adding a single line here which sets data = self._SystemData() in place of the existing data = app.ForceField._SystemData()?

@summeraz
Copy link
Contributor Author

summeraz commented Mar 5, 2018

I need to look into the ParmEd source a bit deeper, but it looks like even when improper dihedrals are present Structure.impropers is not always populated. For now I am counting the total number of impropers by taking the sum of the lengths of Structure.impropers and [dihedral for dihedral in structure.dihedrals if dihedral.improper]. However, I'm not sure if a scenario could exist where this would result in double counting.

@summeraz
Copy link
Contributor Author

summeraz commented Mar 5, 2018

Discovered a problem with saving topological data from createSystem to the newly added _SystemData object. If a forcefield is used to atomtype multiple systems the number of topology object (e.g. bonds, angles, etc.) is additive rather than unique to each system. My latest commit solves this, but seems pretty hacky, so I'm guessing we can find a better solution.

@summeraz
Copy link
Contributor Author

summeraz commented Mar 6, 2018

It turns out that several of atomtypes we have SMARTS strings defined for in the OPLS force field XML did not have the class attribute assigned properly (specifically atomtypes for alkynes, pyridine, pyrimidine, and acetals). I've added the correct classes and this has reduced the number of molecule tests where the number of parametrized angles/dihedrals does not match expectations. However, there are still 12 molecules that have atomtypes assigned properly, but do not have parameters assigned for all angles/dihedrals. Specifically:

  • 1-cyclopropylpropane (missing an angle parameter)
  • 1-nitropropane (missing a dihedral parameter)
  • 2-propylpyrrole (missing a dihedral parameter)
  • 24-pentanedione (missing an angle parameter)
  • E-hex-2-ene (missing a dihedral parameter)
  • dimethyl-sulfoxide (missing all dihedral parameters)
  • ethylene-carbonate (missing an angle parameter)
  • formaldehyde (missing an angle parameter)
  • paraldehyde (missing angle parameters)
  • pentanenitrile (missing a dihedral parameter)
  • propylene-carbonate (missing an angle parameter)
  • styrene (missing dihedral parameters)

Many of these molecules are functionalized alkanes, where parameters are missing for an angle or dihedral that involves the α, β, and γ carbons. In these cases the parameters simply do not exist in the OPLS forcefield shipped with GROMACS (which we converted to this forcefield XML). Many of these missing dihedrals are C-CT-CT-CT dihedrals where the CT-CT-CT-CT dihedral parameters would probably be sufficient (as OPLS uses in many other cases), but we would leave it up to the user to make this decision and add these parameters themselves, keeping the XML shipped with Foyer the same as what is contained in GROMACS. I'm not sure there's really much else we can do. @justinGilmer and myself are working on a website for Foyer that will contain more comprehensive documentation than is available on Github, and we plan to fully document these limitations.

@summeraz
Copy link
Contributor Author

summeraz commented Mar 6, 2018

I've removed secondary alcohols from our list of molecules successfully parameterized using our TraPPE forcefield XML, as these feature a dihedral functional form that we do not currently support.

@ctk3b
Copy link
Member

ctk3b commented Mar 7, 2018

I need to look into the ParmEd source a bit deeper, but it looks like even when improper dihedrals are present Structure.impropers is not always populated. For now I am counting the total number of impropers by taking the sum of the lengths of Structure.impropers and [dihedral for dihedral in structure.dihedrals if dihedral.improper]. However, I'm not sure if a scenario could exist where this would result in double counting.

This sounds like it's worth understanding properly. Is it a specific type of improper that causes the problem? I can help debug if needed

with pytest.raises(Exception):
ethane = oplsaa_with_typo.apply(ethane)
with pytest.warns(UserWarning):
ethane = oplsaa_with_typo.apply(ethane, **kwargs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@summeraz
Copy link
Contributor Author

summeraz commented Mar 7, 2018

Here's a ZIP file with a minimal example that reproduces the strange storage behavior for impropers which includes a dummy forcefield file, a PDB file for an ethene molecule (which should have two impropers), and a script that takes the PDB and atomtypes it. Here's the atomtyping script:

from foyer import Forcefield
import parmed as pmd 

ff = Forcefield(forcefield_files='dummy-ff.xml')
structure = pmd.load_file('ethene.mol2', structure=True)
atomtyped_structure = ff.apply(structure)

And here's output from inspection of atomtyped_structure:

(Pdb) len(atomtyped_structure.impropers)
0
(Pdb) len([dihedral for dihedral in atomtyped_structure.dihedrals if dihedral.improper])
2

So the two impropers are correctly assigned, but they are stored within Structure.dihedrals (where the improper attribute is set to True) rather than being stored within Structure.impropers. I'll look through the ParmEd source to try to figure out why this happens. Perhaps Structure.impropers is some sort of legacy attribute that is no longer used.

@summeraz
Copy link
Contributor Author

summeraz commented Mar 8, 2018

Okay so here is what I've found from my digging so far:

The relevant function in the ParmEd source is load_topology, where an OpenMM System and Topology are used to create a ParmEd Structure. The relevant OpenMM Forces appear to be PeriodicTorsionForce, RBTorsionForce, and CustomTorsionForce. Per the OpenMM documentation, child definitions for each of these forces can be of either "proper" or "improper" type. However, in ParmEd's load_topology each of these three torsion forces is handled differently:

What I gather from this is that there is no way from just the data in a ParmEd Structure to know the number of proper and improper torsions in a system because if using a Ryckaert-Bellemans functional form these are all lumped together into a single attribute (Structure.rb_torsions). However, I still think it is good for us to have some sort of check on our end, so I think the best solution is to keep the approach that I've currently taken with the caveat that counts will be incorrect if improper torsions are defined within the RBTorsionForce section of your XML file. As long as this is well-documented, I think this is a better solution than not having any sanity checks at all as the user still has the ability to override these assertions.

@ctk3b
Copy link
Member

ctk3b commented Mar 8, 2018

Does parmed hash dihedral types based on their contents? That would be an easy way to avoid double counting.

More generally, I can't think of a scenario where a forcefield would define the exact same dihedral in two different ways. I would expect that to be an incorrectly constructed FF file. Ideally we would of course not allow this to silently slip by so a double counting check would be helpful but perhaps not absolutely necessary.

@summeraz
Copy link
Contributor Author

summeraz commented Mar 8, 2018

Right, I no longer think double counting will be a problem. Before looking into the source I was thinking perhaps it was possible for an improper dihedral to exist within both Structure.impropers and Structure.dihedrals, but this does not appear to be possible. So the only remaining issue is that it is ambiguous concerning whether each of the items in Structure.rb_torsions is a proper or improper dihedral. To which I think the best solution is to assume all of these are proper dihedrals (making sure we have this well-documented). If a user does have improper dihedrals within the RBTorsionForce portion of their XML, the sanity check for the number of parameterized proper dihedrals will cause apply to exit with an error. The user can then manually override this behavior by passing assert_dihedral_params=False to apply.

@ctk3b
Copy link
Member

ctk3b commented Mar 8, 2018

I think the best solution is to assume all of these are proper dihedrals

👍

While it's perfectly possible, I don't think I've seen a forcefield implement impropers using this functional form.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants