-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ForceField
allows multiple parameters to have the same SMIRKS patterns
#1363
Comments
This seems at least undesired to me, and is probably a bug. I started to fix this in the ParameterList class, but realized that VirtualSites are one case where identical SMIRKS should be allowed, and so fixing this may be just a hair more complex than I thought. I'll hold off on this until some time in the future when I have more bandwidth. |
I think overlapping SMIRKS is a dangerous footgun and one that's not going to go away as long as we explicitly support loading multiple force fields in and TIP3P parameters are in Sage. I'd be happy with at least either:
|
The whole concept of SMIRNOFF as a hierarchical format is that the "last match wins" rule means it is totally fine if we have a tree branch that is entirely overridden by a leaf with the same SMIRKS pattern. This is even desirable if we want to manually test an override or shoehorn in other parameter sets at the end, as you note. It may not be clean, but it feels like checking for duplicates is more of a "linting" process than something that should cause an error. |
It would be useful if we could reproduce |
We ran into this again, a case in which a user sensibly tried to create different parameters with identical SMIRKS, trying to apply "up" and "down" virtual sites from the same triplet of orientation atoms. The toolkit, as we might expect, only applied one. Here's another reproduction I wrote up before discovering I've already been burned by and reported this behavior: In [1]: from openff.toolkit import *
In [2]: !cat duplicates.offxml
<?xml version="1.0" encoding="utf-8"?>
<SMIRNOFF version="0.3" aromaticity_model="OEAroModel_MDL">
<vdW version="0.3" potential="Lennard-Jones-12-6" combining_rules="Lorentz-Berthelot" scale12="0.0" scale13="0.0" scale14="0.5" scale15="1.0" cutoff="9.0 * angstrom ** 1" switch_width="1.0 * angstrom ** 1" method="cutoff">
<Atom smirks="[#11:1]" epsilon="0.111 * kilocalorie_per_mole ** 1" sigma="3.0 * angstrom ** 1"></Atom>
<Atom smirks="[#11:1]" epsilon="0.222 * kilocalorie_per_mole ** 1" sigma="4.0 * angstrom ** 1"></Atom>
<Atom smirks="[#17:1]" epsilon="0.333* kilocalorie_per_mole ** 1" sigma="3.1 * angstrom ** 1"></Atom>
<Atom smirks="[#17:1]" epsilon="0.444* kilocalorie_per_mole ** 1" sigma="1.1 * angstrom ** 1"></Atom>
</vdW>
<Electrostatics version="0.4" scale12="0.0" scale13="0.0" scale14="0.8333333333" scale15="1.0" cutoff="9.0 * angstrom ** 1" switch_width="0.0 * angstrom ** 1" periodic_potential="Ewald3D-ConductingBoundary" nonperiodic_potential="Coulomb" exception_potential="Coulomb"></Electrostatics>
<LibraryCharges version="0.3">
<LibraryCharge smirks="[#11:1]" charge1="1.0 * elementary_charge ** 1"></LibraryCharge>
<LibraryCharge smirks="[#17:1]" charge1="-1.0 * elementary_charge ** 1"></LibraryCharge>
</LibraryCharges>
</SMIRNOFF>
In [3]: ForceField("duplicates.offxml").create_interchange(
...: Molecule.from_smiles("[#11+1].[#17-1]").to_topology()
...: )["vdW"].potentials
Out[3]:
{PotentialKey associated with handler 'vdW' with id '[#11:1]': Potential(parameters={'sigma': <Quantity(3.0, 'angstrom')>, 'epsilon': <Quantity(0.111, 'kilocalorie_per_mole')>}, map_key=None),
PotentialKey associated with handler 'vdW' with id '[#17:1]': Potential(parameters={'sigma': <Quantity(3.1, 'angstrom')>, 'epsilon': <Quantity(0.333, 'kilocalorie_per_mole')>}, map_key=None)} |
When constructing a
? Looking at this commit, in particular, the line parameter = parameter_handler.get_parameter({"smirks": smirks})[0] it looks as if Interchange follows "first match wins" rather than "last match wins". |
That comment refers to how parameters are loaded up into the $ diff tip3p.offxml tip3p_modified.offxml
4,5c4,5
< <Atom smirks="[#1]-[#8X2H2+0:1]-[#1]" epsilon="0.1521 * kilocalorie_per_mole ** 1" id="n-tip3p-O" sigma="3.1507 * angstrom ** 1"></Atom>
< <Atom smirks="[#1:1]-[#8X2H2+0]-[#1]" epsilon="0.0 * kilocalorie_per_mole ** 1" id="n-tip3p-H" sigma="1.0 * nanometer ** 1"></Atom>
---
> <Atom smirks="[#1]-[#8X2H2+0:1]-[#1]" epsilon="100.1521 * kilocalorie_per_mole ** 1" id="n-tip3p-O" sigma="3.1507 * angstrom ** 1"></Atom>
> <Atom smirks="[#1:1]-[#8X2H2+0]-[#1]" epsilon="100.0 * kilocalorie_per_mole ** 1" id="n-tip3p-H" sigma="1.0 * nanometer ** 1"></Atom> The toolkit will gladly load them up side-by-side, breaking some of the assumptions in lookup logic: In [1]: from openff.toolkit import Molecule, ForceField
In [2]: force_field = ForceField("tip3p.offxml", "tip3p_modified.offxml")
In [3]: force_field['vdW'].parameters
Out[3]:
[<vdWType with smirks: [#1]-[#8X2H2+0:1]-[#1] epsilon: 0.1521 kilocalorie_per_mole id: n-tip3p-O sigma: 3.1507 angstrom >,
<vdWType with smirks: [#1:1]-[#8X2H2+0]-[#1] epsilon: 0.0 kilocalorie_per_mole id: n-tip3p-H sigma: 1.0 nanometer >,
<vdWType with smirks: [#1]-[#8X2H2+0:1]-[#1] epsilon: 100.1521 kilocalorie_per_mole id: n-tip3p-O sigma: 3.1507 angstrom >,
<vdWType with smirks: [#1:1]-[#8X2H2+0]-[#1] epsilon: 100.0 kilocalorie_per_mole id: n-tip3p-H sigma: 1.0 nanometer >] The way Interchange looks up parameters now is wired through
But all of that is only when multiple SMIRKS patterns are identical. In the common case of different files having different SMIRKS patterns, they're squished together and behave in the way the spec describes. For example, if I modify the SMIRKS in a way that will still match the oxygen in water but give it bogus values $ diff tip3p.offxml tip3p_modified_smirks.offxml
4c4
< <Atom smirks="[#1]-[#8X2H2+0:1]-[#1]" epsilon="0.1521 * kilocalorie_per_mole ** 1" id="n-tip3p-O" sigma="3.1507 * angstrom ** 1"></Atom>
---
> <Atom smirks="[#8:1]" epsilon="-5* kilocalorie_per_mole ** 1" id="foo" sigma="-5 * angstrom ** 1"></Atom> and feed it in as the last force field, those bogus values are what get carried through In [1]: from openff.toolkit import Molecule, ForceField
In [2]: force_field = ForceField("tip3p.offxml", "tip3p_modified_smirks.offxml")
In [3]: force_field['vdW'].parameters
Out[3]:
[<vdWType with smirks: [#1]-[#8X2H2+0:1]-[#1] epsilon: 0.1521 kilocalorie_per_mole id: n-tip3p-O sigma: 3.1507 angstrom >,
<vdWType with smirks: [#1:1]-[#8X2H2+0]-[#1] epsilon: 0.0 kilocalorie_per_mole id: n-tip3p-H sigma: 1.0 nanometer >,
<vdWType with smirks: [#8:1] epsilon: -5 kilocalorie_per_mole id: foo sigma: -5 angstrom >,
<vdWType with smirks: [#1:1]-[#8X2H2+0]-[#1] epsilon: 0.0 kilocalorie_per_mole id: n-tip3p-H sigma: 1.0 nanometer >]
In [4]: force_field.create_interchange(Molecule.from_smiles("O").to_topology())['vdW'].get_system_parameters(
...: )
Out[4]:
array([[-5., -5.],
[ 1., 0.],
[ 1., 0.]]) To hammer the point home (that this issue only arises when SMIRKS patterns are identical) I can load the bogus file first and its values aren't propagated through: In [5]: ForceField("tip3p_modified_smirks.offxml", "tip3p.offxml").create_interchange(Molecule.from_smiles("O
...: ").to_topology())['vdW'].get_system_parameters()
Out[5]:
array([[3.1507, 0.1521],
[1. , 0. ],
[1. , 0. ]]) I still think this behavior could be improved but I don't think it's a systematic issue with our shipped force fields - maybe if folks are squishing together force fields that must use identical SMIRKS patterns this is something we need to prioritize more highly. |
Thanks for the detailed response, @mattwthompson. If I understand correctly, the comment about precedence only applies to parameters with the same tag name and unique SMIRKS patterns. When parameters within the same handler have the same SMIRKS pattern, it appears that pre- and post-Interchange versions of the toolkit break ties differently: Interchange breaks ties by choosing the first. As you pointed out, this happens in the To me, the old tie breaking behavior feels like a more natural extension of the comment about precedence. The current behavior seems to require a caveat, something like
|
I agree with your assessment of the problem - I can't I've thought deeply about assignment with duplicate SMIRKS patterns prior to your comment, so thanks for catching this! Between "pick first" (current) and "pick last" (old) I also agree that the old behavior is conceptually more consistent with the SMIRNOFF ethos. I think I'd prefer a different solution, though, in which the toolkit errors out when it detects duplicate SMIRKS (in the same tag). I'm skeptical it's a well-designed state1 so, even though the patch to Interchange might be small, I'm not sure this is a fruitful path to go down. This is a weakly-held opinion, though, and I don't think it would take much (from you or others) for me to budge on this. Footnotes
|
I'm currently in a scenario where there is a possibility of having multiple force field files, with individuals having optimized or fine-tuned certain parameters for specific purposes. These files may contain identical SMIRKS patterns, and we'd like to adhere to the "last match wins" principle while also keeping the behavior consistent with the previous version (toolkit <0.11). One potential solution is to : |
I'm afraid I don't understand this proposal (I don't know what I think this is a case in which the SMIRNOFF specification could provide more guidance1. The behavior of the "old" toolkit APIs is an accident, and while it may ultimately be the preferred behavior, the correct behavior is whatever this page says. The process for updating it is described here, and we have mechanisms in place that enable us to turn around changes fairly quickly (weeks, not months) Footnotes
|
Thank you for your quick reply, @mattwthompson! To clarify, the |
The toolkit can process multiple force field files at once, and the order they're passed matters. For tags that are common between files, the contents of the last file passed should take precedence ("last match wins"). You can read more here:
|
Thank you for your reply. I understand that the contents of the last file passed should take precedence ("last match wins"). However, when the SMIRKS patterns are the same, the toolkit 0.14.4 will take the first parameter. I've read through the thread and understand that it may seem odd to have the same pattern. In our case, we have force field files: I would like to ask for your expert opinion on this to see if there are any potential downsides or concerns if we load force field files in this way. |
Following up on @bxie4 's comment, the basic question is whether, as a workaround, one could get the toolkit >=0.11 sources = list(reversed(sources)) + sources
forcefield = ForceField(*sources) The idea of listing the files backward and then forward is that it would make parameters with the same tag and SMIRKS pattern behave as in toolkit <0.11 ("pick last"), while preserving the correct precedence for parameters with different SMIRKS patterns ("last match wins"). This workaround might come with a small performance penalty, making this loop a bit slower. Alternatively, if the plan is for future releases of the toolkit to raise an error when parameters have the same tag and SMIRKS pattern, we could anticipate this on our end by doing something like for tagname in forcefield.registered_parameter_handlers:
smirks_patterns = set()
for parameter in forcefield[tagname]:
if parameter.smirks in smirks_patterns:
raise ValueError(f"detected multiple {tagname!r} parameters with SMIRKS pattern {parameter.smirks!r}")
smirks_patterns.add(parameter.smirks) . The important thing is that we have well-defined behavior when a user provides parameters with the same tag and SMIRKS pattern, whether that behavior is "pick last" or "fail". We'd like to avoid "pick first" because we find it unintuitive. @mattwthompson Which (if either) of the above approaches would you recommend? |
Agree completely; unfortunately this is a situation in which the behavior is poorly-defined, so as much as we try to avoid behavior changes in the toolkit, it's exceedingly difficult to ensure continuity when there is no clear correct behavior. Of those two options I'd certainly favor the latter if I was in the shoes of a downstream developer. I (as a matter of opinion) question whether or not multiple identical SMIRKS patterns is a valid state to handle in general, and without knowing much about your application I'd imagine it's unintentional or at least non-essential. The toolkit MIGHT1 have a change in the future that throws an error when it detects this, so I like the forwards-compatibility of that solution. If the first solution works as you describe, that's not necessarily bad, but you're right it might run into an error with a future version of the toolkit. It's worth mentioning that @j-wags is the Captain Picard of the toolkit, but he is out of the office for approximately the rest of the month. I'm handling his duties all the way up to, but not including, behavior changes. Sometime after he's back at work, he'll be able to give an authoritative path forward here. If I come across as constrained it's because of that context - I want to communicate the ways that things might change but they won't happen for at least a few weeks, likely more like early 2024. Footnotes
|
Thanks, @mattwthompson . I understand the constraints with Picard on shore leave. If we were to go with the latter approach, could we rely on the |
Yes - that warning is somewhat a blanket descriptor of the toolkit. Removing In addition to that, we'll ensure that any changes to this behavior at least don't land before 0.15.0 |
Sorry for the big delay on this. I think the right solution to the parameter lookup question is to flip the behavior here, so that parameter lookups prioritize the last match for a given SMIRKS instead of the first. So this would change the behavior of I think we should continue allowing multiple identical SMIRKS in the same parameter section. Loading an alternate water model on top of a flagship FF will be a pain otherwise, as will some use cases of virtual sites. Virtual sites may have yet more issues (maybe they should be looked up by a tuple of SMIRKS and name?) but those can be handled separately. I'll work on implementing this next week. |
Describe the bug
The toolkit's force field class allows multiple parameters within the same handler to have the same SMIRKS patterns. While it may seem like a strange use case, the inclusion of TIP3P in Sage makes this an issue anytime somebody wishes to use a different water model alongside it. I'd argue this state should not be reached and somewhere an exception should be raised when attempting to add a parameter with a smirks already found in the handler's list of parameters.
This also causes issues downstream when we assume this sort of 1:1 mapping, which I think we should be able to get away with.
To Reproduce
Using the file at
openff/toolkit/data/test_forcefields/tip3p.offxml
:Output
We ran into a case in which
ForceField("openff-2.0.0.offxml", "tip4p.offxml")
ultimately assigned TIP3P vdW parameters (from Sage) where I expected TIP4P vdW parameters to be assigned. This caused issues since other parameters (i.e. virtual sites) came from TIP4P and the result was an invalid mismatch of water models that unfortunately ran well enough to hide this error for weeks.Computing environment (please complete the following information):
Python 3.9, macOS, most recent development head (0f1611f)
The text was updated successfully, but these errors were encountered: