Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BondCharge virtual sites are not overriding correctly #1206

Closed
j-wags opened this issue Mar 4, 2022 · 11 comments · Fixed by #1252
Closed

BondCharge virtual sites are not overriding correctly #1206

j-wags opened this issue Mar 4, 2022 · 11 comments · Fixed by #1252

Comments

@j-wags
Copy link
Member

j-wags commented Mar 4, 2022

Describe the bug
SMIRNOFF hierarchy rules are somewhat complicated for virtualsites. A virtualsite should only override another virtualsite if they match the same atoms[1], and have the same name and type.

To Reproduce

This code makes two vsites with the same name and type, which should apply to the same atoms, but in a different order:

from openff.toolkit.typing.engines.smirnoff import ForceField
from openff.toolkit.topology import Molecule
from openff.units import unit
from openmm import unit as omm_unit

mol = Molecule.from_mapped_smiles('[Cl:1][Na:2]')
ff = ForceField()
vsh = ff.get_parameter_handler('VirtualSites', {"version":0.3})

# Define two vsites that apply in opposite directions to the same atoms of the test molecule. 
# The vsites may collide (since they have the same name), but they should be 
# distinguishable in the output based on their `distance` values
vsh.add_parameter({'type':'BondCharge', 
                      'smirks':'[Cl:1][Na:2]',
                      'match':'once',
                      'charge_increment':[0.100, -0.100] * unit.elementary_charge,
                      'distance': [0.1010] * unit.angstrom,
                      'name':'aaa'})
vsh.add_parameter({'type':'BondCharge', 
                      'smirks':'[Na:1][Cl:2]',
                      'match':'once',
                      'charge_increment':[0.100, -0.100] * unit.elementary_charge,
                      'distance': [0.220] * unit.angstrom,
                      'name':'aaa'})

sys = ff.create_openmm_system(mol.to_topology())

# For each vsite, print out which particles are involved, and at which distance.
for vsite_idx in range(sys.getNumParticles()):
    if not(sys.isVirtualSite(vsite_idx)):
        continue
    vsite = sys.getVirtualSite(vsite_idx)
    vsite_parents = []
    for vsite_parent_idx in range(vsite.getNumParticles()):
        vsite_parent = vsite.getParticle(vsite_parent_idx)
        vsite_parents.append(vsite_parent)
    distance = vsite.getLocalPosition()[0].value_in_unit(omm_unit.angstrom)
    print(vsite_parents, distance)

[0, 1] -0.101

The specific problems shown in this code are:

  1. The first vsite applies to atoms (0,1), and the second applies to atoms (1,0). Since the atoms are in a different order, I don't think that one should overwrite the other, and so I'd expect both vsites should be in the output[1]. Instead only a single vsite is in the output.
  2. If the second virtualsite (with distance 0.220) MIGHT overwrite the first (with distance 0.101), then I expect that the output should EITHER have one vsite with each distance, OR one vsite with 0.220. Instead we get just one vsite with 0.101.

[1] It's not specifically mentioned in the SMIRNOFF spec, but I assume this means "one vsite will overwrite another if they match the same atoms in the same order"

Computing environment (please complete the following information):

  • Mac OS
  • OFFTK topology biopolymer refactor branch
@trevorgokey
Copy link
Collaborator

trevorgokey commented Mar 4, 2022

This works as intended other than the issue you raised in point 2 (see paragraph 2). I didn't have the same assumption in [1] during the implementation. I didn't think at the time of implementation that tags infer uniqueness, just like in other things like bonds, angles, and the like. Overcoming this was solved with the use of names, so if you want two different parameters on the same set of atoms, you use different names. The result was one parameter because a bond charge with the same name was specified twice in the parameter list. It works just as a standard bond parameter in this way. I think you'll get in trouble if you start using tags to infer uniqueness for e.g. Trivalent, because [*:1][#7:2]([*:3])[*:4] would get unnecessarily complex. If we used the same name with the same distance, could we get the parameter applied 6 times, with all particles overlapping? The same thing would occur if we put another parameter somewhere else with [*:3][#7:2]([*:1])[*:4] (with the same name, same/different distance).

With all this said, assuming add_parameter appends and the hierarchy rules are last one wins, then yes you should have seen the second one applied (d=.220). I am not sure if anyone has worked on this, and I mostly focused on application from an XML, so this could definitely be my mistake. One thing that is coming to mind is that, since you gave it the same name, replace=False might be active somewhere, and you never actually replaced the first (d=.101) with the second (d=.220) in the molecule. I remember setting it to true in the XML side. I'd go take a look, but there was too much in the refactor branch for me to easily figure out where/if vsites have been changed. Was use_named_slots changed (https://github.com/openforcefield/openff-toolkit/blob/master/openff/toolkit/typing/engines/smirnoff/parameters.py#L5494) or replace not set somewhere (I defaulted it to False) (https://github.com/openforcefield/openff-toolkit/blob/master/openff/toolkit/typing/engines/smirnoff/parameters.py#L5328)? FWIW, in most of the logic after matching, it can't really "see" the tags after matching, so I considered uniqueness as set(indices), type, and name, and the permutations of indices only used to distinguish multi particle sites.

The vsites may collide (since they have the same name), but they should be
distinguishable in the output based on their distance values

Remember we took the stance that floats were never to be used to establish uniqueness :)

Hope this helps.

@mattwthompson
Copy link
Member

So we have agreement on what should happen. It might be harder to modify things in a way that realizes that.

@j-wags and I hacked on this for a while today. We successfully identified several places in the code that do not fix it and at least one place that might be the best place to attack. Below is an ugly diff tracking some places in the code we poked around in - right now we think the failure is a result of if same_atoms and same_vsite and diff_keys: evaluating to True inside of _reduce_virtual_particles_to_sites, and at the end of today we're at a loss as to how it could be patched to get the behavior we want. It might take a more extensive rewrite to handle these cross-interactions, I don't currently have the energy or familiarity to help past that.

diff --git a/openff/toolkit/typing/engines/smirnoff/parameters.py b/openff/toolkit/typing/engines/smirnoff/parameters.py
index e170bb89..98b33eda 100644
--- a/openff/toolkit/typing/engines/smirnoff/parameters.py
+++ b/openff/toolkit/typing/engines/smirnoff/parameters.py
@@ -5273,7 +5273,9 @@ class VirtualSiteHandler(_NonbondedHandler):
             # has the match setting, which ultimately decides which orientations
             # to include.
             if self.match == "once":
+                # print(f"{orientations=}")
                 key = self.transformed_dict_cls.key_transform(orientations[0])
+                # print(f"{key=}")
                 orientations = [key]
                 # else all matches wanted, so keep whatever was matched.
 
@@ -5349,6 +5351,7 @@ class VirtualSiteHandler(_NonbondedHandler):
             ref_key = self.transformed_dict_cls.key_transform(orientations[0])
             atoms = list([molecule.atoms[i] for i in ref_key])
             args = (atoms, orientations)
+            # _add_virtual_site will ONLY PROCESS the FIRST value in the list `orientations`.
             off_idx = super()._add_virtual_site(fn, *args, replace=replace)
             return off_idx
 
@@ -5799,11 +5802,25 @@ class VirtualSiteHandler(_NonbondedHandler):
                     )
                     diff_keys = key not in vsite_struct[KEY_LIST]
                     same_vsite = self._same_virtual_site_type(vs_i, vs_j)
+                    # print(f"before, {combined_orientations=}")
+                    # import ipdb; ipdb.set_trace()
 
+                    # This conditional should probably not evaluate to True while reproducing issue #1206. Each
+                    # individual thing seems reasonable enough but something about this doesn't capture the breaking
+                    # case, and it's not clear how all of the necessary information can be gathered and processed
+                    # right here in a way that fixes #1206 _and_ doesn't break other stuff.
                     if same_atoms and same_vsite and diff_keys:
-                        combined_orientations[i][KEY_LIST].append(key)
+                        # print(f"{combined_orientations=}")
+                        # combined_orientations[i][VSITE_TYPE].append(vs_i)
+                        # The last match in the atom_matches iterator takes precendence according to SMIRNOFF rules;
+                        # since only the first orientation is processed by `_add_virtual_site` later, one idea is to
+                        # simply override the other orientations with the one that' scurrently found. This might need
+                        # a match="once" conditional.
+                        combined_orientations[i][KEY_LIST] = [key]
+                        combined_orientations[i][VSITE_TYPE] = vs_i
                         found = True
 
+                    # print(f"after, {combined_orientations=}")
                     # Skip out early since there is no reason to keep
                     # searching since we will never add the same
                     # particle twice
@@ -5850,13 +5867,19 @@ class VirtualSiteHandler(_NonbondedHandler):
 
             top_mol = Topology.from_molecules([molecule])
             matches = self.find_matches(top_mol, expand_permutations=True)
-
+            # print("[k for k in matches.keys()]")
+            # print([k for k in matches.keys()])
+            # In reproducing issue #1206, this method call is where information about the " 'distance': [0.220]" virtual site
+            # type is lost.
             virtual_sites = self._reduce_virtual_particles_to_sites(matches)
+            # print("[v[1] for v in virtual_sites]")
+            # print([v[1] for v in virtual_sites])
 
             # Now handle the vsites for this molecule
             # This call batches the key tuples into a single list, in order
             # for the virtual site to represent multiple particles
             for vsite_type, orientations in virtual_sites:
+                # orientations = [orientations[0]]  # import ipdb; ipdb.set_trace()
                 vsite_type.add_virtual_site(molecule, orientations, replace=True)
 
     def _create_openmm_virtual_sites(self, system, force, topology, molecule):

@trevorgokey
Copy link
Collaborator

I'll head to starbucks and take a look tonight :) My initial reaction is that the issue is before this because this code shouldn't see the first site as only the second should have been applied. If anything, it seems the intention from your eyes would be same_vsite should evaluate to False, but evaluates to True because self._same_virtual_site_type(vs_i, vs_j) only checks name and type. This code highlights my previous statement where we can't see tags at this point, so from the function's perspective, type, name, and indices are all the same and it is essentially trying to combine the particles of two different parameters, which is undefined behavior.

@j-wags
Copy link
Member Author

j-wags commented Mar 5, 2022

Thanks for the feedback. I'd love your eyes on it, but you're not on the hook to fix this, @trevorgokey (and even if you were, you shouldn't work on a Friday night!) :-)

I'm just in the middle of rolling out a comprehensive vsite test suite, so I'll be opening issues to keep tabs on things like these. Hopefully they'll culminate in a big PR that closes several issues at once.

@trevorgokey
Copy link
Collaborator

I understand that the implementation is complex and contains lots of invisible comments that only I can see, so I am happy to to help.

@trevorgokey
Copy link
Collaborator

Just started this, and I'm glad I went through the trouble to get the biopolymer branch up. My first note is that:

  1. mol = Molecule.from_mapped_smiles('[Cl:1][Na:2]') -> [0, 1] -0.101
  2. mol = Molecule.from_mapped_smiles('[Cl:2][Na:1]') -> [0, 1] -0.22
  3. mol = Molecule.from_mapped_smiles('[Na:1][Cl:2]') -> [0, 1] -0.22
  4. mol = Molecule.from_mapped_smiles('[Na:2][Cl:1]') -> [0, 1] -0.101
  5. mol = Molecule.from_smiles('[Cl][Na]') -> [0, 1] -0.101
  6. mol = Molecule.from_smiles('[Na][Cl]') -> [0, 1] -0.22

So it appears to work whenever Na is the "first" atom. The assignments correspond to the order of the atoms in the parameter smirks, which seems to align with assumption [1]. The first parameter has Cl first, and so the rows above which match this results in d=-.101, and when Na is first we have d=-.22.

@trevorgokey
Copy link
Collaborator

trevorgokey commented Mar 5, 2022

Ok, this was my mistake for letting this razor sharp case fall through the cracks. I think I can at least offer a potential solution that solves the above case, and should work for others. The issue is that there was implicit assumption that a permutation of two keys e.g. 0,1 and 1,0 with the same name and the same type were the same virtual site. This is clearly a bad assumption since this issue exists. To get around this, I modified the equality operator to also check for smirks equivalence as the final tie breaker here which will separate distinguish the two parameters. With the code now able to distinguish 0,1 and 1,0, the only remaining task was to clear out any previous matches if there is a new match as a new parameter. This approach should also work for match="all_permutations". The only downside is that it now needs to revisit all current matches that have the same index permutation, as they could be a different parameter from new incoming one, possibly slowing down vsite assignment for large systems.

Here is my proposed fix:

diff --git a/openff/toolkit/typing/engines/smirnoff/parameters.py b/openff/toolkit/typing/engines/smirnoff/parameters.py
index 1dad135f..ddcc3d38 100644
--- a/openff/toolkit/typing/engines/smirnoff/parameters.py
+++ b/openff/toolkit/typing/engines/smirnoff/parameters.py
@@ -5228,7 +5228,7 @@ class VirtualSiteHandler(_NonbondedHandler):
         def __eq__(self, obj):
             if type(self) != type(obj):
                 return False
-            A = ["name"]
+            A = ["name", "smirks"]
             are_equal = [getattr(self, a) == getattr(obj, a) for a in A]
             return all(are_equal)
 
@@ -5626,23 +5626,41 @@ class VirtualSiteHandler(_NonbondedHandler):
                         for new_match in v:
                             unique = True
                             new_item = new_match._parameter_type
-                            for idx, (name, existing_match) in enumerate(
-                                matches[k].items()
-                            ):
-                                existing_item = existing_match._parameter_type
-                                same_parameter = False
-
-                                same_type = type(existing_item) == type(new_item)
-                                if same_type:
-                                    same_parameter = existing_item == new_item
-
-                                # same, so replace it to have a FIFO priority
-                                # and the last parameter matching wins
-                                if same_parameter:
-                                    matches[k][new_item.name] = new_match
-                                    unique = False
+                            for existing_indices in matches:
+                                # only go over existing indices which are a
+                                # permutation of our current match
+                                if set(k).difference(existing_indices):
+                                    continue
+                                # the goal here is to go over all current match
+                                # indices, and potentially clear them out if
+                                # our new parameter is a match
+                                for idx, (name, existing_match) in enumerate(
+                                    list(matches[existing_indices].items())
+                                ):
+                                    existing_item = existing_match._parameter_type
+                                    same_parameter = False
+
+                                    same_type = type(existing_item) == type(new_item)
+                                    if same_type:
+                                        same_parameter = existing_item == new_item
+
+                                    # same, so replace it to have a FIFO priority
+                                    # and the last parameter matching wins
+                                    if same_parameter:
+                                        matches[k][new_item.name] = new_match
+                                        unique = False
+                                    else:
+                                        # If we are here, we have the same
+                                        # set of indices as the new match,
+                                        # but the parameter is not the same.
+                                        # In this case, we need to clear out
+                                        # any existing vsites, since this
+                                        # new incoming vsite needs to replace 
+                                        # them
+                                        matches[existing_indices].pop(existing_item.name)
                             if unique:
                                 marginal_matches.append(new_match)
+
                         matches[k].update(
                             {p._parameter_type.name: p for p in marginal_matches}
                         )

and here is a modification of the above test to make sure match="all_permutations" works:

from openff.toolkit.typing.engines.smirnoff import ForceField
from openff.toolkit.topology import Molecule
from openff.units import unit
from openmm import unit as omm_unit
import pdb

#smis = ['[Na:2][Cl:1]', '[Na:1][Cl:2]', '[Cl:2][Na:1]', '[Cl:1][Na:2]', '[Na][Cl]', '[Cl][Na]']
smis = ['[N:3]=[C:2]=[O:1]', '[N:1]=[C:2]=[O:3]', '[N]=[C]=[O]', '[O:3]=[C:2]=[N:1]', '[O:1]=[C:2]=[N:3]', '[O]=[C]=[N]']

for smi in smis:

    if ":1" in smi:
        mol = Molecule.from_mapped_smiles(smi)
    else:
        mol = Molecule.from_smiles(smi)
    ff = ForceField()
    vsh = ff.get_parameter_handler('VirtualSites', {"version":0.3})

    # Define two vsites that apply in opposite directions to the same atoms of the test molecule. 
    # The vsites may collide (since they have the same name), but they should be 
    # distinguishable in the output based on their `distance` values
    vsh.add_parameter({'type':'BondCharge', 
                          'smirks':'[Cl:1][Na:2]',
                          'match':'once',
                          'charge_increment':[0.100, -0.100] * unit.elementary_charge,
                          'distance': [0.1010] * unit.angstrom,
                          'name':'aaa'})
    vsh.add_parameter({'type':'BondCharge', 
                          'smirks':'[Na:1][Cl:2]',
                          'match':'once',
                          'charge_increment':[0.100, -0.100] * unit.elementary_charge,
                          'distance': [0.220] * unit.angstrom,
                          'name':'aaa'})
    vsh.add_parameter({'type':'DivalentLonePair', 
            'smirks':'[*:1]~[#6:2]~[*:3]',
                          'match':'all_permutations',
                          'charge_increment':[0.100, -0.100, 0.100] * unit.elementary_charge,
                          'distance': [0.30] * unit.angstrom,
                          'outOfPlaneAngle': [45.0] * unit.degrees,
                          'name':'aaa'})
    vsh.add_parameter({'type':'DivalentLonePair', 
            'smirks':'[*:3]~[#6:2]~[*:1]',
                          'match':'all_permutations',
                          'charge_increment':[0.100, -0.100, 0.100] * unit.elementary_charge,
                          'distance': [0.40] * unit.angstrom,
                          'outOfPlaneAngle': [45.0] * unit.degrees,
                          'name':'aaa'})

    sys = ff.create_openmm_system(mol.to_topology())

    # For each vsite, print out which particles are involved, and at which distance.
    #import pdb; pdb.set_trace()
    for vsite_idx in range(sys.getNumParticles()):
        if not(sys.isVirtualSite(vsite_idx)):
            continue
        vsite = sys.getVirtualSite(vsite_idx)
        vsite_parents = []
        for vsite_parent_idx in range(vsite.getNumParticles()):
            vsite_parent = vsite.getParticle(vsite_parent_idx)
            vsite_parents.append(vsite_parent)
        distance = vsite.getLocalPosition()[0].value_in_unit(omm_unit.angstrom)
        print(smi, vsite_parents, distance)

I can certainly make a PR, but I figured you would want to just apply the patch by hand and play around.
Time taken: 1.1 cups of coffee :)

@mattwthompson
Copy link
Member

Please do make a PR - it'll help me understand the changes if I can quickly pull them down locally

trevorgokey added a commit that referenced this issue Mar 7, 2022
Additionally, vsite parameter type __eq__ now checks smirks
@trevorgokey trevorgokey mentioned this issue Mar 7, 2022
1 task
@trevorgokey
Copy link
Collaborator

Sorry for the delay; PR is up.

@j-wags
Copy link
Member Author

j-wags commented Mar 8, 2022

Time taken: 1.1 cups of coffee :)

Ha! 1.1 cups of coffee well spent :-)

The PR is showing some new issues, I think this may turn into whack-a-mole but I'll comment more there. I'm working on a more comprehensive vsite test suite that could help us ensure we fix this and don't lose ground somewhere that we're not aware of.

@trevorgokey
Copy link
Collaborator

Yeah, when I was looking at that failing test I was scratching my head at first. There are likely a few more tests than can hammer things out. The tests use a very symmetric problem, and I wouldn't be surprised if your clever NaCl representation tricks show up in other places.

In any case, I am glad you are sitting down and really trying to hammer out a thorough test suite. As you can see, I only wrote the "does it work, in theory" tests, and you are making sure it works in practice. Much appreciated!

mattwthompson added a commit that referenced this issue Mar 14, 2022
mattwthompson added a commit that referenced this issue Mar 14, 2022
mattwthompson added a commit that referenced this issue Mar 14, 2022
Co-authored-by: Jeff Wagner <jwagnerjpl@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants