Skip to content

Unexpected / unintuitive behavior of template generator caches #429

@epretti

Description

@epretti

Discovered some very strange behavior with the molecule cache that might be undesirable; documenting here for discussion.

Consider the following setup:

from openff.toolkit import Molecule
from openmm.app import ForceField
from openmmforcefields.generators import SMIRNOFFTemplateGenerator

def try_make_system(cache, molecules_in, molecule_wanted):
    ff = ForceField()
    generator = SMIRNOFFTemplateGenerator(cache=cache, molecules=molecules_in)
    ff.registerTemplateGenerator(generator.generator)
    ff.createSystem(molecule_wanted.to_topology().to_openmm())

molecule_1 = Molecule.from_smiles("C")
molecule_2 = Molecule.from_smiles("CC")

Then the following behaves as expected:

try_make_system(None, [molecule_1], molecule_1) # Succeeds
try_make_system(None, [], molecule_1)           # Fails because molecule_1 is not in the list

However, the following succeeds:

try_make_system("test_a.cache", [molecule_1], molecule_1)
try_make_system("test_a.cache", [], molecule_1)

It turns out that the "cache" behaves as more than just a parameter cache that can speed up template generation. Loading a cache actually causes the template generator to behave as if those molecules exist in the list.

This is questionable behavior, but the following could be even more confusing:

try_make_system("test_b.cache", [molecule_1, molecule_2], molecule_1) # Succeeds
try_make_system("test_b.cache", [], molecule_1)                       # Succeeds, molecule_1 in the cache
try_make_system("test_b.cache", [], molecule_2)                       # Fails

It turns out that because molecule_2 wasn't in the topology given when the template generator had both molecules in the molecules list, it isn't cached. If between the second and third lines of the above example, you run

try_make_system("test_b.cache", [molecule_2], molecule_2)

then all lines will execute successfully because molecule_2 will have made it into the cache.

This behavior means that when a "cache" is used, instead of having molecules determine what molecules are allowed to get templates generated for them, it could depend on the past history of calls to other template generators, that may not be obvious without inspection of the topologies that were provided to them. I would assume that all of this is an outright bug except that the current implementation explicitly supports it and there is even a test1 for it. If we don't want this behavior, then it should probably be removed; if we still do, it should be documented very explicitly that the cache is more than a cache and can affect the behavior of parameterization.

Footnotes

  1. This test is in test_system_generator.py but this behavior is not related to SystemGenerator, which just passes any cache argument on to the template generator. This behavior is fully implemented within SmallMoleculeTemplateGenerator.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions