Avoid duplicate residue atomptying #138

mattwthompson · 2017-08-14T01:26:41Z

Big thanks to @summeraz and @justinGilmer for codemonkey-ing this out.

Currently, each residue in a system is atomtyped even if an identical residue was previously atomtyped. For example, the current atomtyper would atomtype each water in a million-water system, which is a waste of time for larger systems. The approach here is to create a map of unique, already-atomtyped residues, so the 2nd to Nth residue don't call the subgraph isomorphism routines and instead look back at a prototype that was added to the map when it was first found.

It appears to not be possible to slice openmm.app.Topology so I wrote a helper function that takes a openmm.app.Topology.Residue and returns an otherwise-identical openmm.app.Topology. This could be done cleaner if slicing is implemented in the future. This can casually be done in ParmEd but we found it to be expensive and unnecessary to move between the two types. We tested a number of different combinations of going between the two, but found it was best to stay in openmm the entire time.
The case of atoms in a residue being bonded to other residues is not trivial to handle, so I added a check for this, and then use the current atomtyping method if this is the case. We might want to revisit this for large polymer or protein systems, but this works fine when so-called "residues" are actually separate molecules.

There is still some overhead associated with this method, but the result is much cheaper for these types of systems. I have done some benchmarking on a system of butane molecules to show this:

The "brute-force" curve is from the current master branch and "with-residues" is from this branch. I was unable to run the 100,000 residue systems, even on the Rahman head node, due to memory issues. The speedup converges to a factor of about 20, which I suspect could be improved upon in the future. The linear scaling is a bit of a concern, but I haven't gotten to the bottom of it yet.

justinGilmer · 2017-08-14T15:30:14Z

foyer/forcefield.py

-            topology = new_system
+            independent_residues = _check_independent_residues(topology)
+
+            if independent_residues == True:


In general, when checking if a boolean value is True or False, it is better to just check like so:

if independent_residues: ...

justinGilmer · 2017-08-14T15:33:38Z

foyer/forcefield.py

+
+    for res_atom in res.atoms():
+        topology_atom = topology.addAtom(name=res_atom.name,
+                         #element=elem.Element.getBySymbol(res_atom.name),


stray comment?

justinGilmer · 2017-08-14T15:40:30Z

foyer/forcefield.py

+    chain = topology.addChain()
+    new_res = topology.addResidue(res.name, chain)
+
+    atoms = dict()  # omm.Atom in res : omm.Atom in topology


# {omm.Atom from res: omm.Atom in *new* topology}

Just to show dictionary relation a bit cleaner.
And this is the newly generated Topology correct?

Maybe add a bit more detail to comment, or a more detailed docstring. Not necessary in terms of PEP8 for private methods, but could be useful for debugging later.

justinGilmer · 2017-08-14T15:45:52Z

foyer/forcefield.py

+    return True
+
+
+def _update_atomtypes(unatomtyped_topology, atomtyped_prototype_topology):


1 line comment to describe method?

summeraz · 2017-08-15T16:19:14Z

foyer/forcefield.py

@@ -298,7 +345,26 @@ def createSystem(self, topology, atomtype=True, nonbondedMethod=NoCutoff,
            the newly created System
        """
        if atomtype:
-            find_atomtypes(topology, forcefield=self)
+            independent_residues = _check_independent_residues(topology)


I like having atomtyping by residue act as the default behavior, but perhaps (as we spoke about regarding unit tests) having a by_residue argument to createSystem(), or something of that sort would be useful just so that we have the option to revert to the old behavior if desired. We are already passing **kwargs to createSystem(), so we should be able to pass by_residue straight from pmd.Structure.apply() or mb.Compound.apply().

summeraz · 2017-08-15T16:30:15Z

foyer/forcefield.py

+    for res_atom in res.atoms():
+        topology_atom = topology.addAtom(name=res_atom.name,
+                         #element=elem.Element.getBySymbol(res_atom.name),
+                         element=res_atom.element,


This line may cause some problems when trying to atomtype coarse-grained systems (where the convention for Foyer is to prepend "element" names with an underscore). This could be tested with a box of united-atom alkanes since the TraPPE forcefield is already contained in Foyer. If it turns out there's not a good way around this, we could have createSystem check to see if the system contains any coarse-grained particles, in which case the old approach to atomtyping could be used.

Inside of apply, _topology_from_residue acts on a topology that was created with generate_topology so this case should already be handled.

summeraz · 2017-08-15T16:39:51Z

foyer/forcefield.py

+                new_topology = topology
+
+                for key, val in residue_map.items():
+                    new_topology = _update_atomtypes(topology, val)


I think we could have this just as _update_atomtypes(topology, val) and have the _update_atomtypes function act directly on the topology object, rather than creating additional topology instances

summeraz · 2017-08-15T16:41:48Z

foyer/forcefield.py

+                new_topology = topology
+
+                for key, val in residue_map.items():
+                    new_topology = _update_atomtypes(topology, val)


You could also pass the residue name here as well. So something like:

for res_name, res_template in residue_map.items(): _update_atomtypes(topology, res_name, res_template)

This would eliminate the need to search for the residue name within _update_atomtypes.

summeraz · 2017-08-15T16:42:35Z

foyer/forcefield.py

+                for key, val in residue_map.items():
+                    new_topology = _update_atomtypes(topology, val)
+
+                topology = new_topology


Again, this should be unnecessary if we operate directly on topology inside of _update_atomtypes

ctk3b

This looks great! Thanks everyone. Just one minor request for an additional test.

ctk3b · 2017-08-15T17:31:13Z

foyer/tests/test_forcefield.py

+        oplsaa.createSystem(topo, use_residue_map = True))
+    without_map = pmd.openmm.load_topology(topo,
+        oplsaa.createSystem(topo, use_residue_map = False))
+    [a.type for a in with_map.atoms] == [a.type for a in without_map.atoms]


Could you also add a test where there are bonds between residues?

mattwthompson · 2017-08-18T19:21:21Z

Anything I missed or is this good to go?

chrisiacovella · 2017-08-18T20:39:35Z

Can you do some tests as a function of residue size? E.g., plot this as a function of total atoms in the system, with different series for say, butane, octane, dodecane, etc. I'm assuming performance will increase as you reduce the overhead related to this operation by considering fewer residues

mattwthompson added 2 commits August 13, 2017 10:13

Use map to avoid duplicate residue atomtyping

e06f749

Check to see if each residue is independent

6baf78d

justinGilmer reviewed Aug 14, 2017

View reviewed changes

justinGilmer added atomtyping feature labels Aug 14, 2017

justinGilmer assigned mattwthompson Aug 14, 2017

mattwthompson added 2 commits August 14, 2017 20:31

Requested fixes

a4c7a82

Test residue mapping as an optional argument

e616800

summeraz requested changes Aug 15, 2017

View reviewed changes

ctk3b requested changes Aug 15, 2017

View reviewed changes

Add tests and requested fixes

eb0f7d2

ctk3b approved these changes Aug 18, 2017

View reviewed changes

summeraz approved these changes Sep 11, 2017

View reviewed changes

summeraz merged commit 64f7a8c into mosdef-hub:master Sep 11, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid duplicate residue atomptying #138

Avoid duplicate residue atomptying #138

mattwthompson commented Aug 14, 2017

justinGilmer Aug 14, 2017

justinGilmer Aug 14, 2017

justinGilmer Aug 14, 2017

justinGilmer Aug 14, 2017

summeraz Aug 15, 2017

summeraz Aug 15, 2017

mattwthompson Aug 15, 2017

summeraz Aug 15, 2017

summeraz Aug 15, 2017

summeraz Aug 15, 2017

ctk3b left a comment

ctk3b Aug 15, 2017

mattwthompson commented Aug 18, 2017

chrisiacovella commented Aug 18, 2017

		return True


		def _update_atomtypes(unatomtyped_topology, atomtyped_prototype_topology):

Avoid duplicate residue atomptying #138

Avoid duplicate residue atomptying #138

Conversation

mattwthompson commented Aug 14, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ctk3b left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattwthompson commented Aug 18, 2017

chrisiacovella commented Aug 18, 2017