Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid duplicate residue atomptying #138

Merged
merged 5 commits into from
Sep 11, 2017

Conversation

mattwthompson
Copy link
Member

Big thanks to @summeraz and @justinGilmer for codemonkey-ing this out.

Currently, each residue in a system is atomtyped even if an identical residue was previously atomtyped. For example, the current atomtyper would atomtype each water in a million-water system, which is a waste of time for larger systems. The approach here is to create a map of unique, already-atomtyped residues, so the 2nd to Nth residue don't call the subgraph isomorphism routines and instead look back at a prototype that was added to the map when it was first found.

  • It appears to not be possible to slice openmm.app.Topology so I wrote a helper function that takes a openmm.app.Topology.Residue and returns an otherwise-identical openmm.app.Topology. This could be done cleaner if slicing is implemented in the future. This can casually be done in ParmEd but we found it to be expensive and unnecessary to move between the two types. We tested a number of different combinations of going between the two, but found it was best to stay in openmm the entire time.

  • The case of atoms in a residue being bonded to other residues is not trivial to handle, so I added a check for this, and then use the current atomtyping method if this is the case. We might want to revisit this for large polymer or protein systems, but this works fine when so-called "residues" are actually separate molecules.

There is still some overhead associated with this method, but the result is much cheaper for these types of systems. I have done some benchmarking on a system of butane molecules to show this:

image

The "brute-force" curve is from the current master branch and "with-residues" is from this branch. I was unable to run the 100,000 residue systems, even on the Rahman head node, due to memory issues. The speedup converges to a factor of about 20, which I suspect could be improved upon in the future. The linear scaling is a bit of a concern, but I haven't gotten to the bottom of it yet.

topology = new_system
independent_residues = _check_independent_residues(topology)

if independent_residues == True:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, when checking if a boolean value is True or False, it is better to just check like so:

if independent_residues:
   ...


for res_atom in res.atoms():
topology_atom = topology.addAtom(name=res_atom.name,
#element=elem.Element.getBySymbol(res_atom.name),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stray comment?

chain = topology.addChain()
new_res = topology.addResidue(res.name, chain)

atoms = dict() # omm.Atom in res : omm.Atom in topology
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# {omm.Atom from res: omm.Atom in *new* topology}

Just to show dictionary relation a bit cleaner.
And this is the newly generated Topology correct?

Maybe add a bit more detail to comment, or a more detailed docstring. Not necessary in terms of PEP8 for private methods, but could be useful for debugging later.

return True


def _update_atomtypes(unatomtyped_topology, atomtyped_prototype_topology):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 line comment to describe method?

@@ -298,7 +345,26 @@ def createSystem(self, topology, atomtype=True, nonbondedMethod=NoCutoff,
the newly created System
"""
if atomtype:
find_atomtypes(topology, forcefield=self)
independent_residues = _check_independent_residues(topology)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like having atomtyping by residue act as the default behavior, but perhaps (as we spoke about regarding unit tests) having a by_residue argument to createSystem(), or something of that sort would be useful just so that we have the option to revert to the old behavior if desired. We are already passing **kwargs to createSystem(), so we should be able to pass by_residue straight from pmd.Structure.apply() or mb.Compound.apply().

for res_atom in res.atoms():
topology_atom = topology.addAtom(name=res_atom.name,
#element=elem.Element.getBySymbol(res_atom.name),
element=res_atom.element,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line may cause some problems when trying to atomtype coarse-grained systems (where the convention for Foyer is to prepend "element" names with an underscore). This could be tested with a box of united-atom alkanes since the TraPPE forcefield is already contained in Foyer. If it turns out there's not a good way around this, we could have createSystem check to see if the system contains any coarse-grained particles, in which case the old approach to atomtyping could be used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inside of apply, _topology_from_residue acts on a topology that was created with generate_topology so this case should already be handled.

new_topology = topology

for key, val in residue_map.items():
new_topology = _update_atomtypes(topology, val)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could have this just as _update_atomtypes(topology, val) and have the _update_atomtypes function act directly on the topology object, rather than creating additional topology instances

new_topology = topology

for key, val in residue_map.items():
new_topology = _update_atomtypes(topology, val)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also pass the residue name here as well. So something like:

for res_name, res_template in residue_map.items():
   _update_atomtypes(topology, res_name, res_template)

This would eliminate the need to search for the residue name within _update_atomtypes.

for key, val in residue_map.items():
new_topology = _update_atomtypes(topology, val)

topology = new_topology
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, this should be unnecessary if we operate directly on topology inside of _update_atomtypes

Copy link
Member

@ctk3b ctk3b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Thanks everyone. Just one minor request for an additional test.

oplsaa.createSystem(topo, use_residue_map = True))
without_map = pmd.openmm.load_topology(topo,
oplsaa.createSystem(topo, use_residue_map = False))
[a.type for a in with_map.atoms] == [a.type for a in without_map.atoms]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add a test where there are bonds between residues?

@mattwthompson
Copy link
Member Author

Anything I missed or is this good to go?

@chrisiacovella
Copy link
Contributor

Can you do some tests as a function of residue size? E.g., plot this as a function of total atoms in the system, with different series for say, butane, octane, dodecane, etc. I'm assuming performance will increase as you reduce the overhead related to this operation by considering fewer residues

@summeraz summeraz merged commit 64f7a8c into mosdef-hub:master Sep 11, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants