-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid duplicate residue atomptying #138
Conversation
foyer/forcefield.py
Outdated
topology = new_system | ||
independent_residues = _check_independent_residues(topology) | ||
|
||
if independent_residues == True: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, when checking if a boolean value is True or False, it is better to just check like so:
if independent_residues:
...
foyer/forcefield.py
Outdated
|
||
for res_atom in res.atoms(): | ||
topology_atom = topology.addAtom(name=res_atom.name, | ||
#element=elem.Element.getBySymbol(res_atom.name), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stray comment?
foyer/forcefield.py
Outdated
chain = topology.addChain() | ||
new_res = topology.addResidue(res.name, chain) | ||
|
||
atoms = dict() # omm.Atom in res : omm.Atom in topology |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# {omm.Atom from res: omm.Atom in *new* topology}
Just to show dictionary relation a bit cleaner.
And this is the newly generated Topology correct?
Maybe add a bit more detail to comment, or a more detailed docstring. Not necessary in terms of PEP8 for private methods, but could be useful for debugging later.
foyer/forcefield.py
Outdated
return True | ||
|
||
|
||
def _update_atomtypes(unatomtyped_topology, atomtyped_prototype_topology): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 line comment to describe method?
foyer/forcefield.py
Outdated
@@ -298,7 +345,26 @@ def createSystem(self, topology, atomtype=True, nonbondedMethod=NoCutoff, | |||
the newly created System | |||
""" | |||
if atomtype: | |||
find_atomtypes(topology, forcefield=self) | |||
independent_residues = _check_independent_residues(topology) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like having atomtyping by residue act as the default behavior, but perhaps (as we spoke about regarding unit tests) having a by_residue
argument to createSystem()
, or something of that sort would be useful just so that we have the option to revert to the old behavior if desired. We are already passing **kwargs
to createSystem()
, so we should be able to pass by_residue
straight from pmd.Structure.apply()
or mb.Compound.apply()
.
foyer/forcefield.py
Outdated
for res_atom in res.atoms(): | ||
topology_atom = topology.addAtom(name=res_atom.name, | ||
#element=elem.Element.getBySymbol(res_atom.name), | ||
element=res_atom.element, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line may cause some problems when trying to atomtype coarse-grained systems (where the convention for Foyer is to prepend "element" names with an underscore). This could be tested with a box of united-atom alkanes since the TraPPE forcefield is already contained in Foyer. If it turns out there's not a good way around this, we could have createSystem
check to see if the system contains any coarse-grained particles, in which case the old approach to atomtyping could be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inside of apply
, _topology_from_residue
acts on a topology that was created with generate_topology
so this case should already be handled.
foyer/forcefield.py
Outdated
new_topology = topology | ||
|
||
for key, val in residue_map.items(): | ||
new_topology = _update_atomtypes(topology, val) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could have this just as _update_atomtypes(topology, val)
and have the _update_atomtypes
function act directly on the topology
object, rather than creating additional topology
instances
foyer/forcefield.py
Outdated
new_topology = topology | ||
|
||
for key, val in residue_map.items(): | ||
new_topology = _update_atomtypes(topology, val) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could also pass the residue name here as well. So something like:
for res_name, res_template in residue_map.items():
_update_atomtypes(topology, res_name, res_template)
This would eliminate the need to search for the residue name within _update_atomtypes
.
foyer/forcefield.py
Outdated
for key, val in residue_map.items(): | ||
new_topology = _update_atomtypes(topology, val) | ||
|
||
topology = new_topology |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, this should be unnecessary if we operate directly on topology
inside of _update_atomtypes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! Thanks everyone. Just one minor request for an additional test.
foyer/tests/test_forcefield.py
Outdated
oplsaa.createSystem(topo, use_residue_map = True)) | ||
without_map = pmd.openmm.load_topology(topo, | ||
oplsaa.createSystem(topo, use_residue_map = False)) | ||
[a.type for a in with_map.atoms] == [a.type for a in without_map.atoms] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also add a test where there are bonds between residues?
Anything I missed or is this good to go? |
Can you do some tests as a function of residue size? E.g., plot this as a function of total atoms in the system, with different series for say, butane, octane, dodecane, etc. I'm assuming performance will increase as you reduce the overhead related to this operation by considering fewer residues |
Big thanks to @summeraz and @justinGilmer for codemonkey-ing this out.
Currently, each residue in a system is atomtyped even if an identical residue was previously atomtyped. For example, the current atomtyper would atomtype each water in a million-water system, which is a waste of time for larger systems. The approach here is to create a map of unique, already-atomtyped residues, so the 2nd to Nth residue don't call the subgraph isomorphism routines and instead look back at a prototype that was added to the map when it was first found.
It appears to not be possible to slice
openmm.app.Topology
so I wrote a helper function that takes aopenmm.app.Topology.Residue
and returns an otherwise-identicalopenmm.app.Topology
. This could be done cleaner if slicing is implemented in the future. This can casually be done inParmEd
but we found it to be expensive and unnecessary to move between the two types. We tested a number of different combinations of going between the two, but found it was best to stay in openmm the entire time.The case of atoms in a residue being bonded to other residues is not trivial to handle, so I added a check for this, and then use the current atomtyping method if this is the case. We might want to revisit this for large polymer or protein systems, but this works fine when so-called "residues" are actually separate molecules.
There is still some overhead associated with this method, but the result is much cheaper for these types of systems. I have done some benchmarking on a system of butane molecules to show this:
The "brute-force" curve is from the current master branch and "with-residues" is from this branch. I was unable to run the 100,000 residue systems, even on the Rahman head node, due to memory issues. The speedup converges to a factor of about 20, which I suspect could be improved upon in the future. The linear scaling is a bit of a concern, but I haven't gotten to the bottom of it yet.