Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move atomtyping step into a specific function #159

Merged
merged 11 commits into from
Mar 19, 2018

Conversation

mattwthompson
Copy link
Member

Dicussed in #157, but to repeat the important bit:

Currently there's a bit of code at the beginning of Forcefield.createSystem that manages the atomtyping step, whereas this should more cleanly done by being sectioned off to a different function.

  • I'm open to naming the function something more elegant, I'm not happy with Forcefield.run_atomtyping but couldn't come up with something better.

  • I'm not sure if it should be a helper function, or whatever Forcefield._run_atomtyping would be called. It seems possible for a user to want to call it directly without creating the entire openmm.System or parmed.Structure.

  • I don't think a test is necessary for this, as I'm just moving code around. But I also couldn't think of how a useful test would be structured. I could verify that the returned openmm.Topology has atomtype information, but this should cause later tests to crash anyway.

  • I added use_residue_map as an argument to Forcefield.apply. I don't know why it wasn't there before, and don't see any reason for it to be buried inside these other functions.

Copy link
Member

@ctk3b ctk3b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Could you also add the check to make sure that every atom has atomtypes into this PR?

@@ -308,6 +308,8 @@ def apply(self, topology, references_file=None, *args, **kwargs):
references_file : str, optional, default=None
Name of file where force field references will be written (in Bibtex
format)
use_residue_map : boolean
Bypass atomtyping duplicate residues using a residue map
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you maybe clarify a bit more what this does via the variable name and/or the description here? Maybe something along the lines of copy_atomtypes_to_duplicate_residues

@mattwthompson
Copy link
Member Author

Do you want this in a test or also in the function itself?

@ctk3b
Copy link
Member

ctk3b commented Mar 8, 2018

The function itself. We cannot build a system if there are missing atomtypes.

@mattwthompson
Copy link
Member Author

I don't think the docstring I added is great (please suggest fixes/improvements!) but it's more descriptive. Personally I like the name of the argument - it's not great that it describes a variable that's totally internal -
but it's a difficult feature to describe in an argument alone.

@mattwthompson
Copy link
Member Author

And coveralls is whining that I added one un-tested line but only three tested lines 🙄

@ctk3b
Copy link
Member

ctk3b commented Mar 9, 2018

Doc string LGTM! Would be easy to add a test case that triggers the error to get that coverage up 😁

Copy link
Contributor

@summeraz summeraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just a few comments.

@@ -308,6 +308,15 @@ def apply(self, topology, references_file=None, *args, **kwargs):
references_file : str, optional, default=None
Name of file where force field references will be written (in Bibtex
format)
use_residue_map : boolean
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple nitpicks in this docstring. Could you make this use_residue_map : boolean, optional, default=True?

Store atomtyped topologies of residues to a dictionary that maps
them to residue names. Each topology, including atomtypes, will be
copied to other residues with the same name. This avoids repeatedly
calling the subgraph isomorphism on idential residues and should
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo (should be identical)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should probably have a vim plugin that prevents me from making these sorts of mistakes....

them to residue names. Each topology, including atomtypes, will be
copied to other residues with the same name. This avoids repeatedly
calling the subgraph isomorphism on idential residues and should
result in better performance for systems with many identital
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo (should be identical)

@@ -317,6 +326,7 @@ def apply(self, topology, references_file=None, *args, **kwargs):
positions = np.empty(shape=(topology.getNumAtoms(), 3))
positions[:] = np.nan
box_vectors = topology.getPeriodicBoxVectors()
topology = self.run_atomtyping(topology, use_residue_map=use_residue_map)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

----------
topology : openmm.app.Topology
Molecular structure to find atom types of
use_residue_map : boolean
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto the comments on the above docstring

raise ValueError('Not all atoms in topology have atom types')

return topology

def createSystem(self, topology, atomtype=True, use_residue_map=True,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to remove the atomtype and use_residue_map arguments now. In fact, is createSystem now exactly the same as OpenMM's? If so, we could actually delete this entire function... (although this may cause problems with #155). Thoughts @ctk3b?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double check me but I think the base method includes the logic for template based atomtyping

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll clean up the arguments but this and a few other ideas with Forcefield.createSystem vs OpenMM's original function should be dealt with at some point. There's an open issue about templating or something that I can't find at the moment but should be referenced.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We think you're right in saying that they do their reside teamplate-based atomtyping in their Forcefield.createSystem. It doesn't appear we can use their base method out of the box.

Worth noting that their while templates are much more involved, we are already doing some work in creating a map between residue/molecule names to atomtyped topologies but trashing it once the system is atomtyped. I don't know if it's feasible or worth considering trying to save residue_map to disk? Or maybe converting it into a single large residueTemplate that could then be passed to OpenMM's base method?

Here's their method at a point in history that should be close to when we grabbed it:

https://github.com/pandegroup/openmm/blob/c9fcabb5da87ef61ac0c777423b1f3e1d96e654e/wrappers/python/simtk/openmm/app/forcefield.py#L1041

Also notable that there have been some updates to that block of in the past year. Probably not much of consequence now but something to worry about later, especially if we want great integration with OpenMM ...

@mattwthompson
Copy link
Member Author

I'm having trouble seeing a good way to test for missing atomtypes. If an atom in the topology truly lacks atomtypes defined in the forcefield, I think errors in find_atomtypes would be raised. I could try to manually pop a.id from a random atom to force it, I guess?

@ctk3b
Copy link
Member

ctk3b commented Mar 9, 2018

@mattwthompson
Copy link
Member Author

In this case I just don't know how to raise that error. I get how to use pytest to verify that a desired error has been raised.

@ctk3b
Copy link
Member

ctk3b commented Mar 9, 2018

don't know how to raise that error

I would just call createSystem without the atomtyping flag and it should yell at you

@mattwthompson
Copy link
Member Author

But that argument is gone now (the atomtyping happens before createSystem is called).

We can try applying an inappropriate force field to a molecule,
but that triggers errors elsewhere that are already test for:

    def test_missing_atomtypes():
        ethane = pmd.load_file(get_fn('pf6.mol2'), structure=True)
        oplsaa = Forcefield(name='oplsaa')
        topo, NULL = generate_topology(ethane)
        with pytest.raises(ValueError):
>           oplsaa.run_atomtyping(topo)

foyer/tests/test_forcefield.py:137: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
foyer/forcefield.py:369: in run_atomtyping
    find_atomtypes(residue, forcefield=self)
foyer/atomtyper.py:43: in find_atomtypes
    _resolve_atomtypes(topology)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

topology = <Topology; 1 chains, 1 residues, 7 atoms, 6 bonds>

    def _resolve_atomtypes(topology):
        """Determine the final atomtypes from the white- and blacklists. """
        for atom in topology.atoms():
            atomtype = [rule_name for rule_name in atom.whitelist - atom.blacklist]
            if len(atomtype) == 1:
                atom.id = atomtype[0]
            elif len(atomtype) > 1:
                raise FoyerError("Found multiple types for atom {} ({}): {}.".format(
                    atom.index, atom.element.name, atomtype))
            else:
                raise FoyerError("Found no types for atom {} ({}).".format(
>                   atom.index, atom.element.name))
E               foyer.exceptions.FoyerError: Found no types for atom 0 (phosphorus).

foyer/atomtyper.py:104: FoyerError

@ctk3b
Copy link
Member

ctk3b commented Mar 9, 2018

Ok good point - sorry I misread how you had implemented this. I think we're good then since we already guarantee that all atoms got a type before calling createSystem which is the whole point

@mattwthompson
Copy link
Member Author

One of the travis builds on mac got hung up on something, but the second most recent test did not, and the most recent commit is minor. Should be fine.

Somebody else should give it a once-over before merging, but from my perspective this is good to go.

@ctk3b
Copy link
Member

ctk3b commented Mar 12, 2018

Do you have admin access on travis? To restart builds etc?

@mattwthompson
Copy link
Member Author

No on upstream I don't think

@mattwthompson
Copy link
Member Author

Did I miss anything or is this good to go?

@summeraz
Copy link
Contributor

LGTM. @ctk3b anything else, or are we good to merge?

@ctk3b
Copy link
Member

ctk3b commented Mar 19, 2018

Sorry I missed this - LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants