Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make offmol.smirnoff_impropers and offmol.single_impropers #746

Closed
maxentile opened this issue Oct 16, 2020 · 8 comments · Fixed by #799
Closed

Make offmol.smirnoff_impropers and offmol.single_impropers #746

maxentile opened this issue Oct 16, 2020 · 8 comments · Fixed by #799

Comments

@maxentile
Copy link
Member

Describe the bug
Not really a bug, possibly naive question about intended behavior in enumerating indices for improper torsions.

I was surprised at the large number of torsions included in offmol.impropers for some molecules (e.g. len(methane.impropers) == 24).

The way offmol.impropers are enumerated appears maximally permissive, in that it doesn't place a constraint on the connectivity or identity of the central atom (except that it must have connectivity of >= 3). I think it's equivalent to returning all chemical environment matches for the SMARTS pattern "[*:1]~[*:2](~[*:3])~[*:4]" ("a central atom and 3 tagged neighbors").

Are there other constraints we wish to impose on the indices returned by offmol.impropers? For example, would it be desirable to impose a constraint that the connectivity of the central atom is exactly 3 ("[*:1]~[X3:2](~[*:3])~[*:4]" )?

To Reproduce

from openforcefield.topology import Molecule

improper_smarts = "[*:1]~[X3:2](~[*:3])~[*:4]"
#permissive_improper_smarts = "[*:1]~[*:2](~[*:3])~[*:4]"

smiles_list = ["C", "N", "CCCCC", 'C'*1000]

def compare_improper_enumeration_schemes(smiles):
    offmol = Molecule.from_smiles(smiles)

    smarts_inds = offmol.chemical_environment_matches(improper_smarts)
    offmol_inds = [tuple([atom.molecule_atom_index for atom in improper]) for improper in offmol.impropers]

    print(f'smiles:\n\t{smiles}')
    print(f'matches for substructure search "{improper_smarts}":\n\t{len(smarts_inds)}')
    print(f'offmol.impropers:\n\t{len(offmol_inds)}')
    print(f'set(smarts_matches) is subset of set(offmol.impropers):\n\t{set(smarts_inds).issubset(offmol_inds)}\n\n')

for smiles in smiles_list:
    compare_improper_enumeration_schemes(smiles)

Output

smiles:
	C
matches for substructure search "[*:1]~[X3:2](~[*:3])~[*:4]":
	0
offmol.impropers:
	24
set(smarts_matches) is subset of set(offmol.impropers):
	True


smiles:
	N
matches for substructure search "[*:1]~[X3:2](~[*:3])~[*:4]":
	6
offmol.impropers:
	6
set(smarts_matches) is subset of set(offmol.impropers):
	True


smiles:
	CCCCC
matches for substructure search "[*:1]~[X3:2](~[*:3])~[*:4]":
	0
offmol.impropers:
	120
set(smarts_matches) is subset of set(offmol.impropers):
	True


smiles:
	CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
matches for substructure search "[*:1]~[X3:2](~[*:3])~[*:4]":
	0
offmol.impropers:
	24000
set(smarts_matches) is subset of set(offmol.impropers):
	True

Computing environment:
openforcefield==0.7.1

@j-wags
Copy link
Member

j-wags commented Oct 16, 2020

The intended behavior here is hard to define. The word impropers will be interpreted differently in SMIRNOFF-land ("give me all the trefoil impropers") versus AMBER-land ("each center has one unique particle tuple which is correct, and that is the only way").

I see where you're going with the *X3:2 pattern, but I don't think it's safe to say that either:

  • only trivalent atoms can have impropers assigned
  • all trivalent atoms will have impropers assigned

Impropers are complicated because the only way to know how many of them will be defined is by consulting a ForceField. This means that no output from a Molecule.impropers iterator can be guaranteed to match the number of impropers that will be assigned during parameterization.

Further, during parameterization, the way that we determine the three canonical orderings of a trefoil improper (as opposed to the six possible orderings) is by canonically ordering them according to their Topology particle indices, which may be different from their Molecule particle indices. So, even if we tried to report the trefoil impropers on a molecule, the final parameterized tuples may be different.

I could see two paths forward here:

  • Leave it as "the maximal number of possible impropers", which is what we have now
  • Remove it altogether, since it's fairly confusing

@jchodera
Copy link
Member

I think the issue here is that .impropers returns a huge number of things to sort through, only a small number of which would ever be used by an Amber or SMIRNOFF force field. It seems like there's an easy way to prune down the list of impropers returned to a much smaller list that would still be more than would be assigned, but with a much less crazy overestimate, especially if we split .smirnoff_impropers (central atom is second atom) from .amber_impropers (central atom is first atom).

@j-wags
Copy link
Member

j-wags commented Oct 16, 2020

Ok, that makes sense. Let me take a swing at a spec to add these:

  • Molecule.smirnoff_impropers
    • Should return impropers only for trivalent centers
    • Central atom should be in second position in tuple
    • Return all orderings of non-central atoms (?)
  • Molecule.amber_impropers
    • Same as above, but central atom should be in first position in tuple
  • Add Topology and TopologyMolecule.smirnoff_impropers and amber_impropers similar to above, but with topology instead of molecule indices.

I'm not sure which way to go on "all orderings of non-central atoms" but it seems better to return all orderings, since it'll be easier for users to prune these down than to permute them up.

@maxentile
Copy link
Member Author

Thanks Jeff! That spec makes sense to me.

I'm not sure which way to go on "all orderings of non-central atoms" but it seems better to return all orderings, since it'll be easier for users to prune these down than to permute them up.

I'd also vote for returning all orderings, for the additional reasons that (1) this behavior is simpler to describe and verify than one that tries to canonicalize the ordering or return an arbitrary ordering, (2) this behavior is convenient to implement by chemical_environment_matches.

@mattwthompson
Copy link
Member

mattwthompson commented Oct 16, 2020

Does this spec imply the removal of .impropers in its current state, or leaving it as is (or something in between, like dropping the public .impropers getting and only using ._impropers as something to be fed to .smirnoff_impropers/.amber_impropers)? I'd go for leaving it as is but updating the documentation to make this behavior more explicit (and without this error):
https://github.com/openforcefield/openforcefield/blob/6336eca1240b0f73d4da8010ecc680244b2e6c3a/openforcefield/topology/molecule.py#L3534-L3545

@j-wags
Copy link
Member

j-wags commented Oct 16, 2020

I'd vote for leaving as-is. If we're exposing other methods that provide the needed functionality then there's no pressure for an behavior change/API break. It's a silly method in its current form, but if someone out there is actually using it then there's no need to disrupt them.

@j-wags j-wags changed the title Clarification: intended behavior of offmol.impropers? Make offmol.trefoil_impropers and offmol.single_impropers Oct 26, 2020
@j-wags j-wags changed the title Make offmol.trefoil_impropers and offmol.single_impropers Make offmol.smirnoff_impropers and offmol.single_impropers Oct 26, 2020
@mattwthompson
Copy link
Member

.amber_impropers (central atom is first atom).

I don't think this is true. Everywhere else I can find, and everything I have implemented, indicates that Amber expects the central atom to be listed third.

ParmEd/ParmEd#881 (comment)

openmm/openmm#220 (comment)

@lilyminium
Copy link
Collaborator

lilyminium commented Oct 28, 2021

Agreed. This confuses everyone about annually.

Parmed impropers have the first atom as the central one only for harmonic improper dihedrals; it uses the third for periodic impropers. But because Amber can reverse them, I think it can also be second.

MDAnalysis/mdanalysis#2386 (comment)

michellab/Sire#193

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants