-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix trans COOh in Elf10 conformer selection #1171
Conversation
…th openeye toolkit
…ders with openeye toolkit and an ELF10 method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work - I'm super impressed at the linearization of correcting arbitrary numbers of conformers without any for loops. I'm approving this, pending the changes requested below and @SimonBoothroyd's approval.
@SimonBoothroyd This may have an effect on outputted partial charges, by making conformers that would otherwise be discarded from ELF (due to trans-COOH conformations) get corrected before being ELF conformer selection runs. We think this change should be minor, but could you let us know if this is acceptable?
Two asks:
- This should add the
make_carboxylic_acids_cis
kwarg toMolecule.generate_conformers
andRDKitToolkitWrapper.generate_conformers
, to keep the method signatures consistent. - If possible,
_make_carboxylic_acids_cis
should take a ToolkitWrapper/Registry to dictate which backend is used for SMARTS matching.
- _make_carboxylic_acids_cis() defines convenience methods for computing dot products, norms, and dihedral angles in Numpy arrays. Should these methods be broken out, perhaps into the utils module for everyone to use?
Good question. The math is highly optimized and pretty complex, so I'm fine with it living inside of the method, since that where a developer would be looking if there were a problem.
- Should we make make_carboxylic_acids_cis True by default in the generate_conformers() method?
I think "no", because this puts us in the position of being a cheminformatics toolkit, and that opens a dangerous door.
- Should we add an argument to assign_partial_charges and assign_fractional_bond_orders to not make carboxylic acids cis?
I don't think this is necessary. If users want to see conformers get deleted they can provide them themselves, and you point out that this only is set to True for ELF-based charge methods.
@@ -3615,6 +3619,80 @@ def test_apply_elf_conformer_selection(self, toolkit): | |||
initial_conformers[1].value_in_unit(unit.angstrom), | |||
) | |||
|
|||
def test_make_carboxylic_acids_cis(self): | |||
offmol = Molecule.from_smiles("CC(O)=O") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be switched to a mapped SMILES (or become a new entry in create_molecules.py
) to ensure that the coordinates below are always interpreted to refer to the same atoms.
openff/toolkit/topology/molecule.py
Outdated
|
||
# Pull out the coordinates of all carboxylic acid groups into cooh_xyz | ||
# cooh_xyz is an array with shape (n_cooh_groups, n_conformers, 4, 3) | ||
cooh_indices = self.chemical_environment_matches("[C:2]([O:3][H:4])=[O:1]") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(not blocking) In other circumstances we'd want to pass in a toolkit registry here, and have the method take a toolkit registry as input. But since this is a private method I don't think it's necessary
Co-authored-by: Jeff Wagner <jwagnerjpl@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this @Yoshanuikabundi ! In general this looks good, I'd just want to see the tests tidied up and made a bit more obvious with less maintenance burden.
@davidlmobley may want to have a quick scan of this, in particular these lines to make sure this aligns with his understanding of the problem as well as he's had more conversations with Chris about this.
# OK, we've checked a single conformer in great detail, now lets check | ||
# a thousand quickly but less robustly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% sure of the value add of the next set of tests - the code itself is pretty involved and almost needs a test of its own 😉
If you want to make sure that a molecule with two or more COOH groups or correctly handled, I'd recommend just using oxalic and malonic acid and then programatically generating cis and trans conformers (you can set a dihedral angle explicitly in both OE and RD)
In general though I'd recommend keeping unit tests as simple and without scope for error as much as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I basically just wanted to throw a lot of different stuff at it and check that a few properties held:
- It was cis now
- Only the H in the COOH groups had moved
- OH bond length hadn't changed
Poor man's fuzzing I guess. Having a weird molecule that would generate conformers that were all over the place, and that had plenty of atoms for buggy code to accidentally mutate, seemed like an easy way to generate a lot data for it. The idea was to try and catch edge cases that I wouldn't have thought of testing. I guess that's not really a unit test though (is there somewhere else I should put something similar?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I definitely agree that the test is a mess, should I just get rid of it and write seperate tests for all the edge cases I can think of? Or am I just being paranoid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's a mixed bag in terms of code complexity and test thoroughness. I appreciate the points on both sides, but let's go ahead and delete the lines after this comment.
Co-authored-by: SimonBoothroyd <simon.boothroyd@hotmail.com>
Clarifies test by using simpler molecule and using dummy values to lay out the molecule obviously Co-authored-by: SimonBoothroyd <simon.boothroyd@hotmail.com>
Replying to @SimonBoothroyd :
This is consistent with my recollection. However, I have this quirky problem where I frequently get this backwards, so I am doublechecking with Christopher. However, for now let's assume that you have it right AND my memory is right. :) Will skim the rest of the issue and see if there's anything else I should chime in on, so if I don't respond again it means I agree with everything here. :) cc @j-wags |
Seems like potentially yes.
Yes
I don't think so. Past experience says this is always a bad idea for MM methods, so we should not make it easy for users to do this. They could probably use user-supplied charges if they really needed this. (Though I'm willing to be persuaded otherwise...) Actually, maybe I'll reconsider -- "hard but possible" may be the right place to be, as I can see wanting to check how much assigned force field parameters would vary depending on COOH orientation when using WBO interpolation, and it would be substantially harder if the toolkit didn't allow it. So maybe there should be a toolkit-level override that comes with disclaimers/warnings. I don't think any of that is blocking, just wanted to give my two cents on the questions which were raised. |
I checked the text Simon referenced with Chris and he confirms this is correct. |
Thanks for the feedback @davidlmobley, you've resolved the blocking question that we had! Interestingly, it looks like we largely disagree on the specific questions originally raised by @Yoshanuikabundi. This can easily get bogged down into bikeshedding, so @Yoshanuikabundi please resolve in the following order:
I'll see if I can check in with @SimonBoothroyd before he heads out about whether the his merge block should persist after we delete the extended test that he requested removing, and will report back if so. |
OK. I have a strong preference for making
The current code in from openff.toolkit.topology import Molecule
from openff.toolkit.utils import OpenEyeToolkitWrapper
mol = Molecule.from_smiles("C(=O)OH", toolkit_registry=OpenEyeToolkitWrapper())
mol.generate_conformers(make_carboxylic_acids_cis=False, toolkit_registry=OpenEyeToolkitWrapper())
mol.assign_partial_charges("am1bccelf10", use_conformers=mol.conformers, toolkit_registry=OpenEyeToolkitWrapper())
mol.visualize("nglview") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great. Thanks for the back-and-forth, everyone. I've received permission from Simon to remove the block from merging this once the requested code changes were made, so this should be good to go now!
b1d8e7e Fix trans COOh in Elf10 conformer selection (#1171) * Add clear error message to openeye apply_elf_conformer_selection in silent failures * Fix apply_elf_conformer_selection when specifying rdkit wrapper * Add test for Molecule.apply_elf_conformer_selection with toolkit wrapper * Add Molecule._make_carboxylic_acids_cis method + tests * Make COOH cis when generating conformers in assign_partial_charges with openeye toolkit * Make COOH cis when generating conformers in assign_fractional_bond_orders with openeye toolkit and an ELF10 method * Update changelog * Typo in release notes * Update docs/releasehistory.md Co-authored-by: Jeff Wagner <jwagnerjpl@gmail.com> * Improve robustness of make_carboxylic_acids_cis tests * Remove two safety checks from molecule.py now that they're in test_molecule.py * Add make_carboxylic_acids_cis switch to rdkit and molecule generate_conformers() * Add toolkit_registry argument to Molecule._make_carboxylic_acids_cis() * Remove erroneous comment from _make_carboxylic_acids_cis * _make_carboxylic_acids_cis: Exit early if molecule has no COOH groups * Make OE assign methods consistent * Update openff/toolkit/topology/molecule.py Co-authored-by: SimonBoothroyd <simon.boothroyd@hotmail.com> * Switch make_carboxylic_acids_cis test to formic acid Clarifies test by using simpler molecule and using dummy values to lay out the molecule obviously Co-authored-by: SimonBoothroyd <simon.boothroyd@hotmail.com> * Lint and clean up * Simplify test_make_carboxylic_acids_cis * generate_conformers makes COOH cis by default Co-authored-by: Jeff Wagner <jwagnerjpl@gmail.com> Co-authored-by: SimonBoothroyd <simon.boothroyd@hotmail.com>
* Port #1171 b1d8e7e Fix trans COOh in Elf10 conformer selection (#1171) * Add clear error message to openeye apply_elf_conformer_selection in silent failures * Fix apply_elf_conformer_selection when specifying rdkit wrapper * Add test for Molecule.apply_elf_conformer_selection with toolkit wrapper * Add Molecule._make_carboxylic_acids_cis method + tests * Make COOH cis when generating conformers in assign_partial_charges with openeye toolkit * Make COOH cis when generating conformers in assign_fractional_bond_orders with openeye toolkit and an ELF10 method * Update changelog * Typo in release notes * Update docs/releasehistory.md Co-authored-by: Jeff Wagner <jwagnerjpl@gmail.com> * Improve robustness of make_carboxylic_acids_cis tests * Remove two safety checks from molecule.py now that they're in test_molecule.py * Add make_carboxylic_acids_cis switch to rdkit and molecule generate_conformers() * Add toolkit_registry argument to Molecule._make_carboxylic_acids_cis() * Remove erroneous comment from _make_carboxylic_acids_cis * _make_carboxylic_acids_cis: Exit early if molecule has no COOH groups * Make OE assign methods consistent * Update openff/toolkit/topology/molecule.py Co-authored-by: SimonBoothroyd <simon.boothroyd@hotmail.com> * Switch make_carboxylic_acids_cis test to formic acid Clarifies test by using simpler molecule and using dummy values to lay out the molecule obviously Co-authored-by: SimonBoothroyd <simon.boothroyd@hotmail.com> * Lint and clean up * Simplify test_make_carboxylic_acids_cis * generate_conformers makes COOH cis by default Co-authored-by: Jeff Wagner <jwagnerjpl@gmail.com> Co-authored-by: SimonBoothroyd <simon.boothroyd@hotmail.com> * Manually update actions Bump upload-artifact * Fixes for units changes Co-authored-by: Josh A. Mitchell <yoshanuikabundi@gmail.com> Co-authored-by: Jeff Wagner <jwagnerjpl@gmail.com> Co-authored-by: SimonBoothroyd <simon.boothroyd@hotmail.com>
OpenEye Omega has a bug where when it is asked to sample different hydrogen positions, it produces trans carboxylic acids, quite often taking precedence over the more physical cis carboxylic acids. This is especially problematic when the conformations are passed on to the
apply_elf_conformer_selection()
method, which rejects trans-COOH and produces no conformers. Previously, this resulted in an exception without any text. As @j-wags and I discussed in our last checkin, this PR:apply_elf_conformer_selection
describing this problem_make_carboxylic_acids_cis()
method toFrozenMolecule
. This method is implemented in Numpy with no Python loops and no reference to external chemical toolkits, so it should be fast even with lots of conformationsapply_elf_conformer_selection
method using the toolkit wrapper, which previously did not workmake_carboxylic_acids_cis
, to the OpenEyegenerate_conformers()
method. False by default, this argument runs_make_carboxylic_acids_cis()
on the output molecule if set to Truemake_carboxylic_acids_cis
argument to True whengenerate_conformers
is run viaassign_partial_charges
orassign_fractional_bond_orders()
with an ELF method, which are the main external API points where users will touch this workflow.Open questions:
_make_carboxylic_acids_cis()
defines convenience methods for computing dot products, norms, and dihedral angles in Numpy arrays. Should these methods be broken out, perhaps into theutils
module for everyone to use?make_carboxylic_acids_cis
True by default in thegenerate_conformers()
method?assign_partial_charges
andassign_fractional_bond_orders
to not make carboxylic acids cis?Fixes #428 (I think)