Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix trans COOh in Elf10 conformer selection #1171

Merged
merged 23 commits into from
Feb 24, 2022
Merged

Fix trans COOh in Elf10 conformer selection #1171

merged 23 commits into from
Feb 24, 2022

Conversation

Yoshanuikabundi
Copy link
Collaborator

@Yoshanuikabundi Yoshanuikabundi commented Jan 14, 2022

OpenEye Omega has a bug where when it is asked to sample different hydrogen positions, it produces trans carboxylic acids, quite often taking precedence over the more physical cis carboxylic acids. This is especially problematic when the conformations are passed on to the apply_elf_conformer_selection() method, which rejects trans-COOH and produces no conformers. Previously, this resulted in an exception without any text. As @j-wags and I discussed in our last checkin, this PR:

  • Adds documentation to the apply_elf_conformer_selection describing this problem
  • Adds a more descriptive error when ELF conformer selection fails without a message
  • Adds a private _make_carboxylic_acids_cis() method to FrozenMolecule. This method is implemented in Numpy with no Python loops and no reference to external chemical toolkits, so it should be fast even with lots of conformations
  • Adds tests for the new method
  • Fixes and adds a test for calling the RDKit apply_elf_conformer_selection method using the toolkit wrapper, which previously did not work
  • Adds an argument, make_carboxylic_acids_cis, to the OpenEye generate_conformers() method. False by default, this argument runs _make_carboxylic_acids_cis() on the output molecule if set to True
  • Sets the make_carboxylic_acids_cis argument to True when generate_conformers is run via assign_partial_charges or assign_fractional_bond_orders() with an ELF method, which are the main external API points where users will touch this workflow.

Open questions:

  • _make_carboxylic_acids_cis() defines convenience methods for computing dot products, norms, and dihedral angles in Numpy arrays. Should these methods be broken out, perhaps into the utils module for everyone to use?
  • Should we make make_carboxylic_acids_cis True by default in the generate_conformers() method?
  • Should we add an argument to assign_partial_charges and assign_fractional_bond_orders to not make carboxylic acids cis?

Fixes #428 (I think)

@codecov
Copy link

codecov bot commented Jan 14, 2022

Codecov Report

Merging #1171 (c5b649d) into master (9362b71) will decrease coverage by 48.94%.
The diff coverage is 15.00%.

Copy link
Member

@j-wags j-wags left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work - I'm super impressed at the linearization of correcting arbitrary numbers of conformers without any for loops. I'm approving this, pending the changes requested below and @SimonBoothroyd's approval.

@SimonBoothroyd This may have an effect on outputted partial charges, by making conformers that would otherwise be discarded from ELF (due to trans-COOH conformations) get corrected before being ELF conformer selection runs. We think this change should be minor, but could you let us know if this is acceptable?

Two asks:

  • This should add the make_carboxylic_acids_cis kwarg to Molecule.generate_conformers and RDKitToolkitWrapper.generate_conformers, to keep the method signatures consistent.
  • If possible, _make_carboxylic_acids_cis should take a ToolkitWrapper/Registry to dictate which backend is used for SMARTS matching.
  • _make_carboxylic_acids_cis() defines convenience methods for computing dot products, norms, and dihedral angles in Numpy arrays. Should these methods be broken out, perhaps into the utils module for everyone to use?

Good question. The math is highly optimized and pretty complex, so I'm fine with it living inside of the method, since that where a developer would be looking if there were a problem.

  • Should we make make_carboxylic_acids_cis True by default in the generate_conformers() method?

I think "no", because this puts us in the position of being a cheminformatics toolkit, and that opens a dangerous door.

  • Should we add an argument to assign_partial_charges and assign_fractional_bond_orders to not make carboxylic acids cis?

I don't think this is necessary. If users want to see conformers get deleted they can provide them themselves, and you point out that this only is set to True for ELF-based charge methods.

docs/releasehistory.md Outdated Show resolved Hide resolved
@@ -3615,6 +3619,80 @@ def test_apply_elf_conformer_selection(self, toolkit):
initial_conformers[1].value_in_unit(unit.angstrom),
)

def test_make_carboxylic_acids_cis(self):
offmol = Molecule.from_smiles("CC(O)=O")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be switched to a mapped SMILES (or become a new entry in create_molecules.py) to ensure that the coordinates below are always interpreted to refer to the same atoms.

openff/toolkit/topology/molecule.py Outdated Show resolved Hide resolved

# Pull out the coordinates of all carboxylic acid groups into cooh_xyz
# cooh_xyz is an array with shape (n_cooh_groups, n_conformers, 4, 3)
cooh_indices = self.chemical_environment_matches("[C:2]([O:3][H:4])=[O:1]")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(not blocking) In other circumstances we'd want to pass in a toolkit registry here, and have the method take a toolkit registry as input. But since this is a private method I don't think it's necessary

Copy link
Contributor

@SimonBoothroyd SimonBoothroyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @Yoshanuikabundi ! In general this looks good, I'd just want to see the tests tidied up and made a bit more obvious with less maintenance burden.

@davidlmobley may want to have a quick scan of this, in particular these lines to make sure this aligns with his understanding of the problem as well as he's had more conversations with Chris about this.

openff/toolkit/tests/test_molecule.py Outdated Show resolved Hide resolved
Comment on lines 3671 to 3672
# OK, we've checked a single conformer in great detail, now lets check
# a thousand quickly but less robustly
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure of the value add of the next set of tests - the code itself is pretty involved and almost needs a test of its own 😉

If you want to make sure that a molecule with two or more COOH groups or correctly handled, I'd recommend just using oxalic and malonic acid and then programatically generating cis and trans conformers (you can set a dihedral angle explicitly in both OE and RD)

In general though I'd recommend keeping unit tests as simple and without scope for error as much as possible.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I basically just wanted to throw a lot of different stuff at it and check that a few properties held:

  1. It was cis now
  2. Only the H in the COOH groups had moved
  3. OH bond length hadn't changed

Poor man's fuzzing I guess. Having a weird molecule that would generate conformers that were all over the place, and that had plenty of atoms for buggy code to accidentally mutate, seemed like an easy way to generate a lot data for it. The idea was to try and catch edge cases that I wouldn't have thought of testing. I guess that's not really a unit test though (is there somewhere else I should put something similar?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely agree that the test is a mess, should I just get rid of it and write seperate tests for all the edge cases I can think of? Or am I just being paranoid.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's a mixed bag in terms of code complexity and test thoroughness. I appreciate the points on both sides, but let's go ahead and delete the lines after this comment.

openff/toolkit/topology/molecule.py Outdated Show resolved Hide resolved
openff/toolkit/tests/test_molecule.py Outdated Show resolved Hide resolved
openff/toolkit/tests/test_molecule.py Outdated Show resolved Hide resolved
Yoshanuikabundi and others added 4 commits February 15, 2022 14:54
Co-authored-by: SimonBoothroyd <simon.boothroyd@hotmail.com>
Clarifies test by using simpler molecule and using dummy values to lay out the molecule obviously

Co-authored-by: SimonBoothroyd <simon.boothroyd@hotmail.com>
@davidlmobley
Copy link
Contributor

Replying to @SimonBoothroyd :

@davidlmobley may want to have a quick scan of this, in particular these lines to make sure this aligns with his understanding of the problem as well as he's had more conversations with Chris about this.

This is consistent with my recollection. However, I have this quirky problem where I frequently get this backwards, so I am doublechecking with Christopher. However, for now let's assume that you have it right AND my memory is right. :)

Will skim the rest of the issue and see if there's anything else I should chime in on, so if I don't respond again it means I agree with everything here. :) cc @j-wags

@davidlmobley
Copy link
Contributor

_make_carboxylic_acids_cis() defines convenience methods for computing dot products, norms, and dihedral angles in Numpy arrays. Should these methods be broken out, perhaps into the utils module for everyone to use?

Seems like potentially yes.

Should we make make_carboxylic_acids_cis True by default in the generate_conformers() method?

Yes

Should we add an argument to assign_partial_charges and assign_fractional_bond_orders to not make carboxylic acids cis?

I don't think so. Past experience says this is always a bad idea for MM methods, so we should not make it easy for users to do this. They could probably use user-supplied charges if they really needed this. (Though I'm willing to be persuaded otherwise...)

Actually, maybe I'll reconsider -- "hard but possible" may be the right place to be, as I can see wanting to check how much assigned force field parameters would vary depending on COOH orientation when using WBO interpolation, and it would be substantially harder if the toolkit didn't allow it. So maybe there should be a toolkit-level override that comes with disclaimers/warnings.

I don't think any of that is blocking, just wanted to give my two cents on the questions which were raised.

@davidlmobley
Copy link
Contributor

I checked the text Simon referenced with Chris and he confirms this is correct.

@j-wags
Copy link
Member

j-wags commented Feb 18, 2022

Thanks for the feedback @davidlmobley, you've resolved the blocking question that we had!

Interestingly, it looks like we largely disagree on the specific questions originally raised by @Yoshanuikabundi. This can easily get bogged down into bikeshedding, so @Yoshanuikabundi please resolve in the following order:

  1. If you have a strong preference to take @davidlmobley's answer instead of mine, do so.
  2. If you don't have a strong preference, take my answer.

I'll see if I can check in with @SimonBoothroyd before he heads out about whether the his merge block should persist after we delete the extended test that he requested removing, and will report back if so.

@Yoshanuikabundi
Copy link
Collaborator Author

OK. I have a strong preference for making make_carboxylic_acids_cis True by default in generate_conformers(), but otherwise don't have strong feelings, so I've made that change.

Actually, maybe I'll reconsider -- "hard but possible" may be the right place to be, as I can see wanting to check how much assigned force field parameters would vary depending on COOH orientation when using WBO interpolation, and it would be substantially harder if the toolkit didn't allow it. So maybe there should be a toolkit-level override that comes with disclaimers/warnings.

The current code in assign_partial_charges() only makes everything cis if it generates conformers. If conformers are provided it'll use them directly. For instance, this code produces trans COOH:

from openff.toolkit.topology import Molecule
from openff.toolkit.utils import OpenEyeToolkitWrapper

mol = Molecule.from_smiles("C(=O)OH", toolkit_registry=OpenEyeToolkitWrapper())
mol.generate_conformers(make_carboxylic_acids_cis=False, toolkit_registry=OpenEyeToolkitWrapper())
mol.assign_partial_charges("am1bccelf10", use_conformers=mol.conformers, toolkit_registry=OpenEyeToolkitWrapper())

mol.visualize("nglview")

Copy link
Member

@j-wags j-wags left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great. Thanks for the back-and-forth, everyone. I've received permission from Simon to remove the block from merging this once the requested code changes were made, so this should be good to go now!

@j-wags j-wags merged commit b1d8e7e into master Feb 24, 2022
@j-wags j-wags deleted the elf10_conformers branch February 24, 2022 00:22
@j-wags j-wags mentioned this pull request Mar 15, 2022
4 tasks
mattwthompson pushed a commit that referenced this pull request Apr 13, 2022
b1d8e7e

Fix trans COOh in Elf10 conformer selection (#1171)

* Add clear error message to openeye apply_elf_conformer_selection in silent failures

* Fix apply_elf_conformer_selection when specifying rdkit wrapper

* Add test for Molecule.apply_elf_conformer_selection with toolkit wrapper

* Add Molecule._make_carboxylic_acids_cis method + tests

* Make COOH cis when generating conformers in assign_partial_charges with openeye toolkit

* Make COOH cis when generating conformers in assign_fractional_bond_orders with openeye toolkit and an ELF10 method

* Update changelog

* Typo in release notes

* Update docs/releasehistory.md

Co-authored-by: Jeff Wagner <jwagnerjpl@gmail.com>

* Improve robustness of make_carboxylic_acids_cis tests

* Remove two safety checks from molecule.py now that they're in test_molecule.py

* Add make_carboxylic_acids_cis switch to rdkit and molecule generate_conformers()

* Add toolkit_registry argument to Molecule._make_carboxylic_acids_cis()

* Remove erroneous comment from _make_carboxylic_acids_cis

* _make_carboxylic_acids_cis: Exit early if molecule has no COOH groups

* Make OE assign methods consistent

* Update openff/toolkit/topology/molecule.py

Co-authored-by: SimonBoothroyd <simon.boothroyd@hotmail.com>

* Switch make_carboxylic_acids_cis test to formic acid

Clarifies test by using simpler molecule and using dummy values to lay out the molecule obviously

Co-authored-by: SimonBoothroyd <simon.boothroyd@hotmail.com>

* Lint and clean up

* Simplify test_make_carboxylic_acids_cis

* generate_conformers makes COOH cis by default

Co-authored-by: Jeff Wagner <jwagnerjpl@gmail.com>
Co-authored-by: SimonBoothroyd <simon.boothroyd@hotmail.com>
mattwthompson added a commit that referenced this pull request Apr 13, 2022
* Port #1171

b1d8e7e

Fix trans COOh in Elf10 conformer selection (#1171)

* Add clear error message to openeye apply_elf_conformer_selection in silent failures

* Fix apply_elf_conformer_selection when specifying rdkit wrapper

* Add test for Molecule.apply_elf_conformer_selection with toolkit wrapper

* Add Molecule._make_carboxylic_acids_cis method + tests

* Make COOH cis when generating conformers in assign_partial_charges with openeye toolkit

* Make COOH cis when generating conformers in assign_fractional_bond_orders with openeye toolkit and an ELF10 method

* Update changelog

* Typo in release notes

* Update docs/releasehistory.md

Co-authored-by: Jeff Wagner <jwagnerjpl@gmail.com>

* Improve robustness of make_carboxylic_acids_cis tests

* Remove two safety checks from molecule.py now that they're in test_molecule.py

* Add make_carboxylic_acids_cis switch to rdkit and molecule generate_conformers()

* Add toolkit_registry argument to Molecule._make_carboxylic_acids_cis()

* Remove erroneous comment from _make_carboxylic_acids_cis

* _make_carboxylic_acids_cis: Exit early if molecule has no COOH groups

* Make OE assign methods consistent

* Update openff/toolkit/topology/molecule.py

Co-authored-by: SimonBoothroyd <simon.boothroyd@hotmail.com>

* Switch make_carboxylic_acids_cis test to formic acid

Clarifies test by using simpler molecule and using dummy values to lay out the molecule obviously

Co-authored-by: SimonBoothroyd <simon.boothroyd@hotmail.com>

* Lint and clean up

* Simplify test_make_carboxylic_acids_cis

* generate_conformers makes COOH cis by default

Co-authored-by: Jeff Wagner <jwagnerjpl@gmail.com>
Co-authored-by: SimonBoothroyd <simon.boothroyd@hotmail.com>

* Manually update actions

Bump upload-artifact

* Fixes for units changes

Co-authored-by: Josh A. Mitchell <yoshanuikabundi@gmail.com>
Co-authored-by: Jeff Wagner <jwagnerjpl@gmail.com>
Co-authored-by: SimonBoothroyd <simon.boothroyd@hotmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OpenEye charging for trans-carboxylic acids will fail; incorporate workaround
4 participants