Refactor the codebase to use the OpenFF toolkit #68

SimonBoothroyd · 2021-03-29T16:46:13Z

Description

This is the main PR which tracks the conversion of the codebase to use the OpenFF toolkit, rather than OE directly. The conversion will happen over several PRs so that the changes being made can be tracked more cleanly.

The discussed way forward for this is to identify which parts of the codebase are necessary to downstream dependencies and only convert those portions over. All other parts of the codebase will then be removed. Currently it seems like the WBOFragmenter and PfizerFragmenter classes and all of the methods they call are the ones to retrain.

Possible Behaviour Changes

The code produced by the updated find_torsion_around_bond (and hence to_torsiondrive_json which is the only function which uses it) is not guaranteed to return the same result after the refactor. See Refactor torsion module to use the OFF toolkit #77 for details.
While the toolkit should give very similar WBOs to the pure OE approach this may not be guaranteed in all cases - it seems that the older versions of OE will produce slightly different WBOs for the same input conformer when computing them using a newer OE version but the same input conformer. See Refactor the chemi module to use the OFF toolkit #80 for details.
The number of conformers generated by the chemi.generate_conformers function may be less after the refactor. It seems that setting omega.SetCanonOrder(True) (old behaviour) will lead to more conformers than setting it to false (OpenFF toolkit behaviour). See Refactor the chemi module to use the OFF toolkit #80 for this change.
Ortho groups are now detected using SMARTS matching approach. While care was taken to try and ensure that the two approaches are equivalent, the implementations are different enough that subtle differences may have creeped in. See Refactor the ring sytem and ortho group detection #83.
When expanding fragments to include the functional groups an atom is in all functional groups the atom appears in are now considered. Previously only the last group and atom assigned to would be considered given that only a single 'fgroup' could be stored in that atoms data. See Refactor the functional group detection to use the OFF toolkit #84, Refactor WBO fragment building to use the OFF toolkit #88 and Handle atoms in multiple groups when finding ring systems #91.
The WBOFragmenter._build_fragment and child methods were pretty substantially rewritten which may lead to operations being performed and new atoms to be considered in slightly different orders leading to different fragments. See Refactor WBO fragment building to use the OFF toolkit #88.
The OE code which extract the fragment molecule from the parent based on a set of atom and bond indices was modified to include hydrogens which are bound to atoms to include in the substructure search. This is so that those in the final fragment will retain their map indices which should be more helpful and will also more closely match of the output of the RDKit implementation. See Move fragment from indices to chemi and add RDKit variant #89.
The indexing of some rings may change after Replace uses of OEMol with OFF Molecule object #90 when using OpenEye due to differences in how the OpenFF toolkit canonically orders OE molecules.

Questions

RDKit does not currently treat pyramidal nitrogens as stereogenic while OE does which will lead to to the two toolkits expecting / generating different stereoisomers. Will this cause issues when fragmenting? (see also Refactor stereo enumeration to use the OFF toolkit #76)

Notes

While care has been taken to ensure that the code which builds a fragment molecule from a parent molecule and a set of atom and bond subset indices it is possible that the RDKit and OE approaches (Move fragment from indices to chemi and add RDKit variant #89) may yield different fragment molecules depending on how they perceive atom valences. This has not yet been observed in the small set of unit tests, but may manifest itself when moving to the regression molecule set.

Status

Ready to go

jthorton

Thanks for tackling this @SimonBoothroyd, the changes seem easy to follow and look good. I am available to regression test these changes against the JACS set of ligands which I have fragmented using both options before just let me know when you are ready for me to test it.

I also just wanted to check how I would extract the results of the fragmentation now, before in QCSubmit and Bespoke-fit I used the to_torsiondrive_json of the fragmentation class to get the dihedral that should be scanned in the fragment, but now it looks like you are using atom maps. Does this mean that the bond tuple in Fragmenter.fragments is also the indices of the target bond in the fragment?

jthorton · 2021-04-13T11:05:19Z

fragmenter/chemi.py


+    fragment_smiles = Chem.MolFragmentToSmiles(rd_molecule, atoms_to_use, bonds_to_use)


Thanks for introducing me to this method, this seems really useful. WRT atom maps I see by default in the docs this method will attempt to put the molecule in canonical order, I was thinking this could affect the atom map should this be set to false? As far as the openeye equivalent goes it does not mention changing the ordering which made me think this might not be what we want.

I was thinking this could affect the atom map should this be set to false? As far as the openeye equivalent goes it does not mention changing the ordering which made me think this might not be what we want.

Hmm thanks for pointing that out! I think because the map is explicitly included in the produced SMILES pattern it shouldn't be affected by a canonical re-ordering, although there doesn't seem to be a downside to disabling the canonical ordering so I will try this out.

SimonBoothroyd · 2021-04-14T15:05:59Z

Ok I have finished making the changes that I would like to make and this PR is now ready for regression testing (cc @jthorton)

@j-wags, @mattwthompson, and @lilyminium if you would like to review this then could you please provide any further feedback before the end of this week?

lilyminium

Thank you @SimonBoothroyd, going through this was very helpful in understanding how fragmenter works :-) I just had a some small questions about Python decisions you made.

fragmenter/states.py

fragmenter/utils.py

fragmenter/chemi.py

fragmenter/fragment.py

fragmenter/chemi.py

mattwthompson · 2021-04-16T15:27:11Z

Sorry, it looks like I won't have time to give this a useful review by the deadline.

j-wags · 2021-04-16T15:42:58Z

Same. Thanks @SimonBoothroyd for looping me in on the timeline, and to @lilyminium for combing through and doing a great review! Looking forward to trying this out after the release.

fragmenter/fragment.py

Co-authored-by: Lily Wang <31115101+lilyminium@users.noreply.github.com>

SimonBoothroyd · 2021-04-20T14:59:07Z

Thank you @lilyminium for the thorough and thoughtful review - I think I've made most of the changes you recommended!

SimonBoothroyd · 2021-04-20T15:00:16Z

Unless either @j-wags or @ChayaSt have any objections I will merge this into master first thing tomorrow morning (GMT) to save this PR from growing even larger.

j-wags · 2021-04-20T16:53:26Z

No objections from me. Thanks for spearheading this!

SimonBoothroyd added 4 commits March 29, 2021 17:02

Update the gitignore file

d9f4b14

Update the repo to a more recent CMS layout and use GHA (#69)

af3be40

Remove code not used externally (#70)

82b9a57

Lint the codebase with isort, black, flake8 (#71)

f648c98

openforcefield deleted a comment from lgtm-com bot Mar 29, 2021

SimonBoothroyd added 2 commits March 30, 2021 09:14

Replace the internal logger method with standard logger (#72)

a40005f

Remove the unused unit conversion constants (#73)

5a0e04a

openforcefield deleted a comment from lgtm-com bot Mar 30, 2021

Add utilities to access the fgroup smarts (#74)

aa018b5

openforcefield deleted a comment from lgtm-com bot Mar 30, 2021

Remove duplicate code from the Pfizer fragmenter (#75)

005c85b

SimonBoothroyd mentioned this pull request Mar 30, 2021

Refactor stereo enumeration to use the OFF toolkit #76

Merged

1 task

Refactor stereo enumeration to use the OFF toolkit (#76)

01a17d7

openforcefield deleted a comment from lgtm-com bot Mar 31, 2021

SimonBoothroyd added 2 commits April 5, 2021 11:29

Install openff-toolkit from GitHub until release made

842859c

Enable RDKit stereo enumeration test after TK fix

e70f600

openforcefield deleted a comment from lgtm-com bot Apr 5, 2021

Refactor torsion module to use the OFF toolkit (#77)

3d146d3

SimonBoothroyd added the refactor label Apr 5, 2021

SimonBoothroyd added 9 commits April 5, 2021 14:42

Add find ring systems utility (#78)

c903884

Refactor non-wbo code into fragmenter base class (#79)

62f4746

Refactor the chemi module to use the OFF toolkit (#80)

dfeb7b1

Isolate internal use of atom and bond sets (#81)

e2e4dab

Refactor stereo checks / fixes to use the OFF toolkit (#82)

6ba9be9

Refactor the ring sytem and ortho group detection (#83)

b7994db

Refactor the FG detection to use the OFF toolkit (#84)

fa8cc00

Refactor torsion quartet code to use the OFF toolkit (#85)

55d786f

Refactor valence capping to use the OFF toolkit (#86)

385f50b

SimonBoothroyd linked an issue Apr 13, 2021 that may be closed by this pull request

Ideas for future refactor #60

Closed

SimonBoothroyd mentioned this pull request Apr 13, 2021

Return parent fragment mapping #59

Closed

SimonBoothroyd linked an issue Apr 13, 2021 that may be closed by this pull request

Return parent fragment mapping #59

Closed

SimonBoothroyd mentioned this pull request Apr 13, 2021

Charging molecules while only returning the original coordinates returns a wonky molecule #55

Closed

SimonBoothroyd linked an issue Apr 13, 2021 that may be closed by this pull request

Charging molecules while only returning the original coordinates returns a wonky molecule #55

Closed

jthorton reviewed Apr 13, 2021

View reviewed changes

SimonBoothroyd mentioned this pull request Apr 13, 2021

Support openforcefield.topology.Molecule? #50

Closed

SimonBoothroyd linked an issue Apr 13, 2021 that may be closed by this pull request

Support openforcefield.topology.Molecule? #50

Closed

jthorton reviewed Apr 13, 2021

View reviewed changes

Fix the CI name to reflect if OE is tested

ba12c07

SimonBoothroyd mentioned this pull request Apr 13, 2021

Remove the torsiondrive input generation code #94

Merged

1 task

SimonBoothroyd added 3 commits April 14, 2021 09:17

Remove the torsiondrive input generation code (#94)

88cdd60

Add simple HTML and SVG based fragment depiction (#95)

a21d9f2

Update the documentation after the refactor (#96)

48eb478

SimonBoothroyd marked this pull request as ready for review April 14, 2021 15:03

lilyminium reviewed Apr 15, 2021

View reviewed changes

lilyminium reviewed Apr 16, 2021

View reviewed changes

fragmenter/fragment.py Outdated Show resolved Hide resolved

lilyminium reviewed Apr 19, 2021

View reviewed changes

fragmenter/fragment.py Outdated Show resolved Hide resolved

lilyminium reviewed Apr 19, 2021

View reviewed changes

fragmenter/fragment.py Outdated Show resolved Hide resolved

SimonBoothroyd and others added 2 commits April 20, 2021 15:51

Apply suggestions from code review

a45d252

Co-authored-by: Lily Wang <31115101+lilyminium@users.noreply.github.com>

Apply additional @lilyminium review suggestions

014af99

SimonBoothroyd merged commit add6eba into master Apr 21, 2021

SimonBoothroyd deleted the refactor-main branch April 21, 2021 09:14

SimonBoothroyd mentioned this pull request Apr 21, 2021

Re-enable GHA on master #100

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor the codebase to use the OpenFF toolkit #68

Refactor the codebase to use the OpenFF toolkit #68

SimonBoothroyd commented Mar 29, 2021 •

edited

Loading

jthorton left a comment

jthorton Apr 13, 2021

SimonBoothroyd Apr 14, 2021

SimonBoothroyd commented Apr 14, 2021

lilyminium left a comment

mattwthompson commented Apr 16, 2021

j-wags commented Apr 16, 2021

SimonBoothroyd commented Apr 20, 2021

SimonBoothroyd commented Apr 20, 2021

j-wags commented Apr 20, 2021


		fragment_smiles = Chem.MolFragmentToSmiles(rd_molecule, atoms_to_use, bonds_to_use)

Refactor the codebase to use the OpenFF toolkit #68

Refactor the codebase to use the OpenFF toolkit #68

Conversation

SimonBoothroyd commented Mar 29, 2021 • edited Loading

Description

Possible Behaviour Changes

Questions

Notes

Status

jthorton left a comment

Choose a reason for hiding this comment

jthorton Apr 13, 2021

Choose a reason for hiding this comment

SimonBoothroyd Apr 14, 2021

Choose a reason for hiding this comment

SimonBoothroyd commented Apr 14, 2021

lilyminium left a comment

Choose a reason for hiding this comment

mattwthompson commented Apr 16, 2021

j-wags commented Apr 16, 2021

SimonBoothroyd commented Apr 20, 2021

SimonBoothroyd commented Apr 20, 2021

j-wags commented Apr 20, 2021

SimonBoothroyd commented Mar 29, 2021 •

edited

Loading