-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor the codebase to use the OpenFF toolkit #68
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for tackling this @SimonBoothroyd, the changes seem easy to follow and look good. I am available to regression test these changes against the JACS set of ligands which I have fragmented using both options before just let me know when you are ready for me to test it.
I also just wanted to check how I would extract the results of the fragmentation now, before in QCSubmit and Bespoke-fit I used the to_torsiondrive_json
of the fragmentation class to get the dihedral that should be scanned in the fragment, but now it looks like you are using atom maps. Does this mean that the bond tuple in Fragmenter.fragments
is also the indices of the target bond in the fragment?
|
||
fragment_smiles = Chem.MolFragmentToSmiles(rd_molecule, atoms_to_use, bonds_to_use) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for introducing me to this method, this seems really useful. WRT atom maps I see by default in the docs this method will attempt to put the molecule in canonical order, I was thinking this could affect the atom map should this be set to false? As far as the openeye equivalent goes it does not mention changing the ordering which made me think this might not be what we want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking this could affect the atom map should this be set to false? As far as the openeye equivalent goes it does not mention changing the ordering which made me think this might not be what we want.
Hmm thanks for pointing that out! I think because the map is explicitly included in the produced SMILES pattern it shouldn't be affected by a canonical re-ordering, although there doesn't seem to be a downside to disabling the canonical ordering so I will try this out.
Ok I have finished making the changes that I would like to make and this PR is now ready for regression testing (cc @jthorton) @j-wags, @mattwthompson, and @lilyminium if you would like to review this then could you please provide any further feedback before the end of this week? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @SimonBoothroyd, going through this was very helpful in understanding how fragmenter works :-) I just had a some small questions about Python decisions you made.
Sorry, it looks like I won't have time to give this a useful review by the deadline. |
Same. Thanks @SimonBoothroyd for looping me in on the timeline, and to @lilyminium for combing through and doing a great review! Looking forward to trying this out after the release. |
Co-authored-by: Lily Wang <31115101+lilyminium@users.noreply.github.com>
Thank you @lilyminium for the thorough and thoughtful review - I think I've made most of the changes you recommended! |
No objections from me. Thanks for spearheading this! |
Description
This is the main PR which tracks the conversion of the codebase to use the OpenFF toolkit, rather than OE directly. The conversion will happen over several PRs so that the changes being made can be tracked more cleanly.
The discussed way forward for this is to identify which parts of the codebase are necessary to downstream dependencies and only convert those portions over. All other parts of the codebase will then be removed. Currently it seems like the
WBOFragmenter
andPfizerFragmenter
classes and all of the methods they call are the ones to retrain.Possible Behaviour Changes
The code produced by the updated
find_torsion_around_bond
(and henceto_torsiondrive_json
which is the only function which uses it) is not guaranteed to return the same result after the refactor. See Refactor torsion module to use the OFF toolkit #77 for details.While the toolkit should give very similar WBOs to the pure OE approach this may not be guaranteed in all cases - it seems that the older versions of OE will produce slightly different WBOs for the same input conformer when computing them using a newer OE version but the same input conformer. See Refactor the chemi module to use the OFF toolkit #80 for details.
The number of conformers generated by the
chemi.generate_conformers
function may be less after the refactor. It seems that settingomega.SetCanonOrder(True)
(old behaviour) will lead to more conformers than setting it to false (OpenFF toolkit behaviour). See Refactor the chemi module to use the OFF toolkit #80 for this change.Ortho groups are now detected using SMARTS matching approach. While care was taken to try and ensure that the two approaches are equivalent, the implementations are different enough that subtle differences may have creeped in. See Refactor the ring sytem and ortho group detection #83.
When expanding fragments to include the functional groups an atom is in all functional groups the atom appears in are now considered. Previously only the last group and atom assigned to would be considered given that only a single 'fgroup' could be stored in that atoms data. See Refactor the functional group detection to use the OFF toolkit #84, Refactor WBO fragment building to use the OFF toolkit #88 and Handle atoms in multiple groups when finding ring systems #91.
The
WBOFragmenter._build_fragment
and child methods were pretty substantially rewritten which may lead to operations being performed and new atoms to be considered in slightly different orders leading to different fragments. See Refactor WBO fragment building to use the OFF toolkit #88.The OE code which extract the fragment molecule from the parent based on a set of atom and bond indices was modified to include hydrogens which are bound to atoms to include in the substructure search. This is so that those in the final fragment will retain their map indices which should be more helpful and will also more closely match of the output of the RDKit implementation. See Move fragment from indices to chemi and add RDKit variant #89.
The indexing of some rings may change after Replace uses of OEMol with OFF Molecule object #90 when using OpenEye due to differences in how the OpenFF toolkit canonically orders OE molecules.
Questions
Notes
Status