-
Notifications
You must be signed in to change notification settings - Fork 845
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add MolVS tautomer canonicalization #2886
Add MolVS tautomer canonicalization #2886
Conversation
This isn't optimal in terms of performance, but all the MolVS tests pass.
std::string d_smarts; | ||
std::shared_ptr<ROMol> dp_mol; | ||
smarts_mol_holder(const std::string &smarts) : d_smarts(smarts) { | ||
dp_mol.reset(SmartsToMol(smarts)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we check that this isn't set a null smarts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it's all internal, so no.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's also tested at match time (line 130)
// a note on efficiency here: we'll construct the SubstructTerm objects here | ||
// repeatedly, but the SMARTS parsing for each entry will only be done once | ||
// since we're using the boost::flyweights above to cache them | ||
const std::vector<SubstructTerm> substructureTerms{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Part of me thinks that this structs + score should be passed in to be easier to modify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like you've captured this in the score func though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about adding that option, but then figured it's more straightforward from the API perspective to just leave it out since the user can always provide their own scoring function.
ctaut = enumerator.Canonicalize(m, scorefunc1) | ||
self.assertEqual(Chem.MolToSmiles(ctaut), "OC1=CCCCC1") | ||
ctaut = enumerator.Canonicalize(m, scorefunc2) | ||
self.assertEqual(Chem.MolToSmiles(ctaut), "O=C1CCCCC1") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth writing a function with the wrong API to see what happens :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point. I also added one to make sure/demonstrate that you can use lambdas from Python
(boost.python is absolutely magic)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this aspect of boost is pretty awesome.
During the 2018 GSoC project to do a C++ implementation of MolVS, doing the tautomer enumeration and canonicalization were stretch goals. @susanhleung actually managed to complete the tautomer enumeration, but since canonicalization wasn't complete, we didn't publicize this particularly widely.
This PR does the last bit of work and adds tautomer canonicalization.
Notes to reviewers: