Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MolVS tautomer canonicalization #2886

Merged

Conversation

greglandrum
Copy link
Member

During the 2018 GSoC project to do a C++ implementation of MolVS, doing the tautomer enumeration and canonicalization were stretch goals. @susanhleung actually managed to complete the tautomer enumeration, but since canonicalization wasn't complete, we didn't publicize this particularly widely.

This PR does the last bit of work and adds tautomer canonicalization.

Notes to reviewers:

  • the goal of this first merge is to implement the tautomer scoring and canonicalization schemes that are used in MolVS. Once we have that in place the arguments can start (if really necessary) about whether or not we want a different/modified default scoring scheme.
  • I will be probably be adding some additional tests over the next couple of days, those won't the design or (hopefully) scoring/canonicalization code itself, so it should be fine to start looking at this.

@greglandrum greglandrum added this to the 2020_03_1 milestone Jan 16, 2020
std::string d_smarts;
std::shared_ptr<ROMol> dp_mol;
smarts_mol_holder(const std::string &smarts) : d_smarts(smarts) {
dp_mol.reset(SmartsToMol(smarts));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check that this isn't set a null smarts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it's all internal, so no.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also tested at match time (line 130)

// a note on efficiency here: we'll construct the SubstructTerm objects here
// repeatedly, but the SMARTS parsing for each entry will only be done once
// since we're using the boost::flyweights above to cache them
const std::vector<SubstructTerm> substructureTerms{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of me thinks that this structs + score should be passed in to be easier to modify.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you've captured this in the score func though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about adding that option, but then figured it's more straightforward from the API perspective to just leave it out since the user can always provide their own scoring function.

ctaut = enumerator.Canonicalize(m, scorefunc1)
self.assertEqual(Chem.MolToSmiles(ctaut), "OC1=CCCCC1")
ctaut = enumerator.Canonicalize(m, scorefunc2)
self.assertEqual(Chem.MolToSmiles(ctaut), "O=C1CCCCC1")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth writing a function with the wrong API to see what happens :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. I also added one to make sure/demonstrate that you can use lambdas from Python
(boost.python is absolutely magic)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this aspect of boost is pretty awesome.

@bp-kelley bp-kelley merged commit f8a4020 into rdkit:master Jan 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants