-
Notifications
You must be signed in to change notification settings - Fork 845
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhanced Stereochemistry canonicalization errors #7041
base: master
Are you sure you want to change the base?
Conversation
get the tests passing on linux and the psql results updated
…eudoChiralCanonError
@tadhurst-cdd this looks like a very nice change. It looks like this may go towards fixing an issue I recently reported #7266 |
Addressed changes to fix errors in tests provided by Greg Landrum. There were a couple of fixes, and the code now does NOT throw an error is the enhanced procedure does not work, but simply calls the old canonicalization method |
…eudoChiralCanonError
I know that one concern about the performance of the new rigorousEnhancedStereo functionality in RDKit. I do not think that this is not a major problem. First, the difference between time required to do the canonicalization WITHOUT the new functionality and WITH the new stuff is a comparison between producing incorrect results and producing correct results. I think we really want the correct results. Second, the new stuff does not affect the time required to canonicalize structures that do NOT have enhanced stereochemistry. Less than 4% of the structures our customers have registered in CDD Vault contain enhanced stereochemistry, so the impact is very small. The currently suggested method does this: Enumerates the possible structures that the enhanced notation represents. This method relies heavily on the current functionality for producing canonical smiles for stereo-labeled compounds, and that is the source of computational complexity. It would be possible to have the new method NOT actually generate and subsequently parse the canonical smiles, but the work of canonicalization would still need to be done. I doubt that any substantial improvement in performance could be made. One possible change to the method might be to produce, more directly, a list of mols by reordering the atoms and bonds according to the canonical atom rankings. It would be necessary to be able to compare and sort these mols to produce a unique list. I am interested in other thoughts and suggestions. tad |
Reference Issue
Enhanced Stereochemistry canonicalization errors
What does this implement/fix? Explain your changes.
Many compounds can be formulated as smiles with different enhanced stereochemistry specification but are actually the same compound. For example:
These are all the same, but without the new code, generate different canonical smiles
These are also the same:
Any other comments?