Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible bug with EnumerateStereoisomers #4144

Closed
stephanielabouille opened this issue May 17, 2021 · 1 comment
Closed

Possible bug with EnumerateStereoisomers #4144

stephanielabouille opened this issue May 17, 2021 · 1 comment
Labels
Milestone

Comments

@stephanielabouille
Copy link

Describe the bug
Calling EnumerateStereoisomers on some imines seems to alter the rings of the molecule.

To Reproduce

from rdkit.Chem import MolFromSmiles, MolToSmiles, rdmolops
from rdkit.Chem.EnumerateStereoisomers import EnumerateStereoisomers


mol = MolFromSmiles('CSCc1cnc(C=Nn2c(C)nc3sc4c(c3c2=O)CCCCC4)s1')
ssr = rdmolops.GetSymmSSSR(mol) 
for ring in ssr:
    print(len(ring))

stereo_mols = EnumerateStereoisomers(mol)
for m in stereo_mols:
    print(MolToSmiles(m))
    ssr = rdmolops.GetSymmSSSR(m) 
    for ring in ssr:
        print(len(ring))

The initial mol contains 4 rings, of length 5, 5, 6, 7. After EnumerateStereoisomers, the list contains 4 rings, of length 5, 5, 7, 9. The representation of the mol looks strange as well (see snapshot below). MolToSmiles of the stereo mols return the expected stereo smiles CSCc1cnc(/C=N/n2c(C)nc3sc4c(c3c2=O)CCCCC4)s1 and CSCc1cnc(/C=N\n2c(C)nc3sc4c(c3c2=O)CCCCC4)s1.

Screenshots
Execution of the snippet:
image

One of the stereo mols:
image

I encountered another problem, which is unrelated to the ring issue. When loading the second stereo smiles (CSCc1cnc(/C=N\n2c(C)nc3sc4c(c3c2=O)CCCCC4)s1) as a new mol, it yields None. Adding a second \ in the smiles (CSCc1cnc(/C=N\\n2c(C)nc3sc4c(c3c2=O)CCCCC4)s1) makes it valid. What would be your recommendation to handle such cases?
image

Configuration (please complete the following information):

  • RDKit version: 2021.03.1
  • OS: macOS Mojave
  • Python version (if relevant): 3.7.9
  • Are you using conda? yes
  • If you are using conda, which channel did you install the rdkit from? conda install -y -c conda-forge rdkit
@greglandrum
Copy link
Member

greglandrum commented May 18, 2021

For the first part: The problem here is that EnumerateStereoisomers() is clearing the ring information before calling AssignStereochemistry(). In order to be efficient, AssignStereochemistry() calls FastFindRings(), which doens't find the symmetric SSSR set. The next call to GetSymmSSR() then re-uses the information from FastFindRings().
A quick workaround until this is fixed and a new RDKit version is available would be to call m.ClearComputedProps() before the call to rdmolops.GetSymmSSSR(m)

For the second part:
the backslashes in the string literals are being interpreted by python as quoting characters. Here's a stackoverflow thread about that: https://stackoverflow.com/questions/301068/quoting-backslashes-in-python-string-literals
The short answer is that you can either double the backslashes (as you already discovered) or use a raw string:
Chem.MolFromSmiles(r'CSCc1cnc(/C=N\n2c(C)nc3sc4c(c3c2=O)CCCCC4)s1')

@greglandrum greglandrum added this to the 2021_03_3 milestone May 18, 2021
greglandrum added a commit to greglandrum/rdkit that referenced this issue May 25, 2021
greglandrum added a commit that referenced this issue Jun 9, 2021
* Fixes #4144

* update the call in EnumerateSteroisomers
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants