Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EnumerateStereoisomers fail with STEREOANY bonds from molblock #3759

Closed
TermeHansen opened this issue Jan 22, 2021 · 6 comments · Fixed by #4272
Closed

EnumerateStereoisomers fail with STEREOANY bonds from molblock #3759

TermeHansen opened this issue Jan 22, 2021 · 6 comments · Fixed by #4272
Labels
Milestone

Comments

@TermeHansen
Copy link

This issue was raied by #2890, and the example there now works, but I have run into another case where it happens when I read in the molecule from a molblock. I have tried to look at atom and bond properties, but I can not see why it is??

from rdkit import Chem
from rdkit.Chem.EnumerateStereoisomers import EnumerateStereoisomers

mol = Chem.MolFromSmiles('C=CC(C)CCC')
mol = Chem.AddHs(mol)
mol.GetBondWithIdx(0).SetStereo(Chem.rdchem.BondStereo.STEREOANY)
for bond in mol.GetBonds():
    print(bond.GetStereo())
for atom in mol.GetAtoms():
    print(atom.GetChiralTag())
list(EnumerateStereoisomers(mol))
STEREOANY
STEREONONE
STEREONONE
STEREONONE
STEREONONE
STEREONONE
STEREONONE
STEREONONE
STEREONONE
STEREONONE
STEREONONE
STEREONONE
STEREONONE
STEREONONE
STEREONONE
STEREONONE
STEREONONE
STEREONONE
STEREONONE
STEREONONE
CHI_UNSPECIFIED
CHI_UNSPECIFIED
CHI_UNSPECIFIED
CHI_UNSPECIFIED
CHI_UNSPECIFIED
CHI_UNSPECIFIED
CHI_UNSPECIFIED
CHI_UNSPECIFIED
CHI_UNSPECIFIED
CHI_UNSPECIFIED
CHI_UNSPECIFIED
CHI_UNSPECIFIED
CHI_UNSPECIFIED
CHI_UNSPECIFIED
CHI_UNSPECIFIED
CHI_UNSPECIFIED
CHI_UNSPECIFIED
CHI_UNSPECIFIED
CHI_UNSPECIFIED
CHI_UNSPECIFIED
CHI_UNSPECIFIED
[<rdkit.Chem.rdchem.Mol at 0x7f2c61d6bdb0>,
 <rdkit.Chem.rdchem.Mol at 0x7f2c61d6bf30>]
mb = '''
     RDKit          3D

 21 20  0  0  0  0  0  0  0  0999 V2000
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  2  3
  2  3  1  0
  3  4  1  0
  3  5  1  0
  5  6  1  0
  6  7  1  0
  1  8  1  0
  1  9  1  0
  2 10  1  0
  3 11  1  0
  4 12  1  0
  4 13  1  0
  4 14  1  0
  5 15  1  0
  5 16  1  0
  6 17  1  0
  6 18  1  0
  7 19  1  0
  7 20  1  0
  7 21  1  0
M  END'''

mol = Chem.MolFromMolBlock(Utils.SDFstrip3d(mb))
mol = Chem.AddHs(mol)
for bond in mol.GetBonds():
    print(bond.GetStereo())
for atom in mol.GetAtoms():
    print(atom.GetChiralTag())
list(EnumerateStereoisomers(mol))
STEREOANY
STEREONONE
STEREONONE
STEREONONE
STEREONONE
STEREONONE
STEREONONE
STEREONONE
STEREONONE
STEREONONE
STEREONONE
STEREONONE
STEREONONE
STEREONONE
STEREONONE
STEREONONE
STEREONONE
STEREONONE
STEREONONE
STEREONONE
CHI_UNSPECIFIED
CHI_UNSPECIFIED
CHI_UNSPECIFIED
CHI_UNSPECIFIED
CHI_UNSPECIFIED
CHI_UNSPECIFIED
CHI_UNSPECIFIED
CHI_UNSPECIFIED
CHI_UNSPECIFIED
CHI_UNSPECIFIED
CHI_UNSPECIFIED
CHI_UNSPECIFIED
CHI_UNSPECIFIED
CHI_UNSPECIFIED
CHI_UNSPECIFIED
CHI_UNSPECIFIED
CHI_UNSPECIFIED
CHI_UNSPECIFIED
CHI_UNSPECIFIED
CHI_UNSPECIFIED
CHI_UNSPECIFIED

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-40-1522346d6a2a> in <module>()
      8 #    print(atom.GetPropsAsDict())
      9     print(atom.GetChiralTag())
---> 10 list(EnumerateStereoisomers(mol))

/home/hafnium/miniconda3/lib/python3.7/site-packages/rdkit/Chem/EnumerateStereoisomers.py in EnumerateStereoisomers(m, options, verbose)
    322     for i in range(nCenters):
    323       flag = bool(bitflag & (1 << i))
--> 324       flippers[i].flip(flag)
    325     isomer = Chem.Mol(tm)
    326     Chem.SetDoubleBondNeighborDirections(isomer)

/home/hafnium/miniconda3/lib/python3.7/site-packages/rdkit/Chem/EnumerateStereoisomers.py in flip(self, flag)
     46       self.bond.SetStereo(Chem.BondStereo.STEREOCIS)
     47     else:
---> 48       self.bond.SetStereo(Chem.BondStereo.STEREOTRANS)
     49 
     50 

RuntimeError: Pre-condition Violation
	Stereo atoms should be specified before specifying CIS/TRANS bond stereochemistry
	Violation occurred on line 286 in file Code/GraphMol/Bond.h
	Failed Expression: what <= STEREOE || getStereoAtoms().size() == 2
	RDKIT: 2020.09.4
	BOOST: 1_74

rdkit from conda-forge on ubuntu 20.04

@ricrogz
Copy link
Contributor

ricrogz commented Jan 22, 2021

@TermeHansen, the issue there is that EnumerateStereoisomers() expects STEREOANY bonds to have reference Stereo Atoms set.

Also, please note that in your example the bond you are marking cannot be a stereobond, since the atom on the left has two identical neighbors (H atoms).

This works as expected (please note the extra Cl atom and the updated indexes):

mol = Chem.MolFromSmiles('C(Cl)=CC(C)CCC')
mol = Chem.AddHs(mol)
mol.GetBondWithIdx(1).SetStereo(Chem.rdchem.BondStereo.STEREOANY)
mol.GetBondWithIdx(1).SetStereoAtoms(1,3)
list(EnumerateStereoisomers(mol))

@TermeHansen
Copy link
Author

@ricrogz you are very right about my weird choise of case, but it was just the molecule I was working on when I saw this.

the SetStereoAtoms was definitely the part I was missing, now using your case, I am still puzzle of why it will not flip the double bound when coming from a molblock.

from rdkit import Chem
from rdkit.Chem.EnumerateStereoisomers import EnumerateStereoisomers

mol = Chem.MolFromSmiles('C(Cl)=CC(C)CCC')
display(mol)
mol.GetBondWithIdx(1).SetStereo(Chem.rdchem.BondStereo.STEREOANY)
mol.GetBondWithIdx(1).SetStereoAtoms(1,3)
sdf = Chem.MolToMolBlock(mol)
for m in EnumerateStereoisomers(mol):
    print(Chem.MolToSmiles(m))
CCC[C@@H](C)/C=C/Cl
CCC[C@H](C)/C=C/Cl
CCC[C@@H](C)/C=C\Cl
CCC[C@H](C)/C=C\Cl
molb = Chem.MolFromMolBlock(sdf)
display(molb)
molb.GetBondWithIdx(1).SetStereo(Chem.rdchem.BondStereo.STEREOANY)
molb.GetBondWithIdx(1).SetStereoAtoms(1,3)
print(sdf)
for m in EnumerateStereoisomers(molb):
    print(Chem.MolToSmiles(m))

     RDKit          2D

  8  7  0  0  0  0  0  0  0  0999 V2000
    2.5981    3.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.5981    4.5000    0.0000 Cl  0  0  0  0  0  0  0  0  0  0  0  0
    1.2990    2.2500    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.2990    0.7500    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.5981   -0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    3.8971    0.7500    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    5.1962   -0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0
  1  3  2  3
  3  4  1  0
  4  5  1  0
  4  6  1  0
  6  7  1  0
  7  8  1  0
M  END

CCC[C@@H](C)C=CCl
CCC[C@H](C)C=CCl

@ricrogz
Copy link
Contributor

ricrogz commented Jan 22, 2021

Interesting. It seems to happen because of the Mol parser setting the EITHERDOUBLE direction on the bond:

case 3: // "either" double bond
res->setBondDir(Bond::EITHERDOUBLE);
res->setStereo(Bond::STEREOANY);
break;

Apparently, EnumerateStereoisomers() doesn't know how to handle that. Once you remove it, it works fine. Based on the second part of your example:

molb = Chem.MolFromMolBlock(sdf)
molb.GetBondWithIdx(1).SetBondDir(Chem.BondDir.NONE)
print(sdf)
for m in EnumerateStereoisomers(molb):
    print(Chem.MolToSmiles(m))
    
 =============================
 
 
     RDKit          2D

  8  7  0  0  0  0  0  0  0  0999 V2000
    2.5981    3.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.5981    4.5000    0.0000 Cl  0  0  0  0  0  0  0  0  0  0  0  0
    1.2990    2.2500    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.2990    0.7500    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.5981   -0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    3.8971    0.7500    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    5.1962   -0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0
  1  3  2  3
  3  4  1  0
  4  5  1  0
  4  6  1  0
  6  7  1  0
  7  8  1  0
M  END

CCC[C@@H](C)/C=C/Cl
CCC[C@H](C)/C=C/Cl
CCC[C@@H](C)/C=C\Cl
CCC[C@H](C)/C=C\Cl

@ricrogz
Copy link
Contributor

ricrogz commented Jan 22, 2021

Actually, the result that you are getting is correct:

molb = Chem.MolFromMolBlock(sdf)
bond = molb.GetBondWithIdx(1)
bond.SetStereo(Chem.rdchem.BondStereo.STEREOANY)
bond.SetStereoAtoms(1,3)
print(sdf)
for m in EnumerateStereoisomers(molb):
    print(Chem.MolToSmiles(m), bond.GetBondDir(), bond.GetStereo())
 =============================

     RDKit          2D

  8  7  0  0  0  0  0  0  0  0999 V2000
    2.5981    3.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.5981    4.5000    0.0000 Cl  0  0  0  0  0  0  0  0  0  0  0  0
    1.2990    2.2500    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.2990    0.7500    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.5981   -0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    3.8971    0.7500    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    5.1962   -0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0
  1  3  2  3
  3  4  1  0
  4  5  1  0
  4  6  1  0
  6  7  1  0
  7  8  1  0
M  END

CCC[C@@H](C)C=CCl EITHERDOUBLE STEREOANY
CCC[C@H](C)C=CCl EITHERDOUBLE STEREOANY

The double bond is still tagged as EITHERDOUBLE/STEREOANY, so technically, the enumeration is correct, only that the bond is not enumerated into each of the CIS/TRANS possibilites.

@TermeHansen
Copy link
Author

TermeHansen commented Jan 23, 2021

@ricrogz thank you very much for these pointers on how to handle this, and yes you are actually right that to some extend it is nice to be able to decide if STEREOANY also should be enumerated by the function - though I don't think it was the intention of the current implementation, and probably better with a flag in the function.

for reference how I handle this now:

def get_isomers(mol, flip_double_bonds=True):
    for bond in mol.GetBonds():
        if bond.GetStereo() == Chem.rdchem.BondStereo.STEREOANY:
            if flip_double_bonds:
                bond.SetBondDir(Chem.BondDir.NONE)
            else:
                bond.SetStereoAtoms(bond.GetBeginAtomIdx()+1, bond.GetEndAtomIdx()+1)
    return EnumerateStereoisomers(mol)

print('flip')
for isomer in get_isomers(Chem.MolFromMolBlock(sdf), flip_double_bonds=True):
    print(Chem.MolToSmiles(isomer))
print('no flip')
for isomer in get_isomers(Chem.MolFromMolBlock(sdf), flip_double_bonds=False):
    print(Chem.MolToSmiles(isomer))

flip
CCC[C@@H](C)/C=C/Cl
CCC[C@H](C)/C=C/Cl
CCC[C@@H](C)/C=C\Cl
CCC[C@H](C)/C=C\Cl
no flip
CCC[C@@H](C)C=CCl
CCC[C@H](C)C=CCl

@ricrogz
Copy link
Contributor

ricrogz commented Jan 25, 2021

@TermeHansen, I think I made a mistake in my first message: EnumerateStereoisomers() expects to be able to find Stereo Atoms for any valid `STEREOANY bonds to be enumerated.

This means that in the example I posted we do not need to set stereo atoms explicitly, as the SMILES I proposed contains a potentially valid stereo bond:

from rdkit import Chem
from rdkit.Chem.EnumerateStereoisomers import EnumerateStereoisomers

mol = Chem.MolFromSmiles('C(Cl)=CC(C)CCC')
mol = Chem.AddHs(mol)
mol.GetBondWithIdx(1).SetStereo(Chem.rdchem.BondStereo.STEREOANY)

for m in EnumerateStereoisomers(mol):
    print(Chem.MolToSmiles(m))
===============================
[H]/C(Cl)=C(/[H])[C@]([H])(C([H])([H])[H])C([H])([H])C([H])([H])C([H])([H])[H]
[H]/C(Cl)=C(/[H])[C@@]([H])(C([H])([H])[H])C([H])([H])C([H])([H])C([H])([H])[H]
[H]/C(Cl)=C(\[H])[C@]([H])(C([H])([H])[H])C([H])([H])C([H])([H])C([H])([H])[H]
[H]/C(Cl)=C(\[H])[C@@]([H])(C([H])([H])[H])C([H])([H])C([H])([H])C([H])([H])[H]

In fact, in the example we don't even need to mark the bond as STEREOANY: the double bond is potentially stereogenic, and no specific stereo information is given, so it will be marked as STEREONONE by the Smiles Parser, which is also enumerated:

from rdkit import Chem
from rdkit.Chem.EnumerateStereoisomers import EnumerateStereoisomers

mol = Chem.MolFromSmiles('C(Cl)=CC(C)CCC')
mol = Chem.AddHs(mol)

for m in EnumerateStereoisomers(mol):
    print(Chem.MolToSmiles(m))
===============================
[H]/C(Cl)=C(/[H])[C@]([H])(C([H])([H])[H])C([H])([H])C([H])([H])C([H])([H])[H]
[H]/C(Cl)=C(/[H])[C@@]([H])(C([H])([H])[H])C([H])([H])C([H])([H])C([H])([H])[H]
[H]/C(Cl)=C(\[H])[C@]([H])(C([H])([H])[H])C([H])([H])C([H])([H])C([H])([H])[H]
[H]/C(Cl)=C(\[H])[C@@]([H])(C([H])([H])[H])C([H])([H])C([H])([H])C([H])([H])[H]

These things were only required to have your example molecule enumerated (in a hacky way), since the bond couldn't be a stereo bond.

Regarding your last code snippet, one additional warning besides SetStereoAtoms() not being required for legal stereo bonds: the line bond.SetStereoAtoms(bond.GetBeginAtomIdx()+1, bond.GetEndAtomIdx()+1) will bite you, since there is no guarantee that neighboring atoms have correlative indexes. This will fail even for an example as simple as CC=CC (the first C atom has an index lower than the C atom in the double bond, which is not contemplated by your code).

Finally, I think the intention of EnumerateStereoisomers() is to always have STEREOANY bonds enumerated, but it does not properly consider the EITHERDOUBLE bond direction (as we have seen, it crashes the enumeration). I think this is a bug, and that bonds with this direction set should be handled the same way as STEREOANY bonds (i.e., they should be enumerated).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants