Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different behavior with mol and smiles when tautomerizing #5937

Open
bjonnh-work opened this issue Jan 5, 2023 · 2 comments
Open

Different behavior with mol and smiles when tautomerizing #5937

bjonnh-work opened this issue Jan 5, 2023 · 2 comments
Labels

Comments

@bjonnh-work
Copy link
Contributor

Describe the bug

When tautomerizing a molecule directly from a specific molfile or its smiles conversion, the behavior is different. And this behavior is extremely dependent on the structure.

To Reproduce

ketched="""
          04090817072D 1   1.00000     0.00000     0

 32 36  0     0  0            999 V2000
    2.9414   -0.4414    0.0000 N   0  0  0  0  0  0           0  0  0
    3.9655   -0.4414    0.0000 C   0  0  0  0  0  0           0  0  0
    2.4310   -1.3207    0.0000 C   0  0  0  0  0  0           0  0  0
    2.2586    0.3172    0.0000 C   0  0  0  0  0  0           0  0  0
    4.4759   -1.3207    0.0000 C   0  0  0  0  0  0           0  0  0
    4.4759    0.4448    0.0000 C   0  0  0  0  0  0           0  0  0
    1.4310   -1.1069    0.0000 N   0  0  0  0  0  0           0  0  0
    2.9414   -2.2103    0.0000 C   0  0  0  0  0  0           0  0  0
    1.3207   -0.0931    0.0000 N   0  0  0  0  0  0           0  0  0
    2.4690    1.3172    0.0000 O   0  0  0  0  0  0           0  0  0
    3.9655   -2.2103    0.0000 C   0  0  0  0  0  0           0  0  0
    5.5000   -1.3207    0.0000 C   0  0  0  0  0  0           0  0  0
    5.5000    0.4448    0.0000 C   0  0  0  0  0  0           0  0  0
    2.4310   -3.0966    0.0000 C   0  0  0  0  0  0           0  0  0
    0.4345    0.4172    0.0000 C   0  0  0  0  0  0           0  0  0
    6.0138   -0.4414    0.0000 C   0  0  0  0  0  0           0  0  0
    6.0138    1.3310    0.0000 C   0  0  0  0  0  0           0  0  0
   -0.4517   -0.0931    0.0000 C   0  0  0  0  0  0           0  0  0
   -1.3379    0.4172    0.0000 N   0  0  0  0  0  0           0  0  0
   -2.2276   -0.0931    0.0000 C   0  0  0  0  0  0           0  0  0
   -3.1138    0.4172    0.0000 C   0  0  0  0  0  0           0  0  0
   -2.2276   -1.1138    0.0000 O   0  0  0  0  0  0           0  0  0
   -4.0448   -0.0034    0.0000 C   0  0  0  0  0  0           0  0  0
   -3.2138    1.4345    0.0000 N   0  0  0  0  0  0           0  0  0
   -4.7276    0.7621    0.0000 C   0  0  0  0  0  0           0  0  0
   -4.2552   -0.9966    0.0000 C   0  0  0  0  0  0           0  0  0
   -4.2172    1.6483    0.0000 C   0  0  0  0  0  0           0  0  0
   -5.7517    0.7621    0.0000 C   0  0  0  0  0  0           0  0  0
   -4.7276    2.5345    0.0000 C   0  0  0  0  0  0           0  0  0
   -6.2621    1.6483    0.0000 C   0  0  0  0  0  0           0  0  0
   -6.2621   -0.1276    0.0000 O   0  0  0  0  0  0           0  0  0
   -5.7517    2.5345    0.0000 C   0  0  0  0  0  0           0  0  0
  1  2  1  0     0  0
  1  3  1  0     0  0
  1  4  1  0     0  0
  2  5  1  0     0  0
  2  6  2  0     0  0
  3  7  2  0     0  0
  3  8  1  0     0  0
  4  9  1  0     0  0
  4 10  2  0     0  0
  5 11  1  0     0  0
  5 12  2  0     0  0
  6 13  1  0     0  0
  8 14  1  0     0  0
  9 15  1  0     0  0
 12 16  1  0     0  0
 13 17  1  0     0  0
 15 18  1  0     0  0
 18 19  1  0     0  0
 19 20  1  0     0  0
 20 21  1  0     0  0
 20 22  2  0     0  0
 21 23  2  0     0  0
 21 24  1  0     0  0
 23 25  1  0     0  0
 23 26  1  0     0  0
 24 27  1  0     0  0
 25 28  1  0     0  0
 27 29  1  0     0  0
 28 30  1  0     0  0
 28 31  2  0     0  0
 29 32  1  0     0  0
  7  9  1  0     0  0
  8 11  2  0     0  0
 13 16  2  0     0  0
 25 27  2  0     0  0
 30 32  1  0     0  0
M  END
"""

def my_preprocess_normalizations(mol: Chem.Mol) -> Chem.Mol:
    return rdMolStandardize.TautomerEnumerator().Canonicalize(mol)

m1=my_preprocess_normalizations(Chem.MolFromMolBlock(ketched))
m2=my_preprocess_normalizations(Chem.MolFromSmiles(Chem.MolToSmiles(Chem.MolFromMolBlock(ketched))))

Interestingly deleting bonds or changing N or O far from the tautomerization site totally stops the issue.

@gedeck managed to find a smaller version that still reproduce the issue:

ketched="""
  MJ221900                      

 15 16  0  0  0  0  0  0  0  0999 V2000
   -0.2638    0.1514    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.9777    0.5625    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
   -1.6945    0.1514    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -2.4084    0.5625    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.6945   -0.6708    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
   -3.1584    0.2236    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -2.4889    1.3820    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
   -3.7084    0.8403    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -3.3279   -0.5764    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -3.2973    1.5542    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -4.5334    0.8403    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -3.7084    2.2681    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -4.9446    1.5542    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -4.9446    0.1236    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
   -4.5334    2.2681    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0  0  0  0
  2  3  1  0  0  0  0
  3  4  1  0  0  0  0
  3  5  2  0  0  0  0
  4  6  2  0  0  0  0
  4  7  1  0  0  0  0
  6  8  1  0  0  0  0
  6  9  1  0  0  0  0
  7 10  1  0  0  0  0
  8 11  1  0  0  0  0
 10 12  1  0  0  0  0
 11 13  1  0  0  0  0
 11 14  2  0  0  0  0
 12 15  1  0  0  0  0
  8 10  2  0  0  0  0
 13 15  1  0  0  0  0
M  END
"""

Outside of the atom orders we didn't find obvious differences between the two molecules using Debug and looking at atom and bond properties.

Expected behavior

I would expect m1 and m2 to be the same.

Screenshots

image

Configuration (please complete the following information):

  • RDKit version: master '2023.03.1pre'
  • OS: Ubuntu 22.04
  • Python version (if relevant): Likely not but 3.10.6
  • Rdkit compiled from git with
cmake -DBUILD_SHARED_LIBS=ON -DRDK_BUILD_CPP_TESTS=OFF -DRDK_BUILD_DESCRIPTORS3D=OFF -DRDK_BUILD_MAEPARSER_SUPPORT=OFF \
   -DRDK_BUILD_SLN_SUPPORT=OFF -DRDK_BUILD_CAIRO_SUPPORT=ON -DRDK_BUILD_PYTHON_WRAPPERS=ON -DRDK_BUILD_INCHI_SUPPORT=ON \
   -DRDK_BUILD_COORDGEN_SUPPORT=ON -DRDK_TEST_MULTITHREADED=OFF -DCMAKE_BUILD_TYPE=Release -DBoost_NO_BOOST_CMAKE=ON \
   -DPYTHON_EXECUTABLE=/usr/bin/python3 -D RDK_INSTALL_STATIC_LIBS=OFF
    ..
@bjonnh-work bjonnh-work added the bug label Jan 5, 2023
@greglandrum
Copy link
Member

Confirmed. The tautomer enumeration code is actually producing different results here:

m = Chem.MolFromMolBlock('''foo
  MJ221900

 15 16  0  0  0  0  0  0  0  0999 V2000
   -0.2638    0.1514    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.9777    0.5625    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
   -1.6945    0.1514    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -2.4084    0.5625    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.6945   -0.6708    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
   -3.1584    0.2236    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -2.4889    1.3820    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
   -3.7084    0.8403    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -3.3279   -0.5764    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -3.2973    1.5542    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -4.5334    0.8403    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -3.7084    2.2681    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -4.9446    1.5542    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -4.9446    0.1236    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
   -4.5334    2.2681    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0  0  0  0
  2  3  1  0  0  0  0
  3  4  1  0  0  0  0
  3  5  2  0  0  0  0
  4  6  2  0  0  0  0
  4  7  1  0  0  0  0
  6  8  1  0  0  0  0
  6  9  1  0  0  0  0
  7 10  1  0  0  0  0
  8 11  1  0  0  0  0
 10 12  1  0  0  0  0
 11 13  1  0  0  0  0
 11 14  2  0  0  0  0
 12 15  1  0  0  0  0
  8 10  2  0  0  0  0
 13 15  1  0  0  0  0
M  END
''')
from rdkit.Chem.MolStandardize import rdMolStandardize
m2 = Chem.MolFromSmiles(Chem.MolToSmiles(m))
tenum = rdMolStandardize.TautomerEnumerator()
tauts = tenum.Enumerate(m)
tauts2 = tenum.Enumerate(m2)
len(tauts) ==len(tauts2)  # <- should be true, is False

The tautomer enumeration code is a gift that keeps on giving. :-S

@kienerj
Copy link

kienerj commented May 5, 2023

I have a similar example but the source here is two different molfiles with the same molecule. orientation and kekulized form is different which then probably leads to different tautomers:

image

(see attached notebbook, rename to ipynb)

Canonical Tautomer Issue.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants