Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RDKit hangs indefinitely when parsing not so big molblock #6434

Closed
eloyfelix opened this issue Jun 2, 2023 · 3 comments · Fixed by #6531
Closed

RDKit hangs indefinitely when parsing not so big molblock #6434

eloyfelix opened this issue Jun 2, 2023 · 3 comments · Fixed by #6531
Labels
Milestone

Comments

@eloyfelix
Copy link
Contributor

eloyfelix commented Jun 2, 2023

Describe the bug

I had US06235754-20010522-C00041.MOL (from USPTO bulk downloads) molecule more than 10 minutes waiting to be parsed until I cancelled.
If the sanitization is turned False and Chem.Sanitize is applied afterwards everything seems to work.

To Reproduce

molfile = """
  ChemDraw12010010062D

 30 32  0  0  0  0  0  0  0  0999 V2000
   -0.6432    2.4695    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.6432    3.9718    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.9484    4.7230    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -3.2535    3.9718    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -3.2535    2.4695    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.9484    1.7183    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -4.5540    1.7183    0.0000 Cl  0  0  0  0  0  0  0  0  0  0  0  0
    0.6573    4.7230    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.9577    3.9718    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.6573    6.2254    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.4131    7.5258    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.0939    7.5258    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    3.2582    4.7230    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    1.9577    2.4695    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    4.5587    3.9718    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.6432   -6.7793    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.6432   -5.2770    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.9484   -4.5258    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -3.2535   -5.2770    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -3.2535   -6.7793    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.9484   -7.5305    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -4.5540   -7.5305    0.0000 Cl  0  0  0  0  0  0  0  0  0  0  0  0
    0.6573   -4.5258    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.9577   -5.2770    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.6573   -3.0235    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.4131   -1.7230    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.0939   -1.7230    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    3.2582   -4.5258    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.9577   -6.7793    0.0000 F   0  0  0  0  0  0  0  0  0  0  0  0
    4.5587   -5.2770    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  2  0  0  0  0
  2  3  1  0  0  0  0
  3  4  2  0  0  0  0
  4  5  1  0  0  0  0
  5  6  2  0  0  0  0
  1  6  1  0  0  0  0
  5  7  1  0  0  0  0
  2  8  1  0  0  0  0
  8  9  2  0  0  0  0
  8 10  1  0  0  0  0
 10 11  1  0  0  0  0
 11 12  1  0  0  0  0
 10 12  1  0  0  0  0
  9 13  1  0  0  0  0
  9 14  1  4  0  0  0
 13 15  1  0  0  0  0
 16 17  2  0  0  0  0
 17 18  1  0  0  0  0
 18 19  2  0  0  0  0
 19 20  1  0  0  0  0
 20 21  2  0  0  0  0
 16 21  1  0  0  0  0
 20 22  1  0  0  0  0
 17 23  1  0  0  0  0
 23 24  2  0  0  0  0
 23 25  1  0  0  0  0
 25 26  1  0  0  0  0
 26 27  1  0  0  0  0
 25 27  1  0  0  0  0
 24 28  1  0  0  0  0
 24 29  1  4  0  0  0
 28 30  2  0  0  0  0
M  END"""

from rdkit import Chem
mol = Chem.MolFromMolBlock(molfile)

What was a bit surprising for me is that disabling the sanitization and sanitazing afterwards seems to make things right. I thought both ways of sanitizing were equivalent.

molfile = """
  ChemDraw12010010062D

 30 32  0  0  0  0  0  0  0  0999 V2000
   -0.6432    2.4695    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.6432    3.9718    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.9484    4.7230    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -3.2535    3.9718    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -3.2535    2.4695    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.9484    1.7183    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -4.5540    1.7183    0.0000 Cl  0  0  0  0  0  0  0  0  0  0  0  0
    0.6573    4.7230    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.9577    3.9718    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.6573    6.2254    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.4131    7.5258    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.0939    7.5258    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    3.2582    4.7230    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    1.9577    2.4695    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    4.5587    3.9718    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.6432   -6.7793    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.6432   -5.2770    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.9484   -4.5258    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -3.2535   -5.2770    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -3.2535   -6.7793    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.9484   -7.5305    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -4.5540   -7.5305    0.0000 Cl  0  0  0  0  0  0  0  0  0  0  0  0
    0.6573   -4.5258    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.9577   -5.2770    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.6573   -3.0235    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.4131   -1.7230    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.0939   -1.7230    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    3.2582   -4.5258    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.9577   -6.7793    0.0000 F   0  0  0  0  0  0  0  0  0  0  0  0
    4.5587   -5.2770    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  2  0  0  0  0
  2  3  1  0  0  0  0
  3  4  2  0  0  0  0
  4  5  1  0  0  0  0
  5  6  2  0  0  0  0
  1  6  1  0  0  0  0
  5  7  1  0  0  0  0
  2  8  1  0  0  0  0
  8  9  2  0  0  0  0
  8 10  1  0  0  0  0
 10 11  1  0  0  0  0
 11 12  1  0  0  0  0
 10 12  1  0  0  0  0
  9 13  1  0  0  0  0
  9 14  1  4  0  0  0
 13 15  1  0  0  0  0
 16 17  2  0  0  0  0
 17 18  1  0  0  0  0
 18 19  2  0  0  0  0
 19 20  1  0  0  0  0
 20 21  2  0  0  0  0
 16 21  1  0  0  0  0
 20 22  1  0  0  0  0
 17 23  1  0  0  0  0
 23 24  2  0  0  0  0
 23 25  1  0  0  0  0
 25 26  1  0  0  0  0
 26 27  1  0  0  0  0
 25 27  1  0  0  0  0
 24 28  1  0  0  0  0
 24 29  1  4  0  0  0
 28 30  2  0  0  0  0
M  END"""

from rdkit import Chem
mol = Chem.MolFromMolBlock(molfile, sanitize=False)
Chem.SanitizeMol(mol)

Expected behavior
It parses the molecule or fails to parse it.

Configuration (please complete the following information):

  • RDKit version: 2023.03.1
  • OS: Ubuntu 20.04
  • Python version (if relevant): 3.11
  • If you are not using conda: how did you install the RDKit? pip install rdkit
@eloyfelix eloyfelix added the bug label Jun 2, 2023
@eloyfelix eloyfelix changed the title RDKit indefinetely hangs when parsing not so big molblock RDKit hangs indefinitely when parsing not so big molblock Jun 2, 2023
@klaasMensaert
Copy link

klaasMensaert commented Jul 10, 2023

Same here for the following molecule:

    molfile = """    
    Marvin  07050610402D          

 16 15  0  0  0  0            999 V2000
    7.6621   -4.0706    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    6.9496   -4.4831    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    6.2183   -4.0706    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    5.5058   -4.4831    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    7.6621   -3.2456    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    8.3746   -4.4831    0.0000 O   0  5  0  0  0  0  0  0  0  0  0  0
    5.5058   -5.3269    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    4.7933   -4.0706    0.0000 O   0  5  0  0  0  0  0  0  0  0  0  0
    6.9496   -5.3269    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
    4.7933   -1.5768    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    6.2183   -3.2456    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    6.9328   -2.8331    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    5.5058   -2.8143    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    4.7913   -3.2268    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    5.5058   -1.9893    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    6.2203   -1.5768    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
  2  1  1  0  0  0  0
  6  1  1  0  0  0  0
  5  1  2  0  0  0  0
  3  2  2  0  0  0  0
  2  9  1  4  0  0  0
  4  3  1  0  0  0  0
 11  3  1  0  0  0  0
  8  4  1  0  0  0  0
  7  4  2  0  0  0  0
 13 11  2  0  0  0  0
 15 13  1  0  0  0  0
 10 15  2  0  0  0  0
 11 12  1  4  0  0  0
 13 14  1  4  0  0  0
 15 16  1  0  0  0  0
M  CHG  2   6  -1   8  -1
M  END
"""

@klaasMensaert
Copy link

klaasMensaert commented Jul 10, 2023

It hangs for me with version 2023.03.2 and 2023.03.1, but not with 2022.9.5
OS: Ubuntu 20.04.6 LTS and CentOS Linux release 7.9.2009 (tested on both)
Python version (if relevant): 3.8
If you are not using conda: how did you install the RDKit? pycharm

@bp-kelley
Copy link
Contributor

I've confirmed this, it looks like the bug was added to the legacy chirality detection. The fix, for now, is to use the new chirality perception:

Chem.SetUseLegacyStereoPerception(True)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants