Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SMILES and SMARTS parse bonds in a different order #5683

Closed
ricrogz opened this issue Oct 24, 2022 · 4 comments
Closed

SMILES and SMARTS parse bonds in a different order #5683

ricrogz opened this issue Oct 24, 2022 · 4 comments
Labels
Milestone

Comments

@ricrogz
Copy link
Contributor

ricrogz commented Oct 24, 2022

This is a relevant issue when parsing strings with CX extensions, since these may include specifications that depend on bond indexes.

I noticed the issue because of a "incorrect" SMILES string:

from rdkit import Chem

# Note atom 3 is not connected to bond 1.
cxsmiles = 'O=C(C=CCC1CCCCC1)N1N=Cc2ccccc2C1c1ccccc1 |w:3.1|'

# This is the right behavior: the CX extension is incorrect, and since strictCXSMILES==true by default it should fail parsing
m1 = Chem.MolFromSmiles(cxsmiles)
# [10:01:43] atom 3 is not associated with bond 1 in w block

m1 is None
# True


# This is not right: this should fail too, but it doesn't!
m2 = Chem.MolFromSmarts(cxsmiles)
m2
# <rdkit.Chem.rdchem.Mol at 0x7f4f08145c40>

The smiles string should also fail to parse as SMARTS, but it doesn't, we get a mol back!

Looking into it, I stripped the CX extension from the string, and tried to parse it again, both as SMILES and SMARTS, and noticed the order of the bonds in the mol is different. Also, notice that bond 1 in the SMARTS is a double bond, so it doesn't make sense that it is wiggly (probably the parsed should check that too):

mol_smiles.Debug()
# [...]
# Bonds:
# 	0 0->1 order: 2 conj?: 1 aromatic?: 0
# 	1 1->2 order: 1 conj?: 1 aromatic?: 0
# 	2 2->3 order: 2 conj?: 1 aromatic?: 0
# 	3 3->4 order: 1 conj?: 0 aromatic?: 0
# 	4 4->5 order: 1 conj?: 0 aromatic?: 0
# 	5 5->6 order: 1 conj?: 0 aromatic?: 0
# 	6 6->7 order: 1 conj?: 0 aromatic?: 0
# 	7 7->8 order: 1 conj?: 0 aromatic?: 0
# 	8 8->9 order: 1 conj?: 0 aromatic?: 0
# 	9 9->10 order: 1 conj?: 0 aromatic?: 0
# 	10 1->11 order: 1 conj?: 1 aromatic?: 0

mol_smarts.Debug()
# [...]
# Bonds:
# 	0 0->1 order: 2 conj?: 0 aromatic?: 0
# 	1 2->3 order: 2 conj?: 0 aromatic?: 0
# 	2 3->4 order: 1 conj?: 0 aromatic?: 0
# 	3 4->5 order: 1 conj?: 0 aromatic?: 0
# 	4 5->6 order: 1 conj?: 0 aromatic?: 0
# 	5 6->7 order: 1 conj?: 0 aromatic?: 0
# 	6 7->8 order: 1 conj?: 0 aromatic?: 0
# 	7 8->9 order: 1 conj?: 0 aromatic?: 0
# 	8 9->10 order: 1 conj?: 0 aromatic?: 0
# 	9 2->1 order: 1 conj?: 0 aromatic?: 0
# 	10 1->11 order: 1 conj?: 0 aromatic?: 0

Note how the bond between atoms 1 and 2 has moved from index 1 in the SMILES to index 9 in the SMARTS. I think this doesn't have any impact on the Mol itself, but it breaks parsing the CX extensions (either for SMILES or for SMARTS, depending how the mol was generated), since the indexes of the bonds might no longer match the order in the literal string.

@ricrogz ricrogz added the bug label Oct 24, 2022
@mwojcikowski
Copy link
Contributor

@ricrogz Have you tried throwing chiral atoms on this bug? Looks like the chirality could flip, which might be related to #3096

@ricrogz
Copy link
Contributor Author

ricrogz commented Oct 24, 2022

@ricrogz Have you tried throwing chiral atoms on this bug? Looks like the chirality could flip, which might be related to #3096

I have now, but it doesn't seem the same issue. In what I wrote above, it can be seen that it is during the parsing of the SMARTS string that the bonds are not created in the same order as the positions they have in the string (the wiggly bond is moved from position 1 to 9), while in your issue, parsing the SMARTS still yields a mol with the atoms and bonds in the right orderings and with the correct chirality:

# first case in issue #3096
m = Chem.MolFromSmarts('[S@@](=O)(-C)-CC')
m.UpdatePropertyCache(False)
m.Debug()

# Atoms:
# 	0 16 S chg: 0  deg: 3 exp: 4 imp: 0 hyb: 0 arom?: 0 chi: 1
# 	1 8 O chg: 0  deg: 1 exp: 2 imp: 0 hyb: 0 arom?: 0 chi: 0
# 	2 6 C chg: 0  deg: 1 exp: 1 imp: 3 hyb: 0 arom?: 0 chi: 0
# 	3 6 C chg: 0  deg: 2 exp: 1 imp: 0 hyb: 0 arom?: 0 chi: 0
# 	4 6 C chg: 0  deg: 1 exp: 0 imp: 0 hyb: 0 arom?: 0 chi: 0
# Bonds:
# 	0 0->1 order: 2 conj?: 0 aromatic?: 0
# 	1 0->2 order: 1 conj?: 0 aromatic?: 0
# 	2 0->3 order: 1 conj?: 0 aromatic?: 0
# 	3 3->4 order: 1 conj?: 0 aromatic?: 0

@ricrogz
Copy link
Contributor Author

ricrogz commented Nov 2, 2022

Another example, this time, with a CXSMILES string created by RDKit itself:

mb = """
     RDKit          2D

  0  0  0  0  0  0  0  0  0  0999 V3000
M  V30 BEGIN CTAB
M  V30 COUNTS 9 10 0 0 0
M  V30 BEGIN ATOM
M  V30 1 C 2.366025 0.000000 0.000000 0
M  V30 2 C 0.866025 0.000000 0.000000 0
M  V30 3 C -0.433013 0.750000 0.000000 0
M  V30 4 N -0.433013 -0.750000 0.000000 0
M  V30 5 C -1.183013 -2.049038 0.000000 0
M  V30 6 C -2.683013 -2.049038 0.000000 0
M  V30 7 C -3.433013 -3.348076 0.000000 0
M  V30 8 C -3.433013 -4.848076 0.000000 0
M  V30 9 C -4.732051 -4.098076 0.000000 0
M  V30 END ATOM
M  V30 BEGIN BOND
M  V30 1 1 1 2
M  V30 2 1 2 3
M  V30 3 1 3 4
M  V30 4 1 4 5
M  V30 5 2 5 6 CFG=2
M  V30 6 1 6 7
M  V30 7 1 7 8
M  V30 8 1 8 9
M  V30 9 1 4 2
M  V30 10 1 9 7
M  V30 END BOND
M  V30 END CTAB
M  END
"""

m = Chem.MolFromMolBlock(mb)

# The crossed double bond will be encoded as "w:" CX extension
# which requires an atom and a bond index.
b = m.GetBondWithIdx(4)
assert b.GetBondType() == Chem.BondType.DOUBLE
assert b.HasProp('_MolFileBondCfg')
assert b.GetIntProp('_MolFileBondCfg') == 2

m.RemoveAllConformers()  # Don't write coords to CXSMILES
cxsmi = Chem.MolToCXSmiles(m)
print(cxsmi)
# CC1CN1C=CC1CC1 |w:4.5|

m2 = Chem.MolFromSmiles(cxsmi)
# [11:32:52] atom 4 is not associated with bond 5 in w block

@ricrogz
Copy link
Contributor Author

ricrogz commented Nov 4, 2022

Another example, this time, with a CXSMILES string created by RDKit itself:

mb = """
RDKit 2D
[...]

Turns out this is not the same issue as described above. We posted a fix for this in #5722

greglandrum added a commit to greglandrum/rdkit that referenced this issue Nov 18, 2022
more testing required
greglandrum added a commit that referenced this issue Nov 23, 2022
* Fixes #5683
more testing required

* add explicit test for issue

* minor refactoring

* add forgotten file
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants