Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running kekulization on mols with query bonds will either fail or return incorrect results. #5505

Closed
ricrogz opened this issue Aug 17, 2022 · 4 comments
Labels
Milestone

Comments

@ricrogz
Copy link
Contributor

ricrogz commented Aug 17, 2022

There's cases in which doing this completely fails:

sma = '[#6]-c1cccc(-[#6])c1'
mol = Chem.MolFromSmarts(sma)
Chem.Kekulize(mol)
============================================
[16:13:14] Can't kekulize mol.  Unkekulized atoms: 1 5
---------------------------------------------------------------------------
KekulizeException                         Traceback (most recent call last)
Input In [2], in <cell line: 7>()
      5 sma = '[#6]-c1cccc(-[#6])c1'
      6 mol = Chem.MolFromSmarts(sma)
----> 7 Chem.Kekulize(mol)

KekulizeException: Can't kekulize mol.  Unkekulized atoms: 1 5

And other that succeed with only slight differences in the structure, although the result is nonsense (note the bond orders):

sma = '[#6]-c1ccccc(-[#6])1'
mol = Chem.MolFromSmarts(sma)
Chem.Kekulize(mol)
mol.Debug()
============================================
Atoms:
	0 6 C chg: 0  deg: 1 exp: 1 imp: 3 hyb: 0 arom?: 0 chi: 0
	1 6 C chg: 0  deg: 3 exp: 1 imp: 0 hyb: 0 arom?: 1 chi: 0
	2 6 C chg: 0  deg: 2 exp: 0 imp: 0 hyb: 0 arom?: 1 chi: 0
	3 6 C chg: 0  deg: 2 exp: 0 imp: 0 hyb: 0 arom?: 1 chi: 0
	4 6 C chg: 0  deg: 2 exp: 0 imp: 0 hyb: 0 arom?: 1 chi: 0
	5 6 C chg: 0  deg: 2 exp: 0 imp: 0 hyb: 0 arom?: 1 chi: 0
	6 6 C chg: 0  deg: 3 exp: 1 imp: 0 hyb: 0 arom?: 1 chi: 0
	7 6 C chg: 0  deg: 1 exp: 1 imp: 3 hyb: 0 arom?: 0 chi: 0
Bonds:
	0 0->1 order: 1 conj?: 0 aromatic?: 0
	1 1->2 order: 1 conj?: 0 aromatic?: 1
	2 2->3 order: 1 conj?: 0 aromatic?: 1
	3 3->4 order: 1 conj?: 0 aromatic?: 1
	4 4->5 order: 1 conj?: 0 aromatic?: 1
	5 5->6 order: 1 conj?: 0 aromatic?: 1
	6 6->7 order: 1 conj?: 0 aromatic?: 0
	7 6->1 order: 2 conj?: 0 aromatic?: 1

I think this is due to the code at

if (QueryOps::hasComplexBondTypeQuery(*bnd)) {
, which picks up the "SingleOrAromaticBond" queries between the aromatic atoms, so that the implicit valences for these atoms are set to 0, which causes problems in the detection of potentially aromatizable bonds.

I think we should either:

  • Add a check to KekulizeFragment() to fail / exit early if bond order queries are detected.
  • Make the code in markDbondCands() more flexible, at
    if (dv == (sbo + 1 + nRadicals)) {
    dBndCands[allAtm] = 1;
    } else if (!nRadicals && at->getNoImplicit() && dv == (sbo + 2)) {
    // special case: there is currently no radical on the atom, but if
    , so that we allow bonds which are below their valence to support double bonds too. Although I have some concerns about performance if we do this.
@ricrogz ricrogz added the bug label Aug 17, 2022
@bp-kelley
Copy link
Contributor

Here is a simpler case

>>> sma = 'c1ccccc1C'
>>> mol = Chem.MolFromSmarts(sma)
>>> Chem.Kekulize(mol)
>>> sma = 'c1ccccc1-[#6]'
>>> mol = Chem.MolFromSmarts(sma)
>>> Chem.Kekulize(mol)
[16:42:15] Can't kekulize mol.  Unkekulized atoms: 5
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
rdkit.Chem.rdchem.KekulizeException: Can't kekulize mol.  Unkekulized atoms: 5

@bp-kelley
Copy link
Contributor

In the case of 'c1cccc1C' the kekulization is just wrong, IMO:

>>> Chem.MolToSmiles(mol, allBondsExplicit=True)
'C-c1-c-c-c-c-c-1'

@bp-kelley
Copy link
Contributor

Here is a recipe that "works"

>>> sma = 'c1ccccc1[#6]'
>>> mol = Chem.MolFromSmarts(sma)
>>> Chem.MolToSmiles(mol, allBondsExplicit=True)
'C-c1:c:c:c:c:c:1'
>>> mh = Chem.MolFromSmiles(Chem.MolToSmiles(mol, allBondsExplicit=True))
>>> Chem.MolToSmiles(mh)
'Cc1ccccc1'
>>> Chem.Kekulize(mh)
>>> Chem.MolToSmiles(mh, allBondsExplicit=True)
'C-c1=c-c=c-c=c-1'

@greglandrum
Copy link
Member

My two cents on this, after a conversation with @ricrogz: We shouldn't be trying to kekulize query bonds.
The question is whether we should throw an execption (perhaps a ValueError since the input is not valid) or just ignore those bonds during kekulization
I was leaning towards the exception approach, but now favor ignoring them since I'm concerned that throwing an exception may be very disruptive.

greglandrum added a commit to greglandrum/rdkit that referenced this issue Sep 9, 2022
@greglandrum greglandrum added this to the 2022_09_1 milestone Sep 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants