Fixes #4721 #4722

ptosco · 2021-11-21T20:40:38Z

This PR fixes #4721. The fix is straightforward.

greglandrum · 2021-11-23T03:57:34Z

Hi @ptosco : the logic here fails for molecules where the dummy atom is in an aromatic ring but is connected to an atom which is not in one:

In [6]: Chem.CanonSmiles('C1CC*2ccccc21')
[04:55:39] Can't kekulize mol.  Unkekulized atoms: 4 5 6 7 8

ptosco · 2021-11-23T07:45:16Z

@greglandrum Right, that was way too straightforward to be true :-) I'll rework the PR it as soon as I have a bit of time.

greglandrum · 2021-11-23T08:01:08Z

@greglandrum Right, that was way too straightforward to be true :-) I'll rework the PR it as soon as I have a bit of time.

yeah, messing with the Kekulization code can't possibly be that easy. :-)

…gs to rather tham just their size - expand the RingInfo API with a few useful methods - identify rings that are certainly aliphatic upfront - avoid unnecessary copying atomRings when RingInfo is already initialized

ptosco · 2021-11-30T21:24:08Z

@greglandrum Don't be scared off by the number of changes.
Actually the fix is pretty simple and robust, and also computationally efficient (I replaced the std::find with a bitset).
All the RingInfo changes are about something I have had in mind for a while and now had a good opportunity to implement. I think that having atom/bond ring memberships available can be handy under many circumstances and will avoid users having to loop over atomRings and bondRings themselves to get that information.

ptosco · 2021-12-01T15:45:46Z

@greglandrum I'd like to improve further on this.
I don't think it is correct, as master currently does (and this PR does not modify) that

Chem.MolFromSmiles("N1****1")

resolves to

I would like not to see any aromatic bonds.
The same applies to this:

Chem.MolFromSmiles("c1ccccc1N1****1")

I believe that in the absence of at least one non-ambiguously aromatic atom in the ring with dummies, no aromatic bonds should be set, i.e. in both cases I'd like to see a dummy-pyrrolidine, not a dummy-pyrrole.

This is still broken, too:

Chem.MolFromSmiles("c1ccccc1C1****1")

@greglandrum Do you agree with me? If so, I'll go ahead.

greglandrum · 2021-12-01T16:05:48Z

@ptosco : yes, I completely agree. There should be one atom which is clearly aromatic

- better handling of dummies in aromatic rings - exposed atomMembers() and bondMembers() - added several tests

ptosco · 2022-01-09T18:14:22Z

@greglandrum this is now ready for review. In addition to the changes already discussed above I have done a pass of code modernization and cleanup. Apart from stylistic changes (replacement of boost::tie with boost::make_iterator_range, replacement of a few loops with stdlib calls) there are also a couple of optimizations (removed unnecessary std::map initializations and two unnecessary copies).
None of the changes breaks any existing test, and the tests that we discussed above and a few more that I thought of while working on this all pass.

- added test for the above

greglandrum

Some changes suggested.

Code/GraphMol/Aromaticity.cpp

Code/GraphMol/RingInfo.cpp

greglandrum · 2022-01-27T03:45:28Z

Code/GraphMol/Kekulize.cpp

+  unsigned int ri = 0;
+  for (const auto &aring : mol.getRingInfo()->atomRings()) {
+    isRingNotCand.set(ri);
+    for (auto ai : aring) {
+      const auto at = mol.getAtomWithIdx(ai);
+      if (isAromaticAtom(*at) && mol.getRingInfo()->numAtomRings(ai) == 1) {
+        isRingNotCand.reset(ri);
+        break;
+      }
+    }
+    ++ri;
+  }


Just a comment (not a request for a code change):
I'm starting to think that loops where you have to track an index (like this one) are better written as a loop over the index instead of using a range-based for. It's just too easy to mess up the indexing in the range-based for version.

Aside: it would be nice if C++ had the equivalent of Python's enumerate() so that we could loop over index and element. I think this would be writeable with C++17.

Code/GraphMol/Kekulize.cpp

Code/GraphMol/RingInfo.cpp

greglandrum · 2022-01-27T05:04:05Z

@ptosco : sorry it took me so long to get to this one.

ptosco · 2022-01-27T21:44:58Z

@greglandrum I should have addressed all your remarks; please shou if that’s not the case. Thanks for reviewing!

greglandrum · 2022-01-28T14:06:57Z

@ptosco since making behavioral changes to core pieces like kekulization makes me nervous, I tried a few things.

This behavior doesn't feel right to me and I'm wondering if you can explain why it is right:

ptosco · 2022-01-28T14:52:44Z

@greglandrum It makes me nervous too, be assured, and I am totally for being conservative for low-level algorithms like kekulization/aromaticity as it is easy to break things. However, we should consider that the way the current aromaticity code deals with dummy atoms is not satisfactory (see examples above where structures are arbitrarily aromatized) and we should improve it if possible, rather than simply sticking to an unsatisfactory status quo.
That said, I can try and explain my reasoning.

In the first example:

m = Chem.MolFromSmiles('C1=C*2=CC=CC=*2C=C1')
m

for b in m.GetBonds():
    print (b.GetBeginAtomIdx(), b.GetEndAtomIdx(), b.GetBondType())
0 1 AROMATIC
1 2 AROMATIC
2 3 AROMATIC
3 4 AROMATIC
4 5 AROMATIC
5 6 AROMATIC
6 7 AROMATIC
7 8 AROMATIC
8 9 AROMATIC
9 0 AROMATIC
7 2 AROMATIC

All 5 double bonds were already positioned in the SMILES, the system is fully conjugated and satisfies Hückel, so it is labelled as aromatic.

In the second example

m = Chem.MolFromSmiles('C1=C*2C=CC=C*2C=C1')
m

for b in m.GetBonds():
    print (b.GetBeginAtomIdx(), b.GetEndAtomIdx(), b.GetBondType())
0 1 DOUBLE
1 2 SINGLE
2 3 SINGLE
3 4 DOUBLE
4 5 SINGLE
5 6 DOUBLE
6 7 SINGLE
7 8 SINGLE
8 9 DOUBLE
9 0 SINGLE
7 2 SINGLE

only 4 double bonds were positioned. With the available information, it is not possible to guess if the user wished the dummy atoms to be sp3 or sp2-hybridized. As there is no certainty on the intention of the user to represent an aromatic system or rather a bicyclotetraene, the system is left as is, as we discussed above.
If the user explicitly marks the system as aromatic, fully kekulizing it or explicitly marking atoms as aromatic, things will work as expected:

m = Chem.MolFromSmiles('c1c*2cccc*2cc1')
m

for b in m.GetBonds():
    print (b.GetBeginAtomIdx(), b.GetEndAtomIdx(), b.GetBondType())
0 1 AROMATIC
1 2 AROMATIC
2 3 AROMATIC
3 4 AROMATIC
4 5 AROMATIC
5 6 AROMATIC
6 7 AROMATIC
7 8 AROMATIC
8 9 AROMATIC
9 0 AROMATIC
7 2 AROMATIC

The idea is that kekulization is inhibited unless there is certainty on the aromatic nature of the dummies. That seems a reasonable assumption to me, but I may certainly be wrong, and I am looking forward to hearing your opinion.

greglandrum · 2022-01-28T15:07:30Z

@ptosco : that explanation makes perfect sense to me and I'm 👍 on the logic

greglandrum

LGTM

bp-kelley · 2022-01-28T16:08:35Z

Sorry for the lateness, but we did have some code to do this in the reaction sanitizer, in aromatize if possible for smarts patterns.

>>> rxn = AllChem.ReactionFromSmarts("C1=C*2C=CC=C*2C=C1>>C")
>>> Chem.MolToSmiles(rxn.GetReactantTemplate(0))
'C1=C*2C=CC=C*2C=C1'
>>> AllChem.SanitizeRxn(rxn)
[11:05:53] Mismatched potential rlabels: 2 unmapped reactant dummy atom rlabels,0 unmappped product dummy atom rlabels
rdkit.Chem.rdChemReactions.SanitizeFlags.SANITIZE_NONE
>>> Chem.MolToSmiles(rxn.GetReactantTemplate(0))
'c1cc*2cccc*:2c1'

I'm just curious about the differences here.

ptosco · 2022-01-28T16:39:33Z

@bp-kelley Hi Brian, if params.aromatizeIfPossible is true (as it is, by default), adjustQueryProperties() calls sanitizeMol to set aromatic flags:

  if (params.aromatizeIfPossible) {
    unsigned int failed;
    sanitizeMol(mol, failed, SANITIZE_SYMMRINGS | SANITIZE_SETAROMATICITY);
  } else {
[...]

After this PR has been merged, sanitizing with SANITIZE_SETAROMATICITY does not aromatize the structure anymore, as it is not obvious if the user meant to treat the dummies as sp2 (hence aromatic) or sp3 (non-aromatic):

rxn = rdChemReactions.ReactionFromSmarts("C1=C*2C=CC=C*2C=C1>>C")
rxn

Chem.MolToSmiles(rxn.GetReactantTemplate(0))
'C1=C*2C=CC=C*2C=C1'
rdChemReactions.SanitizeRxn(rxn)
[17:25:08] Mismatched potential rlabels: 2 unmapped reactant dummy atom rlabels,0 unmappped product dummy atom rlabels
rdkit.Chem.rdChemReactions.SanitizeFlags.SANITIZE_NONE
Chem.MolToSmiles(rxn.GetReactantTemplate(0))
'C1=C*2C=CC=C*2C=C1'

However, if you add an explicit double bond between the dummies (or use aromatic atoms/bonds in the SMARTS), the ambiguity is solved and therefore the structure will be aromatized upon sanitization:

rxn = rdChemReactions.ReactionFromSmarts("C1=C*2C=CC=C*=2C=C1>>C")
rxn

Chem.MolToSmiles(rxn.GetReactantTemplate(0))
'C1=C*2=*(C=C1)C=CC=C2'
rdChemReactions.SanitizeRxn(rxn)
[17:34:19] Mismatched potential rlabels: 2 unmapped reactant dummy atom rlabels,0 unmappped product dummy atom rlabels
rdkit.Chem.rdChemReactions.SanitizeFlags.SANITIZE_NONE
Chem.MolToSmiles(rxn.GetReactantTemplate(0))
'c1cc*2cccc*:2c1'

So the behavior is consistent with what described in my comment above.

greglandrum · 2022-01-28T17:03:22Z

@bp-kelley I think it’s important to remember (and I had to keep reminding myself of this) that the PR only changes the behavior for rings which contain dummy atoms. And then only for some of those, as @ptosco describes above.

fixes rdkit#4721

78de8bc

Tosco, Paolo added 2 commits November 30, 2021 21:56

Merge remote-tracking branch 'upstream/master' into github4716

3f9b101

ptosco marked this pull request as draft December 1, 2021 15:39

Tosco, Paolo added 2 commits January 8, 2022 23:58

Merge remote-tracking branch 'upstream/master' into github4716

5da2a06

- code modernization and cleanup

f44aee0

- better handling of dummies in aromatic rings - exposed atomMembers() and bondMembers() - added several tests

ptosco marked this pull request as ready for review January 9, 2022 18:08

Tosco, Paolo added 2 commits January 24, 2022 18:43

Merge remote-tracking branch 'upstream/master' into github4716

39c82c8

- avoid order dependency on rings

8c9009a

- added test for the above

greglandrum requested changes Jan 27, 2022

View reviewed changes

Tosco, Paolo added 3 commits January 27, 2022 14:46

Merge remote-tracking branch 'upstream/master' into github4716

3b9a3d2

changes in response to review

90fff3d

Merge remote-tracking branch 'upstream/master' into github4716

76825b2

greglandrum approved these changes Jan 28, 2022

View reviewed changes

greglandrum merged commit dc48931 into rdkit:master Jan 28, 2022

rachelnwalker mentioned this pull request Jul 7, 2022

Fix performance issue in RingUtils::checkFused #5410

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes #4721 #4722

Fixes #4721 #4722

ptosco commented Nov 21, 2021

greglandrum commented Nov 23, 2021

ptosco commented Nov 23, 2021

greglandrum commented Nov 23, 2021

ptosco commented Nov 30, 2021 •

edited

ptosco commented Dec 1, 2021 •

edited

greglandrum commented Dec 1, 2021

ptosco commented Jan 9, 2022

greglandrum left a comment

greglandrum Jan 27, 2022

greglandrum commented Jan 27, 2022

ptosco commented Jan 27, 2022

greglandrum commented Jan 28, 2022

ptosco commented Jan 28, 2022

greglandrum commented Jan 28, 2022

greglandrum left a comment

bp-kelley commented Jan 28, 2022

ptosco commented Jan 28, 2022

greglandrum commented Jan 28, 2022

Fixes #4721 #4722

Fixes #4721 #4722

Conversation

ptosco commented Nov 21, 2021

greglandrum commented Nov 23, 2021

ptosco commented Nov 23, 2021

greglandrum commented Nov 23, 2021

ptosco commented Nov 30, 2021 • edited

ptosco commented Dec 1, 2021 • edited

greglandrum commented Dec 1, 2021

ptosco commented Jan 9, 2022

greglandrum left a comment

Choose a reason for hiding this comment

greglandrum Jan 27, 2022

Choose a reason for hiding this comment

greglandrum commented Jan 27, 2022

ptosco commented Jan 27, 2022

greglandrum commented Jan 28, 2022

ptosco commented Jan 28, 2022

greglandrum commented Jan 28, 2022

greglandrum left a comment

Choose a reason for hiding this comment

bp-kelley commented Jan 28, 2022

ptosco commented Jan 28, 2022

greglandrum commented Jan 28, 2022

ptosco commented Nov 30, 2021 •

edited

ptosco commented Dec 1, 2021 •

edited