Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #4721 #4722

Merged
merged 10 commits into from
Jan 28, 2022
Merged

Fixes #4721 #4722

merged 10 commits into from
Jan 28, 2022

Conversation

ptosco
Copy link
Contributor

@ptosco ptosco commented Nov 21, 2021

This PR fixes #4721. The fix is straightforward.

@greglandrum
Copy link
Member

Hi @ptosco : the logic here fails for molecules where the dummy atom is in an aromatic ring but is connected to an atom which is not in one:

In [6]: Chem.CanonSmiles('C1CC*2ccccc21')
[04:55:39] Can't kekulize mol.  Unkekulized atoms: 4 5 6 7 8

@ptosco
Copy link
Contributor Author

ptosco commented Nov 23, 2021

@greglandrum Right, that was way too straightforward to be true :-) I'll rework the PR it as soon as I have a bit of time.

@greglandrum
Copy link
Member

@greglandrum Right, that was way too straightforward to be true :-) I'll rework the PR it as soon as I have a bit of time.

yeah, messing with the Kekulization code can't possibly be that easy. :-)

Tosco, Paolo added 2 commits November 30, 2021 21:56
…gs to rather tham just their size

- expand the RingInfo API with a few useful methods
- identify rings that are certainly aliphatic upfront
- avoid unnecessary copying atomRings when RingInfo is already initialized
@ptosco
Copy link
Contributor Author

ptosco commented Nov 30, 2021

@greglandrum Don't be scared off by the number of changes.
Actually the fix is pretty simple and robust, and also computationally efficient (I replaced the std::find with a bitset).
All the RingInfo changes are about something I have had in mind for a while and now had a good opportunity to implement. I think that having atom/bond ring memberships available can be handy under many circumstances and will avoid users having to loop over atomRings and bondRings themselves to get that information.

@ptosco ptosco marked this pull request as draft December 1, 2021 15:39
@ptosco
Copy link
Contributor Author

ptosco commented Dec 1, 2021

@greglandrum I'd like to improve further on this.
I don't think it is correct, as master currently does (and this PR does not modify) that

Chem.MolFromSmiles("N1****1")

resolves to
image
I would like not to see any aromatic bonds.
The same applies to this:

Chem.MolFromSmiles("c1ccccc1N1****1")

image
I believe that in the absence of at least one non-ambiguously aromatic atom in the ring with dummies, no aromatic bonds should be set, i.e. in both cases I'd like to see a dummy-pyrrolidine, not a dummy-pyrrole.

This is still broken, too:

Chem.MolFromSmiles("c1ccccc1C1****1")

image

@greglandrum Do you agree with me? If so, I'll go ahead.

@greglandrum
Copy link
Member

@ptosco : yes, I completely agree. There should be one atom which is clearly aromatic

Tosco, Paolo added 2 commits January 8, 2022 23:58
- better handling of dummies in aromatic rings
- exposed atomMembers() and bondMembers()
- added several tests
@ptosco ptosco marked this pull request as ready for review January 9, 2022 18:08
@ptosco
Copy link
Contributor Author

ptosco commented Jan 9, 2022

@greglandrum this is now ready for review. In addition to the changes already discussed above I have done a pass of code modernization and cleanup. Apart from stylistic changes (replacement of boost::tie with boost::make_iterator_range, replacement of a few loops with stdlib calls) there are also a couple of optimizations (removed unnecessary std::map initializations and two unnecessary copies).
None of the changes breaks any existing test, and the tests that we discussed above and a few more that I thought of while working on this all pass.

Copy link
Member

@greglandrum greglandrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some changes suggested.

Code/GraphMol/Aromaticity.cpp Outdated Show resolved Hide resolved
Code/GraphMol/Aromaticity.cpp Outdated Show resolved Hide resolved
Code/GraphMol/Aromaticity.cpp Outdated Show resolved Hide resolved
Code/GraphMol/Aromaticity.cpp Outdated Show resolved Hide resolved
Code/GraphMol/RingInfo.cpp Show resolved Hide resolved
Code/GraphMol/RingInfo.cpp Outdated Show resolved Hide resolved
Comment on lines +87 to +98
unsigned int ri = 0;
for (const auto &aring : mol.getRingInfo()->atomRings()) {
isRingNotCand.set(ri);
for (auto ai : aring) {
const auto at = mol.getAtomWithIdx(ai);
if (isAromaticAtom(*at) && mol.getRingInfo()->numAtomRings(ai) == 1) {
isRingNotCand.reset(ri);
break;
}
}
++ri;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a comment (not a request for a code change):
I'm starting to think that loops where you have to track an index (like this one) are better written as a loop over the index instead of using a range-based for. It's just too easy to mess up the indexing in the range-based for version.

Aside: it would be nice if C++ had the equivalent of Python's enumerate() so that we could loop over index and element. I think this would be writeable with C++17.

Code/GraphMol/Kekulize.cpp Outdated Show resolved Hide resolved
Code/GraphMol/Kekulize.cpp Show resolved Hide resolved
Code/GraphMol/RingInfo.cpp Outdated Show resolved Hide resolved
@greglandrum
Copy link
Member

@ptosco : sorry it took me so long to get to this one.

@ptosco
Copy link
Contributor Author

ptosco commented Jan 27, 2022

@greglandrum I should have addressed all your remarks; please shou if that’s not the case. Thanks for reviewing!

@greglandrum
Copy link
Member

@ptosco since making behavioral changes to core pieces like kekulization makes me nervous, I tried a few things.

This behavior doesn't feel right to me and I'm wondering if you can explain why it is right:
image

@ptosco
Copy link
Contributor Author

ptosco commented Jan 28, 2022

@greglandrum It makes me nervous too, be assured, and I am totally for being conservative for low-level algorithms like kekulization/aromaticity as it is easy to break things. However, we should consider that the way the current aromaticity code deals with dummy atoms is not satisfactory (see examples above where structures are arbitrarily aromatized) and we should improve it if possible, rather than simply sticking to an unsatisfactory status quo.
That said, I can try and explain my reasoning.

In the first example:

m = Chem.MolFromSmiles('C1=C*2=CC=CC=*2C=C1')
m

image

for b in m.GetBonds():
    print (b.GetBeginAtomIdx(), b.GetEndAtomIdx(), b.GetBondType())
0 1 AROMATIC
1 2 AROMATIC
2 3 AROMATIC
3 4 AROMATIC
4 5 AROMATIC
5 6 AROMATIC
6 7 AROMATIC
7 8 AROMATIC
8 9 AROMATIC
9 0 AROMATIC
7 2 AROMATIC

All 5 double bonds were already positioned in the SMILES, the system is fully conjugated and satisfies Hückel, so it is labelled as aromatic.

In the second example

m = Chem.MolFromSmiles('C1=C*2C=CC=C*2C=C1')
m

image

for b in m.GetBonds():
    print (b.GetBeginAtomIdx(), b.GetEndAtomIdx(), b.GetBondType())
0 1 DOUBLE
1 2 SINGLE
2 3 SINGLE
3 4 DOUBLE
4 5 SINGLE
5 6 DOUBLE
6 7 SINGLE
7 8 SINGLE
8 9 DOUBLE
9 0 SINGLE
7 2 SINGLE

only 4 double bonds were positioned. With the available information, it is not possible to guess if the user wished the dummy atoms to be sp3 or sp2-hybridized. As there is no certainty on the intention of the user to represent an aromatic system or rather a bicyclotetraene, the system is left as is, as we discussed above.
If the user explicitly marks the system as aromatic, fully kekulizing it or explicitly marking atoms as aromatic, things will work as expected:

m = Chem.MolFromSmiles('c1c*2cccc*2cc1')
m

image

for b in m.GetBonds():
    print (b.GetBeginAtomIdx(), b.GetEndAtomIdx(), b.GetBondType())
0 1 AROMATIC
1 2 AROMATIC
2 3 AROMATIC
3 4 AROMATIC
4 5 AROMATIC
5 6 AROMATIC
6 7 AROMATIC
7 8 AROMATIC
8 9 AROMATIC
9 0 AROMATIC
7 2 AROMATIC

The idea is that kekulization is inhibited unless there is certainty on the aromatic nature of the dummies. That seems a reasonable assumption to me, but I may certainly be wrong, and I am looking forward to hearing your opinion.

@greglandrum
Copy link
Member

@ptosco : that explanation makes perfect sense to me and I'm 👍 on the logic

Copy link
Member

@greglandrum greglandrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@greglandrum greglandrum merged commit dc48931 into rdkit:master Jan 28, 2022
@bp-kelley
Copy link
Contributor

Sorry for the lateness, but we did have some code to do this in the reaction sanitizer, in aromatize if possible for smarts patterns.

>>> rxn = AllChem.ReactionFromSmarts("C1=C*2C=CC=C*2C=C1>>C")
>>> Chem.MolToSmiles(rxn.GetReactantTemplate(0))
'C1=C*2C=CC=C*2C=C1'
>>> AllChem.SanitizeRxn(rxn)
[11:05:53] Mismatched potential rlabels: 2 unmapped reactant dummy atom rlabels,0 unmappped product dummy atom rlabels
rdkit.Chem.rdChemReactions.SanitizeFlags.SANITIZE_NONE
>>> Chem.MolToSmiles(rxn.GetReactantTemplate(0))
'c1cc*2cccc*:2c1'

I'm just curious about the differences here.

@ptosco
Copy link
Contributor Author

ptosco commented Jan 28, 2022

@bp-kelley Hi Brian, if params.aromatizeIfPossible is true (as it is, by default), adjustQueryProperties() calls sanitizeMol to set aromatic flags:

  if (params.aromatizeIfPossible) {
    unsigned int failed;
    sanitizeMol(mol, failed, SANITIZE_SYMMRINGS | SANITIZE_SETAROMATICITY);
  } else {
[...]

After this PR has been merged, sanitizing with SANITIZE_SETAROMATICITY does not aromatize the structure anymore, as it is not obvious if the user meant to treat the dummies as sp2 (hence aromatic) or sp3 (non-aromatic):

rxn = rdChemReactions.ReactionFromSmarts("C1=C*2C=CC=C*2C=C1>>C")
rxn

image

Chem.MolToSmiles(rxn.GetReactantTemplate(0))
'C1=C*2C=CC=C*2C=C1'
rdChemReactions.SanitizeRxn(rxn)
[17:25:08] Mismatched potential rlabels: 2 unmapped reactant dummy atom rlabels,0 unmappped product dummy atom rlabels
rdkit.Chem.rdChemReactions.SanitizeFlags.SANITIZE_NONE
Chem.MolToSmiles(rxn.GetReactantTemplate(0))
'C1=C*2C=CC=C*2C=C1'

However, if you add an explicit double bond between the dummies (or use aromatic atoms/bonds in the SMARTS), the ambiguity is solved and therefore the structure will be aromatized upon sanitization:

rxn = rdChemReactions.ReactionFromSmarts("C1=C*2C=CC=C*=2C=C1>>C")
rxn

image

Chem.MolToSmiles(rxn.GetReactantTemplate(0))
'C1=C*2=*(C=C1)C=CC=C2'
rdChemReactions.SanitizeRxn(rxn)
[17:34:19] Mismatched potential rlabels: 2 unmapped reactant dummy atom rlabels,0 unmappped product dummy atom rlabels
rdkit.Chem.rdChemReactions.SanitizeFlags.SANITIZE_NONE
Chem.MolToSmiles(rxn.GetReactantTemplate(0))
'c1cc*2cccc*:2c1'

So the behavior is consistent with what described in my comment above.

@greglandrum
Copy link
Member

@bp-kelley I think it’s important to remember (and I had to keep reminding myself of this) that the PR only changes the behavior for rings which contain dummy atoms. And then only for some of those, as @ptosco describes above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dummy atoms next to aromatic are always kekulized even when they should not
3 participants