Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aromaticity edge case: when the fusing bond is double, it should be marked as aromatic. #523

Open
greglandrum opened this issue Jun 17, 2015 · 6 comments
Labels

Comments

@greglandrum
Copy link
Member

Reported by Peter Shenkin
Here's a code snippet illustrating the problem:

In [29]: Chem.CanonSmiles('CC23C4=C5C6=C4C(O)2C6=C35')
Out[29]: 'CC12c3c4c1c1c-4c3C12O'

In [30]: Chem.CanonSmiles('CC23C4=C5C6=C4C2=C6C35O')
Out[30]: 'CC12c3c4c5c(c1c3=5)C42O'

A sketch of the first input:
image

A sketch of the second input:
image

And the mailing list thread:
http://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg04827.html

@greglandrum greglandrum added this to the 2015_09_1 milestone Jun 17, 2015
@greglandrum
Copy link
Member Author

An even simpler test case:

In [31]: Chem.CanonSmiles('C1=CC2=CC=C12')
Out[31]: 'c1cc2ccc1-2'

In [32]: Chem.CanonSmiles('C1=CC2=C1C=C2')
Out[32]: 'c1cc2ccc1=2'

And the molecules:
image
and:
image

@greglandrum greglandrum modified the milestones: 2015_09_1, 2016_03_1 Nov 11, 2015
@greglandrum
Copy link
Member Author

After putting some more thought into this, I've changed my mind and, given the current RDKit definition of aromaticity, I'm thinking about this a bit differently. Note: this assumes understanding of the RDKit's aromaticity definition: http://rdkit.org/docs/RDKit_Book.html#aromaticity .

In each of the examples above the two structures are resonance forms of each other. The RDKit does not attempt to standardize resonance forms of non-aromatic systems (avoiding questions about resonance forms is, IMO, one of the big arguments for using aromaticity in SMILES). This means that these two structures:
CC1=C(C)C=C1
image
CC1=CC=C1C
image
are different from the RDKit's perspective. They therefore generate different canonical SMILES:

In [5]: Chem.CanonSmiles('CC1=C(C)C=C1')
Out[5]: 'CC1=C(C)C=C1'

In [6]: Chem.CanonSmiles('CC1=CC=C1C')
Out[6]: 'CC1=CC=C1C'

The examples above are the same thing, just with the added complication of aromaticity thrown in. Since the individual 4-rings are not aromatic (they have 4 pi electrons each), but the full fused system is (it has 6 pi electrons), the envelope bonds are aromatic but the fusing bond (the one between atoms 3 and 4 in the simple example in the previous comment) is not. Since the bond isn't aromatic but is between two aromatic atoms, it is inserted in the SMILES explicitly: as a - in the first case and = in the second case.

There's not really a fix for this that is consistent with the current handling of aromaticity and that doesn't destroy information present in the original structure.

Marking this as "won't fix".

@greglandrum greglandrum added wontfix and removed bug labels Mar 30, 2016
@greglandrum greglandrum removed this from the 2016_03_1 milestone Mar 30, 2016
@shenkin
Copy link

shenkin commented Mar 30, 2016

Greg, it seems to me that the structures depicted in your note of Jun 17 are aromatic by the RDKit definition. "The use of fused rings for aromaticity can lead to situations where individual rings are not aromatic, but the fused system is." Azulene is the example shown in the RDKit book. I agree with RDKit's treatment of cyclobutadiene and biphenylene. I seem to recall that ChemAxon sees the central ring in biphenylene as aromatic, and I thought this was Not Good.

@greglandrum
Copy link
Member Author

@shenkin : Yes, definitely.
The envelopes of those ring systems - the ring formed by atoms (1,3,5,6,4,2) - are aromatic, but the bond between atoms 3 and 4 is not aromatic. This is analagous to the fusing bonds in biphenylene.
The lack of aromaticity in the 3-4 bond is what gives rise to the difference in the SMILES.

@shenkin
Copy link

shenkin commented Mar 30, 2016

@greglandrum

As a matter of nomenclature, in the RDBook description of biphenylene, you say "the fusing bonds here are not considered to be aromatic by the RDKit". I would call the bonds connecting the phenyl rings to each other to be "connecting" bonds. I would call the "fusing" bonds the ones within the individual phenyls that are shared with the 4-membered ring. (Organic chemists consider naphthalene and decalin to be "fused" ring systems, fused across one bond.)

The Book example shows that RDKit considers the "connecting" bonds (as I would call them) to be non-aromatic, and this makes sense to me. But I believe RDKit considers the "fusing" bonds to be aromatic. I also agree with this.

I don't think these "fusing" bonds are at all like (what I would call) the "connecting" bonds in biphenylene. The connecting bonds reside within a single ring; the fusion bonds reside within multiple rings.

So, using my definition of "fusing" for the moment, meaning a bond simultaneously included in two (or more) rings, would RDKit consider the fusing bond in azulene to be aromatic? If so, would it not be analogous to the fusing bonds in the structures shown on June 17?

@rapodaca
Copy link

rapodaca commented Feb 26, 2023

I see that this issue has been labeled wontfix, but it hasn't yet been closed. In the event that this comes up again, I'd like to bring something else into the mix.

@greglandrum noted:

Since the individual 4-rings are not aromatic (they have 4 pi electrons each), but the full fused system is (it has 6 pi electrons), the envelope bonds are aromatic but the fusing bond (the one between atoms 3 and 4 in the simple example in the previous comment) is not. Since the bond isn't aromatic but is between two aromatic atoms, it is inserted in the SMILES explicitly: as a - in the first case and = in the second case.

This situation has been explored in the literature and falls under the heading "conjugated circuit" or "conjugated cycle." See my article Conjugated Cycle Selection for a summary.

As noted, the purpose of aromatic atoms and bonds is to eliminate artificial asymmetry. That only crops up in conjugated cycles. Therefore, to make good on the promise of aromatic atoms and bonds in SMILES, it's sufficient to just select those atoms and bonds that lie along at least one conjugated cycle.

In other words, atoms and bonds in conjugated cycles are marked as aromatic. The rest are not. This can be simplified even further by only selecting those atoms that lie along conjugated cycles, provided that perfect matching is used to "de-aromatize."

This simple rule can be narrowed if desired. For example, the cyclobutadiene situation is complex and in many cases dependent on "observability" of VB isomers. To avoid doubt, additional restrictions can be added to match chemical intuition/custom. For example, the atoms and bonds along conjugated cycles are aromatic, provided that the conjugated cycle has (4n+2)/2 (or 2n + 1) double bonds. And so on.

I think this internally consistent model is simpler than what RDKit currently uses while still addressing the core problem: eliminating artificial asymmetry introduced by the limitations of the VB model. But this simpler model is not compatible with the RDKit model, which selects atoms beyond what artificial asymmetry avoidance requires.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants