Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support / require reading unspecified SMARTS bonds as single-or-aromatic on query atoms #3

Open
tylerperyea opened this issue May 25, 2021 · 0 comments

Comments

@tylerperyea
Copy link
Collaborator

tylerperyea commented May 25, 2021

From Daylight's SMARTS page:
https://www.daylight.com/dayhtml/doc/theory/theory.smarts.html

4.2 Bond Primitives

Various bond symbols are available to match connections between atoms. A missing bond symbol is interpreted as "single or aromatic".

In practice, most tools don't really honor this daylight convention, per se. And that's mostly okay. By a strict reading of the Daylight resource a SMARTS query of c1ccccc1 (benzene) would actually be interpreted as having single-or-aromatic bonds between each atom with each atom itself having at least one aromatic bond somewhere. This is impractical to specify in a molfile.

Typically when a tool produces a SMARTS/SMILES pattern with aromatic atoms (e.g. c) but non-specific bonds between those aromatic atoms (e.g. cc), the common interpretation is that the unspecified bond is aromatic (e.g. c:c). Similarly, when a tool produces a SMARTS/SMILES pattern with aliphatic atoms (e.g. C), but a non-specific bond (e.g. CC), the common interpretation is an implied single bond (e.g. C-C). These conventions are widely used even if they present some problems.

The compromise solution requires a modification to Daylight's statement:

A missing bond symbol BETWEEN ATOMS WHERE AT LEAST ONE ATOM HAS A QUERY FEATURE is interpreted as "single or aromatic".

That is, it's fine to have explicit non-query atoms imply the bonds between them. But if at least one atom is a query atom, AND the SMARTS pattern does not specify a bond type, it should get interpreted as single-or-aromatic. For example:

Ambiguous SMARTS Equivalent to
cc c:c
CC C-C
C[#6] C-,:[#6]
C[*] C-,:[*]
[#6,#7][#6] [#6,#7]-,:[#6]

Here a "query atom" is any atom specified as an atom list (including a list of 1 element) or an atom wildcard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant