New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for %(NNN) notation for SMILES ring closures #1677

Merged
merged 1 commit into from Oct 30, 2017

Conversation

Projects
None yet
4 participants
@baoilleach
Member

baoilleach commented Oct 26, 2017

The % notation only supports up to 99 ring closures. I have an example of a SMILES string from RDKit that goes beyond this (included in the tests).

An extension to this is the %(...) notation which could in theory support any arbitrary number. I've implemented this with support for up to 5 digits. I'm also in the process of proposing this change to RDKit.

I can't see any downside to supporting this notation on reading. On writing, the only alternative is to refuse to write the SMILES.

I've also tightened up the parsing of the normal % notation. The existing code accepted non-digits.

(Apologies for the small number of unrelated white-space changes - my editor seems to be doing this without asking.)

@merkys

This comment has been minimized.

Show comment
Hide comment
@merkys

merkys Oct 26, 2017

Contributor

Is %(NNN) notation standard or is proposed for addition? As per OpenSMILES, it is possible to reuse already used bond IDs, what avoids the problem, unless one needs to have >99 open rings at time.

Contributor

merkys commented Oct 26, 2017

Is %(NNN) notation standard or is proposed for addition? As per OpenSMILES, it is possible to reuse already used bond IDs, what avoids the problem, unless one needs to have >99 open rings at time.

@baoilleach

This comment has been minimized.

Show comment
Hide comment
@baoilleach

baoilleach Oct 26, 2017

Member

It's not standard, it's an extension. The example testcase I provide is a real-life SMILES string that contains more than 99 open rings. We don't have control over what SMILES strings other tools generate. (Rereading this, I sound grumpy - I'm not :-) - by the way, just checked your papers - would be good to get some of your xtallography stuff into Open Babel)

Member

baoilleach commented Oct 26, 2017

It's not standard, it's an extension. The example testcase I provide is a real-life SMILES string that contains more than 99 open rings. We don't have control over what SMILES strings other tools generate. (Rereading this, I sound grumpy - I'm not :-) - by the way, just checked your papers - would be good to get some of your xtallography stuff into Open Babel)

@baoilleach baoilleach added the feature label Oct 26, 2017

@merkys

This comment has been minimized.

Show comment
Hide comment
@merkys

merkys Oct 26, 2017

Contributor

Regarding SMILES – I agree. Many software packages have their own micro-forks of SMILES. Hopefully they will not contradict.

For sure I would be more than glad to get our code together :)

Contributor

merkys commented Oct 26, 2017

Regarding SMILES – I agree. Many software packages have their own micro-forks of SMILES. Hopefully they will not contradict.

For sure I would be more than glad to get our code together :)

@baoilleach

This comment has been minimized.

Show comment
Hide comment
@baoilleach

baoilleach Oct 26, 2017

Member

And just to be clear, RDKit doesn't (yet) have this extension. However, it does generate SMILES that require it. Both OB and RDKit will write %100 at the moment in this situation (if it ever occurs), which is incorrect.

Member

baoilleach commented Oct 26, 2017

And just to be clear, RDKit doesn't (yet) have this extension. However, it does generate SMILES that require it. Both OB and RDKit will write %100 at the moment in this situation (if it ever occurs), which is incorrect.

@merkys

This comment has been minimized.

Show comment
Hide comment
@merkys

merkys Oct 26, 2017

Contributor

Indeed, %(XYZ) is better than just %XYZ. The former would be recognized as syntax error by strict OpenSMILES parser whilst the latter would be broken up to closures of two rings, XY and Z.

Contributor

merkys commented Oct 26, 2017

Indeed, %(XYZ) is better than just %XYZ. The former would be recognized as syntax error by strict OpenSMILES parser whilst the latter would be broken up to closures of two rings, XY and Z.

@bbucior

This comment has been minimized.

Show comment
Hide comment
@bbucior

bbucior Oct 27, 2017

Contributor

Tagging #1529

Contributor

bbucior commented Oct 27, 2017

Tagging #1529

@baoilleach

This comment has been minimized.

Show comment
Hide comment
@baoilleach
Member

baoilleach commented Oct 27, 2017

See also rdkit/rdkit#1624

@ghutchis

This comment has been minimized.

Show comment
Hide comment
@ghutchis

ghutchis Oct 30, 2017

Member

I'm in favor. It seems like the OpenSMILES list has adopted the enhancement, and RDKit has also merged.

Member

ghutchis commented Oct 30, 2017

I'm in favor. It seems like the OpenSMILES list has adopted the enhancement, and RDKit has also merged.

@ghutchis ghutchis merged commit 7981240 into openbabel:master Oct 30, 2017

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@baoilleach baoilleach deleted the baoilleach:smiringclosure branch Nov 4, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment