New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for %(NNN) notation for ring closures #1624

Merged
merged 5 commits into from Oct 27, 2017

Conversation

Projects
None yet
3 participants
@baoilleach
Contributor

baoilleach commented Oct 26, 2017

Reference Issue

Fixes #1623

What does this implement/fix? Explain your changes.

This implements an extension to the OpenSMILES standard to handle ring closure numbers greater than 99, and up to five digits. The extension supports %(N) to %(NNNNN), where N is a digit.

Any other comments?

Test cases are included.

I didn't force the requirement of a non-zero leading digit, but that may be something you wish to consider.

@baoilleach

This comment has been minimized.

Show comment
Hide comment
@baoilleach

baoilleach Oct 26, 2017

Contributor

I see that the Travis build failed due to my new test. I think this is because the changes I've made to the SMILES grammar are not being propagated. It's not obvious to me what I need to do to get this to work.

Contributor

baoilleach commented Oct 26, 2017

I see that the Travis build failed due to my new test. I think this is because the changes I've made to the SMILES grammar are not being propagated. It's not obvious to me what I need to do to get this to work.

@baoilleach

This comment has been minimized.

Show comment
Hide comment
@baoilleach

baoilleach Oct 26, 2017

Contributor

Ok - now I see. There's a smiles.tab.hpp and smiles.tab.cpp that need to be copied to *.cmake. Let me know if I should go ahead and do this....

Contributor

baoilleach commented Oct 26, 2017

Ok - now I see. There's a smiles.tab.hpp and smiles.tab.cpp that need to be copied to *.cmake. Let me know if I should go ahead and do this....

@bp-kelley

This comment has been minimized.

Show comment
Hide comment
@bp-kelley

bp-kelley Oct 27, 2017

Contributor

We don't expect that most users have the bison parsers available when building, so we check them into the source tree. The easiest way to do this is to build RDKit In Tree and then check in the sources (the smiles and smarts tab.cpp)

It would be something like:

cmake \
    -D RDK_INSTALL_INTREE=ON \
    -D RDK_USE_FLEXBISON=ON ...

btw - your patch appears to be timely as we've had a request for better nucleic acid registration.

Contributor

bp-kelley commented Oct 27, 2017

We don't expect that most users have the bison parsers available when building, so we check them into the source tree. The easiest way to do this is to build RDKit In Tree and then check in the sources (the smiles and smarts tab.cpp)

It would be something like:

cmake \
    -D RDK_INSTALL_INTREE=ON \
    -D RDK_USE_FLEXBISON=ON ...

btw - your patch appears to be timely as we've had a request for better nucleic acid registration.

@greglandrum

This comment has been minimized.

Show comment
Hide comment
@greglandrum

greglandrum Oct 27, 2017

Member

@baoilleach : I will do a PR against your branch with the required files RSN

Member

greglandrum commented Oct 27, 2017

@baoilleach : I will do a PR against your branch with the required files RSN

greglandrum and others added some commits Oct 27, 2017

Merge pull request #1 from greglandrum/ExtensionToRingClosure
some changes from the pre-review

@greglandrum greglandrum added this to the 2018_03_1 milestone Oct 27, 2017

@greglandrum greglandrum merged commit cf537a1 into rdkit:master Oct 27, 2017

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@greglandrum

This comment has been minimized.

Show comment
Hide comment
@greglandrum

greglandrum Oct 27, 2017

Member

I tagged this as an enhancement, but one could argue that it's a bug fix.
@bp-kelley, any thoughts on that?

Member

greglandrum commented Oct 27, 2017

I tagged this as an enhancement, but one could argue that it's a bug fix.
@bp-kelley, any thoughts on that?

@bp-kelley

This comment has been minimized.

Show comment
Hide comment
@bp-kelley

bp-kelley Oct 28, 2017

Contributor
Contributor

bp-kelley commented Oct 28, 2017

@greglandrum greglandrum added the bug label Oct 28, 2017

@greglandrum greglandrum modified the milestones: 2018_03_1, 2017_09_2 Oct 28, 2017

@greglandrum

This comment has been minimized.

Show comment
Hide comment
@greglandrum

greglandrum Oct 28, 2017

Member

Since you seem to agree, I'm going to add "bug fix" and move it to the 2017_09_2 milestone.

Member

greglandrum commented Oct 28, 2017

Since you seem to agree, I'm going to add "bug fix" and move it to the 2017_09_2 milestone.

@greglandrum

This comment has been minimized.

Show comment
Hide comment
@greglandrum

greglandrum Oct 28, 2017

Member

@baoilleach : thanks for the contribution!

Member

greglandrum commented Oct 28, 2017

@baoilleach : thanks for the contribution!

@baoilleach

This comment has been minimized.

Show comment
Hide comment
@baoilleach

baoilleach Nov 2, 2017

Contributor

@johnmay points out that there is a second line in SmilesParseOps that probably needs the 100 changed to 100000, something to do with ring closures joining different dot disconnected components:
https://github.com/rdkit/rdkit/blob/master/Code/GraphMol/SmilesParse/SmilesParseOps.cpp#L187

However, I'm finding it hard to trigger the failure that this code should be handling, and I don't like to make a change that is untested. For example, "c1cc%(10000).c1cc%(10000)" is correctly converted to benzene right now. Is this code redundant, due to changes in the SMILES handling, or is there something else going on here?

Just by-the-by, when stepping through the debugger, I noticed an extra atom bookmark with a large negative value is created during parsing - is this a bug or is this expected? It wasn't obvious where it was coming from but I can track it down perhaps if you are interested.

Contributor

baoilleach commented Nov 2, 2017

@johnmay points out that there is a second line in SmilesParseOps that probably needs the 100 changed to 100000, something to do with ring closures joining different dot disconnected components:
https://github.com/rdkit/rdkit/blob/master/Code/GraphMol/SmilesParse/SmilesParseOps.cpp#L187

However, I'm finding it hard to trigger the failure that this code should be handling, and I don't like to make a change that is untested. For example, "c1cc%(10000).c1cc%(10000)" is correctly converted to benzene right now. Is this code redundant, due to changes in the SMILES handling, or is there something else going on here?

Just by-the-by, when stepping through the debugger, I noticed an extra atom bookmark with a large negative value is created during parsing - is this a bug or is this expected? It wasn't obvious where it was coming from but I can track it down perhaps if you are interested.

@bp-kelley

This comment has been minimized.

Show comment
Hide comment
@bp-kelley

bp-kelley Nov 2, 2017

Contributor
Contributor

bp-kelley commented Nov 2, 2017

greglandrum added a commit that referenced this pull request Nov 15, 2017

Add support for %(NNN) notation for ring closures (#1624)
* Add support for %(....) notation for ring closures

* add the .cmake files

* add the extended numbering to SMARTS as well

* and the .cmake file

greglandrum added a commit that referenced this pull request Dec 2, 2017

Add support for %(NNN) notation for ring closures (#1624)
* Add support for %(....) notation for ring closures

* add the .cmake files

* add the extended numbering to SMARTS as well

* and the .cmake file

@baoilleach baoilleach deleted the nextmovesoftware:ExtensionToRingClosure branch Jan 11, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment