Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hydrogen erroneously added when converting to InChI #1078

Closed
openbabel-bot opened this issue Jan 13, 2011 · 10 comments
Closed

Hydrogen erroneously added when converting to InChI #1078

openbabel-bot opened this issue Jan 13, 2011 · 10 comments

Comments

@openbabel-bot
Copy link
Collaborator

When doing some round tripping testing in OpenBabel 2.3.0 I came across the following cases where hydrogen was erroneously added to atoms with explicitly no hydrogen in the conversion from SMILES to InChI

Group 1/2 metals and In/Sn/Sb/Tl/Pb/Bi/Po/At.

e.g. [Li] -->InChI=1S/Li.H instead of InChI=1S/Li
[Bi] --> InChI=1S/Bi.3H instead of InChI=1S/Bi

Possibly relatedly [BiH4] gives InChI=1S/Bi.5H instead of InChI=1S/Bi.4H

Reported by: dan2097

@openbabel-bot
Copy link
Collaborator Author

I(=O)=O also ends up with an extra hydrogen on conversion to InChI

Original comment by: dan2097

@openbabel-bot
Copy link
Collaborator Author

S(C)(C)C exhibits a fault in SMILES reading, the S is interpreted as having 0 rather than 1 hydrogen.

Original comment by: dan2097

@openbabel-bot
Copy link
Collaborator Author

[SeH-] seems to erroneously pick up H+ when converted to InChI

Original comment by: dan2097

@openbabel-bot
Copy link
Collaborator Author

[Se]=P([O-])([O-])[O-] produces InChI=1S/O3PSe/c1-4(2,3)5/q-3 but should produce InChI=1S/H3O3PSe/c1-4(2,3)5/h(H3,1,2,3,5)/p-3
Not too sure what's gone wrong in that case. There were a few other similar cases.

Original comment by: dan2097

@openbabel-bot
Copy link
Collaborator Author

At least some of these problems don't exist with the current dev version. Can you try these with the dev version?

Original comment by: @baoilleach

@openbabel-bot
Copy link
Collaborator Author

BTW, how exactly are you finding these problem cases? If we could monitor our performance on the dataset with time we could ensure we don't cause regressions....

Original comment by: @baoilleach

@openbabel-bot
Copy link
Collaborator Author

I will give it a try on the latest version and report back. I suppose really I should have a bash at fixing the problems rather than giving you work :-P

The set I was using was based on the ChEBI database as of about 9 months ago. What I was actually hoping to round trip was not OpenBabel, but a SMILES writer I have just written. The hope being to convert from SMILES to InChI and check that its output is equivalent to the InChI output from my program. I did indeed find a few corner cases my SMILES writer failed which I've now fixed but I haven't gone through all the discrepancies....mainly as most of them were coming from OpenBabel :-P

Such a set could be created directly from ChEBI although I have a feeling that for some records the InChI and SMILES don't quite correspond to each other e.g. differences like connected vs disconnected [H+] on acids, which may give false failures.

Original comment by: dan2097

@openbabel-bot
Copy link
Collaborator Author

The problems with atoms having explicitly no hydrogen has been fixed.
The following issues remain:
[BiH4] is interpreted as [BiH5]. The reason is on line 344 of inchiformat.cpp InChI is being allowed to implicitly add hydrogens.

The [Se]=P([O-])([O-])[O-] , [SeH-] and S(C)(C)C problems remain.

Original comment by: dan2097

@openbabel-bot
Copy link
Collaborator Author

[BiH4] nows gives InChI=1S/Bi.4H
[SeH-] can be roundtripped through InChI

This leaves only [Se]=P([O-])([O-])[O-] (see below).
The S(C)(C)C I will file as a separate bug as it has nothing to do with all of this.

Original comment by: @baoilleach

@baoilleach
Copy link
Member

All now fixed by #1576 @dan2097

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants