New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hydrogen erroneously added when converting to InChI #1078
Comments
I(=O)=O also ends up with an extra hydrogen on conversion to InChI Original comment by: dan2097 |
S(C)(C)C exhibits a fault in SMILES reading, the S is interpreted as having 0 rather than 1 hydrogen. Original comment by: dan2097 |
[SeH-] seems to erroneously pick up H+ when converted to InChI Original comment by: dan2097 |
[Se]=P([O-])([O-])[O-] produces InChI=1S/O3PSe/c1-4(2,3)5/q-3 but should produce InChI=1S/H3O3PSe/c1-4(2,3)5/h(H3,1,2,3,5)/p-3 Original comment by: dan2097 |
At least some of these problems don't exist with the current dev version. Can you try these with the dev version? Original comment by: @baoilleach |
BTW, how exactly are you finding these problem cases? If we could monitor our performance on the dataset with time we could ensure we don't cause regressions.... Original comment by: @baoilleach |
I will give it a try on the latest version and report back. I suppose really I should have a bash at fixing the problems rather than giving you work :-P The set I was using was based on the ChEBI database as of about 9 months ago. What I was actually hoping to round trip was not OpenBabel, but a SMILES writer I have just written. The hope being to convert from SMILES to InChI and check that its output is equivalent to the InChI output from my program. I did indeed find a few corner cases my SMILES writer failed which I've now fixed but I haven't gone through all the discrepancies....mainly as most of them were coming from OpenBabel :-P Such a set could be created directly from ChEBI although I have a feeling that for some records the InChI and SMILES don't quite correspond to each other e.g. differences like connected vs disconnected [H+] on acids, which may give false failures. Original comment by: dan2097 |
The problems with atoms having explicitly no hydrogen has been fixed. The [Se]=P([O-])([O-])[O-] , [SeH-] and S(C)(C)C problems remain. Original comment by: dan2097 |
[BiH4] nows gives InChI=1S/Bi.4H This leaves only [Se]=P([O-])([O-])[O-] (see below). Original comment by: @baoilleach |
When doing some round tripping testing in OpenBabel 2.3.0 I came across the following cases where hydrogen was erroneously added to atoms with explicitly no hydrogen in the conversion from SMILES to InChI
Group 1/2 metals and In/Sn/Sb/Tl/Pb/Bi/Po/At.
e.g. [Li] -->InChI=1S/Li.H instead of InChI=1S/Li
[Bi] --> InChI=1S/Bi.3H instead of InChI=1S/Bi
Possibly relatedly [BiH4] gives InChI=1S/Bi.5H instead of InChI=1S/Bi.4H
Reported by: dan2097
The text was updated successfully, but these errors were encountered: