New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep count of implicit hydrogens instead of inferring them #1576

Merged
merged 63 commits into from Jun 12, 2017

Conversation

Projects
None yet
3 participants
@baoilleach
Member

baoilleach commented May 23, 2017

This pull request makes two changes to the Open Babel API:

  1. The count of implicit hydrogens on an atom is stored explicitly.
  2. Kekulization has been rewritten as a perfect matching algorithm rather than dearomatization. Hat-tip to @johnmay for help with this.

Both of these changes address long-standing major deficiences of the toolkit.

Regarding (1), there have been many bug reports related to implicit hydrogens appearing/disappearing. Furthermore, it is really much simpler to do things this way, and the code will be faster as there will be no continual reperception of hydrogen counts. Previously there were several functions and flags to work-around the existing system such as ForceNoImplH; these all go away now.

Regarding (2), the previous approach was (IMO) solving the wrong problem. It is now much simpler and faster. Previously parts of the kekulization infrastructure were spread around several different files (some in SMILES format, some in OBMol, and some in kekulization.cpp) and had several associated trigger functions (e.g. IsKSingle).

So what's the downsides? Well, these are major changes and there are still some wrinkles:

  1. The existing radical code was built as an adjustment on top of the old implicit valence code. There are no tests, and so some of this may not work now. I'll be meeting Chris Morley at an RSC meeting in June and will bend his ear.
  2. Smiley Format doesn't read in SMILES correctly. I think this should be removed and the error messages and nice functions merged with the existing SMILES reader.
  3. Since kekulization has been tightened up, we should implement the Daylight aromaticity model as a starting point for SMILES writing. Otherwise we will write out SMILES we cannot kekulize.
  4. I implemented an implicit hydrogen adder based on typical valences (e.g. for PDB) with code from @dan2097. This is likely different from the old one (using IMPVAL values), and may need further tweaking.
  5. I didn't know where to put global functions, e.g. OBKekulize and GetTypicalValences. Consider their current locations placeholders.
  6. Now that I think of it, I scattered around a few TODOs. I should check those...

These are the things that I know about. Others may become apparent over time.

baoilleach added some commits Aug 6, 2016

Make sure that implicit valence is handled correctly when calling Add…
…Hydrogens() or using the 3D builder

1. Set the implicit valence on newly added hydrogens
2. Make sure that implicit valence is copied when copying a molecule and that the implicit valence perceived flag is also copied
3. Comment out some existing special cases (that currently are incorrect for the general case)
1. Write square brackets for hypervalent organic subset atoms in SMILES
2. Combine three different "if" statements for using brackets
Change handling of implicit H in CML - always write out the hydrogenC…
…ount attribute and only fall back to guessing the number of hydrogens if it is not specified.
1. Avoid function calls when copying OBAtom
2. Rewrite KBOSum to use GetImplicitHydrogen()
More on MDL Implicit Valence:
1. Move the expansion of aliases to *after* the application of implicit valence (as those atoms already have their hydrogens added)
2. Correctly handle radicals (e.g. M  RAD values) when setting the implicit hydrogens
1. Move implicit hydrogen adder to obfunctions.h
2. Use implicit hydrogen adder in PDB format
3. Note error in test case and update to match current behaviour (filed bug #1548)
Correct the test case: The P is hypervalent (according to SMILES rule…
…s) and we write [P] for such cases (as discussed on the opensmiles list).
@baoilleach

This comment has been minimized.

Show comment
Hide comment
@baoilleach

baoilleach Jun 2, 2017

Member

The MSVC tests work fine for me, so I'm going to mute them on appveyor and then trigger a rebuild.

Member

baoilleach commented Jun 2, 2017

The MSVC tests work fine for me, so I'm going to mute them on appveyor and then trigger a rebuild.

@baoilleach

This comment has been minimized.

Show comment
Hide comment
@baoilleach
Member

baoilleach commented Jun 2, 2017

@ghutchis

This comment has been minimized.

Show comment
Hide comment
@ghutchis

ghutchis Jun 2, 2017

Member

Obviously, this is a great improvement overall and I'm willing to merge soon so we can find the corner cases, etc. (I also greatly appreciate that you've been cleaning out some of the old bugs.)

  1. Radicals - I think for now, I'm willing to merge even if it breaks radicals, but I'd like at least some confirmation from @chrismorl that we can get radicals back before release.
  2. I agree that we can merge Smiley with the existing reader. I believe @timvdm wanted to eventually replace the existing parser with Smiley, but I think a good first step is to integrate the useful bits.
  3. Yes, we should implement the Daylight aromaticity model for SMILES.
  4. Implicit hydrogen addition is always a pain, so the earlier we get this out, the fewer corner cases we'll have before release. (I fully expect after release, we'll get more complaints and bugs.)

In short, let me know if you're ready to merge and I'll be happy to finish the review.

Member

ghutchis commented Jun 2, 2017

Obviously, this is a great improvement overall and I'm willing to merge soon so we can find the corner cases, etc. (I also greatly appreciate that you've been cleaning out some of the old bugs.)

  1. Radicals - I think for now, I'm willing to merge even if it breaks radicals, but I'd like at least some confirmation from @chrismorl that we can get radicals back before release.
  2. I agree that we can merge Smiley with the existing reader. I believe @timvdm wanted to eventually replace the existing parser with Smiley, but I think a good first step is to integrate the useful bits.
  3. Yes, we should implement the Daylight aromaticity model for SMILES.
  4. Implicit hydrogen addition is always a pain, so the earlier we get this out, the fewer corner cases we'll have before release. (I fully expect after release, we'll get more complaints and bugs.)

In short, let me know if you're ready to merge and I'll be happy to finish the review.

@baoilleach

This comment has been minimized.

Show comment
Hide comment
@baoilleach

baoilleach Jun 3, 2017

Member

I agree with all of this, and I'd like to merge as soon as possible to winkle out any remaining issues asap and to draw a line under this part. I'll see @chrismorl on 22 June at a meeting, and I'll look into the issues surrounding radicals in the run up to that (I don't think it's really going to be a problem).

Appveyor is being annoying though. I'll have to skip another test - there's something not right about their system.

Member

baoilleach commented Jun 3, 2017

I agree with all of this, and I'd like to merge as soon as possible to winkle out any remaining issues asap and to draw a line under this part. I'll see @chrismorl on 22 June at a meeting, and I'll look into the issues surrounding radicals in the run up to that (I don't think it's really going to be a problem).

Appveyor is being annoying though. I'll have to skip another test - there's something not right about their system.

@chrismorl

This comment has been minimized.

Show comment
Hide comment
@chrismorl

chrismorl Jun 3, 2017

Contributor
Contributor

chrismorl commented Jun 3, 2017

@baoilleach

This comment has been minimized.

Show comment
Hide comment
@baoilleach

baoilleach Jun 12, 2017

Member

Bump. Ok to merge?

Member

baoilleach commented Jun 12, 2017

Bump. Ok to merge?

@ghutchis ghutchis merged commit 172fc44 into openbabel:master Jun 12, 2017

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@ghutchis

This comment has been minimized.

Show comment
Hide comment
@ghutchis

ghutchis Jun 12, 2017

Member

You should make an announcement on the mailing list.

Member

ghutchis commented Jun 12, 2017

You should make an announcement on the mailing list.

@ghutchis

This comment has been minimized.

Show comment
Hide comment
@ghutchis

ghutchis Jun 12, 2017

Member

(I'm currently on a family holiday - back on Tuesday night or Wednesday morning.)

Member

ghutchis commented Jun 12, 2017

(I'm currently on a family holiday - back on Tuesday night or Wednesday morning.)

@baoilleach baoilleach deleted the baoilleach:workingimph branch Jun 14, 2017

@baoilleach

This comment has been minimized.

Show comment
Hide comment
@baoilleach

baoilleach Jun 17, 2017

Member

For the record, also fixes #1343, #1320, #1207, #1166.

Member

baoilleach commented Jun 17, 2017

For the record, also fixes #1343, #1320, #1207, #1166.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment