New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(PUP-1635) Make daemon.rb signals Ruby 2.x safe #3586
Conversation
Okay okay I'll fix the tests ;) |
In fact - trying to fix the tests shows that they: rarely seem to run properly on my Debian Wheezy machine and when they do I can't seem to fix the issue - I'm not sure how do that in rspec. |
That solution passes but I don't know how to test that I pop the method and call it in another loop. Looking at the tests that doesn't usually get checked for other bits either. Open to suggestions - but this passes now. |
CLA signed by all contributors. |
Thanks @leepa for the contribution! At this time though we're getting ready to release Puppet 4.0.0, and until we do there's a hold on doing merges. Please be patient and we'll keep an eye on this PR for review. |
I don't mean to sound rude @ScottGarman but is that truly a sane idea to do a major release of Puppet that does not work correctly on the most common version of Ruby? That sounds like a bit of a blocker to me. |
@leepa no worries, that's a legit question; more so now because we'll be bundling ruby 2.1.5 in our new packaging that rolls out with puppet 4, so this would be a regression on some platforms (previously users of the system ruby on, say, rhel 5/6 would never hit this). That said, we are in a merge freeze window for puppet 4.0.0-rc1 (down to mostly some final packaging related changes), which is what @ScottGarman is referring to. Of course, none of that stops us from reviewing this, so that it is ready to go once we open back up for merges. Also just a heads up that we are fully expecting we'll be doing Z releases for 4.0.0 in short order, so it's not like 4.0.0 would be the last shot :) |
@kylog sure - thanks for the detailed answer. I totally get freezing in terms of features but bugs that, as you say, regress the experience for RHEL 5/6 users (of which there are still a lot) seems wrong. Especially when the 'packaging' will include a version of Ruby where simply doing Incidentally, this PR works on Ruby 1.9.x as well from my testing. I would still argue this bug is a blocker to a release with a 2.x version of Ruby bundled. I'm more surprised no one picked this up already in testing 4.x. |
It would have turned up in puppet 3.x as well given a 2.x ruby. My best guess is that use of SIGUSR1/etc is not very widespread. E.g. anecdotally I'm aware of some large ruby-2.x installs who haven't mentioned this. Also we have no tests for this today. On the ticket, I suggested we add an acceptance test (pretty hard to test for this at spec level) so we don't regress in the future. |
Ruby 2.x changes the way signals work to mean you can't do certain things from the 'trap context'. This means that SIGUSR1 and the like no longer work. This change makes it so that we, instead of processing methods in the 'trap context', just store them on a list and pop them out the other side in another job called signal_loop. This is modelled on the way other Ruby Daemon-like projects have fixed this issue.
I have updated this as per @kylog comments to add a constant instead of using a magic number. |
(PUP-1635) Make daemon.rb signals Ruby 2.x safe
Merged to master in 4ccb00c. Thanks for the contribution! |
Well done guys - I take it this fix will be part of Puppet 3.7.5? |
@sammcj this was merged to master, so will be released in 4.0.0. |
So, no one can use puppet 3 non-interactively on any Linux OS that has Ruby 2.x because if so that's going to put a hard dependency on everyone upgrading to Puppet 4 to use say Debian 8, RHEL 7 etc... Wouldn't it be easier for everyone if since it's just an 11 line fix it was applied as a bugfix to 3.7.6? Also, isn't the Puppet 4.x client compiled rather than Ruby based? |
Well we found out because we both use mcollective and an ENC/dashboard. By default that's how mcollective sends a trigger for the puppet agent to restart, but it does not return an error code. The ENC allowed us to catch that all non-interactive triggers were failing on ruby 2.0 So I would assume it is a widespread problem, just not one that is visible unless you're actually looking for it. |
@lak sorry to CC you in here but I want to make sure this isn't missed as it might have a larger fallout for you guys than imagined (or perhaps we're wrong?). |
Puppet.notice "Caught #{signal}; calling #{method}" | ||
send(method) | ||
Puppet.notice "Caught #{signal}; storing #{method}" | ||
@signals << method |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't thread safe and could lead to a signal being lost. The signal handler and the consumer threads need to synchronize access to the shared Array. In practice that won't happen with MRI ruby because of its GIL. But it could happen on other ruby implementations, e.g. JRuby.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep we came to the same conclusion in conversation lost in the default github view because of a subsequent push :)
@sammcj while i hope you don't take this as encouragement to ping people's boss to get them to do what you want, we have pulled this back to 3.7.x and it will be in the next point release. |
@ahpook absolutely not, and thank you so much - I could see this blowing up and having to be retroactively backported if it wasn't done now anyway. Keep up the great work! |
Ruby 2.x changes the way signals work to mean you can't do certain things from the 'trap context'. This means that SIGUSR1 and the like no longer work.
This change makes it so that we, instead of processing methods in the 'trap context', just store them on a list and pop them out the other side in another job called signal_loop.
This is modelled on the way other Ruby Daemon-like projects have fixed this issue.