Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(PUP-1635) Make daemon.rb signals Ruby 2.x safe #3586

Merged
merged 1 commit into from Feb 25, 2015

Conversation

leepa
Copy link
Contributor

@leepa leepa commented Feb 7, 2015

Ruby 2.x changes the way signals work to mean you can't do certain things from the 'trap context'. This means that SIGUSR1 and the like no longer work.

This change makes it so that we, instead of processing methods in the 'trap context', just store them on a list and pop them out the other side in another job called signal_loop.

This is modelled on the way other Ruby Daemon-like projects have fixed this issue.

@leepa
Copy link
Contributor Author

leepa commented Feb 7, 2015

Okay okay I'll fix the tests ;)

@leepa
Copy link
Contributor Author

leepa commented Feb 7, 2015

In fact - trying to fix the tests shows that they: rarely seem to run properly on my Debian Wheezy machine and when they do I can't seem to fix the issue - I'm not sure how do that in rspec.

@leepa leepa changed the title Make daemon.rb signals Ruby 2.x safe (PUP-1635) Make daemon.rb signals Ruby 2.x safe Feb 7, 2015
@leepa
Copy link
Contributor Author

leepa commented Feb 7, 2015

That solution passes but I don't know how to test that I pop the method and call it in another loop. Looking at the tests that doesn't usually get checked for other bits either. Open to suggestions - but this passes now.

@puppetcla
Copy link

CLA signed by all contributors.

@ScottGarman
Copy link
Contributor

Thanks @leepa for the contribution! At this time though we're getting ready to release Puppet 4.0.0, and until we do there's a hold on doing merges. Please be patient and we'll keep an eye on this PR for review.

@leepa
Copy link
Contributor Author

leepa commented Feb 10, 2015

I don't mean to sound rude @ScottGarman but is that truly a sane idea to do a major release of Puppet that does not work correctly on the most common version of Ruby? That sounds like a bit of a blocker to me.

@kylog
Copy link

kylog commented Feb 10, 2015

@leepa no worries, that's a legit question; more so now because we'll be bundling ruby 2.1.5 in our new packaging that rolls out with puppet 4, so this would be a regression on some platforms (previously users of the system ruby on, say, rhel 5/6 would never hit this).

That said, we are in a merge freeze window for puppet 4.0.0-rc1 (down to mostly some final packaging related changes), which is what @ScottGarman is referring to. Of course, none of that stops us from reviewing this, so that it is ready to go once we open back up for merges. Also just a heads up that we are fully expecting we'll be doing Z releases for 4.0.0 in short order, so it's not like 4.0.0 would be the last shot :)

@leepa
Copy link
Contributor Author

leepa commented Feb 10, 2015

@kylog sure - thanks for the detailed answer. I totally get freezing in terms of features but bugs that, as you say, regress the experience for RHEL 5/6 users (of which there are still a lot) seems wrong. Especially when the 'packaging' will include a version of Ruby where simply doing mco puppet runall 10 will bin all your puppet agents.

Incidentally, this PR works on Ruby 1.9.x as well from my testing.

I would still argue this bug is a blocker to a release with a 2.x version of Ruby bundled. I'm more surprised no one picked this up already in testing 4.x.

@kylog
Copy link

kylog commented Feb 10, 2015

I'm more surprised no one picked this up already in testing 4.x.

It would have turned up in puppet 3.x as well given a 2.x ruby. My best guess is that use of SIGUSR1/etc is not very widespread. E.g. anecdotally I'm aware of some large ruby-2.x installs who haven't mentioned this.

Also we have no tests for this today. On the ticket, I suggested we add an acceptance test (pretty hard to test for this at spec level) so we don't regress in the future.

Ruby 2.x changes the way signals work to mean you can't do certain things from the 'trap context'. This means that SIGUSR1 and the like no longer work.

This change makes it so that we, instead of processing methods in the 'trap context', just store them on a list and pop them out the other side in another job called signal_loop.

This is modelled on the way other Ruby Daemon-like projects have fixed this issue.
@leepa
Copy link
Contributor Author

leepa commented Feb 23, 2015

I have updated this as per @kylog comments to add a constant instead of using a magic number.

kylog pushed a commit that referenced this pull request Feb 25, 2015
(PUP-1635) Make daemon.rb signals Ruby 2.x safe
@kylog kylog merged commit 4ccb00c into puppetlabs:master Feb 25, 2015
@kylog
Copy link

kylog commented Feb 25, 2015

Merged to master in 4ccb00c. Thanks for the contribution!

@leepa leepa deleted the ruby2fix branch February 25, 2015 10:01
@sammcj
Copy link

sammcj commented Mar 2, 2015

Well done guys - I take it this fix will be part of Puppet 3.7.5?

@kylog
Copy link

kylog commented Mar 2, 2015

@sammcj this was merged to master, so will be released in 4.0.0.

@sammcj
Copy link

sammcj commented Mar 2, 2015

So, no one can use puppet 3 non-interactively on any Linux OS that has Ruby 2.x because if so that's going to put a hard dependency on everyone upgrading to Puppet 4 to use say Debian 8, RHEL 7 etc...

Wouldn't it be easier for everyone if since it's just an 11 line fix it was applied as a bugfix to 3.7.6?

Also, isn't the Puppet 4.x client compiled rather than Ruby based?

@frconil
Copy link

frconil commented Mar 2, 2015

It would have turned up in puppet 3.x as well given a 2.x ruby. My best guess is that use of SIGUSR1/etc is not very widespread. E.g. anecdotally I'm aware of some large ruby-2.x installs who haven't mentioned this.

Well we found out because we both use mcollective and an ENC/dashboard.

By default that's how mcollective sends a trigger for the puppet agent to restart, but it does not return an error code. The ENC allowed us to catch that all non-interactive triggers were failing on ruby 2.0

So I would assume it is a widespread problem, just not one that is visible unless you're actually looking for it.

@sammcj
Copy link

sammcj commented Mar 3, 2015

@lak sorry to CC you in here but I want to make sure this isn't missed as it might have a larger fallout for you guys than imagined (or perhaps we're wrong?).

Puppet.notice "Caught #{signal}; calling #{method}"
send(method)
Puppet.notice "Caught #{signal}; storing #{method}"
@signals << method
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't thread safe and could lead to a signal being lost. The signal handler and the consumer threads need to synchronize access to the shared Array. In practice that won't happen with MRI ruby because of its GIL. But it could happen on other ruby implementations, e.g. JRuby.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep we came to the same conclusion in conversation lost in the default github view because of a subsequent push :)

@ahpook
Copy link
Contributor

ahpook commented Mar 4, 2015

@sammcj while i hope you don't take this as encouragement to ping people's boss to get them to do what you want, we have pulled this back to 3.7.x and it will be in the next point release.

@sammcj
Copy link

sammcj commented Mar 4, 2015

@ahpook absolutely not, and thank you so much - I could see this blowing up and having to be retroactively backported if it wasn't done now anyway.

Keep up the great work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants