Graceful shutdown on TERM signal #84

wants to merge 2 commits into


None yet

4 participants


This PR adds a graceful shutdown to mcollectived.

When receiving the TERM signal the server stops processing new messages and waits a configurable amount of time for any running agent threads to finish processing.

My use case for a graceful shutdown for updating the mcollective installation from within an agent action.

@ripienaar The code can definitely be improved. I didn't really understood how you manage the log_code PLMC symbols. But what do you think of this feature in general? Any chance considering it?


yeah this is interesting we'd need something similar but we need to be careful with this kind of thing now due to Ruby 2, signal handlers cannot block in any way - so we cant log or a number of other things. I'll need to figure out exactly what the resitrctions are and see how we do this.

We need something like this for windows too so its something we'd do, see - I commented on that ticket but we'd need to do some work before we can consider merging this

The PL messages are managed via - still need to properly figure out the process for contributors its something we're working on


I didn't know about the ruby 2.0 changes regarding what you can do inside a trap context. I couldn't find a definite documentation for this, only this blog post and this bug report.

At least for handling the TERM signal on UNIX raising an Exception without any logging and handling the Exception in the loop of MCollective::Runner#run and logging a line there also works on Ruby 2.0.

Is this something worth pursuing or do you prefer rewriting the run method to not block until a message is received?
Form looking at the Stomp::Connection there is a poll method which could be used instead of receive in the loop.

Any plans to work on this issue in the near future or is this on the back burner for now?


Without large scale reworking I dont think we can make the main loop be anything but blocking so that'll be last resort.

It's on my horizon cos we need it for other things but right now we have a fair bit of higher priority work, so I wont have time to look at this PR and the related changes it brings in for a while - but I added a link to it on and will come back to this soonish

Sorry I don't have much better to offer, bit pressed for man power who can handle this kind of change

Waiting for CLA signature by @databus23

@databus23 - We require a Contributor License Agreement (CLA) for people who contribute to Puppet, but we have an easy click-through license with instructions, which is available at

Note: if your contribution is trivial and you think it may be exempt from the CLA, please post a short reply to this comment with details.

CLA signed by all contributors.


Sorry we haven't got back to you about this. I'm going to close this pr in the mean time but we are working on a solution internally.

@ploubser ploubser closed this Sep 25, 2013

np. can you give a rough eta (for master) maybe?


Sadly no eta at the moment. :(

@ploubser ploubser reopened this Oct 29, 2013

Reopening since the windows fix was trivial and I'd like to get this into master.


This has been resolved in MCO-221 and will ship with the next MCollective release.

@ploubser ploubser closed this Apr 10, 2014

Very cool!. One question: I can't directly see why this shouldn't work on windows as well. Why was it made a unix only feature?


In the case of an agent that takes long to complete or timeout the service can go into a broken state on Windows during shut down. In the long term I'm not sure if the correct action is to allow it on Windows and let users deal with it going into a broken state, or to just disallow it on Windows. For now I'm going to be overly defensive and make it Unix only, but we can re-evaluate in the near future.


Ok, thats why I hat a timeout for the graceful shutdown to complete in this initial PR. I believe it is a good idea in general to have the shutdown complete in a timely fashion. Otherwise a hanging agent could block the shutdown on any platform.
Would you maybe considering this as an (optional) setting.
I would really like to have the graceful shutdown capability on windows available as well.


The hanging agent action should be killed by its timeout, but I hear what you're saying. I'm completely open to it being an optional config option. I've opened where we can discuss it further and track the work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment