Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enables disabling of old workers while allowing them to finish currently running tasks #1137

Merged
merged 2 commits into from
Aug 17, 2015

Conversation

daveFNbuck
Copy link
Contributor

This PR is split into two commits. The first one adds a signal handler for SIGUSR1 to workers, which prevents them from calling get_work. This allows old workers to be replaced without killing currently running tasks. By doing this, we prevent old code from being run going forward without losing progress on long-running tasks.

The second commit adds a --take-lock argument that causes a job to send SIGUSR1 to jobs that would otherwise occupy its lock. I've been using this successfully with an assistant that is automatically run with code deploys to ensure that the latest code is always being run without continuous deployment constantly killing all of my jobs.

@Tarrasch
Copy link
Contributor

LGTM, can you just fix the small comment on variable name?

When updating worker code, I often want to kill the currently running workers so
that they won't run the old code anymore. However, I'm usually ok with them
finishing what they're currently working on, as that will often include long
running jobs that aren't related to the changes I'm deploying. Killing these
jobs throws away a lot of work. Instead, we can send SIGUSR1 to the worker and
it will stop requesting new work and die when it's done running its current
jobs.
I only want one assistant running at a time, and I want it running the latest
code. In order to simplify deployment of new assistants, this adds a --take-lock
command-line argument which will have the worker send SIGUSR1 to all other
workers being run with the same command. This prevents other assistants run on
the same machine with the same command from getting new jobs after this one
starts. I've been using this to launch a new assistant alongside code deploys in
order to ensure that old code stops getting run.
@daveFNbuck
Copy link
Contributor Author

Fixed the variable name.

Tarrasch added a commit that referenced this pull request Aug 17, 2015
Enables disabling of old workers while allowing them to finish currently running tasks
@Tarrasch Tarrasch merged commit ba7cf68 into spotify:master Aug 17, 2015
@daveFNbuck daveFNbuck deleted the take_lock branch August 17, 2015 21:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants