Enables disabling of old workers while allowing them to finish currently running tasks #1137

daveFNbuck · 2015-08-14T23:17:31Z

This PR is split into two commits. The first one adds a signal handler for SIGUSR1 to workers, which prevents them from calling get_work. This allows old workers to be replaced without killing currently running tasks. By doing this, we prevent old code from being run going forward without losing progress on long-running tasks.

The second commit adds a --take-lock argument that causes a job to send SIGUSR1 to jobs that would otherwise occupy its lock. I've been using this successfully with an assistant that is automatically run with code deploys to ensure that the latest code is always being run without continuous deployment constantly killing all of my jobs.

Tarrasch · 2015-08-17T12:12:08Z

LGTM, can you just fix the small comment on variable name?

When updating worker code, I often want to kill the currently running workers so that they won't run the old code anymore. However, I'm usually ok with them finishing what they're currently working on, as that will often include long running jobs that aren't related to the changes I'm deploying. Killing these jobs throws away a lot of work. Instead, we can send SIGUSR1 to the worker and it will stop requesting new work and die when it's done running its current jobs.

I only want one assistant running at a time, and I want it running the latest code. In order to simplify deployment of new assistants, this adds a --take-lock command-line argument which will have the worker send SIGUSR1 to all other workers being run with the same command. This prevents other assistants run on the same machine with the same command from getting new jobs after this one starts. I've been using this to launch a new assistant alongside code deploys in order to ensure that old code stops getting run.

daveFNbuck · 2015-08-17T17:59:26Z

Fixed the variable name.

Enables disabling of old workers while allowing them to finish currently running tasks

daveFNbuck added 2 commits August 17, 2015 10:56

daveFNbuck force-pushed the take_lock branch from d8e8c65 to 4d3d0e9 Compare August 17, 2015 17:59

Tarrasch added a commit that referenced this pull request Aug 17, 2015

Merge pull request #1137 from Houzz/take_lock

ba7cf68

Enables disabling of old workers while allowing them to finish currently running tasks

Tarrasch merged commit ba7cf68 into spotify:master Aug 17, 2015

daveFNbuck deleted the take_lock branch August 17, 2015 21:14

This was referenced Oct 29, 2015

PR #1137 yields SIGUSR1 AttributeError on Windows #1360

Closed

Replace all instances of SIGUSR1 with SIGTERM; addresses #1360 #1361

Closed

Tarrasch mentioned this pull request Oct 29, 2015

Handles missing signal.SIGUSR1 when registering a handler in worker #1363

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enables disabling of old workers while allowing them to finish currently running tasks #1137

Enables disabling of old workers while allowing them to finish currently running tasks #1137

daveFNbuck commented Aug 14, 2015

Tarrasch commented Aug 17, 2015

daveFNbuck commented Aug 17, 2015

Enables disabling of old workers while allowing them to finish currently running tasks #1137

Enables disabling of old workers while allowing them to finish currently running tasks #1137

Conversation

daveFNbuck commented Aug 14, 2015

Tarrasch commented Aug 17, 2015

daveFNbuck commented Aug 17, 2015