Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIGTERM causes jobs to get stuck in WIP queue #507

Closed
cosminstefanxp opened this issue Mar 17, 2015 · 3 comments
Closed

SIGTERM causes jobs to get stuck in WIP queue #507

cosminstefanxp opened this issue Mar 17, 2015 · 3 comments

Comments

@cosminstefanxp
Copy link
Contributor

First of all, I want to say thanks for all of the great work on this awesome tool.

I'm working on a Heroku hosted app and I've encountered an issue causing jobs to get stuck in the wip queue. It happens when some long taking tasks (e.g. emulated by a sleep(20)) were being processed when the workers get SIGTERM (e.g. when changing the no. of dynos or re-deploying).
After that point, the tasks seem to be left in the WIP queue (StartedJobsRegistry) forever, while I would have expected them to eventually be put in the Failed queue or somewhere else. I'm not sure that's the expected behaviour.

In the above scenario, an exception seems to be occuring, so that might cause things to end a bit too fast (copying it here in case it's relevant):

Stopping all processes with SIGTERM
Mar 17 13:55:49 xxx: 12:55:48 system | SIGTERM received
Mar 17 13:55:49 xxx: 12:55:48 system | sending SIGTERM to rq_worker.2 (pid 7)
Mar 17 13:55:49 xxx: 12:55:48 system | sending SIGTERM to rq_worker.1 (pid 8)
Mar 17 13:55:49 xxx: Traceback (most recent call last):
Mar 17 13:55:49 xxx:   File "/app/.heroku/python/bin/honcho", line 11, in <module>
Mar 17 13:55:49 xxx:     sys.exit(main())
Mar 17 13:55:49 xxx:   File "/app/.heroku/python/lib/python2.7/site-packages/honcho/command.py", line 266, in main
Mar 17 13:55:49 xxx:     COMMANDS[args.command](args)
Mar 17 13:55:49 xxx:   File "/app/.heroku/python/lib/python2.7/site-packages/honcho/command.py", line 213, in command_start
Mar 17 13:55:49 xxx:     manager.loop()
Mar 17 13:55:49 xxx:   File "/app/.heroku/python/lib/python2.7/site-packages/honcho/manager.py", line 100, in loop
Mar 17 13:55:49 xxx:     msg = self.events.get(timeout=0.1)
Mar 17 13:55:49 xxx:   File "/app/.heroku/python/lib/python2.7/multiprocessing/queues.py", line 131, in get
Mar 17 13:55:49 xxx:     if timeout < 0 or not self._poll(timeout):
Mar 17 13:55:49 xxx: IOError: [Errno 4] Interrupted system call
Mar 17 13:55:49 xxx:     manager.loop()
Mar 17 13:55:49 xxx:   File "/app/.heroku/python/lib/python2.7/site-packages/honcho/manager.py", line 100, in loop
Mar 17 13:55:49 xxx:     msg = self.events.get(timeout=0.1)
Mar 17 13:55:49 xxx:   File "/app/.heroku/python/lib/python2.7/multiprocessing/queues.py", line 131, in get
Mar 17 13:55:49 xxx:     if timeout < 0 or not self._poll(timeout):
Mar 17 13:55:49 xxx: IOError: [Errno 4] Interrupted system call 

Thanks

@selwin
Copy link
Collaborator

selwin commented Mar 19, 2015

@jtushman made a pull request implementing rq suspend and rq resume commands that allow you to wait for all tasks to finish before resuming execution again so all workers can quite gracefully. I think those commands aren't yet documented :(

Jobs in StartedJobRegistry will only be moved to FailedQueue when StartedJobRegistry.cleanup() is called. The documentation in this area is also sorely lacking. I also think we should provide an easier way to trigger these periodic maintenance tasks.

@cosminstefanxp
Copy link
Contributor Author

For now, I've handled this by running a periodic job (via rq-scheduler), that runs the cleanup, but it'd be great to have this handled automagically somehow.

@selwin
Copy link
Collaborator

selwin commented Jun 18, 2015

Queues are now periodically cleaned by Worker. See #534

@selwin selwin closed this as completed Jun 18, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants