Allow dispatcher to shutdown after a number of jobs (executions) performed#15
Allow dispatcher to shutdown after a number of jobs (executions) performed#15rosa wants to merge 3 commits into
Conversation
Besides alerting based on finished_at, queue and scheduled_at, we want to see all jobs that are finished across all queues and jobs in a queue in all possible statuses.
…ormed This was suggested by Donal (thanks!) as a form of memory management, but also because with Sidekiq workers checked for orphan claimed jobs when they started, and if the workers that held them had died not that long ago so that their heartbeat hand't expired, the jobs wouldn't be released. In Solid Queue this doesn't happen because the supervisor checks for claimed jobs to release periodically, but we might need this for memory management purposes or for other reasons. It's off by default.
|
|
||
| def wait | ||
| @thread&.join | ||
| Thread.new { @thread&.join }.tap(&:join) |
There was a problem hiding this comment.
We need this because we might end up calling wait from within @thread, namely when calling stop because we're done with the executions set by the limit. We can't join the current thread from within itself.
This makes me think that this current code organisation, where code in Dispatcher runs in a thread, but you don't know this unless you look at Runner might be a bit risky... 🤔
There was a problem hiding this comment.
Could we send ourselves a signal to initiate shutdown? That would be handled in the main thread.
There was a problem hiding this comment.
Also, while running (🏃🏻♀️, not running the process 😆) I was thinking about how this is clearly wrong. How aren't these not deadlocking? 🤔 Or at least, I think the newer thread here is not getting a chance to be joined because the main thread is getting joined before 🤔
There was a problem hiding this comment.
Could we send ourselves a signal to initiate shutdown? That would be handled in the main thread.
I had thought briefly about that but thought it was not possible 🤔 The problem is that the main thread here is the supervisor, which handles all the different dispatchers, not just the one that needs to stop 🤔 I might be missing something, though.
|
I'd not thought about the complexity of this feature with multiple threads! It's pretty easy to do when each worker is a process. You exit the process after job X and whatever is supervising the processes starts up a replacement. But with multiple threads it is much trickier! This does make we wonder if this is the right time to introduce this. It's a nice-to-have feature but maybe it should wait until we have the thread/process setup locked down? There are other ways to do this as well if it is just too complex (e.g. monitor memory usage and send a shutdown signal if it gets too high, shutdown each worker after X minutes). What do you think? |
| end | ||
|
|
||
| def executions_per_run_limited? | ||
| SolidQueue.execution_limit_per_dispatch_run >= 0 |
There was a problem hiding this comment.
We should probably treat 0 as no limit
There was a problem hiding this comment.
You're right; that makes more sense 😅 I was following GoodJob's lead here, which uses -1 for all sorts of configurable limits (it doesn't have this particular limit, though), but I think 0 is better.
|
|
||
| def wait | ||
| @thread&.join | ||
| Thread.new { @thread&.join }.tap(&:join) |
There was a problem hiding this comment.
Could we send ourselves a signal to initiate shutdown? That would be handled in the main thread.
Yes, you're right! In fact, this made me go back to the process setup I had been playing with a couple of days ago, I think it makes more sense, it's easier to reason about and to manage! I'm going to park this one and get back to moving from threads (keeping the thread pool for running jobs, that's it) to processes, and then will take this one back. Thank you! |
|
Going to close this one as the code has changed dramatically since I started this. It's something I'll try in the future, but from scratch. |
Add console helpers to connect to different job instances from Dash console
@djmb suggested this in #9 (comment) (thanks!) as a form of memory management, but also because with Sidekiq workers checked for orphan claimed jobs when they started, and if the workers that held them had died not that long ago so that their heartbeat hadn't expired, the jobs wouldn't be released. In Solid Queue, this wouldn't necessarily happen because the supervisor checks for claimed jobs to release periodically, but we might need this for memory management purposes or for other reasons. It's off by default (limit set to -1).