Skip to content

Allow dispatcher to shutdown after a number of jobs (executions) performed#15

Closed
rosa wants to merge 3 commits into
mainfrom
stop-after-number-of-jobs
Closed

Allow dispatcher to shutdown after a number of jobs (executions) performed#15
rosa wants to merge 3 commits into
mainfrom
stop-after-number-of-jobs

Conversation

@rosa
Copy link
Copy Markdown
Member

@rosa rosa commented Mar 14, 2023

@djmb suggested this in #9 (comment) (thanks!) as a form of memory management, but also because with Sidekiq workers checked for orphan claimed jobs when they started, and if the workers that held them had died not that long ago so that their heartbeat hadn't expired, the jobs wouldn't be released. In Solid Queue, this wouldn't necessarily happen because the supervisor checks for claimed jobs to release periodically, but we might need this for memory management purposes or for other reasons. It's off by default (limit set to -1).

rosa added 2 commits March 14, 2023 19:15
Besides alerting based on finished_at, queue and scheduled_at, we want
to see all jobs that are finished across all queues and jobs in a queue
in all possible statuses.
…ormed

This was suggested by Donal (thanks!) as a form of memory management, but also
because with Sidekiq workers checked for orphan claimed jobs when they started, and
if the workers that held them had died not that long ago so that their heartbeat
hand't expired, the jobs wouldn't be released. In Solid Queue this doesn't happen
because the supervisor checks for claimed jobs to release periodically, but we
might need this for memory management purposes or for other reasons. It's off
by default.
Comment thread lib/solid_queue/runner.rb

def wait
@thread&.join
Thread.new { @thread&.join }.tap(&:join)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need this because we might end up calling wait from within @thread, namely when calling stop because we're done with the executions set by the limit. We can't join the current thread from within itself.

This makes me think that this current code organisation, where code in Dispatcher runs in a thread, but you don't know this unless you look at Runner might be a bit risky... 🤔

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we send ourselves a signal to initiate shutdown? That would be handled in the main thread.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, while running (🏃🏻‍♀️, not running the process 😆) I was thinking about how this is clearly wrong. How aren't these not deadlocking? 🤔 Or at least, I think the newer thread here is not getting a chance to be joined because the main thread is getting joined before 🤔

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we send ourselves a signal to initiate shutdown? That would be handled in the main thread.

I had thought briefly about that but thought it was not possible 🤔 The problem is that the main thread here is the supervisor, which handles all the different dispatchers, not just the one that needs to stop 🤔 I might be missing something, though.

@djmb
Copy link
Copy Markdown
Contributor

djmb commented Mar 15, 2023

I'd not thought about the complexity of this feature with multiple threads!

It's pretty easy to do when each worker is a process. You exit the process after job X and whatever is supervising the processes starts up a replacement. But with multiple threads it is much trickier!

This does make we wonder if this is the right time to introduce this. It's a nice-to-have feature but maybe it should wait until we have the thread/process setup locked down? There are other ways to do this as well if it is just too complex (e.g. monitor memory usage and send a shutdown signal if it gets too high, shutdown each worker after X minutes).

What do you think?

Comment thread lib/solid_queue/dispatcher.rb Outdated
end

def executions_per_run_limited?
SolidQueue.execution_limit_per_dispatch_run >= 0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably treat 0 as no limit

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right; that makes more sense 😅 I was following GoodJob's lead here, which uses -1 for all sorts of configurable limits (it doesn't have this particular limit, though), but I think 0 is better.

Comment thread lib/solid_queue/runner.rb

def wait
@thread&.join
Thread.new { @thread&.join }.tap(&:join)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we send ourselves a signal to initiate shutdown? That would be handled in the main thread.

@rosa
Copy link
Copy Markdown
Member Author

rosa commented Mar 15, 2023

It's pretty easy to do when each worker is a process. You exit the process after job X and whatever is supervising the processes starts up a replacement. But with multiple threads it is much trickier!

This does make we wonder if this is the right time to introduce this. It's a nice-to-have feature but maybe it should wait until we have the thread/process setup locked down? There are other ways to do this as well if it is just too complex (e.g. monitor memory usage and send a shutdown signal if it gets too high, shutdown each worker after X minutes).

What do you think?

Yes, you're right! In fact, this made me go back to the process setup I had been playing with a couple of days ago, I think it makes more sense, it's easier to reason about and to manage! I'm going to park this one and get back to moving from threads (keeping the thread pool for running jobs, that's it) to processes, and then will take this one back.

Thank you!

@rosa
Copy link
Copy Markdown
Member Author

rosa commented Dec 7, 2023

Going to close this one as the code has changed dramatically since I started this. It's something I'll try in the future, but from scratch.

@rosa rosa closed this Dec 7, 2023
@rosa rosa deleted the stop-after-number-of-jobs branch December 19, 2023 08:10
hms pushed a commit to ikyn-inc/solid_queue that referenced this pull request Aug 5, 2024
Add console helpers to connect to different job instances from Dash console
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants