-
-
Notifications
You must be signed in to change notification settings - Fork 273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Ruby Reaper performance under heavy load #663
Comments
Closes #663 Signed-off-by: mhenrixon <mikael@mhenrixon.com>
* Improve reaper performance under heavy load Closes #663 Signed-off-by: mhenrixon <mikael@mhenrixon.com> * Mandatory linter commit
❤️ thank you again for the bug report and the solution! I noticed the issue yesterday when the last entry in the sorted set was deleted instead of the first one. Today I also fixed that bug, all thanks to you! It has been released as |
Just a brief report back on this one - for our use case this cut reaper run time by 60%, thank you so much for moving so quickly! 👏🏻 👏🏻 👏🏻 Separately, just so you're aware, while the reaper now runs very effectively for us even when there is a large number of digests to clean (e.g., 150,000), it will struggle a lot if our sidekiq queues are very full (e.g. - 5,000 jobs in queues). For our use case, this is a surge which will pass, so it's fine for the reaper to give up/timeout and be resurrected later, after the surge has passed, but it might be a problem for other use cases. |
Hi @francesmcmullin, can you open a separate issue for this one? I believe we could skip reaping based on how large the queues are. The most frequent use case would and should be until executed and with that one there shouldn't be a need for reaping when so many jobs are queued successfully. Rather it should be done as cleanup after the surge. |
Is your feature request related to a problem? Please describe.
I find the ruby reaper runs slowly, and takes up a lot of memory when I'm processing a large volume of jobs (hundreds of thousands in a few hours). I've updated our configuration so that if the reaper fails entirely, the reaper resurrector brings it back, and it mostly doesn't hit the timeout, but ideally it would run a little bit faster with a smaller footprint.
Describe the solution you'd like
I suggest that the reaper should work through the oldest digests first, and that it should avoid loading all digests in ruby at once. Here's the code I'm interested in:
sidekiq-unique-jobs/lib/sidekiq_unique_jobs/orphans/ruby_reaper.rb
Lines 58 to 63 in 323d4ef
Currently, using
zrevrange
means we go from the highest score to the lowest. As the current timestamp is generally used for a digest, this means going from newest to oldest. It's certainly not perfect, but I suggest a better general guess when seeking stale digests would be to go from oldest to newest - we can do this by usingzrange
instead ofzrevrange
.Second, perhaps more laboriously, I suggest paging through digests rather than loading the whole set. It might look similar to this:
Describe alternatives you've considered
I've considered switching to the Lua reaper, but I was concerned about blocking redis. I'm also thinking about changing some of our application logic so we don't lean quite so heavily on unique jobs, but that will take a bit longer to develop.
Additional context
I'm happy to provide more detail on how we're using sidekiq-unique-jobs in case that's helpful. We tend to process large volumes of jobs (e.g., 300,000) in a short amount of time (e.g., 2 hours) and then have long periods with much less activity.
The text was updated successfully, but these errors were encountered: