Logical scheduling so massive queues are possible. #8

mcfadden · 2013-08-21T23:22:54Z

As I understand, the way this is working right now: when the threshold is reached, it schedules the jobs for the period from now. When the period cycles, the scheduled jobs drop in, and it all repeats. If I'm wrong about this, please correct me.

The issue I'm having with this is when I want to queue up a massive number of jobs (say 50,000) with the threshold of 50 and a period of 1 minute.

So what happens is every 1 minute just under 50,000 jobs have to get processed. This doesn't scale well.

How I would love to see it work is to smartly delay the items. For example, the first 50 can process now, the next 50 in 1 minute from now. The next 50 in 2 minutes, etc.

This will require some tracking of the current queue so that after all ~50,000 items are scheduled (for as far out as ~1000 minutes from now) it can logically add future items. So in a few hours from now, if I add more items, it should automatically figure out where the end of the queue is time-wise and schedule the new items to be completed at the end of the queue.

Is there any plans to change the functionality to work this way? If not, I might spend some time and try to hack something out.

gevans · 2013-10-03T23:26:00Z

I've wanted this functionality for a while now but haven't had much time recently to focus on this gem. You're more than welcome to hack on something. :)

bwthomas · 2013-11-11T13:19:42Z

I think that the solution is actually to rate-limit the fetch rather than scheduling things out. I've got the same problem, plus an interaction issue with sidekiq-priority that makes this solution a necessity.

What I'm thinking is that, rather than scheduling jobs, the fetch method should be rate limited across all workers. this would actually solve a separate issue, which is the misleading balloon in the 'processed' count as things go back & forth between scheduled & the queue.

Looking around I see a fair number of rate limiting gems. My eye was drawn to glutton_ratelimit because of its mention in a number of search results, but also because of its ability to limit based on either a burst strategy (send them all & then wait) or an average strategy (dole it out over a period of time). This seems ideal for dealing with different kinds of APIs.

I don't know if you plan to address this any time soon, but unless it's already in the works then I think I'm going to have to do something sooner rather than later. I welcome (hope for, actually) any thoughts or feedback on the topic.

gevans · 2013-11-11T20:02:43Z

I've been holding off on implementing this sort of functionality until I have a better idea of the implementation. I agree, fetching may be more efficient than scheduling.

You might want to look at Sidekiq::Fetcher#fetch which then uses Sidekiq::Fetcher.strategy to pick a fetching strategy. If you were to use glutton_ratelimit, you could then extend the basic fetcher and rate limit the #retrieve_work method:

class RateLimitedFetcher < Sidekiq::BasicFetch
  extend GluttonRateLimit

  rate_limit :retrieve_work, 5, 60
end

...I'm going to reflect on this a bit more. Thoughts?

bwthomas · 2013-11-11T20:27:15Z

So, I actually think that's about it. I think the rest would be wrapping that up in a sidekiq middleware wrapped in a gem, & then adding some config options to choose the strategy (exhaust vs average).

The only question in my mind is how to most effectively wrap it up in middleware. I guess a second, ancillary question is, is this the direction you want to take sidekiq-throttler? or do you see this as being a separate implementation of a similar concept?

Either way I'm interested in working on it, & it would be great to work it out with you & not saddle myself with yet another (possibly) redundant gem.

bwthomas · 2013-11-12T11:15:12Z

So, glutton_ratelimit won't work, it's not thread-safe. Of course, I'm sure there's an alternative that is ... just have to find it :/

bwthomas · 2013-11-19T13:47:46Z

redis_rate_limiter looks pretty good. In the meantime I'm going to try it out with the same subclassing strategy you outlined above. However, since you can really only have one custom fetcher I don't think it's a sustainable strategy in my codebase or even a gem.

I believe the solution is to add rate-limiting directly to sidekiq, or at least an improved interface for custom fetchers.

bwthomas · 2013-12-03T15:30:04Z

I went ahead & cut a gem, sidekiq-rate-limiter. It doesn't support procs in the options hash yet, but is otherwise similar. I'll be working to improve it as time allows, but for now it's a decent solution for our purposes.

MartinNowak · 2015-02-19T21:00:09Z

As I understand, the way this is working right now: when the threshold is reached, it schedules the jobs for the period from now. When the period cycles, the scheduled jobs drop in, and it all repeats.

Indeed that's a serious and surprising design mistake. Guess it was build for a different purpose.

florrain · 2015-03-31T05:38:28Z

Any news about this thread?

MartinNowak · 2015-03-31T14:35:39Z

I ended up writing a pull for sidekiq-limit_fetch that allows to suspend processing a queue for some time.
deanpcmad/sidekiq-limit_fetch#53

gevans · 2015-04-06T18:44:06Z

Sorry everyone, I'm not using Sidekiq or sidekiq-throttler these days. 😰

If anyone wants to take over from here, let me know. I'm also more than happy to link to alternatives.

mabeebam mentioned this issue Feb 29, 2016

Throttling of queues akira/exq#178

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Logical scheduling so massive queues are possible. #8

Logical scheduling so massive queues are possible. #8

mcfadden commented Aug 21, 2013

gevans commented Oct 3, 2013

bwthomas commented Nov 11, 2013

gevans commented Nov 11, 2013

bwthomas commented Nov 11, 2013

bwthomas commented Nov 12, 2013

bwthomas commented Nov 19, 2013

bwthomas commented Dec 3, 2013

MartinNowak commented Feb 19, 2015

florrain commented Mar 31, 2015

MartinNowak commented Mar 31, 2015

gevans commented Apr 6, 2015

Logical scheduling so massive queues are possible. #8

Logical scheduling so massive queues are possible. #8

Comments

mcfadden commented Aug 21, 2013

gevans commented Oct 3, 2013

bwthomas commented Nov 11, 2013

gevans commented Nov 11, 2013

bwthomas commented Nov 11, 2013

bwthomas commented Nov 12, 2013

bwthomas commented Nov 19, 2013

bwthomas commented Dec 3, 2013

MartinNowak commented Feb 19, 2015

florrain commented Mar 31, 2015

MartinNowak commented Mar 31, 2015

gevans commented Apr 6, 2015