Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logical scheduling so massive queues are possible. #8

Open
mcfadden opened this issue Aug 21, 2013 · 11 comments
Open

Logical scheduling so massive queues are possible. #8

mcfadden opened this issue Aug 21, 2013 · 11 comments

Comments

@mcfadden
Copy link

As I understand, the way this is working right now: when the threshold is reached, it schedules the jobs for the period from now. When the period cycles, the scheduled jobs drop in, and it all repeats. If I'm wrong about this, please correct me.

The issue I'm having with this is when I want to queue up a massive number of jobs (say 50,000) with the threshold of 50 and a period of 1 minute.

So what happens is every 1 minute just under 50,000 jobs have to get processed. This doesn't scale well.

How I would love to see it work is to smartly delay the items. For example, the first 50 can process now, the next 50 in 1 minute from now. The next 50 in 2 minutes, etc.

This will require some tracking of the current queue so that after all ~50,000 items are scheduled (for as far out as ~1000 minutes from now) it can logically add future items. So in a few hours from now, if I add more items, it should automatically figure out where the end of the queue is time-wise and schedule the new items to be completed at the end of the queue.

Is there any plans to change the functionality to work this way? If not, I might spend some time and try to hack something out.

@gevans
Copy link
Owner

gevans commented Oct 3, 2013

I've wanted this functionality for a while now but haven't had much time recently to focus on this gem. You're more than welcome to hack on something. :)

@bwthomas
Copy link

I think that the solution is actually to rate-limit the fetch rather than scheduling things out. I've got the same problem, plus an interaction issue with sidekiq-priority that makes this solution a necessity.

What I'm thinking is that, rather than scheduling jobs, the fetch method should be rate limited across all workers. this would actually solve a separate issue, which is the misleading balloon in the 'processed' count as things go back & forth between scheduled & the queue.

Looking around I see a fair number of rate limiting gems. My eye was drawn to glutton_ratelimit because of its mention in a number of search results, but also because of its ability to limit based on either a burst strategy (send them all & then wait) or an average strategy (dole it out over a period of time). This seems ideal for dealing with different kinds of APIs.

I don't know if you plan to address this any time soon, but unless it's already in the works then I think I'm going to have to do something sooner rather than later. I welcome (hope for, actually) any thoughts or feedback on the topic.

@gevans
Copy link
Owner

gevans commented Nov 11, 2013

I've been holding off on implementing this sort of functionality until I have a better idea of the implementation. I agree, fetching may be more efficient than scheduling.

You might want to look at Sidekiq::Fetcher#fetch which then uses Sidekiq::Fetcher.strategy to pick a fetching strategy. If you were to use glutton_ratelimit, you could then extend the basic fetcher and rate limit the #retrieve_work method:

class RateLimitedFetcher < Sidekiq::BasicFetch
  extend GluttonRateLimit

  rate_limit :retrieve_work, 5, 60
end

...I'm going to reflect on this a bit more. Thoughts?

@bwthomas
Copy link

So, I actually think that's about it. I think the rest would be wrapping that up in a sidekiq middleware wrapped in a gem, & then adding some config options to choose the strategy (exhaust vs average).

The only question in my mind is how to most effectively wrap it up in middleware. I guess a second, ancillary question is, is this the direction you want to take sidekiq-throttler? or do you see this as being a separate implementation of a similar concept?

Either way I'm interested in working on it, & it would be great to work it out with you & not saddle myself with yet another (possibly) redundant gem.

@bwthomas
Copy link

So, glutton_ratelimit won't work, it's not thread-safe. Of course, I'm sure there's an alternative that is ... just have to find it :/

@bwthomas
Copy link

redis_rate_limiter looks pretty good. In the meantime I'm going to try it out with the same subclassing strategy you outlined above. However, since you can really only have one custom fetcher I don't think it's a sustainable strategy in my codebase or even a gem.

I believe the solution is to add rate-limiting directly to sidekiq, or at least an improved interface for custom fetchers.

@bwthomas
Copy link

bwthomas commented Dec 3, 2013

I went ahead & cut a gem, sidekiq-rate-limiter. It doesn't support procs in the options hash yet, but is otherwise similar. I'll be working to improve it as time allows, but for now it's a decent solution for our purposes.

@MartinNowak
Copy link

As I understand, the way this is working right now: when the threshold is reached, it schedules the jobs for the period from now. When the period cycles, the scheduled jobs drop in, and it all repeats.

Indeed that's a serious and surprising design mistake. Guess it was build for a different purpose.

@florrain
Copy link

Any news about this thread?

@MartinNowak
Copy link

I ended up writing a pull for sidekiq-limit_fetch that allows to suspend processing a queue for some time.
deanpcmad/sidekiq-limit_fetch#53

@gevans
Copy link
Owner

gevans commented Apr 6, 2015

Sorry everyone, I'm not using Sidekiq or sidekiq-throttler these days. 😰

If anyone wants to take over from here, let me know. I'm also more than happy to link to alternatives.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants