Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

minutely(5) causes high CPU idle in sidetiq with multiple jobs #208

Closed
riking opened this issue Jan 10, 2014 · 12 comments
Closed

minutely(5) causes high CPU idle in sidetiq with multiple jobs #208

riking opened this issue Jan 10, 2014 · 12 comments

Comments

@riking
Copy link

riking commented Jan 10, 2014

Discussion thread (Posts 16-end)

https://github.com/tobiassvn/sidetiq/issues/31

Workaround commit in discourse

Would definitely be nice if this was fixed.

@avit
Copy link
Collaborator

avit commented Jan 10, 2014

Definitely, I overheard vague details about this from @SamSaffron's talk on profiling that I watched a while back. It's been on my peripheral radar to look into it.

I'll see if we can add some optimizations for the smaller interval loops.

@seejohnrun
Copy link
Collaborator

Hey guys - so I've been doing some research on this. The issue stems from full_required? being true on rules that have intervals >1 (as per #92). If you take this out, you will get the performance you'd expect back.

Now as to why that's an issue in general - when the interval is >1 we walk from start_time instead of opening_time, thus going over a large time range just to determine if opening_time is in the proper interval from start_time.

Sidetiq will keep around the current start_time, which is set as a hard:

# from https://github.com/tobiassvn/sidetiq/blob/master/lib/sidetiq/schedule.rb
START_TIME = Sidetiq.config.utc ? Time.utc(2010, 1, 1) : Time.local(2010, 1, 1)

So I'm not sure which side of the usage is the problem - that we should be doing something to provide a next_occurrence version which can force opening_time instead of start_time, or that sidetiq should be setting start_time as the occurrence time of the last tick instead of January 1st, 2010.

@SamSaffron
Copy link

Why can't we just cache the last time in memory, then the first call to get
the next time will be slow, but thereafter it can be fast?

On Tue, Jan 14, 2014 at 1:27 AM, John Crepezzi notifications@github.comwrote:

Hey guys - so I've been doing some research on this. The issue stems from
full_required? being true on rules that have intervals >1 (as per #92#92).
If you take this out, you will get the performance you'd expect back.

Now as to why that's an issue in general - when the interval is >1 we
walk from start_time instead of opening_time, thus going over a large
time range just to determine if opening_time is in the proper interval
from start_time.

Sidetiq will keep around the current start_time, which is set as a hard:

from https://github.com/tobiassvn/sidetiq/blob/master/lib/sidetiq/schedule.rbSTART_TIME = Sidetiq.config.utc ? Time.utc(2010, 1, 1) : Time.local(2010, 1, 1)

So I'm not sure which side of the usage is the problem - that we should be
doing something to provide a next_occurrence version which can force
opening_time instead of start_time, or that sidetiq should be setting
start_time as the occurrence time of the last tick instead of January
1st, 2010.


Reply to this email directly or view it on GitHubhttps://github.com//issues/208#issuecomment-32173185
.

@seejohnrun
Copy link
Collaborator

@SamSaffron This is something that sidetiq has to evaluate pretty often from a serialized copy so they can figure out when the next occurrence is

@SamSaffron
Copy link

@john I follow, but for example.

Say you have minutely(5)

  1. On first call in sidekiq, do the full crawl. Then cache that number.
  2. On second call create a brand new icecube schedule that starts at the
    time in (1) and get the next time from there
  3. repeat
  4. profit

On Tue, Jan 14, 2014 at 8:11 AM, John Crepezzi notifications@github.comwrote:

@SamSaffron https://github.com/SamSaffron This is something that
sidetiq has to evaluate pretty often from a serialized copy so they can
figure out when the next occurrence is


Reply to this email directly or view it on GitHubhttps://github.com//issues/208#issuecomment-32211831
.

@seejohnrun
Copy link
Collaborator

@SamSaffron gotcha now - I think that's similar to the second option in my original comment. It seems a bit odd that sidetiq bases everything off of a fixed start time. They could just as easily update that start time and store it for the next calculation - that would get rid of people's ability to use things like count, but - it's pretty meaningless to use count with a fixed start time too, right? :)

@SamSaffron
Copy link

@tobiassvn thoughts?

@seejohnrun
Copy link
Collaborator

In 89dee70 I revert the behavior from #92 and add a few more tests - schedule_lock handles the issue mentioned in the ticket, so we can relax full_required? to fix the issue.

I'll leave this sitting for one day to collect any feedback, and then I'll be dropping a release tomorrow with this change

thanks!

@ches
Copy link

ches commented Jan 29, 2014

I haven't dug deeply into understanding the issue here at an implementation level and how it affects tobiassvn/sidetiq#31, so please forgive my ignorance, but I'm wondering if #203 helps to alleviate any problems here, or if it could. That PR was open for awhile but was just merged a few days ago, so not sure if its changes were considered in context here.

@seejohnrun
Copy link
Collaborator

@ches I think that #203 provides more an improvement in readability - and that the real central issue here was the walking from start_time due to sidetiq's fixed start_time

@ches
Copy link

ches commented Jan 30, 2014

@seejohnrun That makes sense about Sidetiq's design decision of the fixed start_time, perhaps fundamentally that is what really need reconsideration -- but could it now be reworked to take advantage of lazy evaluation instead of eagerly projecting from start_time?

Sorry to sort of hijack a closed issue thread for the inquiry here.

@seejohnrun
Copy link
Collaborator

I think that the change made in 89dee70 fixes that issue without need for change on sidetiq's side. ice_cube has a feature where instead of walking from the start_time, you can walk from any time for certain types of schedules. That change uses that feature (as well as the test case mentioned in the original sidetiq ticket) for the fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants