Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: rrule is very slow #42

Closed
mrx23dot opened this issue Aug 15, 2020 · 13 comments
Closed

bug: rrule is very slow #42

mrx23dot opened this issue Aug 15, 2020 · 13 comments
Labels
bug Something isn't working

Comments

@mrx23dot
Copy link

mrx23dot commented Aug 15, 2020

rrule is very slow, for a 1.7MB ical evaluating 7days of events with with ~80 recurring events in calendar

for dayidx in range(7):
   recurring_ical_events.of(calendar).at((day.year, day.month, day.day))

I get the following profiling with i5 CPU:

ncalls tottime percall cumtime percall filename:lineno(function)
635 5.837 0.009 5.837 0.009 {method 'read' of '_ssl._SSLSocket' objects} // google calendar download 5.8sec
966170/963055 5.584 0.000 10.756 0.000 rrule.py:774(_iter) // RRULE eval 5.6s
925050 2.324 0.000 2.453 0.000 rrule.py:1276(ddayset)) // RRULE eval 2.3s
35101 1.395 0.000 14.259 0.000 rrule.py:1381(_iter)/ RRULE eval 1.4s
1965581/1033825 0.844 0.000 12.126 0.000 {built-in method builtins.next}
933067 0.798 0.000 0.798 0.000 {built-in method combine}
71890 0.771 0.000 2.094 0.000 parser.py:321(parts)
933058 0.619 0.000 0.619 0.000 {built-in method fromordinal}

@mrx23dot mrx23dot added the bug Something isn't working label Aug 15, 2020
@mrx23dot
Copy link
Author

I have many old/ended recurring items.
Is ";UNTIL=" checked before iterating RRULE period?

@niccokunzmann
Copy link
Owner

niccokunzmann commented Aug 15, 2020

Hi, could you provide the ICS file and the script so people trying this talk about the same results as you do?
This module uses the dateutil. I wonder then, if this is also relevant for them.

@mrx23dot
Copy link
Author

Hi, sorry I only have my private calendar.
Others are also saying builtin rrule can be very slow.
https://stackoverflow.com/questions/1336824/python-dateutil-rrule-is-incredibly-slow

Can we prefilter with ;UNTIL= parameter before we pass anything over to rrule?
Otherwise if I have a biweekly task from 2010 it will have to crawl over 10years if it's not smart enough.

@niccokunzmann
Copy link
Owner

Reading the question, it seems that using the between function might be fast.

Also, using rrule.between() to get dates within a given interval is very fast.

Currently, we use the iteration, see

Maybe using between() would speed it up?

@mrx23dot
Copy link
Author

rrule only:

  • .at: 9.128sec
  • .between: 9.094sec
    with prefiltering I would expect 0.1sec.

@niccokunzmann
Copy link
Owner

How would prefiltering work?

@niccokunzmann
Copy link
Owner

Also, with using between(), I meant rrule.between, not this module's between function.

@mrx23dot
Copy link
Author

eg
FREQ=WEEKLY;UNTIL=20191023;BYDAY=TH;WKST=SU

UNTIL part already parsed in the code:
rule_list = rule_string.split(";UNTIL=")
rule_list[1]

if UNTIL >= datetime.now():
pass event/line over to rrule
else:
ignore event

@niccokunzmann
Copy link
Owner

niccokunzmann commented Aug 15, 2020

I think, there are some optimizations which can be taken:

  • change the UNTIL parameter in the string
  • use the rrule.between() function instead of plain iteration (inc=True should be tested)

Also, having a test event would be great. Is it possible that you identify the event which takes so long and post it here with the code which takes long? This way, we can really optimize - at the moment, I am still not sure how to properly address it.
If you like to contribute code, you can also start adding a (failing) test and create a pull request, see the issue template.

@mrx23dot
Copy link
Author

mrx23dot commented Aug 16, 2020

Please find test case attached, querying 28days takes 31sec

test.zip

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   206871    2.193    0.000    4.556    0.000 rrule.py:774(_iter)  <-- slowest 2sec
   133784    2.119    0.000   12.167    0.000 recurring_ical_events.py:131(__init__) <-- slow 2sec
   581423    2.023    0.000    2.023    0.000 {method 'replace' of 'datetime.datetime' objects} <-- slow 2sec
   238440    1.611    0.000    7.950    0.000 rrule.py:1381(_iter)
   612392    1.128    0.000    1.965    0.000 caselessdict.py:56(get)
   133784    1.071    0.000    2.799    0.000 recurring_ical_events.py:197(make_all_dates_comparable)
   238440    1.015    0.000   10.964    0.000 recurring_ical_events.py:228(__iter__)
   878548    0.833    0.000    2.481    0.000 recurring_ical_events.py:45(convert_to_datetime)
    34264    0.826    0.000    1.111    0.000 rrule.py:426(__init__)

niccokunzmann added a commit that referenced this issue Aug 16, 2020
@niccokunzmann
Copy link
Owner

@mrx23dot, I added your script as a benchmark in #43. If you like, you can help me find the bottle necks again and we can optimize some more. I also adjusted your script so that caching can take place which gave about a quarter more speed.

niccokunzmann added a commit that referenced this issue Aug 17, 2020
@NicoHood
Copy link

Isnt that issue solved now?

@mrx23dot
Copy link
Author

old v0.1.18b0
old

head
new

I will close it until someone has an idea how to speed it up further. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants