Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify (or de-duplicate) to_offset vs Period._maybe_convert_freq #23475

Closed
jbrockmendel opened this issue Nov 3, 2018 · 3 comments · Fixed by #34619
Closed

Clarify (or de-duplicate) to_offset vs Period._maybe_convert_freq #23475

jbrockmendel opened this issue Nov 3, 2018 · 3 comments · Fixed by #34619
Labels
Enhancement Period Period data type

Comments

@jbrockmendel
Copy link
Member

Period._maybe_convert_freq calls to_offset, so I guess is a strict superset of its functionality? The difference should be documented along with what use case calls for which.

Also it isn't clear that _maybe_convert_freq needs to be a class method, could just be a function.

@jbrockmendel jbrockmendel added the Period Period data type label Nov 4, 2018
@qwhelan
Copy link
Contributor

qwhelan commented Nov 4, 2018

Ditto on the need for some clarity here - I've found that the current setup imposes a pretty substantial overhead on windows. Simply moving an isinstance() check into _maybe_convert_freq() yields a 35x speedup on windows due to eliminating import overhead, so we probably will want to move to_offset into cython even if we do not merge the two functions here.

@jbrockmendel
Copy link
Member Author

Can you clarify what is getting sped up 35x? And do you mean to_offset instead of to_import?

@qwhelan
Copy link
Contributor

qwhelan commented Nov 4, 2018

Sure, here's the asv benchmarks that improved in #23500 :

Benchmarks that have improved:

       before           after         ratio
     [24ab22f7]       [d7cef344]
     <new_asv~1>       <_maybe_convert_freq>
           failed       2.13±0.03s      n/a  timeseries.Iteration.time_iter(<function period_range at 0x7f12102248c8>)
-      6.49±0.04s       2.21±0.03s     0.34  groupby.Datelike.time_sum('period_range')
-      1.02±0.03s         31.2±8ms     0.03  period.Algorithms.time_drop_duplicates('series')
-         172±8ms         5.21±2ms     0.03  period.PeriodIndexConstructor.time_from_pydatetime('D')
-        852±30ms         23.4±8ms     0.03  timeseries.Iteration.time_iter_preexit(<function period_range at 0x7f12102248c8>)

Yep, meant to_offset()

jbrockmendel added a commit to jbrockmendel/pandas that referenced this issue Jun 6, 2020
WillAyd pushed a commit that referenced this issue Jun 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Period Period data type
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants