New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Improve panel mtype check performance #4196
Conversation
looks great to me, I think the only "collision" is with my approach to change get_cutoff, but I removed that since? Sorry currently a bit caught up, I will give this a closer look in the next days. |
Hi I am really confused now, to me performance seems to go down with the latest PR? |
What are you referring to with "with the latest PR"? 0.09s vs 12.4s using this test case:
|
Hi sorry didnt have the time test thoroughly, I use the following code
Which resulted for me in a slower performance when investigating the cProfile file. Hope I can dig in a bit more tomorrow. |
Ok, so you are testing more than just |
I broke it down to your example
I think the key point is that you consider PeriodIndex, while I conser DateTimeIndex. With Datetimeindex and many groups, the PR code presented here takes twice as long on my PC. Maybe we need to implement a logic to check PeriodIndex. The great boost to your PeriodeIndex comes from this line, I guess?
Great find, we definitely need that . |
Interesting. I will investigate further with your code. |
Did a quick comparison of three scenarios: This PR@0e13388: main@8a14adb: My code performed consistently better. For your example the difference is small but it's sill better. For the other two it's significant. Not only for the period index. Very strange, that we observe so different outcomes.
|
Found some further improvements. panel_period: 0.145111 vs panel_period: 12.283228 The slowest part is now |
FYI, I'm currently working on using the config mechanisms to turn off checks, metadata etc - I suppose that's orthogonal to this PR. |
Nice. Thanks. I didn't start yet with an implementation, but my idea was to split check and metadata into two methods and, as a first step, call the existing functions as fallback somehow. But I guess this is indeed orthogonal. |
Hm, good idea. How would it look like, in an example? |
Here are two relevant issues:
|
@fkiraly @hoesler Thanks a lot! |
Performance statistics
after merge of
i.e., I also notice |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot! Great improvement!
I would also approve it.
I'll take this as approval, @danbartl - did you want to press the "approve" button? |
Reference Issues/PRs
Split out of #4140
Contributes to #4139
Overlaps with #3827, #3991
What does this implement/fix? Explain your changes.
This PR improves the performance of
check_pdmultiindex_panel
.It was developed in parallel to the work of @danbartl, so some improvements might be outdated or just implemented differently. I would be grateful, if he could support here.
Does your contribution introduce a new dependency? If yes, which one?
What should a reviewer concentrate their feedback on?
Did you add any tests for the change?
Any other comments?
PR checklist
For all contributions
For new estimators