Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
PERF: remove_unused_levels is very slow #16556
Comments
|
look at my comment here: https://github.com/pandas-dev/pandas/blob/master/pandas/core/indexes/multi.py#L1314 you are welcome to provide another implementation if you'd like. |
jreback
added this to the
Next Major Release
milestone
May 31, 2017
jreback
changed the title from
remove_unused_levels is very slow to PERF: remove_unused_levels is very slow
May 31, 2017
|
Cool, I may take a swing at that. Would a replacement implementation have to produce level lists with the same order as the current implementation, or does that not matter as long as |
|
I think you DO need to produce the same orderings, otherwise things won't be sorted. But not 100% sure this is a strict guarantee. We have some tests which exercise this (prob need a few more). The before and after have to be equal. So technically your example doesn't meet this test. However it may be possible to recompute the missing levels, then simply reorder them. (again the internals), not the |
|
Hm. Then I think there's not just a performance issue, but an actual bug. Should I open a second issue or will this one serve for both? |
|
no these won't compare equal |
|
I think you're misreading me? I'm comparing the input and output of |
rhendric
added a commit
to rhendric/pandas
that referenced
this issue
Jun 1, 2017
|
|
rhendric |
4231e0c
|
rhendric
referenced
this issue
Jun 1, 2017
Merged
PERF/BUG: reimplement MultiIndex.remove_unused_levels #16565
rhendric
added a commit
to rhendric/pandas
that referenced
this issue
Jun 1, 2017
|
|
rhendric |
dbd9ddc
|
jreback
modified the milestone: 0.20.2, Next Major Release
Jun 1, 2017
rhendric
added a commit
to rhendric/pandas
that referenced
this issue
Jun 1, 2017
|
|
rhendric |
8a9fe43
|
rhendric
added a commit
to rhendric/pandas
that referenced
this issue
Jun 1, 2017
|
|
rhendric |
62c5907
|
rhendric
added a commit
to rhendric/pandas
that referenced
this issue
Jun 1, 2017
|
|
rhendric |
9852c6f
|
rhendric
added a commit
to rhendric/pandas
that referenced
this issue
Jun 2, 2017
|
|
rhendric |
83bdc59
|
rhendric commentedMay 31, 2017
Code Sample
Problem description
On my laptop,
xtakes 20 to 40 times as long asy, despiteydoing the extra work of sorting the second level and reindexing the series in the process. The outputs, except for the sorting of the second level, are identical. Why isremove_unused_levelsso slow?Expected Output
remove_unused_levelsshould be at least as fast on large indexes as thereset_index/set_indexhack.Output of
pd.show_versions()