New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unstack performance regression #19289

Closed
TomAugspurger opened this Issue Jan 17, 2018 · 3 comments

Comments

Projects
None yet
3 participants

@TomAugspurger TomAugspurger added this to the 0.23.0 milestone Jan 17, 2018

@TomAugspurger

This comment has been minimized.

Contributor

TomAugspurger commented Mar 29, 2018

@toobaz do you have time to take a look at this for 0.23?

@toobaz

This comment has been minimized.

Member

toobaz commented Mar 29, 2018

I'll try to look at this, and hopefully fix it, next week.

@jreback jreback modified the milestones: 0.23.0, Next Major Release Apr 14, 2018

@toobaz

This comment has been minimized.

Member

toobaz commented Apr 15, 2018

@TomAugspurger .unstack() needs to find unused levels. The problem is that since #18460, this is done twice for each level, once when building the index, and once when building the values (previously, it was done twice only for the level which was being unstacked). There is room for improving the code, but it's not simple, I won't have time soon.

#20703 recovers the performance drop by just making MultiIndex.remove_unused_levels more performant when there are no unused levels. It does not tackle the problem of checking twice, nor brings any improvement when all levels have unused items. We can leave this bug open if you want to keep a reminder for a more general refactoring.

@jreback jreback modified the milestones: Next Major Release, 0.23.0 Apr 15, 2018

jreback added a commit that referenced this issue Apr 15, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment