Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: Slowness in multi-level indexes with datetime levels #8543

Closed
miketkelly opened this issue Oct 12, 2014 · 1 comment
Closed

PERF: Slowness in multi-level indexes with datetime levels #8543

miketkelly opened this issue Oct 12, 2014 · 1 comment
Labels
MultiIndex Performance Memory or execution speed performance
Milestone

Comments

@miketkelly
Copy link

A MultiIndex with a DatetimeIndex level is slower than a similar index with numeric levels:

lev1 = range(10000)
lev2 = range(100)
mi = pd.MultiIndex.from_product([lev1, lev2])
%time mi.values

CPU times: user 571 ms, sys: 41 ms, total: 612 ms
Wall time: 612 ms

lev1 = range(10000)
lev2 = pd.date_range('1/1/2014', periods=100)
mi = pd.MultiIndex.from_product([lev1, lev2])
%time mi.values

CPU times: user 2.51 s, sys: 68 ms, total: 2.58 s
Wall time: 2.58 s

The overhead is in boxing the level values when generating the tuples for the values property. The overhead can be minimized if we do the boxing once for each distinct value rather than for each occurrence of that value in the tuples.

I can send in a PR shortly.

@jreback
Copy link
Contributor

jreback commented Oct 13, 2014

closed by #8544

@jreback jreback closed this as completed Oct 13, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
MultiIndex Performance Memory or execution speed performance
Projects
None yet
Development

No branches or pull requests

2 participants