PERF: Slowness in multi-level indexes with datetime levels #8543

Closed
miketkelly opened this Issue Oct 12, 2014 · 1 comment

Comments

Projects
None yet
2 participants
Contributor

miketkelly commented Oct 12, 2014

A MultiIndex with a DatetimeIndex level is slower than a similar index with numeric levels:

lev1 = range(10000)
lev2 = range(100)
mi = pd.MultiIndex.from_product([lev1, lev2])
%time mi.values

CPU times: user 571 ms, sys: 41 ms, total: 612 ms
Wall time: 612 ms

lev1 = range(10000)
lev2 = pd.date_range('1/1/2014', periods=100)
mi = pd.MultiIndex.from_product([lev1, lev2])
%time mi.values

CPU times: user 2.51 s, sys: 68 ms, total: 2.58 s
Wall time: 2.58 s

The overhead is in boxing the level values when generating the tuples for the values property. The overhead can be minimized if we do the boxing once for each distinct value rather than for each occurrence of that value in the tuples.

I can send in a PR shortly.

jreback added this to the 0.15.0 milestone Oct 13, 2014

Contributor

jreback commented Oct 13, 2014

closed by #8544

jreback closed this Oct 13, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment