Skip to content

Conversation

jschendel
Copy link
Member

Small simplification: modify the breaks metadata before creating an IntervalIndex then create and an IntervalIndex from the modified breaks. The existing approach creates an IntervalIndex, modifies the first Interval, then creates a new IntervalIndex with the updated first Interval.

This yields a slight performance improvement but doesn't seem dramatic enough to warrant a whatsnew entry, though I can add one if desired.

On this branch:

In [1]: import numpy as np; import pandas as pd; pd.__version__
Out[1]: '0.26.0.dev0+1668.ga6c08fc02'

In [2]: a = np.arange(10**5)

In [3]: %timeit pd.qcut(a, 10**4)
273 ms ± 914 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

On master:

In [1]: import numpy as np; import pandas as pd; pd.__version__
Out[1]: '0.26.0.dev0+1667.g40bff2fed'

In [2]: a = np.arange(10**5)

In [3]: %timeit pd.qcut(a, 10**4)
317 ms ± 1.14 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

@jschendel jschendel added Performance Memory or execution speed performance Reshaping Concat, Merge/Join, Stack/Unstack, Explode Clean labels Jan 7, 2020
@jschendel jschendel added this to the 1.0 milestone Jan 7, 2020
@TomAugspurger TomAugspurger merged commit c5948d1 into pandas-dev:master Jan 7, 2020
@TomAugspurger
Copy link
Contributor

Thanks!

@jschendel jschendel deleted the cln-format-labels branch January 7, 2020 16:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Clean Performance Memory or execution speed performance Reshaping Concat, Merge/Join, Stack/Unstack, Explode

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants