Skip to content

Conversation

mzeitlin11
Copy link
Member

Some low-hanging fruit

In [1]: import numpy as np
   ...: import pandas._libs.algos as libalgos
   ...:
   ...: np.random.seed(0)
   ...: inds = np.random.randint(0, 100, 500000)
   ...: ngroups = inds.max() + 1

In [2]: %timeit libalgos.groupsort_indexer(inds, ngroups)

1.06 ms ± 11.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) # this pr
1.2 ms ± 39.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) # master

@mzeitlin11 mzeitlin11 added Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Performance Memory or execution speed performance labels Jun 15, 2021
@jreback jreback added this to the 1.4 milestone Jun 15, 2021
Copy link
Member

@jbrockmendel jbrockmendel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (assuming the CI failures are unrelated)

Copy link
Member

@jbrockmendel jbrockmendel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jreback jreback merged commit a9cb219 into pandas-dev:master Jul 28, 2021
@jreback
Copy link
Contributor

jreback commented Jul 28, 2021

thanks @mzeitlin11

@mzeitlin11 mzeitlin11 deleted the groupsort_indexer branch July 28, 2021 01:54
CGe0516 pushed a commit to CGe0516/pandas that referenced this pull request Jul 29, 2021
feefladder pushed a commit to feefladder/pandas that referenced this pull request Sep 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Performance Memory or execution speed performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants