New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: cythonize groupby-rank #15779

Closed
jreback opened this Issue Mar 22, 2017 · 2 comments

Comments

Projects
None yet
2 participants
@jreback
Contributor

jreback commented Mar 22, 2017

This dispatches to each group individually. Better to have a combined group_rank to do this. It is a bit of code and ideally would share some with the actual rank algos.

In [7]: ngroups = 1000

In [8]: N = 100000

In [9]: np.random.seed(1234)

In [10]: df = DataFrame({'key': np.random.randint(0, ngroups, size=N), 'value': np.arange(N)})

In [11]: %timeit df.groupby('key').rank()
1 loop, best of 3: 392 ms per loop

# comparision with group_shift_indexer, a transforming operator
In [13]: %timeit df.groupby('key').shift()
100 loops, best of 3: 3.15 ms per loop

@jreback jreback modified the milestones: Next Major Release, Next Minor Release Mar 22, 2017

@jreback jreback modified the milestones: Interesting Issues, Next Major Release Nov 26, 2017

@jreback jreback added this to Perf in Interesting Things Nov 26, 2017

@mroeschke mroeschke referenced this issue Jan 10, 2018

Open

PERF: Discrepancy in groupby methods #19165

4 of 7 tasks complete
@WillAyd

This comment has been minimized.

Member

WillAyd commented Jan 25, 2018

I can take a look at this. Any tips on what methods to explore? I was thinking of adding a method to the GroupBy class similar to the others for rank and was looking at the rank method in algos.

It wasn't immediately clear to me the best way to knit that all together so figured I'd get your thoughts if you have any

@jreback

This comment has been minimized.

Contributor

jreback commented Jan 26, 2018

yeah u can make a separate rank routine which takes a group indexer; eg copy something like group_last and integrate with rank

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment