Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
PERF: speed up multi-key groupby #8128
Conversation
|
jreback
added Groupby Performance
labels
Aug 28, 2014
jreback
added this to the
0.15.0
milestone
Aug 28, 2014
behzadnouri
changed the title from
ENH: speed up multi-key groupby to PERF: speed up multi-key groupby
Aug 28, 2014
|
I had some dependency issue, so ran the benchmarks manually;
on branch:
|
|
why is the memory usage so high? the Cartesian product of the groups is not represented here (it's only the compressed space) |
|
The master branch calls into groupsort_indexer with Elsewhere also, the code falls back on argsort to avoid memory error. |
|
|
jreback
merged commit c5a3514
into pandas-dev:master
Aug 29, 2014
1 check passed
|
thanks! this was great! |
|
@jreback On further tests, it seems to me that we need a stable sorter for
I need to change the code to I did some tests with merge-sort and benchmarks still look good. It is also inline with the fact that Wes uses merge sort in here. |
|
ok make a new pr hmm no tests break |
behzadnouri commentedAug 28, 2014
Improves multi-key
groupbyspeed; On master:note that it is not responsive to the reduction in number of groups. With this patch:
benching: