Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
groupby().sum() very slow when applied to boolean columns #2692
Comments
|
Strange. Thanks for letting me know-- I will have a look |
wesm
was assigned
Jan 19, 2013
wesm
closed this
in b5b04e0
Jan 19, 2013
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
lselector commentedJan 14, 2013
While upgrading pandas from 0.7.2 to 0.9.1 we have bumped into slowness of certain groupby().sum() operations. Here is a simple example:
N=10000
aa=DataFrame({'ii':range(N),'bb':[True for x in range(N)]})
timeit aa.sum() # fast
timeit aa.groupby('bb').sum() #fast
timeit aa.groupby('ii').sum() # very slow (~ 1000 times slower)