Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Groupby "negative dimensions are not allowed" error and bad key behaviour when there are NaNs values. #9096

Closed
jordeu opened this issue Dec 17, 2014 · 10 comments · Fixed by #9380
Milestone

Comments

@jordeu
Copy link

jordeu commented Dec 17, 2014

On a groupby with a composed key if the product of all possible values is bigger than 2^63 we get a ValueError "negative dimensions are not allowed" when we call len(grouped_data).

A simple version to reproduce it:

values = range(55109)
data = pd.DataFrame.from_dict({'a': values, 'b': values, 'c': values, 'd': values})
grouped = data.groupby(['a', 'b', 'c', 'd'])
len(grouped)

A side effect of this error is that if there are NaN values as possible keys it won't ignore them, it will replace the NaN values with some other values present in the index.

Here there is a complete IPython notebook example to reproduce it:
http://nbviewer.ipython.org/gist/jordeu/cd86fc99f5f89451cf93

@jreback
Copy link
Contributor

jreback commented Dec 17, 2014

hmm, this overflow space is explicity handled (though maybe is hiding an error some). If you'd like to take a look would appreciate (this is a deep part of groupby).

@jreback jreback added this to the 0.16.0 milestone Dec 17, 2014
@jreback
Copy link
Contributor

jreback commented Dec 17, 2014

cc @behzadnouri
were just talking about this in #9077 (slightly different issue / manifestation)

@mpschr
Copy link

mpschr commented Jan 28, 2015

any progress on this issue? It's creating lots of false data

@jreback
Copy link
Contributor

jreback commented Jan 30, 2015

well, pull-requests are welcome.

This of course should be fixed, but you can avoid by not grouping by all available columns (which causes the error).

@jreback
Copy link
Contributor

jreback commented Jan 31, 2015

give a try on master, just merged in #9380 which fixes this.

@jordeu
Copy link
Author

jordeu commented Feb 2, 2015

Perfect, I've tested on master and now it's working. Thanks!

@rth
Copy link
Contributor

rth commented May 14, 2020

Just got this issue when doing a groupby on a large number of columns with high cardinality with pandas 1.0.3. I'm not sure it was fully fixed. Also there is #31355.

@wangxiaoying
Copy link

Just got this issue when doing a groupby on a large number of columns with high cardinality with pandas 1.0.3. I'm not sure it was fully fixed. Also there is #31355.

got the same problem in pandas 1.0.3

@lisovskey
Copy link

lisovskey commented Jul 6, 2020

got this error too with groupby (pandas 1.0.5)

@jreback
Copy link
Contributor

jreback commented Jul 6, 2020

you will have to open. new issue with a reproducible example in the latest released version and even better showing something exists in master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants