Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
group_by produces 'minlength must be positive error' when applied to empty DataFrame #11699
Comments
jreback
added Bug Groupby Difficulty Intermediate Effort Low
labels
Nov 25, 2015
jreback
added this to the
0.18.0
milestone
Nov 25, 2015
|
cc @behzadnouri @Sereger13 I don't think their is an easy way around this w/o resorting to patching |
Sereger13
commented
Nov 25, 2015
|
I see... We found that this code: If you do decide to fix size() - is there any idea when the next version/patch is going to be available? Thanks.. |
|
will be fixed; 0.18.0 prob later january |
Sereger13
commented
Nov 25, 2015
|
Thanks. |
|
@Sereger13 my point about patching is that you can avoid any code changes. note again that is a 'hack' but will work. e.g.
|
Sereger13
commented
Nov 25, 2015
|
Great - thanks for your help. |
|
This is more a bug in diff --git a/pandas/core/groupby.py b/pandas/core/groupby.py
index e9aa906..d722ef8 100644
--- a/pandas/core/groupby.py
+++ b/pandas/core/groupby.py
@@ -1439,7 +1439,8 @@ class BaseGrouper(object):
"""
ids, _, ngroup = self.group_info
ids = com._ensure_platform_int(ids)
- out = np.bincount(ids[ids != -1], minlength=ngroup)
+ mask = ids != -1
+ out = np.bincount(ids[mask], minlength=ngroup) if ngroup != 0 else []
return Series(out, index=self.result_index, dtype='int64')
@cache_readonly |
Sereger13
commented
Nov 26, 2015
|
Interesting... thanks for the update. Yes they could have made So it looks like simply setting ngroup to None should also do the trick:
Not sure this is more readable than @behzadnouri's solution though. Looking forward for a new pandas with the workaround! |
Sereger13 commentedNov 25, 2015
This used to work fine in previous versions but appears to be broken in 0.17.1
The following code:
Produces this error:
In v 0.16.2 the same code produced an empty DataFrame. We'd really like to upgrade to 0.17.1 but heavily rely on this functionality so have to hold the upgrade. Checking for empty DataFrame is not going to work for us either as there are too many places where it can actually be empty.
If you can suggest any workaround in the meantime so we could upgrade that would be appreciated.
INSTALLED VERSIONS
commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Linux
OS-release: 2.6.18-238.9.1.el5
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US
pandas: 0.16.2
...