Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't iterate a DataFrameGroupBy object if the group by key contains NaN #14170

Closed
chengguangnan opened this issue Sep 7, 2016 · 1 comment
Closed
Labels
API Design Duplicate Report Duplicate issue or pull request Groupby

Comments

@chengguangnan
Copy link

Code Sample, a copy-pastable example if possible

import pandas as pd

df = pd.DataFrame({'a': [1,2,3,4], 'b':[None, None, None, None]})

chunks = df.groupby('a')
assert len([g for g, chunk in chunks]) == 4

chunks = df.groupby(['a', 'b'])

# failed here, it's emtpy
assert len([g for g, chunk in chunks]) == 4
# however, chunks.groups shows that there are 4 groups
#{(1, nan): [0], (2, nan): [1], (3, nan): [2], (4, nan): [3]}


# but if we fillna('') first then it could pass
chunks = df.fillna('').groupby(['a', 'b'])
assert len([g for g, chunk in chunks]) == 4


# In[11]:

Expected Output

I don't see why have NaN in the grouped by key should fail the groupby. Especially if you use chunks.groups, you do have them grouped like this {(1, nan): [0], (2, nan): [1], (3, nan): [2], (4, nan): [3]} In [18]:

output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-36-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: 
LANG: en_US.UTF-8

pandas: 0.17.1
nose: 1.3.7
pip: 8.1.2
setuptools: 24.0.0
Cython: 0.23.4
numpy: 1.11.1
scipy: 0.17.0
statsmodels: 0.6.1
IPython: 4.0.3
sphinx: 1.3.5
patsy: 0.4.0
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.4.6
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.4
lxml: 3.5.0
bs4: 4.4.1
html5lib: 1.0b8
httplib2: 0.9.2
apiclient: None
sqlalchemy: 1.0.13
pymysql: None
psycopg2: 2.6.1 (dt dec pq3 ext lo64)
Jinja2: None
@jreback
Copy link
Contributor

jreback commented Sep 7, 2016

this is by-design ATM, as na groups are excluded by default, see the docs here. #12607 looks promising to fix the underying issue #443.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Duplicate Report Duplicate issue or pull request Groupby
Projects
None yet
Development

No branches or pull requests

2 participants