New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pandas 0.18.1 df.groupby().count() throws "ValueError: Buffer has wrong number of dimensions" when one of the counted columns has dtype datetime64 #13393

Closed
jfries opened this Issue Jun 7, 2016 · 10 comments

Comments

Projects
None yet
4 participants
@jfries

jfries commented Jun 7, 2016

I've confirmed the error does not occur in Pandas 0.16 on a similar machine.

Code Sample, a copy-pastable example if possible

import pandas as pd
df = pd.DataFrame({'x': ['a', 'a', 'b'],
                   'y': [pd.Timestamp('2016-05-07 20:09:25+00:00'), pd.Timestamp('2016-05-07 20:09:29+00:00'), pd.Timestamp('2016-05-07 20:09:29+00:00')]})
df.groupby('x').count()

Observed output


ValueError                                Traceback (most recent call last)
<ipython-input-5-3119045de5b1> in <module>()
      2 df = pd.DataFrame({'x': ['a', 'a', 'b'],
      3                    'y': [pd.Timestamp('2016-05-07 20:09:25+00:00'), pd.Timestamp('2016-05-07 20:09:29+00:00'), pd.Timestamp('2016-05-07 20:09:29+00:00')]})
----> 4 print df.groupby('x').count()

/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.pyc in count(self)
   3754         blk = map(make_block, map(counter, val), loc)
   3755 
-> 3756         return self._wrap_agged_blocks(data.items, list(blk))
   3757 
   3758 

pandas/lib.pyx in pandas.lib.count_level_2d (pandas/lib.c:23068)()

ValueError: Buffer has wrong number of dimensions (expected 2, got 1)

Expected Output

x  y
a  2
b  1

output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.6.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.0-32-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.18.1
nose: 1.3.1
pip: 8.1.2
setuptools: 21.2.1
Cython: None
numpy: 1.11.0
scipy: 0.16.0
statsmodels: 0.5.0
xarray: None
IPython: 4.2.0
sphinx: None
patsy: 0.2.1
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: 0.8.0
tables: 3.1.1
numexpr: 2.2.2
matplotlib: 1.5.1
openpyxl: 1.7.0
xlrd: 0.9.2
xlwt: 0.7.5
xlsxwriter: 0.5.8
lxml: None
bs4: 4.2.1
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: 0.9.7
pymysql: None
psycopg2: 2.5.3 (dt dec pq3 ext)
jinja2: 2.8
boto: 2.36.0
pandas_datareader: None
@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Jun 7, 2016

Contributor

Yeah this is an issue, I thought it was a duplicate, but can't seem to find an open one.

its a bit to fix, but pull-requests welcome.

You can do this.

In [4]: df.groupby('x').y.count()
Out[4]: 
x
a    2
b    1
Name: y, dtype: int64
Contributor

jreback commented Jun 7, 2016

Yeah this is an issue, I thought it was a duplicate, but can't seem to find an open one.

its a bit to fix, but pull-requests welcome.

You can do this.

In [4]: df.groupby('x').y.count()
Out[4]: 
x
a    2
b    1
Name: y, dtype: int64

@jreback jreback added this to the Next Major Release milestone Jun 7, 2016

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Jun 7, 2016

Contributor

as an FYI, pls paste code as markdown. I edited the above.

Contributor

jreback commented Jun 7, 2016

as an FYI, pls paste code as markdown. I edited the above.

@jfries

This comment has been minimized.

Show comment
Hide comment
@jfries

jfries Jun 7, 2016

sorry about the lack of markdown, will use going forward.
Agreed that your workaround is correct, but I'm pretty sure that a lot of existing pandas code uses the idiom in my example, as it's common in a lot of tutorials & example code.

jfries commented Jun 7, 2016

sorry about the lack of markdown, will use going forward.
Agreed that your workaround is correct, but I'm pretty sure that a lot of existing pandas code uses the idiom in my example, as it's common in a lot of tutorials & example code.

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Jun 7, 2016

Contributor

@jfries sure, and that's why its marked as a bug.

Contributor

jreback commented Jun 7, 2016

@jfries sure, and that's why its marked as a bug.

@alan-wong

This comment has been minimized.

Show comment
Hide comment
@alan-wong

alan-wong Jun 29, 2016

This seems related to why the following no longer works in 0.18.1:

df = pd.DataFrame({'a':list('abssbab')}) df.groupby('a').count()

this also used to work

alan-wong commented Jun 29, 2016

This seems related to why the following no longer works in 0.18.1:

df = pd.DataFrame({'a':list('abssbab')}) df.groupby('a').count()

this also used to work

@jorisvandenbossche

This comment has been minimized.

Show comment
Hide comment
@jorisvandenbossche

jorisvandenbossche Jun 29, 2016

Member

@alan-wong I don't think that is related. In your example, there are no columns left to count the values, so I think it is correct that it does not work. But you are right the behaviour did change. Previously it returned an empty dataframe, now it does give an error.
And the empty dataframe seems more correct, @alan-wong do you want to open an new issue about that?

Member

jorisvandenbossche commented Jun 29, 2016

@alan-wong I don't think that is related. In your example, there are no columns left to count the values, so I think it is correct that it does not work. But you are right the behaviour did change. Previously it returned an empty dataframe, now it does give an error.
And the empty dataframe seems more correct, @alan-wong do you want to open an new issue about that?

@alan-wong

This comment has been minimized.

Show comment
Hide comment
@alan-wong

alan-wong Jun 29, 2016

@jorisvandenbossche I get an empty dataframe on version 0.18.1 not sure what version I was running when I posted this answer: http://stackoverflow.com/questions/22391433/count-the-frequency-that-a-value-occurs-in-a-dataframe-column/22391554#22391554 but it used to work it does raise an error if you pass as_index=False arg to the groupby call

alan-wong commented Jun 29, 2016

@jorisvandenbossche I get an empty dataframe on version 0.18.1 not sure what version I was running when I posted this answer: http://stackoverflow.com/questions/22391433/count-the-frequency-that-a-value-occurs-in-a-dataframe-column/22391554#22391554 but it used to work it does raise an error if you pass as_index=False arg to the groupby call

@jorisvandenbossche

This comment has been minimized.

Show comment
Hide comment
@jorisvandenbossche

jorisvandenbossche Jun 29, 2016

Member

@alan-wong Ah, indeed, on 0.18.1 it does work correctly, but on master not anymore. So that indeed looks like a bug. Do you want to open a new issue?

Member

jorisvandenbossche commented Jun 29, 2016

@alan-wong Ah, indeed, on 0.18.1 it does work correctly, but on master not anymore. So that indeed looks like a bug. Do you want to open a new issue?

@alan-wong

This comment has been minimized.

Show comment
Hide comment
@alan-wong

alan-wong Jun 29, 2016

@jorisvandenbossche but is it correct behaviour that it's empty? Are you saying that in older versions it shouldn't have worked? I'll post an issue

alan-wong commented Jun 29, 2016

@jorisvandenbossche but is it correct behaviour that it's empty? Are you saying that in older versions it shouldn't have worked? I'll post an issue

@alan-wong

This comment has been minimized.

Show comment
Hide comment
@alan-wong

alan-wong commented Jun 29, 2016

@jorisvandenbossche posted issue: #13530

@jreback jreback modified the milestones: Next Minor Release, Next Major Release Apr 3, 2017

@jreback jreback modified the milestones: Interesting Issues, 0.21.1 Nov 8, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment