Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A DataFrame including Timestamp with time zone fails to agg(), and makes errors. #23683

Closed
propella opened this issue Nov 14, 2018 · 2 comments

Comments

@propella
Copy link

commented Nov 14, 2018

Code Sample, a copy-pastable example if possible

import pandas as pd

df = pd.DataFrame({
    'tag': [1,1],
    'date': [
        pd.Timestamp('2018-01-01', tz='UTC'),
        pd.Timestamp('2018-01-02', tz='UTC')]
})
df.groupby('tag').agg({'date': lambda e: e.head(1)})

Problem description

The above code makes the following errors.

Traceback (most recent call last):
  File "~/.pyenv/versions/anaconda3-5.3.0/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 2670, in agg_series
    return self._aggregate_series_fast(obj, func)
  File "~/.pyenv/versions/anaconda3-5.3.0/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 2689, in _aggregate_series_fast
    dummy)
  File "pandas/_libs/reduction.pyx", line 334, in pandas._libs.reduction.SeriesGrouper.__init__
  File "pandas/_libs/reduction.pyx", line 347, in pandas._libs.reduction.SeriesGrouper._check_dummy
ValueError: Dummy array must be same dtype

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "~/.pyenv/versions/anaconda3-5.3.0/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 3495, in aggregate
    return self._python_agg_general(func_or_funcs, *args, **kwargs)
  File "~/.pyenv/versions/anaconda3-5.3.0/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 1068, in _python_agg_general
    result, counts = self.grouper.agg_series(obj, f)
  File "~/.pyenv/versions/anaconda3-5.3.0/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 2672, in agg_series
    return self._aggregate_series_pure_python(obj, func)
  File "~/.pyenv/versions/anaconda3-5.3.0/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 2706, in _aggregate_series_pure_python
    raise ValueError('Function does not reduce')
ValueError: Function does not reduce

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "~/.pyenv/versions/anaconda3-5.3.0/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 4656, in aggregate
    return super(DataFrameGroupBy, self).aggregate(arg, *args, **kwargs)
  File "~/.pyenv/versions/anaconda3-5.3.0/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 4087, in aggregate
    result, how = self._aggregate(arg, _level=_level, *args, **kwargs)
  File "~/.pyenv/versions/anaconda3-5.3.0/lib/python3.7/site-packages/pandas/core/base.py", line 490, in _aggregate
    result = _agg(arg, _agg_1dim)
  File "~/.pyenv/versions/anaconda3-5.3.0/lib/python3.7/site-packages/pandas/core/base.py", line 441, in _agg
    result[fname] = func(fname, agg_how)
  File "~/.pyenv/versions/anaconda3-5.3.0/lib/python3.7/site-packages/pandas/core/base.py", line 424, in _agg_1dim
    return colg.aggregate(how, _level=(_level or 0) + 1)
  File "~/.pyenv/versions/anaconda3-5.3.0/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 3497, in aggregate
    result = self._aggregate_named(func_or_funcs, *args, **kwargs)
  File "~/.pyenv/versions/anaconda3-5.3.0/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 3627, in _aggregate_named
    raise Exception('Must produce aggregated value')
Exception: Must produce aggregated value

Actually, it works if I remove tz from the timestamps like this. So I guess it is a bug.

df = pd.DataFrame({
    'tag': [1,1],
    'date': [
        pd.Timestamp('2018-01-01'),
        pd.Timestamp('2018-01-02')]
})
df.groupby('tag').agg({'date': lambda e: e.head(1)})

Expected Output

          date
tag
1 2018-01-01 00:00:00+00:00

Output of pd.show_versions()

>>> pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.7.0.final.0
python-bits: 64
OS: Linux
OS-release: 4.15.0-34-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.4
pytest: 3.8.0
pip: 18.1
setuptools: 40.2.0
Cython: 0.28.5
numpy: 1.15.1
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.5.0
sphinx: 1.7.9
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: 1.2.1
tables: 3.4.4
numexpr: 2.6.8
feather: None
matplotlib: 2.2.3
openpyxl: 2.5.6
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.1.0
lxml: 4.2.5
bs4: 4.6.3
html5lib: 1.0.1
sqlalchemy: 1.2.11
pymysql: None
psycopg2: 2.7.5 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
@genchik1

This comment has been minimized.

Copy link

commented Nov 28, 2018

@mroeschke Hi, please allow me to take a crack at this, thank you!

@mroeschke

This comment has been minimized.

Copy link
Member

commented Nov 28, 2018

Sure, feel free @genchik1!

@jreback jreback added this to the 0.25.0 milestone Feb 16, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.