Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A DataFrame including Timestamp with time zone fails to agg(), and makes errors. #23683

Closed
propella opened this issue Nov 14, 2018 · 2 comments · Fixed by #25308
Closed

A DataFrame including Timestamp with time zone fails to agg(), and makes errors. #23683

propella opened this issue Nov 14, 2018 · 2 comments · Fixed by #25308
Labels
Bug Groupby Timezones Timezone data dtype
Milestone

Comments

@propella
Copy link

Code Sample, a copy-pastable example if possible

import pandas as pd

df = pd.DataFrame({
    'tag': [1,1],
    'date': [
        pd.Timestamp('2018-01-01', tz='UTC'),
        pd.Timestamp('2018-01-02', tz='UTC')]
})
df.groupby('tag').agg({'date': lambda e: e.head(1)})

Problem description

The above code makes the following errors.

Traceback (most recent call last):
  File "~/.pyenv/versions/anaconda3-5.3.0/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 2670, in agg_series
    return self._aggregate_series_fast(obj, func)
  File "~/.pyenv/versions/anaconda3-5.3.0/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 2689, in _aggregate_series_fast
    dummy)
  File "pandas/_libs/reduction.pyx", line 334, in pandas._libs.reduction.SeriesGrouper.__init__
  File "pandas/_libs/reduction.pyx", line 347, in pandas._libs.reduction.SeriesGrouper._check_dummy
ValueError: Dummy array must be same dtype

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "~/.pyenv/versions/anaconda3-5.3.0/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 3495, in aggregate
    return self._python_agg_general(func_or_funcs, *args, **kwargs)
  File "~/.pyenv/versions/anaconda3-5.3.0/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 1068, in _python_agg_general
    result, counts = self.grouper.agg_series(obj, f)
  File "~/.pyenv/versions/anaconda3-5.3.0/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 2672, in agg_series
    return self._aggregate_series_pure_python(obj, func)
  File "~/.pyenv/versions/anaconda3-5.3.0/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 2706, in _aggregate_series_pure_python
    raise ValueError('Function does not reduce')
ValueError: Function does not reduce

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "~/.pyenv/versions/anaconda3-5.3.0/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 4656, in aggregate
    return super(DataFrameGroupBy, self).aggregate(arg, *args, **kwargs)
  File "~/.pyenv/versions/anaconda3-5.3.0/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 4087, in aggregate
    result, how = self._aggregate(arg, _level=_level, *args, **kwargs)
  File "~/.pyenv/versions/anaconda3-5.3.0/lib/python3.7/site-packages/pandas/core/base.py", line 490, in _aggregate
    result = _agg(arg, _agg_1dim)
  File "~/.pyenv/versions/anaconda3-5.3.0/lib/python3.7/site-packages/pandas/core/base.py", line 441, in _agg
    result[fname] = func(fname, agg_how)
  File "~/.pyenv/versions/anaconda3-5.3.0/lib/python3.7/site-packages/pandas/core/base.py", line 424, in _agg_1dim
    return colg.aggregate(how, _level=(_level or 0) + 1)
  File "~/.pyenv/versions/anaconda3-5.3.0/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 3497, in aggregate
    result = self._aggregate_named(func_or_funcs, *args, **kwargs)
  File "~/.pyenv/versions/anaconda3-5.3.0/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 3627, in _aggregate_named
    raise Exception('Must produce aggregated value')
Exception: Must produce aggregated value

Actually, it works if I remove tz from the timestamps like this. So I guess it is a bug.

df = pd.DataFrame({
    'tag': [1,1],
    'date': [
        pd.Timestamp('2018-01-01'),
        pd.Timestamp('2018-01-02')]
})
df.groupby('tag').agg({'date': lambda e: e.head(1)})

Expected Output

          date
tag
1 2018-01-01 00:00:00+00:00

Output of pd.show_versions()

>>> pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.7.0.final.0
python-bits: 64
OS: Linux
OS-release: 4.15.0-34-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.4
pytest: 3.8.0
pip: 18.1
setuptools: 40.2.0
Cython: 0.28.5
numpy: 1.15.1
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.5.0
sphinx: 1.7.9
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: 1.2.1
tables: 3.4.4
numexpr: 2.6.8
feather: None
matplotlib: 2.2.3
openpyxl: 2.5.6
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.1.0
lxml: 4.2.5
bs4: 4.6.3
html5lib: 1.0.1
sqlalchemy: 1.2.11
pymysql: None
psycopg2: 2.7.5 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
@mroeschke mroeschke added Bug Groupby Timezones Timezone data dtype labels Nov 14, 2018
@genchik1
Copy link

@mroeschke Hi, please allow me to take a crack at this, thank you!

@mroeschke
Copy link
Member

Sure, feel free @genchik1!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Groupby Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants