groupby AssertionError with datetime column name #35876

hliatrussellinvestments · 2020-08-24T18:29:56Z

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.

Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

# Your code here
import pandas as pd
import datetime

column_names = ['dimension_1', 'dt', 'value']
data = [
        ('D1', datetime.date(2020, 1, 1), 2.1),
        ('D1', datetime.date(2020, 1, 2), 4.1),
        ('D1', datetime.date(2020, 1, 3), 1.7),
    ]
df_stack = pd.DataFrame(data=data, columns=column_names)
df_stack['dt'] = pd.to_datetime(df_stack['dt'])
df_stack.set_index(['dimension_1', 'dt'], inplace=True)

df_pivot = df_stack.unstack(level=['dt']).transpose().droplevel(None).transpose()

agg_map = {c: 'sum' for c in df_pivot.columns}
df_pivot.groupby(level=['dimension_1']).agg(agg_map)

Problem description

When executing groupby (on index) for dataframe whose column-names are of type datetime ('datetime64[ns]'), the groupby fails with AssertionError (result.name == res_name).

AssertionError Traceback (most recent call last) in 1 # do groupby on index 2 agg_map = {c: 'sum' for c in df_pivot.columns} ----> 3 df_pivot.groupby(level=['dimension_1']).agg(agg_map)

c:\users\hli\appdata\local\programs\python\python38-32\lib\site-packages\pandas\core\groupby\generic.py in aggregate(self, func, engine, engine_kwargs, *args, **kwargs)
947 )
948
--> 949 result, how = self._aggregate(func, *args, **kwargs)
950 if how is None:
951 return result

c:\users\hli\appdata\local\programs\python\python38-32\lib\site-packages\pandas\core\base.py in _aggregate(self, arg, *args, **kwargs)
349 keys = list(arg.keys())
350 if isinstance(obj, ABCDataFrame) and len(
--> 351 obj.columns.intersection(keys)
352 ) != len(keys):
353 cols = sorted(set(keys) - set(obj.columns.intersection(keys)))

c:\users\hli\appdata\local\programs\python\python38-32\lib\site-packages\pandas\core\indexes\datetimelike.py in intersection(self, other, sort)
701 # TODO: no tests rely on this; needed?
702 result = result._with_freq("infer")
--> 703 assert result.name == res_name
704 return result
705

Expected Output

The expected result in this example should be the dataframe itself, since it only has one row. This can be seen by changing the column data type to string:

df_pivot.columns = df_pivot.columns.astype(str)
agg_map = {c: 'sum' for c in df_pivot.columns}
df_pivot.groupby(level=['dimension_1']).agg(agg_map)

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : f2ca0a2
python : 3.8.5.final.0
python-bits : 32
OS : Windows
OS-release : 10
Version : 10.0.18362
machine : AMD64
processor : Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : English_United States.1252

pandas : 1.1.1
numpy : 1.19.0
pytz : 2020.1
dateutil : 2.8.1
pip : 20.2.1
setuptools : 47.1.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.5.1
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.17.0
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.2.2
numexpr : None
odfpy : None
openpyxl : 3.0.3
pandas_gbq : None
pyarrow : None
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.5.0
sqlalchemy : 1.3.18
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None

The text was updated successfully, but these errors were encountered:

dsaxton · 2020-08-24T20:50:19Z

Thanks @hliatrussellinvestments, can you try to make this example more minimal (there's a lot of code here that looks unrelated to the bug)?

This worked in 1.0.5 and the regression seems to be due to 29c820f.

cc @jbrockmendel

In [1]: import pandas as pd
   ...: import datetime
   ...:
   ...: print(pd.__version__)
   ...:
   ...: column_names = ['dimension_1', 'dt', 'value']
   ...: data = [
   ...:     ('D1', datetime.date(2020, 1, 1), 2.1),
   ...:     ('D1', datetime.date(2020, 1, 2), 4.1),
   ...:     ('D1', datetime.date(2020, 1, 3), 1.7),
   ...: ]
   ...: df_stack = pd.DataFrame(data=data, columns=column_names)
   ...: df_stack['dt'] = pd.to_datetime(df_stack['dt'])
   ...: df_stack = df_stack.set_index(['dimension_1', 'dt'])
   ...:
   ...: df_pivot = df_stack.unstack(level=['dt']).transpose().droplevel(None).transpose()
   ...:
   ...: agg_map = {c: 'sum' for c in df_pivot.columns}
   ...: df_pivot.groupby(level=['dimension_1']).agg(agg_map)
   ...:
1.0.5
Out[1]:
             2020-01-01  2020-01-02  2020-01-03
dimension_1
D1                  2.1         4.1         1.7

dsaxton · 2020-08-24T21:15:47Z

I think this is essentially the problem (it's an index issue and not one with groupby per se):

import pandas as pd

print(pd.__version__)

values = [pd.Timestamp("2020-01-01"), pd.Timestamp("2020-02-01")]
idx = pd.DatetimeIndex(values, name="a")
idx.intersection(values)

1.2.0.dev0+147.g07983803b
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-1-d5a68db44b8f> in <module>
      5 values = [pd.Timestamp("2020-01-01"), pd.Timestamp("2020-02-01")]
      6 idx = pd.DatetimeIndex(values, name="a")
----> 7 idx.intersection(values)

~/opt/miniconda3/envs/pandas-dev/lib/python3.8/site-packages/pandas/core/indexes/datetimelike.py in intersection(self, other, sort)
    701                     # TODO: no tests rely on this; needed?
    702                     result = result._with_freq("infer")
--> 703             assert result.name == res_name
    704             return result
    705

AssertionError:

Related issue #35847

hliatrussellinvestments added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 24, 2020

dsaxton added Apply Apply, Aggregate, Transform Groupby Regression Functionality that used to work in a prior pandas version and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 24, 2020

dsaxton added Index Related to the Index class or subclasses Timeseries and removed Apply Apply, Aggregate, Transform Groupby labels Aug 24, 2020

jbrockmendel mentioned this issue Aug 24, 2020

REGR: DatetimeIndex.intersection incorrectly raising AssertionError #35877

Merged

5 tasks

jreback added this to the 1.1.2 milestone Aug 24, 2020

jreback closed this as completed in #35877 Aug 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

groupby AssertionError with datetime column name #35876

groupby AssertionError with datetime column name #35876

hliatrussellinvestments commented Aug 24, 2020

INSTALLED VERSIONS

dsaxton commented Aug 24, 2020

dsaxton commented Aug 24, 2020 •

edited

groupby AssertionError with datetime column name #35876

groupby AssertionError with datetime column name #35876

Comments

hliatrussellinvestments commented Aug 24, 2020

Code Sample, a copy-pastable example

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

dsaxton commented Aug 24, 2020

dsaxton commented Aug 24, 2020 • edited

Output of `pd.show_versions()`

dsaxton commented Aug 24, 2020 •

edited