Resample category data with timedelta index #12169

mapa17 · 2016-01-28T17:58:27Z

Hi,

I get a very strange behavior when i try to resample categorical data with and timedelta index, as compared to a datetime index.

>> d1 = pd.DataFrame({'Group_obj': 'A'}, index=pd.date_range('2000-1-1', periods=20, freq='s'))
>> d1['Group'] = d1['Group_obj'].astype('category')
>> d1
                    Group_obj Group
2000-01-01 00:00:00         A     A
2000-01-01 00:00:01         A     A
2000-01-01 00:00:02         A     A
2000-01-01 00:00:03         A     A
2000-01-01 00:00:04         A     A
2000-01-01 00:00:05         A     A
2000-01-01 00:00:06         A     A
2000-01-01 00:00:07         A     A
2000-01-01 00:00:08         A     A
2000-01-01 00:00:09         A     A
2000-01-01 00:00:10         A     A
2000-01-01 00:00:11         A     A
2000-01-01 00:00:12         A     A
2000-01-01 00:00:13         A     A
2000-01-01 00:00:14         A     A
2000-01-01 00:00:15         A     A
2000-01-01 00:00:16         A     A
2000-01-01 00:00:17         A     A
2000-01-01 00:00:18         A     A
2000-01-01 00:00:19         A     A

>> corr = d1.resample('10s', how=lambda x: (x.value_counts().index[0]))
>> corr
                    Group_obj Group
2000-01-01 00:00:00         A     A
2000-01-01 00:00:10         A     A

>> corr.dtypes
Group_obj    object
Group        object
dtype: object

>> d2 = d1.set_index(pd.to_timedelta(list(range(20)), unit='s'))
>> fxx = d2.resample('10s', how=lambda x: (x.value_counts().index[0]))
>> fxx
         Group_obj  Group
00:00:00         A    NaN
00:00:10         A    NaN

>> fxx.dtypes
Group_obj     object
Group        float64
dtype: object

It seems to me the aggregated result in case of using timedelta as an index for the category is always NaN.
Should this be?

Thx

PS: is there a way to specify the dtype for the aggregated columns?

The text was updated successfully, but these errors were encountered:

jreback · 2016-01-28T18:18:44Z

hmm, does appear a little buggy.

you shouldn't need to specify the dtype on aggregations they are inferred. Here I think there is an embedded exception which is caught in stead of actuallly computing correctly.

jreback · 2016-01-28T18:19:41Z

I look after #11841 as the timedelta resampling is tested a bit more there (but not enough!)

BranYang · 2016-02-04T05:24:42Z

The root cause of this issue is that, when construct Series from a dict with TimedeltaIndex as key, it will treat the value as float64. See pandas/core/series.py, from line 172 to 185

try:
    if isinstance(index, DatetimeIndex):
        if len(data):
            # coerce back to datetime objects for lookup
            data = _dict_compat(data)
            data = lib.fast_multiget(data, index.astype('O'),
                                     default=np.nan)
        else:
            data = np.nan
    elif isinstance(index, PeriodIndex):
        data = ([data.get(i, nan) for i in index]
                if data else np.nan)
    else:
        data = lib.fast_multiget(data, index.values,
                                 default=np.nan)

I believe just change isinstance(index, PeriodIndex): to isinstance(index, (PeriodIndex, TimedeltaIndex): would solve this issue

Before

In [5]: fxx = d2.resample('10s', how=lambda x: (x.value_counts().index[0]))

In [6]: fxx
Out[6]:
         Group_obj  Group
00:00:00         A    NaN
00:00:10         A    NaN

After

In [5]: fxx = d2.resample('10s', how=lambda x: (x.value_counts().index[0]))

In [6]: fxx
Out[6]:
         Group_obj Group
00:00:00         A     A
00:00:10         A     A

closes pandas-dev#12169 Author: Bran Yang <snowolfy@163.com> Closes pandas-dev#12271 from BranYang/issue12169 and squashes the following commits: 4a5605f [Bran Yang] add tests to Series/test_constructors; and update whatsnew 7cf1be9 [Bran Yang] Fix pandas-dev#12169 - Resample category data with timedelta index

jreback added Bug Resample resample method Categorical Categorical Data Type Difficulty Intermediate labels Jan 28, 2016

jreback added this to the 0.18.0 milestone Jan 28, 2016

BranYang mentioned this issue Feb 9, 2016

Fix #12169 - Resample category data with timedelta index #12271

Closed

jreback closed this as completed in e9558d3 Feb 10, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resample category data with timedelta index #12169

Resample category data with timedelta index #12169

mapa17 commented Jan 28, 2016

jreback commented Jan 28, 2016

jreback commented Jan 28, 2016

BranYang commented Feb 4, 2016

Resample category data with timedelta index #12169

Resample category data with timedelta index #12169

Comments

mapa17 commented Jan 28, 2016

jreback commented Jan 28, 2016

jreback commented Jan 28, 2016

BranYang commented Feb 4, 2016