New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resampling with how=list_of_funcs not returning dataframe #1596

Closed
lenolib opened this Issue Jul 10, 2012 · 2 comments

Comments

Projects
None yet
2 participants
@lenolib
Contributor

lenolib commented Jul 10, 2012

I would suggest that the observed behaviour is not the expected when passing a how=sequence of functions in resample.
I would like to get a dataframe back where each column corresponds to applying each function to each time interval.
If you insert a value into the series, for example at 2012-06-12 01:01:00, you get that behaviour.

In [54]: stamps = [p for p in pd.DatetimeIndex(freq='60Min',start=pd.datetime(2012,6,12),periods=4)]

In [55]: stamps
Out[55]:
[Timestamp: 2012-06-12 00:00:00,
Timestamp: 2012-06-12 01:00:00,
Timestamp: 2012-06-12 02:00:00,
Timestamp: 2012-06-12 03:00:00]

In [56]: series = pd.TimeSeries([1,2,3,4], index=stamps)

In [57]: series
Out[57]:
2012-06-12 00:00:00 1
2012-06-12 01:00:00 2
2012-06-12 02:00:00 3
2012-06-12 03:00:00 4

In [58]: series.resample( '20Min', how=(np.mean,np.sum) )
Out[58]:
2012-06-12 00:00:00 1
2012-06-12 00:20:00 NaN
2012-06-12 00:40:00 NaN
2012-06-12 01:00:00 2
2012-06-12 01:20:00 NaN
2012-06-12 01:40:00 NaN
2012-06-12 02:00:00 3
2012-06-12 02:20:00 NaN
2012-06-12 02:40:00 NaN
2012-06-12 03:00:00 4
Freq: 20T

In [59]:

@lenolib

This comment has been minimized.

Contributor

lenolib commented Jul 11, 2012

For my purposes, a temporary fix consists of shifting the last entry a microsecond before the even hour:

series[:-1].append(pd.Series(series[-1],index=[series.index[-1]-relativedelta(microseconds=1)])).resample( '20Min', how=(np.mean,np.size) )
mean size
2012-06-12 00:00:00 1 1
2012-06-12 00:20:00 NaN 0
2012-06-12 00:40:00 NaN 0
2012-06-12 01:00:00 2 1
2012-06-12 01:20:00 NaN 0
2012-06-12 01:40:00 NaN 0
2012-06-12 02:00:00 3 1
2012-06-12 02:20:00 NaN 0
2012-06-12 02:40:00 NaN 0
2012-06-12 03:00:00 4 1

@wesm

This comment has been minimized.

Member

wesm commented Jul 11, 2012

The issue is that you are not aggregating:

In [12]: series
Out[12]: 
2012-06-12 00:00:00    1
2012-06-12 01:00:00    2
2012-06-12 02:00:00    3
2012-06-12 03:00:00    4

In [13]: series.resample('2h', how=(np.mean, np.sum))
Out[13]: 
                     mean  sum
2012-06-12 00:00:00   1.0    1
2012-06-12 02:00:00   2.5    5
2012-06-12 04:00:00   4.0    4

I'll see if there's a way for me to change the behavior in the upsampling case

@wesm wesm closed this in 77b49b9 Jul 11, 2012

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment