API: change .resample to be a groupby-like API #11732

Closed
jreback opened this Issue Dec 1, 2015 · 7 comments

Comments

Projects
None yet
2 participants
Contributor

jreback commented Dec 1, 2015

similar to #11603

this would transform:

s.resample('D',how='max')

to

s.resample('D').max()

This would be a breaking API change, as the default is how='mean', meaning, that s.resample('D') returns the mean of the resampled data. However it would be visible at the very least and not simply change working code.

This would bring .resample (which is just a groupby type operation under the hood anyhow) into the API syntax for .groupby and .rolling et. al.

Furthermore this would allow geitem / aggregate type operations with minimal effort
e.g.

s.resample('D').agg(['min','max'])

jreback added this to the 0.18.0 milestone Dec 1, 2015

jreback changed the title from API: change .resample to be a groupby-like operation to API: change .resample to be a groupby-like API Dec 1, 2015

Member

shoyer commented Dec 1, 2015

This change would also eliminate the need many of the current use cases for pd.TimeGrouper, which is a nice thing because that API is pretty well hidden right now.

This API will work well for downsampling (to a coarser time resolution), but it's not clear to me how it would work for upsampling or combined down/upsampling. For example, how would you upsample from daily to hourly data using forward filling with the new API? s.resample('H').mean(fill_method='pad')? Using a method like mean is a bit confusing in this context.

Contributor

jreback commented Dec 1, 2015

s.resample('H').pad()

Contributor

jreback commented Dec 1, 2015

I am not sure that combined up/downsampling is even possible now?

Contributor

jreback commented Dec 1, 2015

or maybe to be more in-line

s.resample('H').ffill()
s.resample('H').fillna(method='pad')

(or all the above)

I guess

s.upsample('H').ffill() is also possible :)

Member

shoyer commented Dec 1, 2015

Here's a simple example of combined up/downsampling:

In [25]: idx = pd.to_datetime(['2000-01-01T06', '2000-01-01T12', '2000-01-03T00'])

In [26]: s = pd.Series(range(3), idx)

In [27]: s
Out[27]:
2000-01-01 06:00:00    0
2000-01-01 12:00:00    1
2000-01-03 00:00:00    2
dtype: int64

In [28]: s.resample('1D')
Out[28]:
2000-01-01    0.5
2000-01-02    NaN
2000-01-03    2.0
Freq: D, dtype: float64

In [29]: s.resample('1D', fill_method='pad')
Out[29]:
2000-01-01    0.5
2000-01-02    0.5
2000-01-03    2.0
Freq: D, dtype: float64
Contributor

jreback commented Dec 1, 2015

I suppose we could have an optional fill_method kw in the Resample object
e.g. in s.resample('D',fill_method='pad') if necessary (similar to how .reindex has this, but normally you would do a: .reindex().ffill()

e.g.

In [23]: s.resample('1D',how='mean').ffill()
Out[23]: 
2000-01-01    0.5
2000-01-02    0.5
2000-01-03    2.0
Freq: D, dtype: float64

which I would do like:
s.resample('1D').mean().ffill()

I guess fill_method would apply while doing the mean intra-day I guess (though I don't think I can see a case for this).

Contributor

jreback commented Dec 2, 2015

POC

In [3]: s = Series(np.random.rand(1000), pd.date_range('20130101 09:00:00',freq='Min',periods=1000))

In [4]: r = s.resample2('H')

In [5]: r
Out[5]: DatetimeIndexResampler [freq-><Hour>,axis->0,closed->left,label->left,convention->start,base->0]

In [6]: r.
r.agg        r.aggregate  r.ax         r.mean       r.name       

In [6]: r.mean()
Out[6]: 
2013-01-01 09:00:00    0.463474
2013-01-01 10:00:00    0.496552
2013-01-01 11:00:00    0.467690
2013-01-01 12:00:00    0.542037
2013-01-01 13:00:00    0.500808
2013-01-01 14:00:00    0.541115
2013-01-01 15:00:00    0.549489
2013-01-01 16:00:00    0.567870
2013-01-01 17:00:00    0.466067
2013-01-01 18:00:00    0.468675
2013-01-01 19:00:00    0.520051
2013-01-01 20:00:00    0.495800
2013-01-01 21:00:00    0.496541
2013-01-01 22:00:00    0.437051
2013-01-01 23:00:00    0.514727
2013-01-02 00:00:00    0.517313
2013-01-02 01:00:00    0.501945
Freq: H, dtype: float64

jreback referenced this issue Dec 14, 2015

Closed

Refactored Resample API breaking change #11841

2 of 2 tasks complete

@jreback jreback added a commit to jreback/pandas that referenced this issue Dec 23, 2015

@jreback jreback ENH: .resample API to groupby-like class, #11732 5b59fc0

@jreback jreback added a commit to jreback/pandas that referenced this issue Feb 2, 2016

@jreback jreback ENH: .resample API to groupby-like class, #11732
original API detection & warning

support for isinstance / numeric ops

support for comparison ops

DOC: documentation updates w.r.t. aggregation
e570570

jreback closed this in 1dc49f5 Feb 2, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment