New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: .pipe on Resampler #17905

topper-123 opened this Issue Oct 17, 2017 · 0 comments


None yet
2 participants

topper-123 commented Oct 17, 2017

I've made a PR(#17871) to get pipe funcionality on GroupBy objects.

Resampler should IMO have the same .pipe behaviour as GroupBy objects. Then you could do the below in a single pass:

df.resample('3M').pipe(lambda resampled: resampled.Open.first() - resampled.close.last())

That is, for each resampled period, you could reuse a Resampler objects multiple time in a pipe. The alternative would be to do it in several lines, which would be less readable, or to use apply, which would be slower.

Currently, however, Resampler.pipe/DatetimeResampler.pipe implicitly converts to a dataframe of mean before piping, which to me seems wrong/unintuitive:

>>> df.resample('3M').pipe(lambda x: x.max() - x.min())
C:\Users\TP\Anaconda3\envs\pandasdev\Scripts\ FutureWarning:
.resample() is now a deferred operation
You called pipe(...) on this deferred object which materialized it into a dataframe
by implicitly taking the mean.  Use .resample(...).mean() instead

To demonstrate:

First set-up:

>>> d = = pd.date_range('2017-01-01', periods=4)
>>> df = pd.DataFrame(dict(B=[1,2,3, 4]), index=d)
>>> r = df.resample('2D')

if we call pipe we get an uexpected result:

>>> r.pipe(lambda x: x.max() - x.min())
B    2.0
dtype: float64

The reason for the unexpected result is that under the hood the above is:

>>> r.mean().pipe(lambda x: x.max() - x.min())
B    2.0
dtype: float64

Expected would be:

>>> def diff(r):
...:       return r.max() - r.mean()
>>> diff(r)   # r.pipe(diff) should give the same result as this 
2017-01-01  0.5
2017-01-03  0.5

IMO, if the user wants the mean before piping, he should just himself call mean before pipe.

Adding a pipe was very easy for GroupBy and has very good use cases. I propose adding a (proper) pipe to Resampler also.

@topper-123 topper-123 changed the title from .pipe on Resampler/other pandas objects to ENH: .pipe on Resampler Oct 17, 2017

@jreback jreback added this to the Next Major Release milestone Oct 18, 2017

@jreback jreback modified the milestones: Next Major Release, 0.23.0 Dec 26, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment