Join GitHub today
ENH: .pipe on Resampler #17905
I've made a PR(#17871) to get pipe funcionality on GroupBy objects.
df.resample('3M').pipe(lambda resampled: resampled.Open.first() - resampled.close.last())
That is, for each resampled period, you could reuse a
>>> df.resample('3M').pipe(lambda x: x.max() - x.min()) C:\Users\TP\Anaconda3\envs\pandasdev\Scripts\ipython-script.py:1: FutureWarning: .resample() is now a deferred operation You called pipe(...) on this deferred object which materialized it into a dataframe by implicitly taking the mean. Use .resample(...).mean() instead
>>> d = = pd.date_range('2017-01-01', periods=4) >>> df = pd.DataFrame(dict(B=[1,2,3, 4]), index=d) >>> r = df.resample('2D')
if we call pipe we get an uexpected result:
>>> r.pipe(lambda x: x.max() - x.min()) B 2.0 dtype: float64
The reason for the unexpected result is that under the hood the above is:
>>> r.mean().pipe(lambda x: x.max() - x.min()) B 2.0 dtype: float64
Expected would be:
>>> def diff(r): ...: return r.max() - r.mean() >>> diff(r) # r.pipe(diff) should give the same result as this B 2017-01-01 0.5 2017-01-03 0.5
IMO, if the user wants the mean before piping, he should just himself call mean before pipe.
Adding a pipe was very easy for GroupBy and has very good use cases. I propose adding a (proper) pipe to Resampler also.