ENH: Add pipe method #10253

TomAugspurger · 2015-06-03T02:26:03Z

In the dev meeting, we settled on the following:

.pipe will not include a check for __pipe_func__ on the function passed in.
To avoid messiness with lambdas when a function takes the DataFrame other than in the first position, users can pass in a (callable, data_keyword) argument to .pipe (thanks @mwaskom)

   import statsmodels.formula.api as sm

   bb = pd.read_csv('data/baseball.csv', index_col='id')

   (bb.query('h > 0')
      .assign(ln_h = lambda df: np.log(df.h))
      # sm.possion expects `formula, data`
      .pipe((sm.poisson, 'data'), 'hr ~ ln_h + year + g + C(lg)')
      .fit()
      .summary()
   )
## -- End pasted text --
Optimization terminated successfully.
         Current function value: 2.116284
         Iterations 24
Out[1]:
<class 'statsmodels.iolib.summary.Summary'>
"""
                          Poisson Regression Results
==============================================================================
Dep. Variable:                     hr   No. Observations:                   68
Model:                        Poisson   Df Residuals:                       63
Method:                           MLE   Df Model:                            4
Date:                Tue, 02 Jun 2015   Pseudo R-squ.:                  0.6878
Time:                        20:57:27   Log-Likelihood:                -143.91
converged:                       True   LL-Null:                       -460.91
                                        LLR p-value:                6.774e-136
===============================================================================
                  coef    std err          z      P>|z|      [95.0% Conf. Int.]
-------------------------------------------------------------------------------
Intercept   -1267.3636    457.867     -2.768      0.006     -2164.767  -369.960
C(lg)[T.NL]    -0.2057      0.101     -2.044      0.041        -0.403    -0.008
ln_h            0.9280      0.191      4.866      0.000         0.554     1.302
year            0.6301      0.228      2.762      0.006         0.183     1.077
g               0.0099      0.004      2.754      0.006         0.003     0.017
===============================================================================
"""

Thanks everyone for the input in the issue.

This is mostly ready. What's a good name for the argument to pipe? I have func right now, @shoyer you had target in the issue thread. I've been using target as the keyword expecting the data, i.e. where the DataFrame should be pipe to.

The tests are extremely minimal... but so is the implementation. Am I missing any obvious edge-cases?

We'll see how this goes. I don't think I push .pipe as a protocol at all in the documentation, though we can change that in the future. We should be forwards-compatible if we do ever go down the __pipe_func__ route.

jreback · 2015-06-03T02:34:37Z

doc/source/basics.rst

+       )
+
+Pandas encourages the second style. It flows with the rest of pandas
+methods which return DataFrames or Series and are non-mutating by


maybe use pure instead?

I'm being pedantic, but "pure" implies no-side effects. df.plot() is non-mutating, but not pure since it has the side-effect of drawing a plot. I didn't want some functional programming guru to call us out :)

shoyer · 2015-06-03T06:18:15Z

A couple of other things that would be nice to highlight in the docs:

Let's mention how this was inspired by popular pipe operator %>% from R's magrittr package, but the implementation here is explicit and Pythonic. I would encourage reading the source code -- we might even include it in the docs.
We should encourage just using pipe instead of monkey patching. I would consider removing the mention of monkey patching at all -- this is a far better way to go.

jorisvandenbossche · 2015-06-03T12:53:15Z

doc/source/basics.rst

+
+   >>> (df.pipe(h),
+          .pipe(g, arg1=1),
+          .pipe(f, arg2=2, arg3=3)


I think the comma's at the end of the lines are not correct here?

TomAugspurger · 2015-06-03T13:02:53Z

I would consider removing the mention of monkey patching at all

Any objections to removing this section from the docs? I forgot it even existed.

TomAugspurger · 2015-06-04T00:56:10Z

Let's mention how this was inspired by popular pipe operator %>% from R's magrittr package, but the implementation here is explicit and Pythonic. I would encourage reading the source code -- we might even include it in the docs.

The pipe method is inspired by unix pipes and more recently dplyr_ and magrittr_, which
have introduced the popular (%>%) (read pipe) operator for R_.
The implementation of pipe here is quite clean and feels right at home in python.
We encourage you to view the source code (pd.DataFrame.pipe?? in IPyhton).

Ok, addressed all the comments I think (thanks). I removed the section on monkey patching.

The outstanding issue I see is @shoyer's comment about maybe checking whether we're about to clobber a kwarg when the tuple-style is used.

if target in kwargs:
    raise ValueError('%s is both the pipe target and a keyword argument' % target)

to catch a case of df.add(1).pipe((f, 'data'), x=x, y=y, data=df). Right now we (silently) replace kwargs['data'], The closest parallel I see is writing f(a=1, a=2), which Python catches.

shoyer · 2015-06-04T01:08:25Z

doc/source/basics.rst

+The pipe method is inspired by unix pipes and more recently dplyr_ and magrittr_, which
+have introduced the popular ``(%>%)`` (read pipe) operator for R_.
+The implementation of ``pipe`` here is quite clean and feels right at home in python.
+We encourage you to view the source code (``pd.DataFrame.pipe??`` in IPyhton).


IPyhton -> IPython

Thanks, I should read what I write :/

TomAugspurger · 2015-06-04T01:53:01Z

I put the kwarg clobbering (df.pipe((f, 'data'), x=1, data=2)) in a second commit: 968d0ae

EDIT: I also removed the monkey patching docs FYI, can reinstate them if anyone is attached to them.

jreback · 2015-06-04T12:12:50Z

pls rebase on master and repush had an issue with some builds

TomAugspurger · 2015-06-04T20:07:57Z

Travis is green. Are we good with checking whether the target is clobbered? And are we OK with raising a ValueError instead of SyntaxError like f(a=2, a=3) does?

jreback · 2015-06-04T20:18:10Z

doc/source/basics.rst

+
+1. `Tablewise Function Application`_: :meth:`~DataFrame.pipe`
+2. `Row or Column-wise Function Application`_: :meth:`~DataFrame.apply`
+3. Elementwise_ function application: :meth:`~DataFrame.applymap`


this needs backticks to pick up the references

backticks on the "Elementwise_"? It works w/o the backticks since it's a single word. Are do you mean the method references?

oh, ok, then didn't know that.

TomAugspurger · 2015-06-05T12:36:44Z

Ok, fixed that newline in the docstring and I just link people to the pipe section from internals.rst instead of the monkeypathcing.

jreback · 2015-06-05T21:47:39Z

doc/source/whatsnew/v0.16.2.txt

@@ -10,6 +10,7 @@ We recommend that all users upgrade to this version.
 Highlights include:

 - Documentation on how to use ``numba`` with *pandas*, see :ref:`here <enhancingperf.numba>`
+- A new ``pipe`` method, :ref:`here <basics.pipe>`


minor point, but I usually have these refer to the whatsnew itself, e.g. below (with the actual doc link from the whatsnew to the docs)

I think the numba one is linking to the docs (could be wrong). Want me to change that to the section in whatsnew?

nvm, didn't see it doesn't have a section

the numba one doesn't have another what's entry that's why its that way. Whereas you have a section in the whatsnew (which is good). So the link will skip to there. Then you already have the link back to the docs (at the end).

jreback · 2015-06-05T21:48:05Z

@TomAugspurger lgtm. merge when ready (see that minor point above though)

shoyer · 2015-06-05T21:51:20Z

doc/source/whatsnew/v0.16.2.txt

+
+In the example above, the functions ``f``, ``g``, and ``h`` each expected the DataFrame as the first positional argument.
+When the funciton you wish to apply takes its data anywhere other than the first argument, pass a tuple
+of ``(funciton, keyword)`` indicating where the DataFrame should flow. For example:


spelling: function

shoyer · 2015-06-05T21:54:25Z

looks pretty much good to go for me, too, after a few quick fixes

TomAugspurger · 2015-06-05T22:13:57Z

Ok, I added a section header for pipe in whatsnew, linked to that from above. Fixed the (two) typos on function, and added the ... continuations in the docstring.

TomAugspurger · 2015-06-05T22:14:29Z

We waiting on Travis? It's felt slow today :/

jreback · 2015-06-05T22:15:00Z

doc/source/api.rst

@@ -1,1597 +0,0 @@
-.. currentmodule:: pandas


? I think you checked in after you buildt the docs....!

Bah, one sec.

jreback · 2015-06-05T22:15:54Z

travis IS slow today (on master anyhow). not real sure why.

TomAugspurger · 2015-06-05T22:22:25Z

OK, fixed the accidental deletion of api.rst

jreback · 2015-06-05T22:29:17Z

pandas/core/generic.py

+        ...    .pipe(g, arg1=a)
+        ...    .pipe(f, arg2=b, arg3=c)
+        ... )
+


maybe show an example of using the callable & data_keyword in the Notes? (can do later)

TomAugspurger · 2015-06-05T23:36:54Z

@shoyer @jreback merge at your leisure. Or I'll be back online later tonight to push the button :)

shoyer · 2015-06-06T00:24:47Z

pandas/core/generic.py

+        ... )
+
+        If you have a function that takes the data as (say) the second
+        argumnet, pass a tuple indicating which keyword expects the


spelling: argument

I'm normally not the bad at spelling.

"the bad"? :)

Ugh. Long day.

ENH: Add pipe method

The implementation here is directly copied from pandas: pandas-dev/pandas#10253

jreback · 2015-08-19T17:19:06Z

http://blog.ibis-project.org/design-composability/

from @wesm

TomAugspurger force-pushed the pipe branch from cf810b3 to 78d6826 Compare June 3, 2015 02:31

TomAugspurger added the API Design label Jun 3, 2015

TomAugspurger added this to the 0.16.2 milestone Jun 3, 2015

jreback reviewed Jun 3, 2015
View reviewed changes

jorisvandenbossche reviewed Jun 3, 2015
View reviewed changes

TomAugspurger changed the title ~~ENH: this is a pipe~~ ENH: Add pipe method Jun 3, 2015

jorisvandenbossche added the Enhancement label Jun 3, 2015

shoyer mentioned this pull request Jun 3, 2015

API: Implement pipe protocol, a method for extensible method chaining #10129

Closed

TomAugspurger force-pushed the pipe branch from 78d6826 to d851c36 Compare June 4, 2015 00:51

shoyer reviewed Jun 4, 2015
View reviewed changes

TomAugspurger force-pushed the pipe branch from d851c36 to 968d0ae Compare June 4, 2015 01:51

TomAugspurger force-pushed the pipe branch from 968d0ae to e1451da Compare June 4, 2015 12:17

jreback reviewed Jun 4, 2015
View reviewed changes

TomAugspurger force-pushed the pipe branch 2 times, most recently from 9e92e8d to 16394d7 Compare June 5, 2015 12:35

jreback reviewed Jun 5, 2015
View reviewed changes

shoyer reviewed Jun 5, 2015
View reviewed changes

TomAugspurger force-pushed the pipe branch from 16394d7 to 92358d1 Compare June 5, 2015 22:13

jreback reviewed Jun 5, 2015
View reviewed changes

TomAugspurger force-pushed the pipe branch from 92358d1 to 1ade043 Compare June 5, 2015 22:19

jreback reviewed Jun 5, 2015
View reviewed changes

TomAugspurger force-pushed the pipe branch from 1ade043 to 2a64769 Compare June 5, 2015 23:36

shoyer reviewed Jun 6, 2015
View reviewed changes

TomAugspurger added 2 commits June 5, 2015 22:09

ENH: this is a pipe

356fa4a

API: catch target kwarg clobbering

0c3bf51

TomAugspurger force-pushed the pipe branch from 2a64769 to 0c3bf51 Compare June 6, 2015 03:09

TomAugspurger pushed a commit that referenced this pull request Jun 6, 2015

Merge pull request #10253 from TomAugspurger/pipe

031e3bc

ENH: Add pipe method

TomAugspurger merged commit 031e3bc into pandas-dev:master Jun 6, 2015

shoyer added a commit to shoyer/xarray that referenced this pull request Jun 10, 2015

Add pipe method copied from pandas

11685a4

The implementation here is directly copied from pandas: pandas-dev/pandas#10253

shoyer mentioned this pull request Jun 10, 2015

Add pipe method copied from pandas pydata/xarray#429

Merged

TomAugspurger deleted the pipe branch August 18, 2015 12:44

jreback mentioned this pull request Nov 1, 2015

support for .pipe, how to make this render in the notebook w/o using show(p) bokeh/bokeh#3046

Closed

ENH: Add pipe method #10253

ENH: Add pipe method #10253

Conversation

TomAugspurger commented Jun 3, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shoyer commented Jun 3, 2015

Choose a reason for hiding this comment

TomAugspurger commented Jun 3, 2015

TomAugspurger commented Jun 4, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TomAugspurger commented Jun 4, 2015

jreback commented Jun 4, 2015

TomAugspurger commented Jun 4, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TomAugspurger commented Jun 5, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Jun 5, 2015

Choose a reason for hiding this comment

shoyer commented Jun 5, 2015

TomAugspurger commented Jun 5, 2015

TomAugspurger commented Jun 5, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Jun 5, 2015

TomAugspurger commented Jun 5, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TomAugspurger commented Jun 5, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Aug 19, 2015