-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Add pipe method #10253
ENH: Add pipe method #10253
Conversation
) | ||
|
||
Pandas encourages the second style. It flows with the rest of pandas | ||
methods which return DataFrames or Series and are non-mutating by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe use pure instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm being pedantic, but "pure" implies no-side effects. df.plot()
is non-mutating, but not pure since it has the side-effect of drawing a plot. I didn't want some functional programming guru to call us out :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure...
A couple of other things that would be nice to highlight in the docs:
|
|
||
>>> (df.pipe(h), | ||
.pipe(g, arg1=1), | ||
.pipe(f, arg2=2, arg3=3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the comma's at the end of the lines are not correct here?
Any objections to removing this section from the docs? I forgot it even existed. |
The pipe method is inspired by unix pipes and more recently dplyr_ and magrittr_, which Ok, addressed all the comments I think (thanks). I removed the section on monkey patching. The outstanding issue I see is @shoyer's comment about maybe checking whether we're about to clobber a kwarg when the tuple-style is used. if target in kwargs:
raise ValueError('%s is both the pipe target and a keyword argument' % target) to catch a case of |
The pipe method is inspired by unix pipes and more recently dplyr_ and magrittr_, which | ||
have introduced the popular ``(%>%)`` (read pipe) operator for R_. | ||
The implementation of ``pipe`` here is quite clean and feels right at home in python. | ||
We encourage you to view the source code (``pd.DataFrame.pipe??`` in IPyhton). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IPyhton -> IPython
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I should read what I write :/
I put the kwarg clobbering EDIT: I also removed the monkey patching docs FYI, can reinstate them if anyone is attached to them. |
pls rebase on master and repush had an issue with some builds |
Travis is green. Are we good with checking whether the |
|
||
1. `Tablewise Function Application`_: :meth:`~DataFrame.pipe` | ||
2. `Row or Column-wise Function Application`_: :meth:`~DataFrame.apply` | ||
3. Elementwise_ function application: :meth:`~DataFrame.applymap` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs backticks to pick up the references
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
backticks on the "Elementwise_"? It works w/o the backticks since it's a single word. Are do you mean the method references?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, ok, then didn't know that.
9e92e8d
to
16394d7
Compare
Ok, fixed that newline in the docstring and I just link people to the |
@@ -10,6 +10,7 @@ We recommend that all users upgrade to this version. | |||
Highlights include: | |||
|
|||
- Documentation on how to use ``numba`` with *pandas*, see :ref:`here <enhancingperf.numba>` | |||
- A new ``pipe`` method, :ref:`here <basics.pipe>` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor point, but I usually have these refer to the whatsnew itself, e.g. below (with the actual doc link from the whatsnew to the docs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the numba one is linking to the docs (could be wrong). Want me to change that to the section in whatsnew?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvm, didn't see it doesn't have a section
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the numba one doesn't have another what's entry that's why its that way. Whereas you have a section in the whatsnew (which is good). So the link will skip to there. Then you already have the link back to the docs (at the end).
@TomAugspurger lgtm. merge when ready (see that minor point above though) |
|
||
In the example above, the functions ``f``, ``g``, and ``h`` each expected the DataFrame as the first positional argument. | ||
When the funciton you wish to apply takes its data anywhere other than the first argument, pass a tuple | ||
of ``(funciton, keyword)`` indicating where the DataFrame should flow. For example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spelling: function
looks pretty much good to go for me, too, after a few quick fixes |
Ok, I added a section header for |
We waiting on Travis? It's felt slow today :/ |
@@ -1,1597 +0,0 @@ | |||
.. currentmodule:: pandas |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
? I think you checked in after you buildt the docs....!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bah, one sec.
travis IS slow today (on master anyhow). not real sure why. |
OK, fixed the accidental deletion of api.rst |
... .pipe(g, arg1=a) | ||
... .pipe(f, arg2=b, arg3=c) | ||
... ) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe show an example of using the callable & data_keyword in the Notes? (can do later)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
... ) | ||
|
||
If you have a function that takes the data as (say) the second | ||
argumnet, pass a tuple indicating which keyword expects the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spelling: argument
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm normally not the bad at spelling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"the bad"? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh. Long day.
The implementation here is directly copied from pandas: pandas-dev/pandas#10253
Closes #10129
In the dev meeting, we settled on the following:
.pipe
will not include a check for__pipe_func__
on the function passed in.(callable, data_keyword)
argument to.pipe
(thanks @mwaskom)Thanks everyone for the input in the issue.
This is mostly ready. What's a good name for the argument to
pipe
? I havefunc
right now, @shoyer you hadtarget
in the issue thread. I've been usingtarget
as the keyword expecting the data, i.e. where the DataFrame should be pipe to.The tests are extremely minimal... but so is the implementation. Am I missing any obvious edge-cases?
We'll see how this goes. I don't think I push
.pipe
as a protocol at all in the documentation, though we can change that in the future. We should be forwards-compatible if we do ever go down the__pipe_func__
route.