ENH: Have pivot and pivot_table take similar arguments #5505

jsexauer · 2013-11-14T23:27:59Z

Right now, there is

df.pivot(index=foo, columns=bar, values=baz)

and

df.pivot_table(rows=foo, cols=bar, values=baz)

Because of dirtyness in my data I occasionally have to go back and forth between the two forms (and use df.pivot_table(..., aggfunc='count') to see who had the bad data). It would be nice if I didn't have to change the keyword arguments each time and one had an alias of the other. It would also be easier to remember, especially in the trivial case of cols vs columns. Finally, the two are related operations and having an option for consistent keyword parameters makes sense.

I could make a PR to solve this easy enough, if this is an enhancement you are interested in. I'm not really sure what your stance is on redundant keyword arguments.

jtratner · 2013-11-13T18:33:23Z

If you want to propose something we'd likely be open to it (depending on
how much the API is changed), but you'd also need to maintain order of
arguments. Probably easiest to just add **kwargs to it and then validate
that only cols or columns were passed and then just convert to what
function expects

hayd · 2013-11-20T00:09:00Z

pandas/tools/pivot.py

+            if rows is None:
+                rows = kwarg['index']
+            else:
+                raise TypeError("Can only specify either 'rows' or 'index'")


Think better way to do this is (without all this for loop stuff):

index = kwargs.get('index', None) if index is not None: if rows is None: rows = index else: raise

could index be an ndarray?

ha, yes, should be "is not None"

@jtratner I'm sorry, but I don't follow how index being an ndarray breaks things.

@jsexauer if something does bool(something) under the hood and ndarrays longer than length one can't be converted to bool:

>>> bool(np.array([1, 2, 3,4]) ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

@hayd I think the flaw with your approach is that if the user passes an incorrect keyword argument, no error is raised. It seems to me we would want to raise a type error in such an event. This is especially import I feel for pivot_table() where for example someone might pass agg_func instead of aggfunc and get frustrated wondering why the function is returning apparently incorrect results.

@jtratner The following experiment works for me:

>>> if np.ndarray([1,2]) is None: ... print 'None' ... else: ... print 'Not None' ... Not None

Make it a pop rather than a get then check the length of kwargs is 0. (if kwargs: raise...)

hayd · 2013-11-20T00:11:17Z

I guess the problem is that the API (order of args) won't be changed, so it's not really going to ever fully fix... :(

jsexauer · 2013-11-24T12:34:20Z

@hayd hopefully my last commit addresses your concerns.

jorisvandenbossche · 2013-11-24T12:43:02Z

A remark: would it make sense to choose which of the keywords (col/columns and rows/index) is preferential and only provide those to the other function (and not have both options in both functions)?

jreback · 2013-12-05T23:42:24Z

@jsexauer does this have an issue? (aside from this PR)

jsexauer · 2013-12-06T01:04:39Z

Not sure what you're asking @jreback. The included PR, which has no issues to my knowledge, fixes the issue regarding a discrepancy between the parameters for pivot and pivot_table.

jreback · 2013-12-06T02:27:21Z

am asking if their is a reference issue for this (eg a prior issue); guess not

jsexauer · 2013-12-06T13:11:11Z

I submitted this originally as an issue ticket but changed into a PR ticket when I coded the enhancement.

hayd · 2013-12-06T20:45:28Z

pandas/core/frame.py

+                msg = "Can only specify either 'cols' or 'columns'"
+                raise TypeError(msg)
+
+        if len(kwarg) > 0:


pythonic tip - you can just use "if kwarg" (my bad for saying otherwise)

hayd · 2013-12-06T20:49:55Z

This has tripped me in the past, so I think I'm +1 on this.

jsexauer · 2013-12-07T12:24:38Z

I've implemented @hayd 's suggestions.

hayd · 2013-12-07T20:23:10Z

(waiting til after 0.13 is final, but then it would need something in release notes and some tests, sorry I hadn't realised there were no tests 'til just now!)

jorisvandenbossche · 2013-12-08T16:31:55Z

I want to repeat my remark (as there was not really a response). Isn't it possible to really make it more consistent instead of just providing both options to both functions as additional keyword arguments?

for cols/columns we could choose one of the two as the preferential one. E.g. choosing for columns, and only add it as a keyword argument to pivot_table (and not also add cols to pivot).
I think it would maybe even be possible to change cols to columns in the default function call of pivot_table, and provide cols as additional one (so both argument by order as by name still works, no backwards incompatible change, but the function call itself are also more consistent).
for index/rows it is maybe less clear wich is the preferential?

jreback · 2013-12-08T16:36:30Z

+1 on joris suggestion

maybe attach a depr warning to use of cols?

columns and index would be the most consistent with the rest of pandas, though rows is ok 2 (definitely not cols though)

jreback · 2014-02-16T13:13:49Z

I think lets break pivot_table and just go with pivot args; you can for 0.14 provide a nice warning message, but I would just change the signature to using index/columns. This is used everywhere else; rows/cols not used anywhere.

hayd · 2014-02-18T07:10:33Z

should probably deprecate old args too.

jreback · 2014-03-09T15:01:31Z

@jsexauer can you respond to @hayd comments?

jsexauer · 2014-03-09T23:00:52Z

Assuming this passes CI, I believe this addresses everyone's concerns. Sorry for the delay.

jreback · 2014-03-09T23:33:03Z

pandas/tools/tests/test_pivot.py

+        self.assertIsInstance(w[0], FutureWarning)
+        self.assertIsInstance(w[1], FutureWarning)
+
+        with warnings.catch_warnings(record=True) as w2:


use tm.assert_produces_warning here (same thing just more consistent)

jsexauer · 2014-03-13T23:02:14Z

Ok good to know. Sorry you're going to get my full commit history then. I will do this in the future.

jorisvandenbossche · 2014-03-13T23:02:48Z

But you can do it now, that is no problem.
If you just execute the lines above, you should be able to meld all the commits into one or a few.

jsexauer · 2014-03-13T23:19:27Z

God I hope I did this right...

jorisvandenbossche · 2014-03-13T23:44:21Z

Hmm, you have one commit (the third last one) that seems a squashed one, but the original commits are not gone, so clearly something went wrong, but I don't know what.

If git rebase -i does not work, another option is to create a new branch and cherry-pick that one commit that is the good one, something like:

git checkout master
git rebase upstream/master
git checkout -b new_branch
git cherry-pick 1e1caf6e367d1f04cfce43cdff062e80bb99ebaf

and then push this branch (to a new PR)

jsexauer · 2014-03-14T00:59:22Z

So I deleted branch fix5505 on my local repo, and created a new version which was a squashed rebase of commit e3dad66 onto upstream/master (which is commit 0d74287). I then tried to push this new version of branch fix5505 onto jsexauer/pandas/fix5505 but it said I had to pull down the old version first, so I made a new empty commit [2409c0e] to make it happy. If there's a better way to do this last step, let me know, but this was the best I could figure out short of opening a new PR. If you'd prefer I did that I will, but it seems like it would be best to keep this mess in one ticket instead of spreading the wealth :)

hayd · 2014-03-14T01:05:25Z

You have to force push at the end (as you've changed the history), you've now done it as a merge. Best advice is to create new branch off master and cherry-pick as Joris describes. then force push.

git push origin new_branch:fix5505 --force

jsexauer · 2014-03-14T01:07:32Z

Bam, looks good I think?

jsexauer · 2014-03-14T01:58:29Z

I'm really at a loss, as the test it fails on works for me:

import pandas
import numpy as np
import pandas.util.testing as tm

index = ['A', 'B']
columns = 'C'
df = pandas.DataFrame({'A': ['foo', 'foo', 'foo', 'foo',
                                     'bar', 'bar', 'bar', 'bar',
                                     'foo', 'foo', 'foo'],
                               'B': ['one', 'one', 'one', 'two',
                                     'one', 'one', 'one', 'two',
                                     'two', 'two', 'one'],
                               'C': ['dull', 'dull', 'shiny', 'dull',
                                     'dull', 'shiny', 'shiny', 'dull',
                                     'shiny', 'shiny', 'shiny'],
                               'D': np.random.randn(11),
                               'E': np.random.randn(11),
                               'F': np.random.randn(11)})

with tm.assert_produces_warning(FutureWarning):
    pandas.pivot_table(df, values='D', rows=index,
                                    cols=columns)

with tm.assert_produces_warning(False):
    pandas.pivot_table(df, values='D', index=index,
                                    columns=columns)

pandas.pivot_table(df, values='D', rows=index,
                                cols=columns)

jorisvandenbossche · 2014-03-14T10:50:56Z

@jsexauer In any case, the rebase looks good now! One clean commit.

Regarding the test warning, this is because the warning is already triggered somewhere else, namely in the TestCrosstab tests in the same file (and only triggered once). You should also replace there all rows/cols to index/columns, as well as in eg the function crosstab (which is based on pivot_table).

jsexauer · 2014-03-14T11:10:13Z

Ok I'll do that when I get home from work today. Would you like that squashed into the previous commit or as a new one?

jorisvandenbossche · 2014-03-14T11:12:41Z

You can try to squash it at once, see if it works now .. :-) If you have further git questions, just ask!

jsexauer · 2014-03-14T21:46:03Z

Thank you all for your patience in helping me figure out how to do this within your process. Unless someone indicates otherwise, I believe this PR is ready for merging.

jorisvandenbossche · 2014-03-14T21:50:16Z

And you thanks fo the contribution! And travis is happy now

A last thing, this should also be added to the release notes, under api changes I think. You can add in both https://github.com/pydata/pandas/blob/master/doc/source/v0.14.0.txt and https://github.com/pydata/pandas/blob/master/doc/source/release.rst a short line mentioning this.

jsexauer · 2014-03-14T22:23:12Z

Updated

jreback · 2014-03-14T22:30:22Z

looks good..

@jorisvandenbossche merge when you are ready

jorisvandenbossche · 2014-03-14T22:41:26Z

Looks good! Thanks a lot @jsexauer

ENH: Have pivot and pivot_table take similar arguments

…s-dev#5505]

hayd reviewed Nov 20, 2013
View reviewed changes

hayd reviewed Dec 6, 2013
View reviewed changes

jreback added API Design labels Feb 16, 2014

jreback reviewed Mar 9, 2014
View reviewed changes

ENH: Have pivot and pivot_table take similar arguments

6e7054a

jreback mentioned this pull request Mar 14, 2014

DEPR: deprecate rows/cols to pivot #6640

Closed

jorisvandenbossche added a commit that referenced this pull request Mar 14, 2014

Merge pull request #5505 from jsexauer/fix5505

4216178

ENH: Have pivot and pivot_table take similar arguments

jorisvandenbossche merged commit 4216178 into pandas-dev:master Mar 14, 2014

jsexauer added a commit to jsexauer/pandas that referenced this pull request Mar 22, 2014

Fix docs to use correct pivot_table arguments [pandas-dev#6645, panda…

3329d37

…s-dev#5505]

jsexauer mentioned this pull request Mar 23, 2014

DEPR: Deprecate cols in several functions #6693

Closed

jreback mentioned this pull request Apr 24, 2014

Unify argument conventions #2062

Closed

jorisvandenbossche mentioned this pull request Apr 14, 2016

API proposal: shoudn't the "index" property (and parameter) of a df renamed "rows" ? #12891

Closed

chris-b1 mentioned this pull request Jul 14, 2016

API: Deprecate skip_footer in read_csv #13386

Closed

jreback mentioned this pull request Jul 24, 2016

DEPR: deprecations log for removed issues #13777

Closed

ENH: Have pivot and pivot_table take similar arguments #5505

ENH: Have pivot and pivot_table take similar arguments #5505

Conversation

jsexauer commented Nov 14, 2013

jtratner commented Nov 13, 2013

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hayd commented Nov 20, 2013

jsexauer commented Nov 24, 2013

jorisvandenbossche commented Nov 24, 2013

jreback commented Dec 5, 2013

jsexauer commented Dec 6, 2013

jreback commented Dec 6, 2013

jsexauer commented Dec 6, 2013

Choose a reason for hiding this comment

hayd commented Dec 6, 2013

jsexauer commented Dec 7, 2013

hayd commented Dec 7, 2013

jorisvandenbossche commented Dec 8, 2013

jreback commented Dec 8, 2013

jreback commented Feb 16, 2014

hayd commented Feb 18, 2014

jreback commented Mar 9, 2014

jsexauer commented Mar 9, 2014

Choose a reason for hiding this comment

jsexauer commented Mar 13, 2014

jorisvandenbossche commented Mar 13, 2014

jsexauer commented Mar 13, 2014

jorisvandenbossche commented Mar 13, 2014

jsexauer commented Mar 14, 2014

hayd commented Mar 14, 2014

jsexauer commented Mar 14, 2014

jsexauer commented Mar 14, 2014

jorisvandenbossche commented Mar 14, 2014

jsexauer commented Mar 14, 2014

jorisvandenbossche commented Mar 14, 2014

jsexauer commented Mar 14, 2014

jorisvandenbossche commented Mar 14, 2014

jsexauer commented Mar 14, 2014

jreback commented Mar 14, 2014

jorisvandenbossche commented Mar 14, 2014