ENH: Keep positional arguments for pivot #51359

Erotemic · 2023-02-13T13:50:41Z

Feature Type

Keeping existing functionality in pandas

Problem Description

I'd like to request the pivot method keep its positional arguments. Currently when you use pivot it warns:

FutureWarning: In a future version of pandas all arguments of DataFrame.pivot will be keyword-only.

I posted a question to the mailing list asking about the rational of deprecating the positional arguments in most methods. I received a reply pointing me to this issue: #41485 that seems to suggest the motivation for this was due to static analysis.

I think this is a good change for most methods, but in the case of pivot, you generally specify all 3, and forcing the user to specify them as keyword arguments is a bit cumbersome. At least I find it to be cumbersome.

Because there are only 3 arguments to begin with in this case, I don't think removing flexibility provides enough additional simplification to justify forcing keyword only arguments here.

Feature Description

Simply remove the FutureWarning from pivot and keep the code as-is.

Alternative Solutions

The alternative solution is to ignore me and remove them anyway. I'd be a little sad, but I'd eventually get over it.

Additional Context

No response

The text was updated successfully, but these errors were encountered:

phofl · 2023-02-13T21:04:26Z

Hi, thanks for your report. Static typing wasn't the intention here (not completely anyway). index was the second argument and had a default, columns was the third and also had a default, but could never be not specified because the value is non-optional. Making them keyword only allows us to swap them and make columns non-optional

Erotemic · 2023-02-14T15:47:30Z

I think it makes sense that the order is index, columns, value, especially if you are going to specify all of them. And that is exactly the time when you don't want to have to specify them as kwargs.

Swapping columns to be first just because it is required seems like it's missing the point of pivot. How often do people use pivot without specifying index? AFAIK, everywhere else in pandas index is specified before columns, or at least that's the most common pattern, so I don't see why their order should be swapped here (or that their order should be taken away).

Yes, it is annoying that when you only need to specify columns that isn't the first argument, but changing to kwarg-only doesn't help here, it just gives the user an error, which is effectively the same thing that happens right now if you tried to run pivot with only one argument right now: the user gets an error.

I do see how there is value in updating the signature such that it is clear what arguments are optional and which are not, but I don't think it more valuable than having a semantically clear order.

Truth be told, I didn't even know it was possible to specify columns without an index. Whenever I've needed to use the existing index in a pivot, I've always dropped the index and then pivoted on index/columns (I'm not actually sure if I've ever needed to do that, but that's how I would have done it). Really, in terms of pivot logic, using the existing index is just a convenience. You always need to specify what the new index/columns will be --- it just happens there is a natural default for index whereas there isn't for columns.

Instead of just discarding the ability to specify a pivot positionally, perhaps it just makes sense to update the error message when the user only specifies one positional argument? Currently df.pivot('foo') produces:

TypeError: pivot() missing 1 required argument: 'columns'

Which personally I think is good enough, but maybe something like this would alleviate any confusion this change is trying to resolve without removing useful (and used) functionality?

def pivot(
    data: DataFrame,
    index: IndexLabel | None = None,
    columns: IndexLabel | None = None,
    values: IndexLabel | None = None,
) -> DataFrame:
    if columns is None:
        if index is None and values is None:
            raise TypeError("pivot() given no arguments; missing 1 required argument: 'columns'")
        elif index is None:
            raise TypeError("pivot() given 'index'; missing 1 required argument: 'columns'")
        elif values is None:
            raise TypeError("pivot() given 'values'; missing 1 required argument: 'columns'")

phofl · 2023-02-14T17:26:35Z

Keeping all this aside:

df.pivot('foo', 'bar', 'baz')

Using pivot like this requires everyone that tries to understand your code to know exactly in which order the arguments are specified. There is no way to know that the first argument is the index without looking up the function signature. Not specifying the argument-names here can be very confusing real quick.

Erotemic · 2023-02-15T22:52:31Z

There is no way to know that the first argument is the index without looking up the function signature

Sort of... That's true for all positional arguments. That justifies a style guide, not deprecation. AFAIK in pandas, the order is always index, columns in signatures that contain both. It's axis 0 vs axis 1. I'd wager a fair number of people could intuit which one is which (assuming they know what a pivot table is).

As a real world example of something that is emitting the warning, I have:

piv = counts.pivot(['pkg_version'], ['abi_tag', 'os', 'arch'], 'count')

I'd prefer in this case to keep the line length shorter and not have to be forced into making a choice between spreading this across vertical space

piv = counts.pivot(
    index=['pkg_version'], 
    columns=['abi_tag', 'os', 'arch'], 
    values='count')

or having an extra long horizontal space.

piv = counts.pivot(index=['pkg_version'], columns=['abi_tag', 'os', 'arch'], values='count')

I haven't seen any arguments claiming that positional arguments are blocking important changes or increasing maintenance burden. From what I understand the motivation is stylistic, and I don't think that justifies deprecating positional arguments of pivot.

MarcoGorelli · 2023-04-24T11:43:58Z

There is no way to know that the first argument is the index without looking up the function signature

agree, it'd be just as unreadable as df.sort(False) (what does the False mean, that it's not going to be sorted?)

let's close then, it sounds like the motivation for keeping it is also stylistic, so let's favour readibility (future readers of any code you write using pivot will thank you :) )

Erotemic · 2023-04-24T13:27:50Z

agree, it'd be just as unreadable as df.sort(False)

Strong disagree. Index, columns, values have a natural order in the context of a row-major arrays. The choice of ascending versus decending is different (and tangentally if the call looked like df.sort('ascending') then it would not be an issue).

I'll accept that nobody else seems to be raising the issue, and if I'm the only one who would see value in this, then it's not worth it. However, I will end the thread by 1. noting that the modivation for removing is also stylistic and 2. voicing my opinion that I find this moderately annoying and I think this is a mistake.

sumitovant-ttran · 2023-05-09T18:37:24Z

I want to voice that my intuition when using this method has always been to provide index, followed by column(s), followed by value(s) (the singular vs plural alone drives me crazy when trying to specify the keywords). Only recently have I started to omit values for convenience.

As for the example of df.sort(False), I don't think it's fair to provide an example that doesn't exist. The method sort_values has a signature that lends itself to always specifying the keyword when one wants to adjust that parameter, e.g., df.sort_values("column_name", ascending=False). To say that the method should also require the first parameter be specified by keyword is subjectively over-the-top: df.sort_values(by="column_name") vs df.sort_values("column_name")

Finally, the documentation provides the ideal order as index, columns, values. I believe this would be faster to learn by users. Example: df.pivot(index='foo', columns='bar', values='baz')

…labeling Without this fix the run_30_Placement.sh will fail. Related to pandas-dev/pandas#51359

oulenz · 2024-01-25T13:53:17Z

For me this change away from positional arguments makes pandas a lot more cumbersome to use. I have scripts with ~40 calls to pivot, and having to fix these is a major pain.

And having to add these keywords into every new call of pivot just adds a lot of unwanted friction.

Erotemic added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 13, 2023

phofl added the Closing Candidate May be closeable, needs more eyeballs label Feb 13, 2023

MarcoGorelli closed this as completed Apr 24, 2023

aiunderstand added a commit to aiunderstand/PCBFlow that referenced this issue May 11, 2023

new pandas.Dataframe.pivot() API requires explicit row, column,value …

e4812e2

…labeling Without this fix the run_30_Placement.sh will fail. Related to pandas-dev/pandas#51359

aiunderstand mentioned this issue May 11, 2023

new pandas.Dataframe.pivot() API requires explicit row, column,value … cpldcpu/PCBFlow#5

Merged

MarcoGorelli mentioned this issue Jan 25, 2024

feat(python): auto-determine index/columns/values columns in pivot if one is left out, deprecate passing arguments positionally pola-rs/polars#12125

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Keep positional arguments for pivot #51359

ENH: Keep positional arguments for pivot #51359

Erotemic commented Feb 13, 2023 •

edited

phofl commented Feb 13, 2023

Erotemic commented Feb 14, 2023 •

edited

phofl commented Feb 14, 2023

Erotemic commented Feb 15, 2023

MarcoGorelli commented Apr 24, 2023

Erotemic commented Apr 24, 2023

sumitovant-ttran commented May 9, 2023 •

edited

oulenz commented Jan 25, 2024

ENH: Keep positional arguments for pivot #51359

ENH: Keep positional arguments for pivot #51359

Comments

Erotemic commented Feb 13, 2023 • edited

Feature Type

Problem Description

Feature Description

Alternative Solutions

Additional Context

phofl commented Feb 13, 2023

Erotemic commented Feb 14, 2023 • edited

phofl commented Feb 14, 2023

Erotemic commented Feb 15, 2023

MarcoGorelli commented Apr 24, 2023

Erotemic commented Apr 24, 2023

sumitovant-ttran commented May 9, 2023 • edited

oulenz commented Jan 25, 2024

Erotemic commented Feb 13, 2023 •

edited

Erotemic commented Feb 14, 2023 •

edited

sumitovant-ttran commented May 9, 2023 •

edited