Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Keep positional arguments for pivot #51359

Closed
1 task done
Erotemic opened this issue Feb 13, 2023 · 8 comments
Closed
1 task done

ENH: Keep positional arguments for pivot #51359

Erotemic opened this issue Feb 13, 2023 · 8 comments
Labels
Closing Candidate May be closeable, needs more eyeballs Enhancement Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@Erotemic
Copy link
Contributor

Erotemic commented Feb 13, 2023

Feature Type

  • Keeping existing functionality in pandas

Problem Description

I'd like to request the pivot method keep its positional arguments. Currently when you use pivot it warns:

FutureWarning: In a future version of pandas all arguments of DataFrame.pivot will be keyword-only.

I posted a question to the mailing list asking about the rational of deprecating the positional arguments in most methods. I received a reply pointing me to this issue: #41485 that seems to suggest the motivation for this was due to static analysis.

I think this is a good change for most methods, but in the case of pivot, you generally specify all 3, and forcing the user to specify them as keyword arguments is a bit cumbersome. At least I find it to be cumbersome.

Because there are only 3 arguments to begin with in this case, I don't think removing flexibility provides enough additional simplification to justify forcing keyword only arguments here.

Feature Description

Simply remove the FutureWarning from pivot and keep the code as-is.

Alternative Solutions

The alternative solution is to ignore me and remove them anyway. I'd be a little sad, but I'd eventually get over it.

Additional Context

No response

@Erotemic Erotemic added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 13, 2023
@phofl
Copy link
Member

phofl commented Feb 13, 2023

Hi, thanks for your report. Static typing wasn't the intention here (not completely anyway). index was the second argument and had a default, columns was the third and also had a default, but could never be not specified because the value is non-optional. Making them keyword only allows us to swap them and make columns non-optional

@phofl phofl added the Closing Candidate May be closeable, needs more eyeballs label Feb 13, 2023
@Erotemic
Copy link
Contributor Author

Erotemic commented Feb 14, 2023

I think it makes sense that the order is index, columns, value, especially if you are going to specify all of them. And that is exactly the time when you don't want to have to specify them as kwargs.

Swapping columns to be first just because it is required seems like it's missing the point of pivot. How often do people use pivot without specifying index? AFAIK, everywhere else in pandas index is specified before columns, or at least that's the most common pattern, so I don't see why their order should be swapped here (or that their order should be taken away).

Yes, it is annoying that when you only need to specify columns that isn't the first argument, but changing to kwarg-only doesn't help here, it just gives the user an error, which is effectively the same thing that happens right now if you tried to run pivot with only one argument right now: the user gets an error.

I do see how there is value in updating the signature such that it is clear what arguments are optional and which are not, but I don't think it more valuable than having a semantically clear order.

Truth be told, I didn't even know it was possible to specify columns without an index. Whenever I've needed to use the existing index in a pivot, I've always dropped the index and then pivoted on index/columns (I'm not actually sure if I've ever needed to do that, but that's how I would have done it). Really, in terms of pivot logic, using the existing index is just a convenience. You always need to specify what the new index/columns will be --- it just happens there is a natural default for index whereas there isn't for columns.

Instead of just discarding the ability to specify a pivot positionally, perhaps it just makes sense to update the error message when the user only specifies one positional argument? Currently df.pivot('foo') produces:

TypeError: pivot() missing 1 required argument: 'columns'

Which personally I think is good enough, but maybe something like this would alleviate any confusion this change is trying to resolve without removing useful (and used) functionality?

def pivot(
    data: DataFrame,
    index: IndexLabel | None = None,
    columns: IndexLabel | None = None,
    values: IndexLabel | None = None,
) -> DataFrame:
    if columns is None:
        if index is None and values is None:
            raise TypeError("pivot() given no arguments; missing 1 required argument: 'columns'")
        elif index is None:
            raise TypeError("pivot() given 'index'; missing 1 required argument: 'columns'")
        elif values is None:
            raise TypeError("pivot() given 'values'; missing 1 required argument: 'columns'")

@phofl
Copy link
Member

phofl commented Feb 14, 2023

Keeping all this aside:

df.pivot('foo', 'bar', 'baz')

Using pivot like this requires everyone that tries to understand your code to know exactly in which order the arguments are specified. There is no way to know that the first argument is the index without looking up the function signature. Not specifying the argument-names here can be very confusing real quick.

@Erotemic
Copy link
Contributor Author

There is no way to know that the first argument is the index without looking up the function signature

Sort of... That's true for all positional arguments. That justifies a style guide, not deprecation. AFAIK in pandas, the order is always index, columns in signatures that contain both. It's axis 0 vs axis 1. I'd wager a fair number of people could intuit which one is which (assuming they know what a pivot table is).

As a real world example of something that is emitting the warning, I have:

piv = counts.pivot(['pkg_version'], ['abi_tag', 'os', 'arch'], 'count')

I'd prefer in this case to keep the line length shorter and not have to be forced into making a choice between spreading this across vertical space

piv = counts.pivot(
    index=['pkg_version'], 
    columns=['abi_tag', 'os', 'arch'], 
    values='count')

or having an extra long horizontal space.

piv = counts.pivot(index=['pkg_version'], columns=['abi_tag', 'os', 'arch'], values='count')

I haven't seen any arguments claiming that positional arguments are blocking important changes or increasing maintenance burden. From what I understand the motivation is stylistic, and I don't think that justifies deprecating positional arguments of pivot.

@MarcoGorelli
Copy link
Member

There is no way to know that the first argument is the index without looking up the function signature

agree, it'd be just as unreadable as df.sort(False) (what does the False mean, that it's not going to be sorted?)

let's close then, it sounds like the motivation for keeping it is also stylistic, so let's favour readibility (future readers of any code you write using pivot will thank you :) )

@Erotemic
Copy link
Contributor Author

agree, it'd be just as unreadable as df.sort(False)

Strong disagree. Index, columns, values have a natural order in the context of a row-major arrays. The choice of ascending versus decending is different (and tangentally if the call looked like df.sort('ascending') then it would not be an issue).

I'll accept that nobody else seems to be raising the issue, and if I'm the only one who would see value in this, then it's not worth it. However, I will end the thread by 1. noting that the modivation for removing is also stylistic and 2. voicing my opinion that I find this moderately annoying and I think this is a mistake.

@sumitovant-ttran
Copy link

sumitovant-ttran commented May 9, 2023

I want to voice that my intuition when using this method has always been to provide index, followed by column(s), followed by value(s) (the singular vs plural alone drives me crazy when trying to specify the keywords). Only recently have I started to omit values for convenience.

As for the example of df.sort(False), I don't think it's fair to provide an example that doesn't exist. The method sort_values has a signature that lends itself to always specifying the keyword when one wants to adjust that parameter, e.g., df.sort_values("column_name", ascending=False). To say that the method should also require the first parameter be specified by keyword is subjectively over-the-top: df.sort_values(by="column_name") vs df.sort_values("column_name")

Finally, the documentation provides the ideal order as index, columns, values. I believe this would be faster to learn by users. Example: df.pivot(index='foo', columns='bar', values='baz')

aiunderstand added a commit to aiunderstand/PCBFlow that referenced this issue May 11, 2023
…labeling

Without this fix the run_30_Placement.sh will fail.

Related to pandas-dev/pandas#51359
@oulenz
Copy link

oulenz commented Jan 25, 2024

For me this change away from positional arguments makes pandas a lot more cumbersome to use. I have scripts with ~40 calls to pivot, and having to fix these is a major pain.

And having to add these keywords into every new call of pivot just adds a lot of unwanted friction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Closing Candidate May be closeable, needs more eyeballs Enhancement Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
Development

No branches or pull requests

5 participants