New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Harmonize drop and rename API #12392

Closed
nickeubank opened this Issue Feb 19, 2016 · 44 comments

Comments

Projects
None yet
7 participants
@nickeubank
Contributor

nickeubank commented Feb 19, 2016

rename accepts a columns argument or an index argument, while drop looks for a labels and axis pair. I don't know about anyone else, but I have to check the help file every time I come back to pandas to remember which takes which.

How would people feel about adding columns and index arguments to drop? They could just be added in addition to labels/axis if we want to provide backwards compatibility and just raise an exception if the user tries to mix them.

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Feb 19, 2016

Contributor

actually this is a bigger API issue that @TomAugspurger and I briefly touched on here

.rename and .rename_axis
.reindex and .reindex_axis
are consistent with each other

.drop and .fillna are also consistent (just not with the others)

So thoughts on how to proceed here. I'd rather not make add hoc changes, rather try to construct an overall consistent way of doing things; we can certainly provide back-compat, but unifying things is probably a good thing.

Contributor

jreback commented Feb 19, 2016

actually this is a bigger API issue that @TomAugspurger and I briefly touched on here

.rename and .rename_axis
.reindex and .reindex_axis
are consistent with each other

.drop and .fillna are also consistent (just not with the others)

So thoughts on how to proceed here. I'd rather not make add hoc changes, rather try to construct an overall consistent way of doing things; we can certainly provide back-compat, but unifying things is probably a good thing.

@jreback jreback changed the title from ENH: Harmonize `drop` and `rename` API to ENH: Harmonize drop and rename API Feb 19, 2016

@TomAugspurger

This comment has been minimized.

Show comment
Hide comment
@TomAugspurger

TomAugspurger Feb 19, 2016

Contributor

I don't have a strong preference for one style over the other. The only upshot of the .rename(index=, columns=) approach is that you can do both at once instead of .rename_axis(index).rename_axis(columns, axis=1), very minor.

I would slightly favor just recommending and documenting the _axis methods (with labels, axis) rather than changing any method signatures.

Contributor

TomAugspurger commented Feb 19, 2016

I don't have a strong preference for one style over the other. The only upshot of the .rename(index=, columns=) approach is that you can do both at once instead of .rename_axis(index).rename_axis(columns, axis=1), very minor.

I would slightly favor just recommending and documenting the _axis methods (with labels, axis) rather than changing any method signatures.

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Feb 19, 2016

Contributor

do you think we should add corresponding .drop_axis and .fillna_axis? or too much clutter

Contributor

jreback commented Feb 19, 2016

do you think we should add corresponding .drop_axis and .fillna_axis? or too much clutter

@nickeubank

This comment has been minimized.

Show comment
Hide comment
@nickeubank

nickeubank Feb 19, 2016

Contributor

Personally, I have a preference for columns and index as arguments -- they've always felt more intuitive and pythonic to me. But that's second to the value of harmonization.

Just documenting the _axis methods still leaves an uncomfortable inconsistency though, no? We offer a work around, I'd be in favor of fixing .drop and .fillna.

I'm agnostic on adding .drop_axis/.fillna_axis methods.

If we change the .drop and .fillna methods to take columns, index, do we still want to support the labels, axis arguments for backwards compatibility or break the api?

Contributor

nickeubank commented Feb 19, 2016

Personally, I have a preference for columns and index as arguments -- they've always felt more intuitive and pythonic to me. But that's second to the value of harmonization.

Just documenting the _axis methods still leaves an uncomfortable inconsistency though, no? We offer a work around, I'd be in favor of fixing .drop and .fillna.

I'm agnostic on adding .drop_axis/.fillna_axis methods.

If we change the .drop and .fillna methods to take columns, index, do we still want to support the labels, axis arguments for backwards compatibility or break the api?

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Feb 19, 2016

Contributor

why don't you list all of the relevant methods (might be some more that I am forgetting), and make a proposal.

Contributor

jreback commented Feb 19, 2016

why don't you list all of the relevant methods (might be some more that I am forgetting), and make a proposal.

@nickeubank

This comment has been minimized.

Show comment
Hide comment
@nickeubank

nickeubank Feb 19, 2016

Contributor

OK

Contributor

nickeubank commented Feb 19, 2016

OK

@nickeubank

This comment has been minimized.

Show comment
Hide comment
@nickeubank

nickeubank Feb 19, 2016

Contributor

drop and fillna:

  • change primary arguments from labels, axis to columns, index
  • Accept labels, axis arguments for backward compatibility, but move to back of argument list
    (note this will break code by people who passed labels as first positional argument, but ok since will throw and exception
    if no positional arguments allowed)

drop_axis and fillna_axis:

  • New method that accepts labels, axis

Others:

  • Could implement for apply if we really wanted? I'm dis-inclined, but possible.
  • Could implement for add() , sub() , mul(), div(), radd(), rsum(), etc...

Open question:

  • How should these work for panels? (I never use panels, so not sure of best practices)
Contributor

nickeubank commented Feb 19, 2016

drop and fillna:

  • change primary arguments from labels, axis to columns, index
  • Accept labels, axis arguments for backward compatibility, but move to back of argument list
    (note this will break code by people who passed labels as first positional argument, but ok since will throw and exception
    if no positional arguments allowed)

drop_axis and fillna_axis:

  • New method that accepts labels, axis

Others:

  • Could implement for apply if we really wanted? I'm dis-inclined, but possible.
  • Could implement for add() , sub() , mul(), div(), radd(), rsum(), etc...

Open question:

  • How should these work for panels? (I never use panels, so not sure of best practices)
@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Feb 19, 2016

Contributor

see that's the problem. In reality we should leave everything alone and maybe just change reindex/rename. The labels/axis idiom is much more common (and to be honest quite a bit more useful). Rarely do you actually change 2 things at once (which violates many pythonic principles). I would rather chain things like:

.reindex(...., axis='index').reindex(...., axis='columns')

though we are actually flexible enough to accept both paradigms.

Contributor

jreback commented Feb 19, 2016

see that's the problem. In reality we should leave everything alone and maybe just change reindex/rename. The labels/axis idiom is much more common (and to be honest quite a bit more useful). Rarely do you actually change 2 things at once (which violates many pythonic principles). I would rather chain things like:

.reindex(...., axis='index').reindex(...., axis='columns')

though we are actually flexible enough to accept both paradigms.

@nickeubank

This comment has been minimized.

Show comment
Hide comment
@nickeubank

nickeubank Feb 19, 2016

Contributor

Oh, I don't really care about the "two things at once" -- I just liked the "columns" argument for being more meaningful.

So your preference is:

reindex/rename:
- change primary arguments to label / axis
- keep taking columns / index for backwards compatibility?

That's fine by me -- like I said, I'm mostly interested in harmonization!

Contributor

nickeubank commented Feb 19, 2016

Oh, I don't really care about the "two things at once" -- I just liked the "columns" argument for being more meaningful.

So your preference is:

reindex/rename:
- change primary arguments to label / axis
- keep taking columns / index for backwards compatibility?

That's fine by me -- like I said, I'm mostly interested in harmonization!

@max-sixty

This comment has been minimized.

Show comment
Hide comment
@max-sixty

max-sixty Feb 19, 2016

Contributor

The labels/axis idiom is much more common (and to be honest quite a bit more useful). Rarely do you actually change 2 things at once (which violates many pythonic principles).

+1

And, I know people have gone back & forth on this a bit - but I would also 'vote' for:

  • .rename being like xarray: renaming axes names only or, where the object has a name (currently Series), renaming the object
  • .relabel used for reindexing-like operations with a mapping from old to new labels
Contributor

max-sixty commented Feb 19, 2016

The labels/axis idiom is much more common (and to be honest quite a bit more useful). Rarely do you actually change 2 things at once (which violates many pythonic principles).

+1

And, I know people have gone back & forth on this a bit - but I would also 'vote' for:

  • .rename being like xarray: renaming axes names only or, where the object has a name (currently Series), renaming the object
  • .relabel used for reindexing-like operations with a mapping from old to new labels
@shoyer

This comment has been minimized.

Show comment
Hide comment
@shoyer

shoyer Feb 19, 2016

Member

The labels/axis idiom is much more common (and to be honest quite a bit more useful). Rarely do you actually change 2 things at once (which violates many pythonic principles).

I agree that changing 2 things at once is not a great API, but I agree with @nickeubank that explicit columns and index arguments make for more readable code: compare df.drop(columns='foo') vs df.drop('foo', axis='columns') (or worse, df.drop('foo', axis=1), which is assuredly more common because it's less typing).

Member

shoyer commented Feb 19, 2016

The labels/axis idiom is much more common (and to be honest quite a bit more useful). Rarely do you actually change 2 things at once (which violates many pythonic principles).

I agree that changing 2 things at once is not a great API, but I agree with @nickeubank that explicit columns and index arguments make for more readable code: compare df.drop(columns='foo') vs df.drop('foo', axis='columns') (or worse, df.drop('foo', axis=1), which is assuredly more common because it's less typing).

@jorisvandenbossche

This comment has been minimized.

Show comment
Hide comment
@jorisvandenbossche

jorisvandenbossche Feb 19, 2016

Member

I would like to avoid adding new methods as drop_axis (which is actually not a good name IMO, as it sounds you want to drop a full axis, while you want do drop certain items from an axis)

Further, I think we should make a clear distinction between methods that modify the axis (rename, drop, reindex), and methods that perform operation over a certain axis (apply, add, ..). Those last ones use the axis= idiom to specify the direction of operation, and that is indeed a common idiom. I think the discussion should only be about rename, reindex and drop

I personally also like the explicit column and index arguments in eg df.rename(columns=..) (this reads very natural). So I would not like to see these go (or deprecated).

It is not really good API design, but I think it is perfectly possible to combine both idioms in one method for all of the discussed functions as kind of a compromise?
For example, changes of the current signature could be:

  • df.reindex(index=None, columns=None, ...) -> df.reindex(labels=None, index=None, columns=None, axis=0, ...)
  • df.drop(labels, axis=0, ...) -> df.drop(labels=None, axis=0, index=None, columns=None, ...)

Which would be I think backwards compatible?
That would kind of harmonize the api for the different methods, but have the bad design of providing two ways to do something in one function.

Member

jorisvandenbossche commented Feb 19, 2016

I would like to avoid adding new methods as drop_axis (which is actually not a good name IMO, as it sounds you want to drop a full axis, while you want do drop certain items from an axis)

Further, I think we should make a clear distinction between methods that modify the axis (rename, drop, reindex), and methods that perform operation over a certain axis (apply, add, ..). Those last ones use the axis= idiom to specify the direction of operation, and that is indeed a common idiom. I think the discussion should only be about rename, reindex and drop

I personally also like the explicit column and index arguments in eg df.rename(columns=..) (this reads very natural). So I would not like to see these go (or deprecated).

It is not really good API design, but I think it is perfectly possible to combine both idioms in one method for all of the discussed functions as kind of a compromise?
For example, changes of the current signature could be:

  • df.reindex(index=None, columns=None, ...) -> df.reindex(labels=None, index=None, columns=None, axis=0, ...)
  • df.drop(labels, axis=0, ...) -> df.drop(labels=None, axis=0, index=None, columns=None, ...)

Which would be I think backwards compatible?
That would kind of harmonize the api for the different methods, but have the bad design of providing two ways to do something in one function.

@jorisvandenbossche

This comment has been minimized.

Show comment
Hide comment
@jorisvandenbossche

jorisvandenbossche Feb 19, 2016

Member

And, I know people have gone back & forth on this a bit - but I would also 'vote' for:

  • .rename being like xarray: renaming axes names only or, where the object has a name (currently Series), renaming the object
  • .relabel used for reindexing-like operations with a mapping from old to new labels

@MaximilianR Maybe open a separate issue to discuss that? What kind of idiom to use in the signature maybe depends on this, but the question of adding such a method is separate discussion I think.

Member

jorisvandenbossche commented Feb 19, 2016

And, I know people have gone back & forth on this a bit - but I would also 'vote' for:

  • .rename being like xarray: renaming axes names only or, where the object has a name (currently Series), renaming the object
  • .relabel used for reindexing-like operations with a mapping from old to new labels

@MaximilianR Maybe open a separate issue to discuss that? What kind of idiom to use in the signature maybe depends on this, but the question of adding such a method is separate discussion I think.

@nickeubank

This comment has been minimized.

Show comment
Hide comment
@nickeubank

nickeubank Feb 20, 2016

Contributor

I think that @jorisvandenbossche's suggestion works perfectly. The real brilliance is that it even works for someone who used positional arguments for rename (i.e. typed df.rename({0:-99}) instead of df.rename(index={0:-99}))!

Contributor

nickeubank commented Feb 20, 2016

I think that @jorisvandenbossche's suggestion works perfectly. The real brilliance is that it even works for someone who used positional arguments for rename (i.e. typed df.rename({0:-99}) instead of df.rename(index={0:-99}))!

@nickeubank

This comment has been minimized.

Show comment
Hide comment
@nickeubank

nickeubank Feb 21, 2016

Contributor

I take that back – if somebody uses more than one positional argument (index and columns) the results will differ.

On further reflection, I think we only have two choices: break the API, or tack the new arguments on to the end of the argument list so anyone who uses positional arguments is OK.

Contributor

nickeubank commented Feb 21, 2016

I take that back – if somebody uses more than one positional argument (index and columns) the results will differ.

On further reflection, I think we only have two choices: break the API, or tack the new arguments on to the end of the argument list so anyone who uses positional arguments is OK.

@jorisvandenbossche

This comment has been minimized.

Show comment
Hide comment
@jorisvandenbossche

jorisvandenbossche Feb 21, 2016

Member

I take that back – if somebody uses more than one positional argument (index and columns) the results will differ.

I think even that should be possible to detect and warn. If the user did originally df.reindex(index, columns), with the new signature df.reindex(labels=None, index=None, columns=None, axis=0, ...) those would map to labels and index, but as you shouldn't use both at the same time, we can detect this case and give an informative message.

Member

jorisvandenbossche commented Feb 21, 2016

I take that back – if somebody uses more than one positional argument (index and columns) the results will differ.

I think even that should be possible to detect and warn. If the user did originally df.reindex(index, columns), with the new signature df.reindex(labels=None, index=None, columns=None, axis=0, ...) those would map to labels and index, but as you shouldn't use both at the same time, we can detect this case and give an informative message.

@nickeubank

This comment has been minimized.

Show comment
Hide comment
@nickeubank

nickeubank Feb 21, 2016

Contributor

@jorisvandenbossche My impression was that "backwards compatibility" / "not breaking the API" means that old code still runs fine -- an informative error beats a silent failure, but seems like that's still API-breaking.

An overview of where I think we stand:

1. Do nothing

2. Backwards Compatible

rename(index=None, columns=None, **kwargs) ->
rename(index=None, columns=None, labels=None, axis=0, **kwargs)
(where **kwargs now takes labels,axis)

drop(labels, axis=0, level=None, inplace=False, errors='raise')->
drop(labels, axis=0, level=None, inplace=False, errors='raise', index=None, columns=None)

Pros:
* Backwards compatible
* Can use both with same named arguments

Cons:
* Cannot use both with same positional argument patterns

3. Break-API - All options available

rename(index=None, columns=None, **kwargs) ->
rename(labels=None, axis=None, index=None, columns=None, labels=None, axis=0, **kwargs)

drop(labels, axis=0, level=None, inplace=False, errors='raise')->
drop(labels, axis=None, index=None, columns=None, level=None, inplace=False, errors='raise')

Pros:
* Backwards compatible for people who use named arguments
* Allows all forms of interaction

Cons:
* API Breaking

4. Break-API -- adopt labels,axis

rename(index=None, columns=None, **kwargs) ->
rename(labels=None, axis=0, labels=None, axis=0, **kwargs)

Pros:
* Conforms with syntax of other functions like apply
* Minimal number of functions broken

Cons:
* index/axis less readable than index/columns

5. Break-API -- adopt columns/index

drop(labels, axis=0, level=None, inplace=False, errors='raise')->
drop(index=None, columns=None, level=None, inplace=False, errors='raise')
Pros:
* More readable new API
* Only breaks a few functions

Cons:
* Not consistent with use of [transformation]/axis argument structure in other places

My take:

I think we should shoot for either 2 (to ensure backwards compatibility) or 4. 2 because I think api breaking for these kind of core functions is bad, and 4 because I'm increasingly won over by @jreback's argument -- while I prefer index/columns in general, I think that the labels/axis is more consistent with the general pandas library, and I think minimal API breaking is desirable.

Contributor

nickeubank commented Feb 21, 2016

@jorisvandenbossche My impression was that "backwards compatibility" / "not breaking the API" means that old code still runs fine -- an informative error beats a silent failure, but seems like that's still API-breaking.

An overview of where I think we stand:

1. Do nothing

2. Backwards Compatible

rename(index=None, columns=None, **kwargs) ->
rename(index=None, columns=None, labels=None, axis=0, **kwargs)
(where **kwargs now takes labels,axis)

drop(labels, axis=0, level=None, inplace=False, errors='raise')->
drop(labels, axis=0, level=None, inplace=False, errors='raise', index=None, columns=None)

Pros:
* Backwards compatible
* Can use both with same named arguments

Cons:
* Cannot use both with same positional argument patterns

3. Break-API - All options available

rename(index=None, columns=None, **kwargs) ->
rename(labels=None, axis=None, index=None, columns=None, labels=None, axis=0, **kwargs)

drop(labels, axis=0, level=None, inplace=False, errors='raise')->
drop(labels, axis=None, index=None, columns=None, level=None, inplace=False, errors='raise')

Pros:
* Backwards compatible for people who use named arguments
* Allows all forms of interaction

Cons:
* API Breaking

4. Break-API -- adopt labels,axis

rename(index=None, columns=None, **kwargs) ->
rename(labels=None, axis=0, labels=None, axis=0, **kwargs)

Pros:
* Conforms with syntax of other functions like apply
* Minimal number of functions broken

Cons:
* index/axis less readable than index/columns

5. Break-API -- adopt columns/index

drop(labels, axis=0, level=None, inplace=False, errors='raise')->
drop(index=None, columns=None, level=None, inplace=False, errors='raise')
Pros:
* More readable new API
* Only breaks a few functions

Cons:
* Not consistent with use of [transformation]/axis argument structure in other places

My take:

I think we should shoot for either 2 (to ensure backwards compatibility) or 4. 2 because I think api breaking for these kind of core functions is bad, and 4 because I'm increasingly won over by @jreback's argument -- while I prefer index/columns in general, I think that the labels/axis is more consistent with the general pandas library, and I think minimal API breaking is desirable.

@jorisvandenbossche

This comment has been minimized.

Show comment
Hide comment
@jorisvandenbossche

jorisvandenbossche Feb 22, 2016

Member

Nice overview!

@jorisvandenbossche My impression was that "backwards compatibility" / "not breaking the API" means that old code still runs fine -- an informative error beats a silent failure, but seems like that's still API-breaking.

@nickeubank An informative message does not necessarily need to be an error! It can also be a warning (or we can even decide to just pass it through correctly without warning, although I wouldn't do that). So I am still convinced this can be done in a backwards compatible way (and your options 2 and 3 can be combined).

\2. Backwards Compatible
...
Cons:

  • Cannot use both with same positional argument patterns

I don't think this is really a con, as using it with only positional arguments is never a sane thing to do regarding clarity of your code :-)

Further, I think there is 6th option: use separate methods for the two idioms (like reindex / reindex_axis)

So I think we have to choose between:

a) combine both idioms within the same methods and live with the bad API design (in a back compat or incompat way -> your options 2 and 3)
b) choose one of the idioms and deprecate the other (your options 4 and 5)
c) have separate methods for each idiom

I would personally be in favor of a)

Member

jorisvandenbossche commented Feb 22, 2016

Nice overview!

@jorisvandenbossche My impression was that "backwards compatibility" / "not breaking the API" means that old code still runs fine -- an informative error beats a silent failure, but seems like that's still API-breaking.

@nickeubank An informative message does not necessarily need to be an error! It can also be a warning (or we can even decide to just pass it through correctly without warning, although I wouldn't do that). So I am still convinced this can be done in a backwards compatible way (and your options 2 and 3 can be combined).

\2. Backwards Compatible
...
Cons:

  • Cannot use both with same positional argument patterns

I don't think this is really a con, as using it with only positional arguments is never a sane thing to do regarding clarity of your code :-)

Further, I think there is 6th option: use separate methods for the two idioms (like reindex / reindex_axis)

So I think we have to choose between:

a) combine both idioms within the same methods and live with the bad API design (in a back compat or incompat way -> your options 2 and 3)
b) choose one of the idioms and deprecate the other (your options 4 and 5)
c) have separate methods for each idiom

I would personally be in favor of a)

@nickeubank

This comment has been minimized.

Show comment
Hide comment
@nickeubank

nickeubank Feb 22, 2016

Contributor

@jorvisvandenbossche good call about positional argument differences not being a big deal.

I think that makes my 2 (backwards compatible with both sets or arguments) my preference.

Contributor

nickeubank commented Feb 22, 2016

@jorvisvandenbossche good call about positional argument differences not being a big deal.

I think that makes my 2 (backwards compatible with both sets or arguments) my preference.

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Feb 22, 2016

Contributor

@nickeubank can you survey all the methods and see which use each idiom? kind of like a value_counts, most important is prob number per class of idiom. (e.g. make several categories and measure how many methods of each type of idiom we have for both). Just to get an overview of the entire API.

Contributor

jreback commented Feb 22, 2016

@nickeubank can you survey all the methods and see which use each idiom? kind of like a value_counts, most important is prob number per class of idiom. (e.g. make several categories and measure how many methods of each type of idiom we have for both). Just to get an overview of the entire API.

@nickeubank

This comment has been minimized.

Show comment
Hide comment
@nickeubank

nickeubank Feb 22, 2016

Contributor

@jreback Sure, but will need some time -- busy week!

Contributor

nickeubank commented Feb 22, 2016

@jreback Sure, but will need some time -- busy week!

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Feb 22, 2016

Contributor

@nickeubank np. this issue would be for 0.19.0 in any event.

Contributor

jreback commented Feb 22, 2016

@nickeubank np. this issue would be for 0.19.0 in any event.

@nickeubank

This comment has been minimized.

Show comment
Hide comment
@nickeubank

nickeubank Mar 6, 2016

Contributor

A DataFrame has ~200 methods. Those that take columns as a modifier argument:

  • pivot
  • pivot_table
  • reindex
  • rename
  • sort (but now depreciated -- sort_values uses axis.

Also note that columns is a keyword for the following, but in a somewhat different context:

  • All to_[format] calls
  • from_items
  • from_records

axis is in too many to count, but the ones that seem to use as a modifier (as reindex uses columns) in alphabetical order:

  • add
  • align
  • all
  • any
  • apply
  • compound
  • corrwith
  • count
  • cummax, cummin, etc.
  • div, divide
  • diff
  • dropna
  • eq
  • fillna
  • floordiv
    ... (ok, gonna stop there. You get the idea. It's everywhere)

In light of that, I would vote for leaving drop and company as they are, and adding labels/axis named arguments to rename/reindex (and pivot?). My vote is to put at the end of the argument list for full backwards compatibility, but am open to suggestions.

Contributor

nickeubank commented Mar 6, 2016

A DataFrame has ~200 methods. Those that take columns as a modifier argument:

  • pivot
  • pivot_table
  • reindex
  • rename
  • sort (but now depreciated -- sort_values uses axis.

Also note that columns is a keyword for the following, but in a somewhat different context:

  • All to_[format] calls
  • from_items
  • from_records

axis is in too many to count, but the ones that seem to use as a modifier (as reindex uses columns) in alphabetical order:

  • add
  • align
  • all
  • any
  • apply
  • compound
  • corrwith
  • count
  • cummax, cummin, etc.
  • div, divide
  • diff
  • dropna
  • eq
  • fillna
  • floordiv
    ... (ok, gonna stop there. You get the idea. It's everywhere)

In light of that, I would vote for leaving drop and company as they are, and adding labels/axis named arguments to rename/reindex (and pivot?). My vote is to put at the end of the argument list for full backwards compatibility, but am open to suggestions.

@nickeubank

This comment has been minimized.

Show comment
Hide comment
@nickeubank

nickeubank May 8, 2016

Contributor

Revisiting this, seems like we came to a consensus on two things then got stuck.

Consensus:

  • Current state is problematic and harmonization is desirable
  • The norm in pandas is clearly label/axis, not columns/index. So we should probably move
    rename/reindex to labels/axis`.

No Consensus:

Seems we have three options:

Option 1: Add labels/axis to end of the argument list, leave columns/index in place
Pros:

  • Fully backward compatible

Cons:

  • Doesn't quite achieve harmonization

Option 2: Put labels/axis at the front of the argument list, push back columns/index but still accept

Pros:

  • Backward compatible for named arguments
  • If users pass only one positional argument, also backwards compatible. In old framework, that would correspond to index argument; in new framework, would correspond to labels with a default axis of 0.
  • If users pass multiple positional arguments (index and columns in old framework), an exception would be raised since nothing columns would accept would constitute a valid axis argument, so the failure would not be silent.

Cons:

  • Will break old code that used both columns and index

Option 3: Replace columns/index with labels/axis
Pros:

  • Cleaner

Cons:

  • Not backward compatible

Personally, I like 1 or 2 (though my indifference between the two is partially motivated by the fact I always name my arguments so they're equivalent for me ;))

Contributor

nickeubank commented May 8, 2016

Revisiting this, seems like we came to a consensus on two things then got stuck.

Consensus:

  • Current state is problematic and harmonization is desirable
  • The norm in pandas is clearly label/axis, not columns/index. So we should probably move
    rename/reindex to labels/axis`.

No Consensus:

Seems we have three options:

Option 1: Add labels/axis to end of the argument list, leave columns/index in place
Pros:

  • Fully backward compatible

Cons:

  • Doesn't quite achieve harmonization

Option 2: Put labels/axis at the front of the argument list, push back columns/index but still accept

Pros:

  • Backward compatible for named arguments
  • If users pass only one positional argument, also backwards compatible. In old framework, that would correspond to index argument; in new framework, would correspond to labels with a default axis of 0.
  • If users pass multiple positional arguments (index and columns in old framework), an exception would be raised since nothing columns would accept would constitute a valid axis argument, so the failure would not be silent.

Cons:

  • Will break old code that used both columns and index

Option 3: Replace columns/index with labels/axis
Pros:

  • Cleaner

Cons:

  • Not backward compatible

Personally, I like 1 or 2 (though my indifference between the two is partially motivated by the fact I always name my arguments so they're equivalent for me ;))

@toobaz

This comment has been minimized.

Show comment
Hide comment
@toobaz

toobaz Jul 17, 2017

Member

we actually already have rename_axis and reindex_axis for exactly this (for the axis-keyword idiom). So we could add a new drop-like method with the named axes idiom
But, what name to use for this? As the current drop should actually be "drop_axis", and the existing drop should be changed.
Is it needed to have two functions for each operation?

I think having two methods doing the same thing is confusing (less so if the documentation of each just clarified the difference from the other, but still I don't think both are worth keeping).

@MaximilianR Maybe open a separate issue to discuss that?

Done: #16990 . Clearly this discussion on the signature also applies to that bug, assuming my proposal (of adding .relabel) is accepted. I'm personally slightly in favor of index=, just because it is more common in pandas methods (although I do realize the difference between working on values and on indices, it's still good if the two have a similar interface).

Member

toobaz commented Jul 17, 2017

we actually already have rename_axis and reindex_axis for exactly this (for the axis-keyword idiom). So we could add a new drop-like method with the named axes idiom
But, what name to use for this? As the current drop should actually be "drop_axis", and the existing drop should be changed.
Is it needed to have two functions for each operation?

I think having two methods doing the same thing is confusing (less so if the documentation of each just clarified the difference from the other, but still I don't think both are worth keeping).

@MaximilianR Maybe open a separate issue to discuss that?

Done: #16990 . Clearly this discussion on the signature also applies to that bug, assuming my proposal (of adding .relabel) is accepted. I'm personally slightly in favor of index=, just because it is more common in pandas methods (although I do realize the difference between working on values and on indices, it's still good if the two have a similar interface).

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Oct 2, 2017

Contributor

@jorisvandenbossche any possibility of getting this in? obviously aside from #17644 which is merged

Contributor

jreback commented Oct 2, 2017

@jorisvandenbossche any possibility of getting this in? obviously aside from #17644 which is merged

@TomAugspurger

This comment has been minimized.

Show comment
Hide comment
@TomAugspurger

TomAugspurger Oct 5, 2017

Contributor

What's left to do here? The same changes to reindex and rename as Joris made to drop?

If so, I can put together a PR this afternoon.

Contributor

TomAugspurger commented Oct 5, 2017

What's left to do here? The same changes to reindex and rename as Joris made to drop?

If so, I can put together a PR this afternoon.

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Oct 5, 2017

Contributor

yep i think so; that’s a bit more involved though

Contributor

jreback commented Oct 5, 2017

yep i think so; that’s a bit more involved though

@TomAugspurger

This comment has been minimized.

Show comment
Hide comment
@TomAugspurger

TomAugspurger Oct 5, 2017

Contributor

Yes, I was just going to post that :) I may have found a (somewhat) hacky solution. Will have the start of a PR in a bit.

The difficulty is disambiguating

>>> df.rename(fn, axis=1)  # OK
>>> df.rename(index=fn, axis=1)  # TypeError

But I may have a way.

Contributor

TomAugspurger commented Oct 5, 2017

Yes, I was just going to post that :) I may have found a (somewhat) hacky solution. Will have the start of a PR in a bit.

The difficulty is disambiguating

>>> df.rename(fn, axis=1)  # OK
>>> df.rename(index=fn, axis=1)  # TypeError

But I may have a way.

@TomAugspurger

This comment has been minimized.

Show comment
Hide comment
@TomAugspurger

TomAugspurger Oct 5, 2017

Contributor

How much to want to do the other side of this though? As I'm writing the release notes for adding axis to rename and it reads strange coming right after the drop section adding index / columns.

I'm comfortable with recommending index=, columns= as the preferred way going forward. Adding axis to reindex and rename is (implicitly) recommending the other style.

Contributor

TomAugspurger commented Oct 5, 2017

How much to want to do the other side of this though? As I'm writing the release notes for adding axis to rename and it reads strange coming right after the drop section adding index / columns.

I'm comfortable with recommending index=, columns= as the preferred way going forward. Adding axis to reindex and rename is (implicitly) recommending the other style.

@toobaz

This comment has been minimized.

Show comment
Hide comment
@toobaz

toobaz Oct 5, 2017

Member

I'm comfortable with recommending index=, columns= as the preferred way going forward

I think that @nickeubank 's comment provides strong evidence in favor of axis=. Together with coherence with numpy, which won't harm, and with the use of dim= in xarray. And while apparently axis=1 is not considered very pythonic (not so obvious to me), and coherence with numpy is not top priority, being able to do axis="columns" looks to me sufficient to restore readability.

Keeping both approaches where index= and columns= are already present is the best solution, but I think the standard/recommended way should be axis=, which incidentally is also often simpler to implement.

Member

toobaz commented Oct 5, 2017

I'm comfortable with recommending index=, columns= as the preferred way going forward

I think that @nickeubank 's comment provides strong evidence in favor of axis=. Together with coherence with numpy, which won't harm, and with the use of dim= in xarray. And while apparently axis=1 is not considered very pythonic (not so obvious to me), and coherence with numpy is not top priority, being able to do axis="columns" looks to me sufficient to restore readability.

Keeping both approaches where index= and columns= are already present is the best solution, but I think the standard/recommended way should be axis=, which incidentally is also often simpler to implement.

@TomAugspurger

This comment has been minimized.

Show comment
Hide comment
@TomAugspurger

TomAugspurger Oct 5, 2017

Contributor

Yes, re-reading that comment does make a good case for it.

OK then, I'll put up my WIP for rename, and finish it up later tonight.

Contributor

TomAugspurger commented Oct 5, 2017

Yes, re-reading that comment does make a good case for it.

OK then, I'll put up my WIP for rename, and finish it up later tonight.

@toobaz

This comment has been minimized.

Show comment
Hide comment
@toobaz

toobaz Oct 6, 2017

Member

(By the way: something else good, and very pythonic, about axis= is that the reader knows by definition that a method he once saw used on e.g. index works exactly in the same way on columns, or vice-versa)

Member

toobaz commented Oct 6, 2017

(By the way: something else good, and very pythonic, about axis= is that the reader knows by definition that a method he once saw used on e.g. index works exactly in the same way on columns, or vice-versa)

@jorisvandenbossche

This comment has been minimized.

Show comment
Hide comment
@jorisvandenbossche

jorisvandenbossche Oct 6, 2017

Member

I disagree with that comment (#12392 (comment)): it is correct that the axis idiom is used a lot more in pandas, but we are speaking here about very specific functions where this comparison does not hold.
Eg in df.mean(axis=) you are applying the function over either axis (this would be difficult to express with index= or columns= arguments). But in the rename/drop methods, you are altering one of the axes, not applying a function along one of the axes. In that case, the index/columns args do make sense in a way that is not comparable to all those other methods that take the axis arg (and in that sense: yes, I personally will recommend people to write drop(columns=[..]) instead of drop([..], axis=1)).

But anyhow, that's not really that relevant anymore :-) As it is good to make them consistent anyway, which means adding axis to rename, and then people can do what they like most.

@TomAugspurger Thanks for picking this up! Will look at the PR now.

Member

jorisvandenbossche commented Oct 6, 2017

I disagree with that comment (#12392 (comment)): it is correct that the axis idiom is used a lot more in pandas, but we are speaking here about very specific functions where this comparison does not hold.
Eg in df.mean(axis=) you are applying the function over either axis (this would be difficult to express with index= or columns= arguments). But in the rename/drop methods, you are altering one of the axes, not applying a function along one of the axes. In that case, the index/columns args do make sense in a way that is not comparable to all those other methods that take the axis arg (and in that sense: yes, I personally will recommend people to write drop(columns=[..]) instead of drop([..], axis=1)).

But anyhow, that's not really that relevant anymore :-) As it is good to make them consistent anyway, which means adding axis to rename, and then people can do what they like most.

@TomAugspurger Thanks for picking this up! Will look at the PR now.

TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Oct 6, 2017

TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Oct 10, 2017

TomAugspurger added a commit that referenced this issue Oct 10, 2017

API: Added axis argument to rename, reindex (#17800)
* API: Added axis argument to rename

xref: #12392

* API: Accept 'axis' keyword argument for reindex

@TomAugspurger TomAugspurger modified the milestones: 0.21.0, Next Major Release Oct 12, 2017

@TomAugspurger

This comment has been minimized.

Show comment
Hide comment
@TomAugspurger

TomAugspurger Oct 12, 2017

Contributor

Were reindex and rename the last ones needed here? Can this be closed?

Contributor

TomAugspurger commented Oct 12, 2017

Were reindex and rename the last ones needed here? Can this be closed?

@jorisvandenbossche

This comment has been minimized.

Show comment
Hide comment
@jorisvandenbossche

jorisvandenbossche Oct 13, 2017

Member

Yes, I think drop, rename and reindex were the only ones.

Closed by #17644, #17800 and #17842

Member

jorisvandenbossche commented Oct 13, 2017

Yes, I think drop, rename and reindex were the only ones.

Closed by #17644, #17800 and #17842

@jorisvandenbossche jorisvandenbossche modified the milestones: Next Major Release, 0.21.0 Oct 13, 2017

kchomski-reef added a commit to reef-technologies/pandas that referenced this issue Oct 16, 2017

API: Added axis argument to rename, reindex (pandas-dev#17800)
* API: Added axis argument to rename

xref: pandas-dev#12392

* API: Accept 'axis' keyword argument for reindex

alanbato added a commit to alanbato/pandas that referenced this issue Nov 10, 2017

API: Added axis argument to rename, reindex (pandas-dev#17800)
* API: Added axis argument to rename

xref: pandas-dev#12392

* API: Accept 'axis' keyword argument for reindex

No-Stream added a commit to No-Stream/pandas that referenced this issue Nov 28, 2017

API: Added axis argument to rename, reindex (pandas-dev#17800)
* API: Added axis argument to rename

xref: pandas-dev#12392

* API: Accept 'axis' keyword argument for reindex
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment