ENH: Harmonize drop and rename API #12392

nickeubank · 2016-02-19T04:12:45Z

rename accepts a columns argument or an index argument, while drop looks for a labels and axis pair. I don't know about anyone else, but I have to check the help file every time I come back to pandas to remember which takes which.

How would people feel about adding columns and index arguments to drop? They could just be added in addition to labels/axis if we want to provide backwards compatibility and just raise an exception if the user tries to mix them.

The text was updated successfully, but these errors were encountered:

jreback · 2016-02-19T13:15:41Z

actually this is a bigger API issue that @TomAugspurger and I briefly touched on here

.rename and .rename_axis
.reindex and .reindex_axis
are consistent with each other

.drop and .fillna are also consistent (just not with the others)

So thoughts on how to proceed here. I'd rather not make add hoc changes, rather try to construct an overall consistent way of doing things; we can certainly provide back-compat, but unifying things is probably a good thing.

TomAugspurger · 2016-02-19T13:30:58Z

I don't have a strong preference for one style over the other. The only upshot of the .rename(index=, columns=) approach is that you can do both at once instead of .rename_axis(index).rename_axis(columns, axis=1), very minor.

I would slightly favor just recommending and documenting the _axis methods (with labels, axis) rather than changing any method signatures.

jreback · 2016-02-19T13:35:25Z

do you think we should add corresponding .drop_axis and .fillna_axis? or too much clutter

nickeubank · 2016-02-19T15:59:42Z

Personally, I have a preference for columns and index as arguments -- they've always felt more intuitive and pythonic to me. But that's second to the value of harmonization.

Just documenting the _axis methods still leaves an uncomfortable inconsistency though, no? We offer a work around, I'd be in favor of fixing .drop and .fillna.

I'm agnostic on adding .drop_axis/.fillna_axis methods.

If we change the .drop and .fillna methods to take columns, index, do we still want to support the labels, axis arguments for backwards compatibility or break the api?

jreback · 2016-02-19T16:02:54Z

why don't you list all of the relevant methods (might be some more that I am forgetting), and make a proposal.

nickeubank · 2016-02-19T16:05:43Z

OK

nickeubank · 2016-02-19T16:18:18Z

drop and fillna:

change primary arguments from labels, axis to columns, index
Accept labels, axis arguments for backward compatibility, but move to back of argument list
(note this will break code by people who passed labels as first positional argument, but ok since will throw and exception
if no positional arguments allowed)

drop_axis and fillna_axis:

New method that accepts labels, axis

Others:

Could implement for apply if we really wanted? I'm dis-inclined, but possible.
Could implement for add() , sub() , mul(), div(), radd(), rsum(), etc...

Open question:

How should these work for panels? (I never use panels, so not sure of best practices)

jreback · 2016-02-19T16:21:19Z

see that's the problem. In reality we should leave everything alone and maybe just change reindex/rename. The labels/axis idiom is much more common (and to be honest quite a bit more useful). Rarely do you actually change 2 things at once (which violates many pythonic principles). I would rather chain things like:

.reindex(...., axis='index').reindex(...., axis='columns')

though we are actually flexible enough to accept both paradigms.

nickeubank · 2016-02-19T16:24:55Z

Oh, I don't really care about the "two things at once" -- I just liked the "columns" argument for being more meaningful.

So your preference is:

reindex/rename:
- change primary arguments to label / axis
- keep taking columns / index for backwards compatibility?

That's fine by me -- like I said, I'm mostly interested in harmonization!

max-sixty · 2016-02-19T17:10:34Z

The labels/axis idiom is much more common (and to be honest quite a bit more useful). Rarely do you actually change 2 things at once (which violates many pythonic principles).

+1

And, I know people have gone back & forth on this a bit - but I would also 'vote' for:

.rename being like xarray: renaming axes names only or, where the object has a name (currently Series), renaming the object
.relabel used for reindexing-like operations with a mapping from old to new labels

shoyer · 2016-02-19T20:16:45Z

The labels/axis idiom is much more common (and to be honest quite a bit more useful). Rarely do you actually change 2 things at once (which violates many pythonic principles).

I agree that changing 2 things at once is not a great API, but I agree with @nickeubank that explicit columns and index arguments make for more readable code: compare df.drop(columns='foo') vs df.drop('foo', axis='columns') (or worse, df.drop('foo', axis=1), which is assuredly more common because it's less typing).

jorisvandenbossche · 2016-02-19T23:31:54Z

I would like to avoid adding new methods as drop_axis (which is actually not a good name IMO, as it sounds you want to drop a full axis, while you want do drop certain items from an axis)

Further, I think we should make a clear distinction between methods that modify the axis (rename, drop, reindex), and methods that perform operation over a certain axis (apply, add, ..). Those last ones use the axis= idiom to specify the direction of operation, and that is indeed a common idiom. I think the discussion should only be about rename, reindex and drop

I personally also like the explicit column and index arguments in eg df.rename(columns=..) (this reads very natural). So I would not like to see these go (or deprecated).

It is not really good API design, but I think it is perfectly possible to combine both idioms in one method for all of the discussed functions as kind of a compromise?
For example, changes of the current signature could be:

df.reindex(index=None, columns=None, ...) -> df.reindex(labels=None, index=None, columns=None, axis=0, ...)
df.drop(labels, axis=0, ...) -> df.drop(labels=None, axis=0, index=None, columns=None, ...)

Which would be I think backwards compatible?
That would kind of harmonize the api for the different methods, but have the bad design of providing two ways to do something in one function.

jorisvandenbossche · 2016-02-19T23:33:57Z

And, I know people have gone back & forth on this a bit - but I would also 'vote' for:

.rename being like xarray: renaming axes names only or, where the object has a name (currently Series), renaming the object

.relabel used for reindexing-like operations with a mapping from old to new labels

@MaximilianR Maybe open a separate issue to discuss that? What kind of idiom to use in the signature maybe depends on this, but the question of adding such a method is separate discussion I think.

nickeubank · 2016-02-20T19:30:17Z

I think that @jorisvandenbossche's suggestion works perfectly. The real brilliance is that it even works for someone who used positional arguments for rename (i.e. typed df.rename({0:-99}) instead of df.rename(index={0:-99}))!

nickeubank · 2016-02-21T01:16:16Z

I take that back – if somebody uses more than one positional argument (index and columns) the results will differ.

On further reflection, I think we only have two choices: break the API, or tack the new arguments on to the end of the argument list so anyone who uses positional arguments is OK.

jorisvandenbossche · 2016-02-21T10:02:18Z

I take that back – if somebody uses more than one positional argument (index and columns) the results will differ.

I think even that should be possible to detect and warn. If the user did originally df.reindex(index, columns), with the new signature df.reindex(labels=None, index=None, columns=None, axis=0, ...) those would map to labels and index, but as you shouldn't use both at the same time, we can detect this case and give an informative message.

nickeubank · 2016-02-21T16:49:29Z

@jorisvandenbossche My impression was that "backwards compatibility" / "not breaking the API" means that old code still runs fine -- an informative error beats a silent failure, but seems like that's still API-breaking.

An overview of where I think we stand:

1. Do nothing

2. Backwards Compatible

rename(index=None, columns=None, **kwargs) ->
rename(index=None, columns=None, labels=None, axis=0, **kwargs)
(where **kwargs now takes labels,axis)

drop(labels, axis=0, level=None, inplace=False, errors='raise')->
drop(labels, axis=0, level=None, inplace=False, errors='raise', index=None, columns=None)

Pros:
* Backwards compatible
* Can use both with same named arguments

Cons:
* Cannot use both with same positional argument patterns

3. Break-API - All options available

rename(index=None, columns=None, **kwargs) ->
rename(labels=None, axis=None, index=None, columns=None, labels=None, axis=0, **kwargs)

drop(labels, axis=0, level=None, inplace=False, errors='raise')->
drop(labels, axis=None, index=None, columns=None, level=None, inplace=False, errors='raise')

Pros:
* Backwards compatible for people who use named arguments
* Allows all forms of interaction

Cons:
* API Breaking

4. Break-API -- adopt labels,axis

rename(index=None, columns=None, **kwargs) ->
rename(labels=None, axis=0, labels=None, axis=0, **kwargs)

Pros:
* Conforms with syntax of other functions like apply
* Minimal number of functions broken

Cons:
* index/axis less readable than index/columns

5. Break-API -- adopt columns/index

drop(labels, axis=0, level=None, inplace=False, errors='raise')->
drop(index=None, columns=None, level=None, inplace=False, errors='raise')
Pros:
* More readable new API
* Only breaks a few functions

Cons:
* Not consistent with use of [transformation]/axis argument structure in other places

My take:

I think we should shoot for either 2 (to ensure backwards compatibility) or 4. 2 because I think api breaking for these kind of core functions is bad, and 4 because I'm increasingly won over by @jreback's argument -- while I prefer index/columns in general, I think that the labels/axis is more consistent with the general pandas library, and I think minimal API breaking is desirable.

jorisvandenbossche · 2016-02-22T10:50:22Z

Nice overview!

@jorisvandenbossche My impression was that "backwards compatibility" / "not breaking the API" means that old code still runs fine -- an informative error beats a silent failure, but seems like that's still API-breaking.

@nickeubank An informative message does not necessarily need to be an error! It can also be a warning (or we can even decide to just pass it through correctly without warning, although I wouldn't do that). So I am still convinced this can be done in a backwards compatible way (and your options 2 and 3 can be combined).

\2. Backwards Compatible
...
Cons:

Cannot use both with same positional argument patterns

I don't think this is really a con, as using it with only positional arguments is never a sane thing to do regarding clarity of your code :-)

Further, I think there is 6th option: use separate methods for the two idioms (like reindex / reindex_axis)

So I think we have to choose between:

a) combine both idioms within the same methods and live with the bad API design (in a back compat or incompat way -> your options 2 and 3)
b) choose one of the idioms and deprecate the other (your options 4 and 5)
c) have separate methods for each idiom

I would personally be in favor of a)

nickeubank · 2016-02-22T16:04:46Z

@jorvisvandenbossche good call about positional argument differences not being a big deal.

I think that makes my 2 (backwards compatible with both sets or arguments) my preference.

jreback · 2016-02-22T16:38:42Z

@nickeubank can you survey all the methods and see which use each idiom? kind of like a value_counts, most important is prob number per class of idiom. (e.g. make several categories and measure how many methods of each type of idiom we have for both). Just to get an overview of the entire API.

nickeubank · 2016-02-22T19:08:52Z

@jreback Sure, but will need some time -- busy week!

jreback · 2016-02-22T19:33:27Z

@nickeubank np. this issue would be for 0.19.0 in any event.

nickeubank · 2016-03-06T22:33:22Z

A DataFrame has ~200 methods. Those that take columns as a modifier argument:

pivot
pivot_table
reindex
rename
sort (but now depreciated -- sort_values uses axis.

Also note that columns is a keyword for the following, but in a somewhat different context:

All to_[format] calls
from_items
from_records

axis is in too many to count, but the ones that seem to use as a modifier (as reindex uses columns) in alphabetical order:

add
align
all
any
apply
compound
corrwith
count
cummax, cummin, etc.
div, divide
diff
dropna
eq
fillna
floordiv
... (ok, gonna stop there. You get the idea. It's everywhere)

In light of that, I would vote for leaving drop and company as they are, and adding labels/axis named arguments to rename/reindex (and pivot?). My vote is to put at the end of the argument list for full backwards compatibility, but am open to suggestions.

nickeubank · 2016-05-08T16:20:07Z

Revisiting this, seems like we came to a consensus on two things then got stuck.

Consensus:

Current state is problematic and harmonization is desirable
The norm in pandas is clearly label/axis, not columns/index. So we should probably move
rename/reindex to labels/axis`.

No Consensus:

Seems we have three options:

Option 1: Add labels/axis to end of the argument list, leave columns/index in place
Pros:

Fully backward compatible

Cons:

Doesn't quite achieve harmonization

Option 2: Put labels/axis at the front of the argument list, push back columns/index but still accept

Pros:

Backward compatible for named arguments
If users pass only one positional argument, also backwards compatible. In old framework, that would correspond to index argument; in new framework, would correspond to labels with a default axis of 0.
If users pass multiple positional arguments (index and columns in old framework), an exception would be raised since nothing columns would accept would constitute a valid axis argument, so the failure would not be silent.

Cons:

Will break old code that used both columns and index

Option 3: Replace columns/index with labels/axis
Pros:

Cleaner

Cons:

Not backward compatible

Personally, I like 1 or 2 (though my indifference between the two is partially motivated by the fact I always name my arguments so they're equivalent for me ;))

toobaz · 2017-07-17T07:12:29Z

we actually already have rename_axis and reindex_axis for exactly this (for the axis-keyword idiom). So we could add a new drop-like method with the named axes idiom
But, what name to use for this? As the current drop should actually be "drop_axis", and the existing drop should be changed.
Is it needed to have two functions for each operation?

I think having two methods doing the same thing is confusing (less so if the documentation of each just clarified the difference from the other, but still I don't think both are worth keeping).

@MaximilianR Maybe open a separate issue to discuss that?

Done: #16990 . Clearly this discussion on the signature also applies to that bug, assuming my proposal (of adding .relabel) is accepted. I'm personally slightly in favor of index=, just because it is more common in pandas methods (although I do realize the difference between working on values and on indices, it's still good if the two have a similar interface).

jreback · 2017-10-02T12:52:11Z

@jorisvandenbossche any possibility of getting this in? obviously aside from #17644 which is merged

TomAugspurger · 2017-10-05T20:04:47Z

What's left to do here? The same changes to reindex and rename as Joris made to drop?

If so, I can put together a PR this afternoon.

jreback · 2017-10-05T21:10:37Z

yep i think so; that’s a bit more involved though

TomAugspurger · 2017-10-05T21:14:20Z

Yes, I was just going to post that :) I may have found a (somewhat) hacky solution. Will have the start of a PR in a bit.

The difficulty is disambiguating

>>> df.rename(fn, axis=1)  # OK
>>> df.rename(index=fn, axis=1)  # TypeError

But I may have a way.

TomAugspurger · 2017-10-05T21:50:41Z

How much to want to do the other side of this though? As I'm writing the release notes for adding axis to rename and it reads strange coming right after the drop section adding index / columns.

I'm comfortable with recommending index=, columns= as the preferred way going forward. Adding axis to reindex and rename is (implicitly) recommending the other style.

toobaz · 2017-10-05T22:08:01Z

I'm comfortable with recommending index=, columns= as the preferred way going forward

I think that @nickeubank 's comment provides strong evidence in favor of axis=. Together with coherence with numpy, which won't harm, and with the use of dim= in xarray. And while apparently axis=1 is not considered very pythonic (not so obvious to me), and coherence with numpy is not top priority, being able to do axis="columns" looks to me sufficient to restore readability.

Keeping both approaches where index= and columns= are already present is the best solution, but I think the standard/recommended way should be axis=, which incidentally is also often simpler to implement.

TomAugspurger · 2017-10-05T22:11:26Z

Yes, re-reading that comment does make a good case for it.

OK then, I'll put up my WIP for rename, and finish it up later tonight.

xref: pandas-dev#12392

toobaz · 2017-10-06T08:58:45Z

(By the way: something else good, and very pythonic, about axis= is that the reader knows by definition that a method he once saw used on e.g. index works exactly in the same way on columns, or vice-versa)

jorisvandenbossche · 2017-10-06T12:11:42Z

I disagree with that comment (#12392 (comment)): it is correct that the axis idiom is used a lot more in pandas, but we are speaking here about very specific functions where this comparison does not hold.
Eg in df.mean(axis=) you are applying the function over either axis (this would be difficult to express with index= or columns= arguments). But in the rename/drop methods, you are altering one of the axes, not applying a function along one of the axes. In that case, the index/columns args do make sense in a way that is not comparable to all those other methods that take the axis arg (and in that sense: yes, I personally will recommend people to write drop(columns=[..]) instead of drop([..], axis=1)).

But anyhow, that's not really that relevant anymore :-) As it is good to make them consistent anyway, which means adding axis to rename, and then people can do what they like most.

@TomAugspurger Thanks for picking this up! Will look at the PR now.

xref: pandas-dev#12392

* API: Added axis argument to rename xref: #12392 * API: Accept 'axis' keyword argument for reindex

TomAugspurger · 2017-10-12T19:25:12Z

Were reindex and rename the last ones needed here? Can this be closed?

jorisvandenbossche · 2017-10-13T08:46:20Z

Yes, I think drop, rename and reindex were the only ones.

Closed by #17644, #17800 and #17842

* API: Added axis argument to rename xref: pandas-dev#12392 * API: Accept 'axis' keyword argument for reindex

jreback added API Design Needs Discussion Requires discussion from core team before further action labels Feb 19, 2016

jreback changed the title ~~ENH: Harmonize drop and rename API~~ ENH: Harmonize drop and rename API Feb 19, 2016

jreback added this to the 0.19.0 milestone Feb 22, 2016

jreback added Difficulty Intermediate labels Feb 22, 2016

toobaz mentioned this issue Jul 17, 2017

Add a .relabel method; deprecate .rename and .rename_axis for relabeling #16990

Closed

TomAugspurger mentioned this issue Aug 18, 2017

RLS: 1.0 #17287

Closed

6 tasks

jorisvandenbossche mentioned this issue Sep 23, 2017

API: harmonize drop/reindex/rename args (GH12392) - drop #17644

Merged

TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Oct 5, 2017

API: Added axis argument to rename

fa4358d

xref: pandas-dev#12392

TomAugspurger mentioned this issue Oct 5, 2017

API: Added axis argument to rename, reindex #17800

Merged

TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Oct 6, 2017

API: Added axis argument to rename

ac7b59e

xref: pandas-dev#12392

jorisvandenbossche mentioned this issue Oct 10, 2017

API: deprecate rename_axis / reindex_axis ? #17833

Closed

TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Oct 10, 2017

API: Added axis argument to rename

b06e726

xref: pandas-dev#12392

TomAugspurger added a commit that referenced this issue Oct 10, 2017

API: Added axis argument to rename, reindex (#17800)

727ea20

* API: Added axis argument to rename xref: #12392 * API: Accept 'axis' keyword argument for reindex

TomAugspurger modified the milestones: 0.21.0, Next Major Release Oct 12, 2017

jorisvandenbossche closed this as completed Oct 13, 2017

jorisvandenbossche modified the milestones: Next Major Release, 0.21.0 Oct 13, 2017

ghost pushed a commit to reef-technologies/pandas that referenced this issue Oct 16, 2017

API: Added axis argument to rename, reindex (pandas-dev#17800)

daaaae3

* API: Added axis argument to rename xref: pandas-dev#12392 * API: Accept 'axis' keyword argument for reindex

alanbato pushed a commit to alanbato/pandas that referenced this issue Nov 10, 2017

API: Added axis argument to rename, reindex (pandas-dev#17800)

ae3a18a

* API: Added axis argument to rename xref: pandas-dev#12392 * API: Accept 'axis' keyword argument for reindex

No-Stream pushed a commit to No-Stream/pandas that referenced this issue Nov 28, 2017

API: Added axis argument to rename, reindex (pandas-dev#17800)

a7b634c

* API: Added axis argument to rename xref: pandas-dev#12392 * API: Accept 'axis' keyword argument for reindex

ghost mentioned this issue Jul 22, 2019

ENH: Add Series.set_index #27504

Closed

4 tasks

ENH: Harmonize drop and rename API #12392

ENH: Harmonize drop and rename API #12392

Comments

nickeubank commented Feb 19, 2016

jreback commented Feb 19, 2016

TomAugspurger commented Feb 19, 2016

jreback commented Feb 19, 2016

nickeubank commented Feb 19, 2016

jreback commented Feb 19, 2016

nickeubank commented Feb 19, 2016

nickeubank commented Feb 19, 2016

jreback commented Feb 19, 2016

nickeubank commented Feb 19, 2016

max-sixty commented Feb 19, 2016

shoyer commented Feb 19, 2016

jorisvandenbossche commented Feb 19, 2016

jorisvandenbossche commented Feb 19, 2016

nickeubank commented Feb 20, 2016

nickeubank commented Feb 21, 2016

jorisvandenbossche commented Feb 21, 2016

nickeubank commented Feb 21, 2016

jorisvandenbossche commented Feb 22, 2016

nickeubank commented Feb 22, 2016

jreback commented Feb 22, 2016

nickeubank commented Feb 22, 2016

jreback commented Feb 22, 2016

nickeubank commented Mar 6, 2016

nickeubank commented May 8, 2016

Consensus:

No Consensus:

toobaz commented Jul 17, 2017

jreback commented Oct 2, 2017 • edited Loading

TomAugspurger commented Oct 5, 2017 • edited Loading

jreback commented Oct 5, 2017

TomAugspurger commented Oct 5, 2017 • edited Loading

TomAugspurger commented Oct 5, 2017

toobaz commented Oct 5, 2017 • edited Loading

TomAugspurger commented Oct 5, 2017

toobaz commented Oct 6, 2017

jorisvandenbossche commented Oct 6, 2017

TomAugspurger commented Oct 12, 2017

jorisvandenbossche commented Oct 13, 2017

jreback commented Oct 2, 2017 •

edited

Loading

TomAugspurger commented Oct 5, 2017 •

edited

Loading

TomAugspurger commented Oct 5, 2017 •

edited

Loading

toobaz commented Oct 5, 2017 •

edited

Loading