Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: MultiIndex.labels -> codes #13443

Closed
chris-b1 opened this issue Jun 14, 2016 · 11 comments · Fixed by #23752
Closed

API: MultiIndex.labels -> codes #13443

chris-b1 opened this issue Jun 14, 2016 · 11 comments · Fixed by #23752
Labels
API Design Deprecate Functionality to remove in pandas MultiIndex Needs Discussion Requires discussion from core team before further action
Milestone

Comments

@chris-b1
Copy link
Contributor

The boat may have long sailed on this, but just for consideration.

I semi-frequently get the .levels and .labels of a MultiIndex backwards. Maybe it's just me, but I think labels is the culprit, because in other pandas contexts, "labels" refer to the actual value of the thing. E.g.

  • .loc indexes by "labels" (values)
  • the values inside of a single row index or columns are the "labels" for those items

So, consistent with Categorical, would it make sense for the integer mapping inside a MultiIndex to also be called .codes?

@jreback
Copy link
Contributor

jreback commented Jun 14, 2016

IIRC this originally came from R land. I actually have no problem with this, esp for consistency.

its a tiny bit tricky to do this because you have to accept both args for a while.

@jreback jreback added API Design MultiIndex Deprecate Functionality to remove in pandas labels Jun 14, 2016
@jreback jreback added this to the 0.19.0 milestone Jun 14, 2016
@jorisvandenbossche jorisvandenbossche added the Needs Discussion Requires discussion from core team before further action label Jun 14, 2016
@jorisvandenbossche
Copy link
Member

I personally also find the current naming confusing (but indeed separate question is if there is something better that is workable to change to)

Besides labels, I also find levels confusing. The levels are actually what we now call the categories in the Categorical API (I even find labels a better name for levels). I find this confusing because of the usage of the level= keyword argument in many functions (so the idea that the MultiIndex has several 'levels').

@shoyer
Copy link
Member

shoyer commented Jun 14, 2016

+1 for the attribute names .categories and .codes. This is mostly internal/advanced API, so I don't think this will be too painful. The level= keyword argument can stay -- it means something entirely different.

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Jun 14, 2016

The level= keyword argument can stay -- it means something entirely different.

Yes certainly (I didn't want to imply that). That is the logical use for this name, and that is what makes the 'other' levels a confusing name.

BTW, I also think that the current repr is not the best one. As @shoyer put, the levels/labels are 'mostly internal', so then we shouldn't show them by default in repr? But will open a separate issue for that. EDIT -> #13480

@jorisvandenbossche
Copy link
Member

+1 for the attribute names .categories and .codes

codes is certainly OK, but I am personally a bit less enthusiastic about categories. I know implementation wise (codes/labels and categories/levels) they are very similar, but for a lot of users, I don't think they see it that way. So calling it categories may be confusing as well.
(although, since we are just arguing it is more internal-like, it maybe does not matter that much)

@jreback
Copy link
Contributor

jreback commented Mar 23, 2017

anyone have appetite for changing .labels -> .codes? This is purely internal.

I think this would be a positive change here. (let's leave levels as is though).

@jreback jreback modified the milestones: 0.20.0, 0.21.0 Apr 12, 2017
@jreback
Copy link
Contributor

jreback commented Sep 23, 2017

@chris-b1 do you want to do this one?

@jreback jreback modified the milestones: 0.21.0, 1.0 Oct 2, 2017
@topper-123
Copy link
Contributor

I would like to take this on, after #22511 is merged.

So I'll do: labels -> codes, but leave levels alone. Also I'll make related changes, e.g. set_labels -> set_codes etc.

@jorisvandenbossche
Copy link
Member

@topper-123 Very welcome to take this up! I think your summary is correct.

Only, I don't think it needs to depend on #22511, as it is (AFAIU) independent of the repr.
(of course, once one or the other is merged, you will need to deal with merge conflicts that might be annoying)

@topper-123
Copy link
Contributor

topper-123 commented Nov 7, 2018

Hey @jorisvandenbossche, I'm thinking about the repr output and doc strings here. If it weren't for #22511, the repr output would change from MultiIndex(..., labels=...) to MultiIndex(..., codes=...), while #22511 changes the repr in a different way. So these two issues are connected through the repr output, through test touching the repr output and examples of MultiIndex usage in the doc string in various locations...

I had a working implementation of this a week ago (locally, was not pushed), so should be easy to update, but would prefer to get the repr issue settled before pushing.

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Nov 8, 2018

@topper-123 I think you could ignore the repr here (only change the internals and the method/attribute names, and leave the repr alone). Of course we need to change the repr eventually, but since you have the other PR, I would not worry about it here.
So I would recommend to already push your working implementation into a PR, so we can already start reviewing it (we can still later decide which PR to merge first).

andrewsanchez added a commit to andrewsanchez/q2-diversity that referenced this issue Feb 12, 2020
nbokulich pushed a commit to qiime2/q2-diversity that referenced this issue Feb 14, 2020
* Use MultiIndex.codes instead of labels (no longer support)

Please see these links for the rational if interested:

pandas-dev/pandas#13443

pandas-dev/pandas#23752

* Unpack and name values returned by _reindex_with_metadata

This makes it easier to see what's going on below with the values
returned by this function.

* Initial patch to handle new pandas error

This prevents attempting to drop columns that don't exist in merged.columns
after setting the index, while still dropping columns that are present in
merged.columns.  Attempting to do so raises an exception in pandas >= 1.

Please see pandas-dev/pandas#8594
for details.

* Avoid mutating `merged` in place by assigning to new variable

This avoids attempting to drop columns that had already been dropped in previous
calls to _reindex_with_metadata in the for loop in `alpha_rarefaction`.

Co-authored-by: Matthew Dillon <matthewrdillon@gmail.com>

Co-authored-by: Matthew Dillon <matthewrdillon@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Deprecate Functionality to remove in pandas MultiIndex Needs Discussion Requires discussion from core team before further action
Projects
None yet
5 participants