Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: MultiIndex.labels -> codes #13443

Closed
chris-b1 opened this issue Jun 14, 2016 · 11 comments

Comments

Projects
None yet
5 participants
@chris-b1
Copy link
Contributor

commented Jun 14, 2016

The boat may have long sailed on this, but just for consideration.

I semi-frequently get the .levels and .labels of a MultiIndex backwards. Maybe it's just me, but I think labels is the culprit, because in other pandas contexts, "labels" refer to the actual value of the thing. E.g.

  • .loc indexes by "labels" (values)
  • the values inside of a single row index or columns are the "labels" for those items

So, consistent with Categorical, would it make sense for the integer mapping inside a MultiIndex to also be called .codes?

@jreback

This comment has been minimized.

Copy link
Contributor

commented Jun 14, 2016

IIRC this originally came from R land. I actually have no problem with this, esp for consistency.

its a tiny bit tricky to do this because you have to accept both args for a while.

@jorisvandenbossche

This comment has been minimized.

Copy link
Member

commented Jun 14, 2016

I personally also find the current naming confusing (but indeed separate question is if there is something better that is workable to change to)

Besides labels, I also find levels confusing. The levels are actually what we now call the categories in the Categorical API (I even find labels a better name for levels). I find this confusing because of the usage of the level= keyword argument in many functions (so the idea that the MultiIndex has several 'levels').

@shoyer

This comment has been minimized.

Copy link
Member

commented Jun 14, 2016

+1 for the attribute names .categories and .codes. This is mostly internal/advanced API, so I don't think this will be too painful. The level= keyword argument can stay -- it means something entirely different.

@jorisvandenbossche

This comment has been minimized.

Copy link
Member

commented Jun 14, 2016

The level= keyword argument can stay -- it means something entirely different.

Yes certainly (I didn't want to imply that). That is the logical use for this name, and that is what makes the 'other' levels a confusing name.

BTW, I also think that the current repr is not the best one. As @shoyer put, the levels/labels are 'mostly internal', so then we shouldn't show them by default in repr? But will open a separate issue for that. EDIT -> #13480

@jorisvandenbossche

This comment has been minimized.

Copy link
Member

commented Jun 18, 2016

+1 for the attribute names .categories and .codes

codes is certainly OK, but I am personally a bit less enthusiastic about categories. I know implementation wise (codes/labels and categories/levels) they are very similar, but for a lot of users, I don't think they see it that way. So calling it categories may be confusing as well.
(although, since we are just arguing it is more internal-like, it maybe does not matter that much)

@jreback

This comment has been minimized.

Copy link
Contributor

commented Mar 23, 2017

anyone have appetite for changing .labels -> .codes? This is purely internal.

I think this would be a positive change here. (let's leave levels as is though).

@jreback jreback modified the milestones: 0.20.0, 0.21.0 Apr 12, 2017

@jreback

This comment has been minimized.

Copy link
Contributor

commented Sep 23, 2017

@chris-b1 do you want to do this one?

@jreback jreback modified the milestones: 0.21.0, 1.0 Oct 2, 2017

@topper-123 topper-123 referenced this issue Sep 20, 2018

Merged

ENH: better MultiIndex.__repr__ #22511

3 of 3 tasks complete
@topper-123

This comment has been minimized.

Copy link
Contributor

commented Oct 27, 2018

I would like to take this on, after #22511 is merged.

So I'll do: labels -> codes, but leave levels alone. Also I'll make related changes, e.g. set_labels -> set_codes etc.

@jorisvandenbossche

This comment has been minimized.

Copy link
Member

commented Nov 6, 2018

@topper-123 Very welcome to take this up! I think your summary is correct.

Only, I don't think it needs to depend on #22511, as it is (AFAIU) independent of the repr.
(of course, once one or the other is merged, you will need to deal with merge conflicts that might be annoying)

@topper-123

This comment has been minimized.

Copy link
Contributor

commented Nov 7, 2018

Hey @jorisvandenbossche, I'm thinking about the repr output and doc strings here. If it weren't for #22511, the repr output would change from MultiIndex(..., labels=...) to MultiIndex(..., codes=...), while #22511 changes the repr in a different way. So these two issues are connected through the repr output, through test touching the repr output and examples of MultiIndex usage in the doc string in various locations...

I had a working implementation of this a week ago (locally, was not pushed), so should be easy to update, but would prefer to get the repr issue settled before pushing.

@jorisvandenbossche

This comment has been minimized.

Copy link
Member

commented Nov 8, 2018

@topper-123 I think you could ignore the repr here (only change the internals and the method/attribute names, and leave the repr alone). Of course we need to change the repr eventually, but since you have the other PR, I would not worry about it here.
So I would recommend to already push your working implementation into a PR, so we can already start reviewing it (we can still later decide which PR to merge first).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.