Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: MultiIndex.get_level_values should have label only / level only mode? #10461

Open
sinhrks opened this issue Jun 28, 2015 · 9 comments
Open

Comments

@sinhrks
Copy link
Member

sinhrks commented Jun 28, 2015

Currently, MultiIndex.get_level_values handles both names and levels, it return different levels depending on its name (like .ix). Adding an option (or separate method) like .loc and .iloc to avoid any unexpected results?

idx = pd.MultiIndex.from_tuples([(1, 'A'), (2, 'B')])
idx.get_level_values(0)
# Int64Index([1, 2], dtype='int64')

idx = idx.set_names([1, 0])
idx.get_level_values(0)
# Index([u'A', u'B'], dtype='object', name=0)
@sinhrks sinhrks added Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex labels Jun 28, 2015
@sinhrks sinhrks added this to the 0.17.0 milestone Jun 28, 2015
@shoyer
Copy link
Member

shoyer commented Jul 2, 2015

Maybe idx.get_level_values and idx.get_ilevel_values? Or have we already locked in the mixed functionality of get_level_values?

@sinhrks
Copy link
Member Author

sinhrks commented Jul 4, 2015

I think we can't change get_level_values. Options are listed below.

I think users prefer levels to names to avoid ambiguous, cos names can be duplicated or None. Thus option 1 can cover most cases.

  1. Add get_ilevel_values which accepts level only. No additional for names.
  2. Add get_ilevel_values which accepts level only and get_llevel_values (?) which accepts names only.
  3. Add allow_level (?) and allow_names (?) options to get_level_values with default True.

@jreback
Copy link
Contributor

jreback commented Jul 7, 2015

already there is inference to figure out if they mean positional (if no integers are in the names) or label based - eg like ix

but I have never seen this come up as a practical issue

I guess get_ilevel_values would be fine

ironically in .query we allow

ilevel as a search term so this would be consistent at least

@jreback
Copy link
Contributor

jreback commented Jul 7, 2015

@jreback
Copy link
Contributor

jreback commented Jul 7, 2015

just throwing it out there - could allow this:

df.index.ix(0) #current
df.index.loc('foo) #label only
df.index.iloc(0) #positional only

or maybe

df.index.ix[0]

further this might be useful

df.index.ix[0,1] -> grab first 2 levels

@sinhrks
Copy link
Member Author

sinhrks commented Jul 7, 2015

Right. Adding ix stacks sounds consistent and reasonable. Let me try.

@jorisvandenbossche
Copy link
Member

I am -1 on adding ix like above. If we add ix (or loc/iloc to the index), I think the usage should be equivalent to as if you would have a column with the name 'index' and you are indexing frame.index. If it has other typical behaviour, we should just call it differently I think.

@jreback jreback modified the milestones: Next Major Release, 0.17.0 Aug 20, 2015
@Dr-Irv
Copy link
Contributor

Dr-Irv commented Feb 18, 2017

This relates to #12223 and #15262 which are bugs due to allowing index names to be integers. For those, the issue is the internal use of get_level_values() and the ambiguity of the argument if one passes an integer, and some of the names of the index are integers. I have two proposals that could fix this:

  1. Modify the signature of get_level_values() to include an optional argument try_names_first=True that preserves current behavior. Modify internal calls in pandas that use get_level_values() to pass try_names_first=False when the caller knows it is passing the level number instead of a name.
  2. Create a new internal method called _get_level_values_by_level_number() that is not in the published API, but is used internally in pandas code. Modify internal calls in pandas that use get_level_values() to call get_level_values_by_level_number() when the caller knows it is passing the level number instead of a name.

I think I am in favor of (2) as I actually think it gives a very small performance boost, but would like to hear the opinions of @jreback, @jorisvandenbossche and @sinhrks before I embark on making those changes.

@jreback
Copy link
Contributor

jreback commented Feb 18, 2017

  1. is fine

@toobaz toobaz added Index Related to the Index class or subclasses and removed Indexing Related to indexing on series/frames, not to indexes themselves labels Jun 28, 2019
@mroeschke mroeschke removed the Index Related to the Index class or subclasses label Apr 18, 2021
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants