BUG: Passing multiple levels to stack when having mixed integer/string level names #8584

jorisvandenbossche · 2014-10-19T20:57:14Z

Related #7770

Using the example of the docs (http://pandas.pydata.org/pandas-docs/stable/reshaping.html#multiple-levels):

columns = MultiIndex.from_tuples([('A', 'cat', 'long'), ('B', 'cat', 'long'), ('A', 'dog', 'short'), ('B', 'dog', 'short')], 
                                 names=['exp', 'animal', 'hair_length'])
df = DataFrame(randn(4, 4), columns=columns)

CONTEXT: df.stack(level=['animal', 'hair_length']) and df.stack(level=[1, 2]) are equivalent (feature introduced in #7770). Mixing integers location and string names (eg df.stack(level=['animal', 2])) gives a ValueError.

But if you have level names of mixed types, some different (and wrong things) happen:

With a total different number, it still works as it should:

df.columns.names = ['exp', 'animal', 10]
df.stack(level=['animal', 10])

With the number 1, it treats the 1 as a level number instead of the level name, leading to a wrong result (two times the same level unstacked):

In [42]: df.columns.names = ['exp', 'animal', 1]

In [43]: df.stack(level=['animal', 1])
Out[43]: 
exp                     A         B
  animal animal                    
0 cat    cat    -1.006065  0.401136
  dog    dog     0.526734 -1.753478
1 cat    cat    -0.718401 -0.400386
  dog    dog    -0.951336 -1.074323
2 cat    cat     1.119843 -0.606982
  dog    dog     0.371467 -1.837341
3 cat    cat    -1.467968  1.114524
  dog    dog    -0.040112  0.240026

With the number 0, it gives a strange error:

In [46]: df.columns.names = ['exp', 'animal', 0]

In [47]: df.stack(level=['animal', 0])
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-47-4e9507e0708f> in <module>()
----> 1 df.stack(level=['animal', 0])

/home/joris/scipy/pandas/pandas/core/frame.pyc in stack(self, level, dropna)
3390 
3391         if isinstance(level, (tuple, list)):
-> 3392             return stack_multiple(self, level, dropna=dropna)
3393         else:
3394             return stack(self, level, dropna=dropna)

....

/home/joris/scipy/pandas/pandas/core/index.pyc in _partial_tup_index(self, tup, side)
3820             raise KeyError('Key length (%d) was greater than MultiIndex'
3821                            ' lexsort depth (%d)' %
-> 3822                            (len(tup), self.lexsort_depth))
3823 
3824         n = len(tup)

KeyError: 'Key length (2) was greater than MultiIndex lexsort depth (0)'

The text was updated successfully, but these errors were encountered:

jreback · 2014-10-19T21:22:25Z

hmm, so this is an api issue then? I think we should be very strict on this as we cannot disambiguate easy (e.g. .ix/.loc issues).

Integers must be treated always as positional and don't allow mixed integers / names
if integer-like level names then can be passed as strings (and not actual integers; not sure if this will break anything)

jorisvandenbossche · 2014-10-19T21:37:55Z

I think this are examples that we can disambiguate.

I understood from that PR that the new logic was:

if all entries (strings or ints) are found in the level names -> use as level names
if not all found:
- if all integers -> use as level locations
- if not all integers -> raise ValueError

So following that, these case should / can work I think (and it does work in some cases, so at least it is a bit inconstent).

And if that logic is correct (it is the logic we want to follow), that should maybe be also mentioned in the docstring.

onesandzeroes · 2014-10-20T10:19:00Z

I agree that these two should work, since all the levels are in the level names:

In [42]: df.columns.names = ['exp', 'animal', 1]
In [43]: df.stack(level=['animal', 1])

And

In [46]: df.columns.names = ['exp', 'animal', 0]
In [47]: df.stack(level=['animal', 0])

I had a look tonight and I think I have a fix for both cases, we just need to be a bit more careful about when we're dealing with level names and when we're dealing with level numbers. If I can get these cases working, then I think the logic you've outlined (which was the original intent of the PR) still holds. Probably a good idea to add it to the docstring though.

onesandzeroes · 2014-10-20T22:51:53Z

The simplest solution I came up with for this involved adding an as_level_numbers=False flag to MultiIndex.swaplevel(), so I could use as_level_numbers=True to signal that the levels being passed were already level numbers, skipping the _get_level_number() step.

Would this be OK to add to the API, or should I add this behaviour in a new method like MultiIndex._swaplevel_using_level_numbers()? Seems like it could be somewhat useful if you ever need to force swaplevel to deal with the passed levels as numbers, but it might break consistency.

jreback · 2014-10-20T23:02:01Z

@onesandzeroes you can make an internal function (leading '_') if you need, but this shouldn't be exposed

BUG: Passing multiple levels to stack when having mixed integer/string level names (#8584)

jreback added API Design Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Oct 19, 2014

jreback added this to the 0.15.1 milestone Oct 19, 2014

onesandzeroes mentioned this issue Nov 14, 2014

BUG: Passing multiple levels to stack when having mixed integer/string level names (#8584) #8809

Merged

jorisvandenbossche closed this as completed in #8809 Nov 17, 2014

jorisvandenbossche added a commit that referenced this issue Nov 17, 2014

Merge pull request #8809 from onesandzeroes/stackfix

0f899f4

BUG: Passing multiple levels to stack when having mixed integer/string level names (#8584)

jorisvandenbossche mentioned this issue Jun 29, 2018

API: unclear what integer level name references: name or position? #21677

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Passing multiple levels to stack when having mixed integer/string level names #8584

BUG: Passing multiple levels to stack when having mixed integer/string level names #8584

jorisvandenbossche commented Oct 19, 2014

jreback commented Oct 19, 2014

jorisvandenbossche commented Oct 19, 2014

onesandzeroes commented Oct 20, 2014

onesandzeroes commented Oct 20, 2014

jreback commented Oct 20, 2014

BUG: Passing multiple levels to stack when having mixed integer/string level names #8584

BUG: Passing multiple levels to stack when having mixed integer/string level names #8584

Comments

jorisvandenbossche commented Oct 19, 2014

jreback commented Oct 19, 2014

jorisvandenbossche commented Oct 19, 2014

onesandzeroes commented Oct 20, 2014

onesandzeroes commented Oct 20, 2014

jreback commented Oct 20, 2014