df.index.map with different size fails for Pandas > 0.22 #24800

RutgerK · 2019-01-16T14:06:08Z

Code Sample

import pandas as pd

df = pd.DataFrame({'a': [0,1,2,3],
                   'b': ['a_1_bar', 'a_2_bar', 'b_1_bar', 'b_2_bar'],
                   'c': list('defg')})

df = df.set_index(['b','c'])
df.index.map(lambda x: tuple(x[0].split('_')))

Problem description

The code above works in Pandas 0.22 and lower, but fails since 0.23. This seems to be due to the fact that Pandas wants to preserve the names of the levels in the old index.

When the amount of levels in the new index is different compared to the old one, this fails with a ValueError because of this mismatch.

ValueError: Length of names must match number of levels in MultiIndex.

Expected Output

Older version of pandas returned a new MultiIndex, without names for the levels.

MultiIndex(levels=[['a', 'b'], ['1', '2'], ['bar']],
           labels=[[0, 0, 1, 1], [0, 1, 0, 1], [0, 0, 0, 0]])

I'm not sure whether this change was deliberate. If not, a workaround might be to only preserve the names if the new amount of levels matches the old one. And otherwise disregard the names, resulting in similar behavior as in the olders Pandas version.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.6.6.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 158 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.23.4
pytest: 4.0.2
pip: 18.1
setuptools: 40.5.0
Cython: None
numpy: 1.15.4
scipy: 1.1.0
pyarrow: None
xarray: 0.11.2
IPython: 7.1.1
sphinx: 1.8.2
patsy: None
dateutil: 2.7.5
pytz: 2018.7
blosc: None
bottleneck: None
tables: None
numexpr: 2.6.8
feather: None
matplotlib: 3.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.6.3
html5lib: 1.0.1
sqlalchemy: 1.2.14
pymysql: None
psycopg2: 2.7.6.1 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

WillAyd · 2019-01-16T16:23:37Z

Hmm I think this should work but cc @toobaz for thoughts

toobaz · 2019-01-17T07:26:27Z

I'm pretty sure not only that the OP code should work, but also that even when the number of levels coincide, as in (notice the different column "b"):

In [2]: df = pd.DataFrame({'a': [0,1,2,3],
   ...:                    'b': ['a_1', 'a_2', 'b_1', 'b_2'],
   ...:                    'c': list('defg')})
   ...:                    

In [3]: df = df.set_index(['b', 'c'])

In [4]: df.index.map(lambda x : tuple(x[0].split('_')))
Out[4]: 
MultiIndex(levels=[['a', 'b'], ['1', '2']],
           codes=[[0, 0, 1, 1], [0, 1, 0, 1]],
           names=['b', 'c'])

... it is a mistake to reuse the names, because the output of the lambda does not (in general) have anything to do with the input.

So the question becomes "is there any case in which it makes sense to reuse the names of a MultiIndex in a call to map?" I think the answer is "no", and if I am right, we just need to suppress this behavior here:

https://github.com/pandas-dev/pandas/blob/master/pandas/core/indexes/base.py#L4441

jorisvandenbossche · 2019-01-18T09:39:38Z

So the question becomes "is there any case in which it makes sense to reuse the names of a MultiIndex in a call to map?" I think the answer is "no",

Note that this is more general to MultiIndex.map. We also preserve the name for Index.map, Series.map, Series.apply, ..
So at least from a consistency point of view, trying to preserve the names for MultiIndex.map as well might make sense.

toobaz · 2019-01-18T11:12:39Z

Note that this is more general to MultiIndex.map. We also preserve the name for Index.map, Series.map, Series.apply, ..

Sorry, my comment was indeed a bit vague, but I was thinking to levels names (of the original MultiIndex), not just the name attribute. The best analogy I can come with is

In [2]: df = pd.DataFrame([[1, 2], [3, 4]], columns=['a', 'b'])

In [3]: df.apply(lambda x : pd.Series([x[0], -x[0]]), axis=1)
Out[3]: 
   0  1
0  1 -1
1  3 -3

which does not preserve column names.

gsaurabhr · 2020-10-28T22:38:54Z

Still getting this error when the number of levels in original and returned multiindex is different. Any solutions?

jorisvandenbossche · 2020-11-03T14:27:37Z

Contributions to fix this are certainly welcome!

WillAyd added the MultiIndex label Jan 16, 2019

RutgerK changed the title ~~df.index.map with difference size fails for Pandas > 0.22~~ df.index.map with different size fails for Pandas > 0.22 Jan 17, 2019

jorisvandenbossche added the Bug label Nov 3, 2020

jorisvandenbossche added this to the Contributions Welcome milestone Nov 3, 2020

simonjayhawkins added the Regression Functionality that used to work in a prior pandas version label Jun 8, 2022

simonjayhawkins mentioned this issue Jun 8, 2022

BUG: Cannot map MultiIndex to more levels #47173

Closed

3 tasks

simonjayhawkins modified the milestones: Contributions Welcome, 1.5 Jun 8, 2022

simonjayhawkins mentioned this issue Jun 8, 2022

BUG: Cannot map MultiIndex to more levels #47212

Closed

4 tasks

mroeschke removed this from the 1.5 milestone Aug 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

df.index.map with different size fails for Pandas > 0.22 #24800

df.index.map with different size fails for Pandas > 0.22 #24800

RutgerK commented Jan 16, 2019

INSTALLED VERSIONS

WillAyd commented Jan 16, 2019

toobaz commented Jan 17, 2019

jorisvandenbossche commented Jan 18, 2019

toobaz commented Jan 18, 2019

gsaurabhr commented Oct 28, 2020

jorisvandenbossche commented Nov 3, 2020

df.index.map with different size fails for Pandas > 0.22 #24800

df.index.map with different size fails for Pandas > 0.22 #24800

Comments

RutgerK commented Jan 16, 2019

Code Sample

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

WillAyd commented Jan 16, 2019

toobaz commented Jan 17, 2019

jorisvandenbossche commented Jan 18, 2019

toobaz commented Jan 18, 2019

gsaurabhr commented Oct 28, 2020

jorisvandenbossche commented Nov 3, 2020

Output of `pd.show_versions()`