New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
df.index.map with different size fails for Pandas > 0.22 #24800
Comments
Hmm I think this should work but cc @toobaz for thoughts |
I'm pretty sure not only that the OP code should work, but also that even when the number of levels coincide, as in (notice the different column In [2]: df = pd.DataFrame({'a': [0,1,2,3],
...: 'b': ['a_1', 'a_2', 'b_1', 'b_2'],
...: 'c': list('defg')})
...:
In [3]: df = df.set_index(['b', 'c'])
In [4]: df.index.map(lambda x : tuple(x[0].split('_')))
Out[4]:
MultiIndex(levels=[['a', 'b'], ['1', '2']],
codes=[[0, 0, 1, 1], [0, 1, 0, 1]],
names=['b', 'c']) ... it is a mistake to reuse the names, because the output of the lambda does not (in general) have anything to do with the input. So the question becomes "is there any case in which it makes sense to reuse the names of a https://github.com/pandas-dev/pandas/blob/master/pandas/core/indexes/base.py#L4441 |
Note that this is more general to MultiIndex.map. We also preserve the name for Index.map, Series.map, Series.apply, .. |
Sorry, my comment was indeed a bit vague, but I was thinking to levels names (of the original In [2]: df = pd.DataFrame([[1, 2], [3, 4]], columns=['a', 'b'])
In [3]: df.apply(lambda x : pd.Series([x[0], -x[0]]), axis=1)
Out[3]:
0 1
0 1 -1
1 3 -3 which does not preserve column names. |
Still getting this error when the number of levels in original and returned multiindex is different. Any solutions? |
Contributions to fix this are certainly welcome! |
Code Sample
Problem description
The code above works in Pandas 0.22 and lower, but fails since 0.23. This seems to be due to the fact that Pandas wants to preserve the names of the levels in the old index.
When the amount of levels in the new index is different compared to the old one, this fails with a
ValueError
because of this mismatch.ValueError: Length of names must match number of levels in MultiIndex.
Expected Output
Older version of pandas returned a new MultiIndex, without names for the levels.
I'm not sure whether this change was deliberate. If not, a workaround might be to only preserve the names if the new amount of levels matches the old one. And otherwise disregard the names, resulting in similar behavior as in the olders Pandas version.
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.6.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 158 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.23.4
pytest: 4.0.2
pip: 18.1
setuptools: 40.5.0
Cython: None
numpy: 1.15.4
scipy: 1.1.0
pyarrow: None
xarray: 0.11.2
IPython: 7.1.1
sphinx: 1.8.2
patsy: None
dateutil: 2.7.5
pytz: 2018.7
blosc: None
bottleneck: None
tables: None
numexpr: 2.6.8
feather: None
matplotlib: 3.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.6.3
html5lib: 1.0.1
sqlalchemy: 1.2.14
pymysql: None
psycopg2: 2.7.6.1 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: