Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify that MultiIndex.set_levels() interprets passed values as new components of the .levels attribute #28294

Closed
pepicello opened this issue Sep 5, 2019 · 6 comments · Fixed by #29143

Comments

@pepicello
Copy link
Contributor

@pepicello pepicello commented Sep 5, 2019

Code Sample, a copy-pastable example if possible

import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.rand(3, 3), columns=pd.MultiIndex.from_tuples([(1, 2), (1, 4), (5, 6)]))
df.columns = df.columns.set_levels([1, 1, 3], level=0)

Which raises a ValueError:

ValueError: Level values must be unique: [1, 1, 3] on level 0

Problem description

Despite a dataframe with non-unique MultiIndex can be created, they cannot be set using set_levels(). Is this behaviour expected? I believe non-unique level values were not allowed for a period of time (#18882), but then they were allowed again (#21423), so I am not sure which is the current convention.

Expected Output

          1                   3
          2         4         6
0  0.317669  0.329142  0.056725
1  0.969472  0.340309  0.135204
2  0.242408  0.934748  0.683186

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.7.4.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 158 Stepping 10, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None

pandas : 0.25.1
numpy : 1.16.4
pytz : 2019.2
dateutil : 2.8.0
pip : 19.2.2
setuptools : 41.0.1
Cython : 0.29.13
pytest : 5.1.2
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : 1.1.8
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.10.1
IPython : 7.8.0
pandas_datareader: None
bs4 : None
bottleneck : 1.2.1
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.1.1
numexpr : None
odfpy : None
openpyxl : 2.6.2
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.3.1
sqlalchemy : 1.3.7
tables : None
xarray : None
xlrd : 1.2.0
xlwt : None
xlsxwriter : 1.1.8

@WillAyd

This comment has been minimized.

Copy link
Member

@WillAyd WillAyd commented Sep 5, 2019

@WillAyd WillAyd added the MultiIndex label Sep 5, 2019
@toobaz

This comment has been minimized.

Copy link
Member

@toobaz toobaz commented Sep 5, 2019

I believe non-unique level values were not allowed for a period of time (#18882), but then they were allowed again (#21423), so I am not sure which is the current convention.

Those issues were about non-unique level names, yours about non-unique level values. And in this second case the story is simple: yes, they are and always were allowed.

But I think you've misunderstood how set_levels works (possibly because the docs could and should be improved): you're not assumed to provide a new value for each row, but rather for each code describing values in the level. In other words, you are changing the levels attribute. And there having duplicates does not make a lot of sense. This said, you can do it if you really want, by passing verify_integrity=False.

In short, this function is probably not what you're looking for. And unfortunately, there are no obvious alternative to do what you want to do.
The simplest way is to recreate the index.
The most efficient way might be to use set_levels in combination with set_codes (you can obtain levels and codes from a simple MultiIndex with 1 level).

@toobaz toobaz closed this Sep 5, 2019
@WillAyd

This comment has been minimized.

Copy link
Member

@WillAyd WillAyd commented Sep 5, 2019

Should we at least keep this open to improve the docs then?

@toobaz

This comment has been minimized.

Copy link
Member

@toobaz toobaz commented Sep 5, 2019

Should we at least keep this open to improve the docs then?

Makes sense!

@toobaz toobaz reopened this Sep 5, 2019
@toobaz toobaz added the Docs label Sep 5, 2019
@toobaz toobaz changed the title MultiIndex.set_levels() requires unique level values Clarify that MultiIndex.set_levels() interpret passed values as new components of the .levels attribute Sep 5, 2019
@toobaz toobaz changed the title Clarify that MultiIndex.set_levels() interpret passed values as new components of the .levels attribute Clarify that MultiIndex.set_levels() interprets passed values as new components of the .levels attribute Sep 5, 2019
@WillAyd WillAyd added this to the Contributions Welcome milestone Sep 5, 2019
@hweecat

This comment has been minimized.

Copy link
Contributor

@hweecat hweecat commented Sep 20, 2019

I could give a try at improving the docs for MultiIndex.set_levels(), if that's okay.

@WillAyd

This comment has been minimized.

Copy link
Member

@WillAyd WillAyd commented Sep 20, 2019

@hweecat sure!

hweecat added a commit to hweecat/pandas that referenced this issue Oct 5, 2019
hweecat added a commit to hweecat/pandas that referenced this issue Oct 5, 2019
rectify for failing tests

DOC: added docs for MultiIndex.set_levels (pandas-dev#28294)
hweecat added a commit to hweecat/pandas that referenced this issue Oct 5, 2019
hweecat added a commit to hweecat/pandas that referenced this issue Oct 6, 2019
hweecat added a commit to hweecat/pandas that referenced this issue Oct 21, 2019
datapythonista added a commit that referenced this issue Jan 3, 2020
…s() interprets passed values as new components of the .levels attribute (#28294) (#29143)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.