Skip to content

BUG: Unexpected behavior when a DataFrame has MultiIndex columns and non-unique index #55126

@jonmooser

Description

@jonmooser

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import numpy as np
import pandas as pd


df = pd.DataFrame(
    np.random.random((5, 3)),
    index=[1, 2, 2, 3, 4],
    columns=pd.MultiIndex.from_product([np.arange(3), ["x"]]),
)

print(df)


"""
This works, but is arguably a bug.
It returns a Series of two values, but the contract of the
DataFrame.at, as I understand, is to return a single value
"""
print(df.at[2, 1])
print()

"""
This works and returns a scalar value as expected
"""
print(df.loc[1, (1, "x")])
print()

"""
Fails with 
ValueError: Invalid call for scalar access (getting)!
"""
print(df.at[1, (1, "x")])

Issue Description

When a DataFrame has one axis with non-unique values, and one axis with a multi-index, the DataFrame.at method throws a confusing error.
Even if this is strictly expected, the error should be more clear (And maybe KeyError instead of ValueError)

It feels like a bug because

  1. Calling loc with the same parameters works and
  2. There is no ambiguity in the values passed
    (there is only one row with index=1)
Traceback (most recent call last):
  File "/Users/jonathan/Dropbox/py/debug_tests/pandas_mi_bug.py", line 33, in <module>
    print(df.at[1, (1, "x")])
          ~~~~~^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pandas/core/indexing.py", line 2485, in __getitem__
    raise ValueError("Invalid call for scalar access (getting)!")
ValueError: Invalid call for scalar access (getting)!

Expected Behavior

In the example above,

print(df.at[1, (1, "x")])

Should return a single value at that location

If it should throw an Error, make it clear that the duplicates in the index is the problem (concat() does this)
Also note:

  • If the index contains unique values, it will work as expected
  • It the columns are not a MultiIndex, it will work as expected
    In other words both axes need to have these properties for the bug to appear

Installed Versions

/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_distutils_hack/__init__.py:33: UserWarning: Setuptools is replacing distutils. warnings.warn("Setuptools is replacing distutils.")

INSTALLED VERSIONS

commit : ba1cccd
python : 3.11.0.final.0
python-bits : 64
OS : Darwin
OS-release : 21.4.0
Version : Darwin Kernel Version 21.4.0: Fri Mar 18 00:46:32 PDT 2022; root:xnu-8020.101.4~15/RELEASE_ARM64_T6000
machine : arm64
processor : arm
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 2.1.0
numpy : 1.23.4
pytz : 2022.5
dateutil : 2.8.2
setuptools : 65.5.0
pip : 23.2.1
Cython : None
pytest : 7.2.0
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.12.0
pandas_datareader : None
bs4 : 4.12.2
bottleneck : None
dataframe-api-compat: None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : None
tzdata : 2023.3
qtpy : None
pyqt5 : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugIndexingRelated to indexing on series/frames, not to indexes themselvesNeeds DiscussionRequires discussion from core team before further action

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions