Skip to content

BUG: Indexing by frozendict not allowed #55308

@caneff

Description

@caneff

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
from frozendict import frozendict
from immutabledict import immutabledict

set_col = frozenset({1})
df_set = pd.DataFrame({set_col: [1, 2, 3]}) # OK
df_set[set_col] # OK

dict_col = frozendict({'x': 3})

df_dict = pd.DataFrame({dict_col: [1, 2, 3]}) # OK
df_dict[[dict_col]] # OK, but returns as a DataFrame as expected
df_dict[dict_col] # TypeError
df_dict[df_dict.columns[0]] # TypeError
df_dict[[dict_col]].iloc[:, 0] # Ugly workaround but works to get the column as a series.

dict_col_2 = immutabledict({'x':3})
df_dict_2 = pd.DataFrame({dict_col_2: [1, 2, 3]}) # OK
df_dict_2[dict_col_2] # OK

Issue Description

This is something I encountered in our code base.

In pandas 1 the df_dict example was OK because it is still hashable and so the lookup would succeed.

In pandas 2 we get TypeError: Passing a dict as an indexer is not supported. Use a list instead.

But we don't get the same issue with a frozenset or immutabledict, because they aren't sets or dicts respectively. I think this is inconsistent. If we are allowed to make columns with it we should be allowed to index with it.

As an overarching rule we should be able to do df[df.columns[0]] as a matter of course and not expect it to error.

I don't think any of them should work, but if we wanted it to work for a frozen subclass of dict, I think the check here is wrong, and needs to do something more than just isinstance(key, dict). Needs to check for hashability and if it is hashable whether it is an element of the column index.

Expected Behavior

Frozendicts should not be allowed to be columns. Neither should frozensets or immutabledicts or any of them.

Or df_dict[dict_col] should return the column as a series just as df_set[set_col] does.

Installed Versions

INSTALLED VERSIONS

commit : e86ed37
python : 3.11.5.final.0
python-bits : 64
OS : Linux
OS-release : 6.3.11-1rodete2-amd64
Version : #1 SMP PREEMPT_DYNAMIC Debian 6.3.11-1rodete2 (2023-08-24)
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 2.1.1
numpy : 1.24.4
pytz : 2023.3.post1
dateutil : 2.8.2
setuptools : 68.2.2
pip : 23.2.1
Cython : None
pytest : 7.4.2
hypothesis : 6.84.3
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : 1.4.6
psycopg2 : 2.9.7
jinja2 : None
IPython : 8.15.0
pandas_datareader : None
bs4 : None
bottleneck : None
dataframe-api-compat: None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 11.0.0
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.11.2
sqlalchemy : 1.4.49
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : 0.21.0
tzdata : 2023.3
qtpy : None
pyqt5 : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugIndexingRelated to indexing on series/frames, not to indexes themselves

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions