Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REGR: Assigning multiple new columns with loc fails when index is a MultiIndex #39147

Closed
2 of 3 tasks
gdiepen opened this issue Jan 13, 2021 · 1 comment · Fixed by #39161
Closed
2 of 3 tasks

REGR: Assigning multiple new columns with loc fails when index is a MultiIndex #39147

gdiepen opened this issue Jan 13, 2021 · 1 comment · Fixed by #39161
Labels
Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@gdiepen
Copy link

gdiepen commented Jan 13, 2021

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

import pandas as pd

rows = [(1,2), (3,4)]
initial_cols = ["a", "b"]

df = pd.DataFrame(42, index=pd.MultiIndex.from_tuples(rows), columns=initial_cols)

new_cols = ["c", "d"]
df.loc[:, new_cols] = None

Problem description

When running the above code in pandas 1.1.5, you will get the following error:

  File "foo.py", line 11, in <module>
    df.loc[:, new_cols] = None
  File "/home/guido/anaconda3/envs/pandas_115_bug/lib/python3.7/site-packages/pandas/core/indexing.py", line 666, in __setitem__
    indexer = self._get_setitem_indexer(key)
  File "/home/guido/anaconda3/envs/pandas_115_bug/lib/python3.7/site-packages/pandas/core/indexing.py", line 609, in _get_setitem_indexer
    return self._convert_tuple(key, is_setter=True)
  File "/home/guido/anaconda3/envs/pandas_115_bug/lib/python3.7/site-packages/pandas/core/indexing.py", line 734, in _convert_tuple
    idx = self._convert_to_indexer(k, axis=i, is_setter=is_setter)
  File "/home/guido/anaconda3/envs/pandas_115_bug/lib/python3.7/site-packages/pandas/core/indexing.py", line 1198, in _convert_to_indexer
    return self._get_listlike_indexer(key, axis, raise_missing=True)[1]
  File "/home/guido/anaconda3/envs/pandas_115_bug/lib/python3.7/site-packages/pandas/core/indexing.py", line 1254, in _get_listlike_indexer
    self._validate_read_indexer(keyarr, indexer, axis, raise_missing=raise_missing)
  File "/home/guido/anaconda3/envs/pandas_115_bug/lib/python3.7/site-packages/pandas/core/indexing.py", line 1298, in _validate_read_indexer
    raise KeyError(f"None of [{key}] are in the [{axis_name}]")
KeyError: "None of [Index(['c', 'd'], dtype='object')] are in the [columns]"

Till version 1.1.4, the above code would result in the dataframe being extended with the two new columns c and d. With the release of pandas 1.1.5, the code results in the above error. It might be related to the fix for #37711

Single column assignments do still work, i.e. replacing the single assignment line with the following two separate assignment lines does work:

df.loc[:, "c"] = None
df.loc[:, "d"] = None

The problem also only occurs whenever the index is a multiindex. If the index is not a multiindex, the error will not occur.

Expected Output

Expected output is that no error occurs and that the two columns c and d are added with values None for all rows.

Output of pd.show_versions()

INSTALLED VERSIONS

commit : b5958ee
python : 3.7.9.final.0
python-bits : 64
OS : Linux
OS-release : 5.8.0-33-generic
Version : #36-Ubuntu SMP Wed Dec 9 09:14:40 UTC 2020
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : en_US.UTF-8
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.1.5
numpy : 1.19.2
pytz : 2020.5
dateutil : 2.8.1
pip : 20.3.3
setuptools : 51.1.2.post20210112
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None

@gdiepen gdiepen added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 13, 2021
@phofl phofl added Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex Regression Functionality that used to work in a prior pandas version and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 14, 2021
@phofl phofl added this to the 1.2.1 milestone Jan 14, 2021
@simonjayhawkins
Copy link
Member

It might be related to the fix for #37711

#37787

(just adding the reference to the PR since sometimes easier to detect duplicate issues from the links from the PR)

@simonjayhawkins simonjayhawkins changed the title BUG: Assigning multiple new columns with loc fails when index is a MultiIndex REGR: Assigning multiple new columns with loc fails when index is a MultiIndex Jan 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants