You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
importpandasaspddf=pd.DataFrame(
[[1, 2, 3, 4], [4, 5, 6, 7], [7, 8, 9, 10]], columns=["D", "B", "C", "A"]
)
item=pd.DataFrame(
[[1, 2, 3, 4], [4, 5, 6, 7], [7, 8, 9, 10]],
columns=["A", "B", "C", "X"],
index=[
3, # 3 does not exist in the row key, so it will be skipped2,
1,
],
)
df.loc[[True, False, True], ["B", "E", "B"]] =item
Issue Description
Performing loc __setitem__ with pandas versions 2.2.0+ has faulty behavior when assigning a DataFrame to another DataFrame when inserting new columns with duplicated columns present. However, the column keys have to follow the pattern of [existing column(s), non-existent column(s), duplicated existing column(s)]. In the example provided, "B" exists but "E" does not. This can be reproduced with the following loc __setitem__ operations as well.
Also, note that in some cases the output cannot be printed out, and if printing is tried it'll result in the error I ran into below:
>>>df=pd.DataFrame(
... [[1, 2, 3, 4], [4, 5, 6, 7], [7, 8, 9, 10]], columns=["D", "B", "C", "A"]
... )
>>>dfDBCA0123414567278910>>>item=pd.DataFrame(
... [[1, 2, 3, 4], [4, 5, 6, 7], [7, 8, 9, 10]],
... columns=["A", "B", "C", "X"],
... index=[
... 3, # 3 does not exist in the row key, so it will be skipped
... 2,
... 1,
... ],
... )
>>>itemABCX3123424567178910>>>df.loc[[True, False, True], ["B", "E", "B"]] =item>>>df# ERROR!"""Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Users/vbudati/anaconda3/envs/pandas-dev-39/lib/python3.9/site-packages/pandas/core/frame.py", line 1203, in __repr__ return self.to_string(**repr_params) File "/Users/vbudati/anaconda3/envs/pandas-dev-39/lib/python3.9/site-packages/pandas/util/_decorators.py", line 333, in wrapper return func(*args, **kwargs) File "/Users/vbudati/anaconda3/envs/pandas-dev-39/lib/python3.9/site-packages/pandas/core/frame.py", line 1383, in to_string return fmt.DataFrameRenderer(formatter).to_string( File "/Users/vbudati/anaconda3/envs/pandas-dev-39/lib/python3.9/site-packages/pandas/io/formats/format.py", line 962, in to_string string = string_formatter.to_string() File "/Users/vbudati/anaconda3/envs/pandas-dev-39/lib/python3.9/site-packages/pandas/io/formats/string.py", line 29, in to_string text = self._get_string_representation() File "/Users/vbudati/anaconda3/envs/pandas-dev-39/lib/python3.9/site-packages/pandas/io/formats/string.py", line 44, in _get_string_representation strcols = self._get_strcols() File "/Users/vbudati/anaconda3/envs/pandas-dev-39/lib/python3.9/site-packages/pandas/io/formats/string.py", line 35, in _get_strcols strcols = self.fmt.get_strcols() File "/Users/vbudati/anaconda3/envs/pandas-dev-39/lib/python3.9/site-packages/pandas/io/formats/format.py", line 476, in get_strcols strcols = self._get_strcols_without_index() File "/Users/vbudati/anaconda3/envs/pandas-dev-39/lib/python3.9/site-packages/pandas/io/formats/format.py", line 740, in _get_strcols_without_index fmt_values = self.format_col(i) File "/Users/vbudati/anaconda3/envs/pandas-dev-39/lib/python3.9/site-packages/pandas/io/formats/format.py", line 754, in format_col return format_array( File "/Users/vbudati/anaconda3/envs/pandas-dev-39/lib/python3.9/site-packages/pandas/io/formats/format.py", line 1161, in format_array return fmt_obj.get_result() File "/Users/vbudati/anaconda3/envs/pandas-dev-39/lib/python3.9/site-packages/pandas/io/formats/format.py", line 1194, in get_result fmt_values = self._format_strings() File "/Users/vbudati/anaconda3/envs/pandas-dev-39/lib/python3.9/site-packages/pandas/io/formats/format.py", line 1250, in _format_strings & np.all(notna(vals), axis=tuple(range(1, len(vals.shape)))) File "/Users/vbudati/anaconda3/envs/pandas-dev-39/lib/python3.9/site-packages/pandas/core/dtypes/missing.py", line 457, in notna res = isna(obj) File "/Users/vbudati/anaconda3/envs/pandas-dev-39/lib/python3.9/site-packages/pandas/core/dtypes/missing.py", line 178, in isna return _isna(obj) File "/Users/vbudati/anaconda3/envs/pandas-dev-39/lib/python3.9/site-packages/pandas/core/dtypes/missing.py", line 207, in _isna return _isna_array(obj, inf_as_na=inf_as_na) File "/Users/vbudati/anaconda3/envs/pandas-dev-39/lib/python3.9/site-packages/pandas/core/dtypes/missing.py", line 300, in _isna_array result = np.isnan(values)TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''"""
Since I cannot directly print out what the new df is, I tried doing this via iloc __getitem__, row-by-row:
Notice column A -- originally it had values [3, 7, 10] (as seen from expected behavior). In the faulty result, all values in A are replaced by [].
Expected Behavior
# Expected behavior, from pandas versions 2.1.x and below, the result would be:DBBCAE01NaNNaN34NaN145.05.067NaN275.05.0910NaN# however, pandas versions 2.2.0+ error out.
Installed Versions
INSTALLED VERSIONS
------------------
commit : bdc79c1
python : 3.9.18.final.0
python-bits : 64
OS : Darwin
OS-release : 23.4.0
Version : Darwin Kernel Version 23.4.0: Fri Mar 15 00:12:49 PDT 2024; root:xnu-10063.101.17~1/RELEASE_ARM64_T6020
machine : arm64
processor : arm
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.UTF-8
sfc-gh-vbudati
changed the title
BUG: loc __setitem__ has incorrect behavior when assigned a DataFrame and new columns are added.
BUG: loc __setitem__ has incorrect behavior when assigned a DataFrame and new columns and duplicated columns are added.
Apr 18, 2024
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
Performing
loc __setitem__
with pandas versions 2.2.0+ has faulty behavior when assigning a DataFrame to another DataFrame when inserting new columns with duplicated columns present. However, the column keys have to follow the pattern of[existing column(s), non-existent column(s), duplicated existing column(s)]
. In the example provided, "B" exists but "E" does not. This can be reproduced with the followingloc __setitem__
operations as well.Also, note that in some cases the output cannot be printed out, and if printing is tried it'll result in the error I ran into below:
Since I cannot directly print out what the new
df
is, I tried doing this viailoc __getitem__
, row-by-row:The expected result is:
Notice column A -- originally it had values [3, 7, 10] (as seen from expected behavior). In the faulty result, all values in A are replaced by
[]
.Expected Behavior
Installed Versions
pandas : 2.2.1
numpy : 1.26.0
pytz : 2023.3.post1
dateutil : 2.8.2
setuptools : 68.0.0
pip : 23.3.1
Cython : None
pytest : 7.4.2
hypothesis : None
sphinx : 5.0.2
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.18.1
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.3
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.8.4
numba : None
numexpr : 2.8.4
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 10.0.1
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : 1.13.0
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : None
tzdata : 2023.3
qtpy : None
pyqt5 : None
The text was updated successfully, but these errors were encountered: