-
-
Notifications
You must be signed in to change notification settings - Fork 19.2k
Description
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
Code Sample, a copy-pastable example
import pandas as pd
# Create mock dataframe
df = pd.DataFrame({
'a': [1, 2, 3, 4]
})
m = df['a'].eq(2)
new_df1 = df[m]
# <weakref at 0x00000213E01C5090; to 'DataFrame' at 0x00000213DE9C6FD0>
print(new_df1._is_copy)
# SettingWithCopyWarning (expected)
new_df1['b'] = 1
new_df2 = df.loc[m]
# <weakref at 0x00000213E01C5090; to 'DataFrame' at 0x00000213DE9C6FD0>
print(new_df2._is_copy)
# SettingWithCopyWarning (unexpected)
new_df2['c'] = 1
new_df3 = df.loc[m, :]
# <weakref at 0x00000213E01C5090; to 'DataFrame' at 0x00000213DE9C6FD0>
print(new_df3._is_copy)
# SettingWithCopyWarning (unexpected)
new_df3['d'] = 1
new_df4 = df.loc[m, df.columns]
# None
print(new_df4._is_copy)
# No SettingWithCopyWarning (expected?)
new_df4['e'] = 1
Problem description
I expect that df[m]
would produce a weakref, but I don't understand why I'm getting a weakref for the loc
options where columns are not explicitly defined.
The issue being that:
new_df2 = df.loc[m]
new_df2['b'] = 1
and
new_df3 = df.loc[m, :]
new_df3['c'] = 1
Both warn:
SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self._set_item(key, value)
I couldn't find any documentation saying that columns must be explicitly defined when using loc
to not produce a copy, although all of the examples in the docs on Why does assignment fail when using chained indexing? do explicitly declare the column or columns.
Expected Output
I would expect not to get a SettingWithCopyWarning
especially since :
works when slicing the Index:
filtered_df = df.loc[:, ['a']]
print(filtered_df._is_copy) # None
filtered_df['b'] = 1 # No Warning
Output of pd.show_versions()
Confirmed on Windows
INSTALLED VERSIONS
commit : f00ed8f
python : 3.9.6.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19042
machine : AMD64
processor : Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_United States.1252
pandas : 1.3.0
numpy : 1.21.1
pytz : 2021.1
dateutil : 2.8.1
pip : None
setuptools : None
Cython : None
pytest : 6.2.4
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.6.3
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.0.1
IPython : 7.22.0
pandas_datareader: 0.9.0
bs4 : 4.9.3
bottleneck : None
fsspec : 2021.05.0
fastparquet : None
gcsfs : None
matplotlib : 3.4.2
numexpr : None
odfpy : None
openpyxl : 3.0.7
pandas_gbq : None
pyarrow : 3.0.0
pyxlsb : None
s3fs : None
scipy : 1.6.2
sqlalchemy : None
tables : None
tabulate : 0.8.9
xarray : None
xlrd : None
xlwt : None
numba : None
Confirmed On Mac
INSTALLED VERSIONS
commit : f00ed8f
python : 3.9.2.final.0
python-bits : 64
OS : Darwin
OS-release : 20.5.0
Version : Darwin Kernel Version 20.5.0: Sat May 8 05:10:31 PDT 2021; root:xnu-7195.121.3~9/RELEASE_ARM64_T8101
machine : arm64
processor : arm
byteorder : little
LC_ALL : None
LANG : None
LOCALE : en_US.UTF-8
pandas : 1.3.0
numpy : 1.21.1
pytz : 2021.1
dateutil : 2.8.1
pip : 21.1.3
setuptools : 54.2.0
Cython : 0.29.22
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.4.2
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None