BUG: Label and integer based indexing return different values for same column #45684

epmojo · 2022-01-28T21:06:50Z

Pandas version checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import numpy as np
import pandas as pd


def main():
    # create a data frame
    # dtypes must be the same for the issue to occur
    data = {'x': np.arange(8, dtype=np.int64), 'y': np.int64(0)}
    df = pd.DataFrame(data)

    # These next three lines are necessary to produce the issue.
    df = df.copy()
    data = df['y']
    df.iat[7, df.columns.get_loc('x')] = 9999

    # set the last row of column y to a desired value
    df.iat[7, df.columns.get_loc('y')] = 1234

    # compare get values for at and iat
    atValue = df.at[7, 'y']
    iatValue = df.iat[7, df.columns.get_loc('y')]
    print(iatValue, atValue, iatValue == atValue)

    # inspect data
    print("\nThe DataFrame:")
    print(df)

    print("\nLabel Based:")
    print(df['y'])

    print("\nInteger Based:")
    print(df.iloc[:, df.columns.get_loc('y')])

if __name__ == '__main__':
    main()

Issue Description

In the above example, a value of 1234 was set to the last row in column 'y' using an integer based method of indexing. However, if using a label based method of indexing to view the values, it returns the original value of 0 instead of 1234. This issue does not occur on pandas version 1.3.5.

1234 0 False

The DataFrame:
      x     y
0     0     0
1     1     0
2     2     0
3     3     0
4     4     0
5     5     0
6     6     0
7  9999  1234

Label Based:
0    0
1    0
2    0
3    0
4    0
5    0
6    0
7    0
Name: y, dtype: int64

Integer Based:
0       0
1       0
2       0
3       0
4       0
5       0
6       0
7    1234
Name: y, dtype: int64

Expected Behavior

The label based method returns the correct value.

Installed Versions

INSTALLED VERSIONS

commit : bb1f651
python : 3.9.10.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.17763
machine : AMD64
processor : Intel64 Family 6 Model 142 Stepping 12, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_United States.1252

pandas : 1.4.0
numpy : 1.22.1
pytz : 2021.3
dateutil : 2.8.2
pip : 21.3.1
setuptools : 60.5.0
Cython : 0.29.26
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.7.1
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : 1.3.2
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.5.1
numba : 0.53.1
numexpr : None
odfpy : None
openpyxl : 3.0.9
pandas_gbq : None
pyarrow : 6.0.1
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.7.3
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : None

The text was updated successfully, but these errors were encountered:

phofl · 2022-01-28T22:07:02Z

cc @jbrockmendel The cache of the DataFrame looks off, could have been caused by #43406

jbrockmendel · 2022-01-28T22:23:52Z

Looks like the df.iat[7, df.columns.get_loc('x')] = 9999 causes us to split blocks without clearing df._item_cache. somehow this happens in series._set_value(index, value, takeable=True) inside DataFrame._set_value. Will need to look into this more closely.

jbrockmendel · 2022-01-28T22:45:26Z

an edit that fixes this case (~~haven't run the test suite with it though~~) is to go into DataFrame._set_value and change the series._set_value calls to series._mgr.setitem_inplace(loc, value), which avoids a block-splitting

simonjayhawkins · 2022-02-08T12:09:30Z

cc @jbrockmendel The cache of the DataFrame looks off, could have been caused by #43406

can confirm first bad commit: [03dd698] BUG: DataFrame.setitem sometimes operating inplace (#43406)

simonjayhawkins · 2022-02-08T12:11:44Z

an edit that fixes this case (haven't run the test suite with it though) is to go into DataFrame._set_value and change the series._set_value calls to series._mgr.setitem_inplace(loc, value), which avoids a block-splitting

on master, on 1.4.x series._mgr.setitem_inplace(loc, value) is the existing original code

epmojo added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 28, 2022

phofl added the Indexing Related to indexing on series/frames, not to indexes themselves label Jan 28, 2022

jbrockmendel mentioned this issue Jan 30, 2022

BUG: Frame.iat item_cache invalidation bug #45706

Merged

4 tasks

phofl added Regression Functionality that used to work in a prior pandas version and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 30, 2022

phofl added this to the 1.4.1 milestone Jan 30, 2022

jreback closed this as completed in #45706 Jan 30, 2022

simonjayhawkins added a commit to simonjayhawkins/pandas that referenced this issue Feb 8, 2022

code sample for pandas-dev#45684

f12f6e8

simonjayhawkins mentioned this issue Jun 8, 2022

BUG: Memory leak when setting Series value via __setitem__ #47172

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Label and integer based indexing return different values for same column #45684

BUG: Label and integer based indexing return different values for same column #45684

epmojo commented Jan 28, 2022

INSTALLED VERSIONS

phofl commented Jan 28, 2022 •

edited

Loading

jbrockmendel commented Jan 28, 2022

jbrockmendel commented Jan 28, 2022 •

edited

Loading

simonjayhawkins commented Feb 8, 2022

simonjayhawkins commented Feb 8, 2022

BUG: Label and integer based indexing return different values for same column #45684

BUG: Label and integer based indexing return different values for same column #45684

Comments

epmojo commented Jan 28, 2022

Pandas version checks

Reproducible Example

Issue Description

Expected Behavior

Installed Versions

INSTALLED VERSIONS

phofl commented Jan 28, 2022 • edited Loading

jbrockmendel commented Jan 28, 2022

jbrockmendel commented Jan 28, 2022 • edited Loading

simonjayhawkins commented Feb 8, 2022

simonjayhawkins commented Feb 8, 2022

phofl commented Jan 28, 2022 •

edited

Loading

jbrockmendel commented Jan 28, 2022 •

edited

Loading