Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrame.iat will create new column if .iat is used to set None on int Series #23236

Closed
jimmywan opened this issue Oct 19, 2018 · 5 comments

Comments

Projects
None yet
4 participants
@jimmywan
Copy link

commented Oct 19, 2018

Code Sample, a copy-pastable example if possible

>>> df = pd.DataFrame({'a':[0,1],'b':[4,5]})
>>> df
   a  b
0  0  4
1  1  5
>>> df.iat[0, 0] = None
>>> df
   a  b   0
0  0  4 NaN
1  1  5 NaN

Problem description

This is problematic for multiple reasons.

  • inconsistency between iloc and iat.
  • creation of a brand new column is almost surely not the intended/expected behavior
  • At the very least, I would expect it to simply bail on the operation with a warning about incompatible types.

This is likely related to the non-intuitive behavior of Series which has already been documented here:
#20643 (comment)

Expected Output

I would expect it to do what it does when using .iloc:

>>> df = pd.DataFrame({'a':[0,1],'b':[4,5]})
>>> df
   a  b
0  0  4
1  1  5
>>> df.iloc[0, 0] = None
>>> df
     a  b
0  NaN  4
1  1.0  5

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.6.final.0 python-bits: 64 OS: Linux OS-release: 3.13.0-24-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.utf8 LOCALE: en_US.UTF-8

pandas: 0.22.0
pytest: None
pip: 9.0.3
setuptools: 39.0.1
Cython: 0.27.3
numpy: 1.14.0
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: 0.5.0
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: 0.1.2
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@jimmywan

This comment has been minimized.

Copy link
Author

commented Oct 19, 2018

I tried it with 0.23.4 and had the same output.

INSTALLED VERSIONS ------------------ commit: None python: 3.6.6.final.0 python-bits: 64 OS: Linux OS-release: 3.13.0-24-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.utf8 LOCALE: en_US.UTF-8

pandas: 0.23.4
pytest: None
pip: 9.0.3
setuptools: 39.0.1
Cython: 0.27.3
numpy: 1.14.0
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: 0.5.0
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: 0.1.2
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@mroeschke

This comment has been minimized.

Copy link
Member

commented Oct 19, 2018

Thanks for the report. Investigations and PRs welcome!

@aditya0811

This comment has been minimized.

Copy link

commented Oct 22, 2018

I think we can conclude that as df.iat is capable of creating and setting values,it will create index for 0 which is not present as column index(a,b).
We cannot access those values using iat if column index are char type.

@mroeschke

This comment has been minimized.

Copy link
Member

commented Oct 22, 2018

iat indexes by integer position and not label, so it shouldn't matter if 0 is not in the columns; it should modify the value in row 0 column A

IIRC iat and at doesn't perform as many data validation checks, so this may be a "fallback" assignment and broadcasting.

@jimmywan

This comment has been minimized.

Copy link
Author

commented Oct 23, 2018

I do not agree with @aditya0811 :

I think we can conclude that as df.iat is capable of creating and setting values,it will create index for 0 which is not present as column index(a,b).

As explained by @mroeschke here:

iat indexes by integer position and not label, so it shouldn't matter if 0 is not in the columns; it should modify the value in row 0 column A

RoeiRaz added a commit to RoeiRaz/pandas that referenced this issue Dec 30, 2018

RoeiRaz added a commit to RoeiRaz/pandas that referenced this issue Dec 30, 2018

BUG: fix .iat assignment creates a new column
- in response to pandas-devgh-23236
- changes the fallback of .iat to .iloc on type error

@RoeiRaz RoeiRaz referenced this issue Dec 30, 2018

Merged

BUG: fix .iat assignment creates a new column #24495

4 of 4 tasks complete

@jreback jreback added this to the 0.24.0 milestone Dec 30, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.