Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: assignment with .at or .loc modifies dataframe when it fails #15490

Open
allComputableThings opened this issue Feb 23, 2017 · 7 comments
Open
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Projects

Comments

@allComputableThings
Copy link

Code Sample, a copy-pastable example if possible

d = pd.DataFrame()
try:
   d.set_value(0, "c", [1,2,4]) 
except Exception:
  print "Caught"

print d  # c is now in d 

Problem description

The above call set_value (correctly) fails and throw an exception. However, the dataframe has been modified. It has a new 'c' column of type float. Because the operation failed, it probably should have remained unchanged.

Expected Output

print d should print an empty dataframe.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 2.7.6.final.0 python-bits: 64 OS: Linux OS-release: 4.2.0-41-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: None.None

pandas: 0.19.1
nose: 1.3.1
pip: 1.5.4
setuptools: 3.3
Cython: 0.25.1
numpy: 1.11.2
scipy: 0.18.1
statsmodels: 0.6.1
xarray: None
IPython: 4.2.1
sphinx: None
patsy: 0.2.1
dateutil: 2.6.0
pytz: 2016.7
blosc: None
bottleneck: None
tables: 3.1.1
numexpr: 2.2.2
matplotlib: 1.3.1
openpyxl: 1.7.0
xlrd: 0.9.2
xlwt: 0.7.5
xlsxwriter: None
lxml: None
bs4: 4.2.1
html5lib: 0.999
httplib2: 0.8
apiclient: None
sqlalchemy: 1.0.15
pymysql: None
psycopg2: 2.4.5 (dt dec mx pq3 ext)
jinja2: 2.8
boto: 2.41.0
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented Feb 23, 2017

The well-defined and document indexers have these types of guarantees (e.g. .loc), but these functions are being deprecated: #15269

so unless someone puts forth a fix, this is won't fix.

@jreback jreback closed this as completed Feb 23, 2017
@jreback jreback added Error Reporting Incorrect or improved errors from pandas Indexing Related to indexing on series/frames, not to indexes themselves labels Feb 23, 2017
@jreback jreback added this to the won't fix milestone Feb 23, 2017
@jreback
Copy link
Contributor

jreback commented Feb 23, 2017

actually I spoke too soon about other operators. It depends if there are multiple operations or not. Though honestly you are pushing the boundaries here; these are not atomic operations. That said, if you'd like to push a fix would take it.

In [19]: d = pd.DataFrame()

In [20]: d.at[4,1] = [1,2,3]
ValueError: setting an array element with a sequence.

In [21]: d
Out[21]: 
    1
4 NaN

this is fine.

In [22]: d = pd.DataFrame()
    ...: 

In [23]: d.loc[4] = [1,2,3]
ValueError: cannot set a frame with no defined columns

In [24]: d
Out[24]: 
Empty DataFrame
Columns: []
Index: []

@jorisvandenbossche
Copy link
Member

@jreback the same happens with loc if you specify both dims:

In [29]: d = pd.DataFrame()

In [30]: d.loc[4, 3] = [1,2,3]
...
ValueError: setting an array element with a sequence.

In [31]: d
Out[31]: 
    3
4 NaN

@jreback
Copy link
Contributor

jreback commented Feb 25, 2017

of course this is a 2 stage operation in implementation

@allComputableThings
Copy link
Author

... hmmm, so the caller should guess at the implementation to infer the behavior?
Generally, any code that throws an exception if better off its it idempotent. Could we leave this as an open bug (clearly it shouldn't modify the dataframe, and is therefore a bug, even if the fix is not obvious).

@jorisvandenbossche
Copy link
Member

Given that the behaviour is also present in non-deprecated functionality (.at, .loc), I agree we can reopen this.
It will still need someone diving into it to fix it.

@jorisvandenbossche jorisvandenbossche modified the milestones: won't fix, Someday Feb 26, 2018
@jorisvandenbossche jorisvandenbossche changed the title set_value modifies dataframe when fails BUG: assignment with .at or .loc modifies dataframe when it fails Feb 26, 2018
@mroeschke mroeschke removed the Error Reporting Incorrect or improved errors from pandas label May 8, 2021
@mroeschke mroeschke removed this from the Someday milestone Oct 13, 2022
@bt-
Copy link

bt- commented Nov 16, 2022

Is the result from this example related? I was surprised that this did not return an error when the left side of the last assignment does.

d = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
# Gives KeyError: "['C', 'D'] not in index"
# d.loc[:, ['A', 'B', 'C', 'D']] 
# Gives no error and adds columns 'C' and 'D' with NaNs
d.loc[:, ['A', 'B', 'C', 'D']] = pd.DataFrame({'A': [7, 8, 9], 'B': [10, 11, 12]})
d

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Projects
No open projects
Indexing
Awaiting triage
Development

No branches or pull requests

5 participants