BUG: df[col] = arr should not overwrite data in df[col] #35417

jbrockmendel · 2020-07-26T22:39:51Z

closes REGR: setting column with setitem should not modify existing array inplace #33457
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

cc @jorisvandenbossche this still fails 7 tests locally and there's one more (commented in-line) test that looks fishy. Extra eyeballs would be welcome.

xref #35271, #35266

pep8speaks · 2020-07-26T22:40:00Z

Hello @jbrockmendel! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-12-29 22:43:33 UTC

TomAugspurger · 2020-07-27T11:08:00Z

What's the summary of the behavior change from 1.0.5? DataFrame.__setitem__[array] will not mutate the existing array inplace? What dtypes does this affect?

TomAugspurger · 2020-07-27T13:02:24Z

@jbrockmendel I attempted a whatsnew in e600237, if you could take a look.

TomAugspurger · 2020-07-27T13:42:35Z

Going through the failing tests

test_fancy_getitem_slice_mixed: OK
TestDataFrameIndexing.test_iloc_row: OK
TestDataFrameIndexing.test_iloc_col: OK
TestiLoc2.test_identity_slice_returns_new_object: ... Probably OK
TestiLoc2.test_iloc_setitem_categorical_updates_inplace: Probably OK
TestMerge.test_merge_nocopy: Probably OK

Looking into test_apply_function_with_indexing some more now.

TomAugspurger · 2020-07-27T14:04:35Z

One question on desired behavior: df.loc[:, "A"] = value should mutate the array inplace, right? On this branch, that is lost:

In [2]: df = pd.DataFrame({"A": [1, 2, 3]})

In [3]: df2 = df.iloc[:]

In [4]: df._mgr.blocks[0].values is df2._mgr.blocks[0].values
Out[4]: True

In [5]: df.loc[:, "A"] = 0

In [6]: df2
Out[6]:
   A
0  1
1  2
2  3

jbrockmendel · 2020-07-27T16:50:26Z

One question on desired behavior: df.loc[:, "A"] = value should mutate the array inplace, right? On this branch, that is lost:

Agreed. I think thats the behavior with the FIXME comment in test_block_internals.

doc/source/whatsnew/v1.1.0.rst

jbrockmendel · 2020-07-27T18:19:19Z

test_fancy_getitem_slice_mixed: OK

The relevant part of this test reads:

        sliced = float_frame.iloc[:, -3:]

        msg = r"\nA value is trying to be set on a copy of a slice from a DataFrame"
        with pytest.raises(com.SettingWithCopyError, match=msg):
            sliced["C"] = 4.0

        assert (float_frame["C"] == 4).all()

I think with the new behavior, the last assertion is now incorrect, as we expect the setitem to create a new array. My question is: do we still need the SettingWithCopyError?

TomAugspurger · 2020-07-27T18:56:33Z

Sorry, but "OK" I meant OK with updating the tests for the new behavior.

My question is: do we still need the SettingWithCopyError?

I was wondering this as well. If the warning is about whether or not float_frame is updated then it seems like it can be removed. But perhaps the test should be updated to use sliced.loc[:, "C"] = 4.0, in which case the warning is still valid?

jbrockmendel · 2020-07-28T16:56:04Z

Down to two failing tests:

FAILED pandas/tests/groupby/test_apply_mutate.py::test_apply_function_with_indexing - AssertionError: Series are different
FAILED pandas/tests/reshape/merge/test_merge.py::TestMerge::test_merge_nocopy - AssertionError: assert False

test_apply_function_with_indexing i've tracked down into libreduction.apply_frame_axis0

simonjayhawkins · 2020-07-29T09:56:42Z

@jbrockmendel 1.2 whatsnew now merged. also you can add back the release note that was deleted in #35271?

jorisvandenbossche · 2021-06-08T14:28:47Z

I am not comfortable with merging this right before the RC.

(and sorry for my slow response here (I have been quite absent the last weeks for several reasons), which is part of why it's delayed until right before the RC ...)

That said, in addition I am also not yet convinced that this PR is worth the 1) subtle breaking changes and 2) performance implications.

Does that summarize the perf concern accurately?

I think so, yes. The main point is that a setitem operation, which before this PR could be fully inplace, can now trigger a copy of the (almost) full DataFrame (triggered directly or a potential delayed until a next consolidation). This additional data copy has an inherent cost. And I don't think it can be avoided while preserving the actual original intent of this PR (i.e. not overriding existing data).

But it's not only the performance implications, it are also the subtle API changes. And of course it's always a trade-off. In many case a slight performance degradation and some behaviour changes can be worth it for the benefit of the change. I am just not convinced that the benefits of this PR are big enough.

jbrockmendel · 2021-06-08T14:41:03Z

So we're roughly where we've always been: I think fixing this inconsistency is a bugfix, Joris thinks its an API change.

simonjayhawkins · 2021-06-08T14:44:08Z

So we're roughly where we've always been: I think fixing this inconsistency is a bugfix, Joris thinks its an API change.

there's a dev meeting tomorrow where this can be discussed.

jreback · 2021-12-20T01:41:33Z

pandas/core/indexing.py

@@ -693,8 +694,23 @@ def _ensure_listlike_indexer(self, key, axis=None, value=None):
            # GH#38148
            keys = self.obj.columns.union(key, sort=False)

+            # Try to get the right dtype when we do this reindex.


doesn't infer_dtype_from do this?

that infers a dtype, but we need to pass a scalar fill_value to reindex_axis

removed this edit as no longer necessary. i think at this point all the controversial bits are gone and we're down to just the bugfix.

jbrockmendel · 2022-01-26T17:47:02Z

closing in favor of #45352

jbrockmendel added 2 commits July 26, 2020 10:50

Make df[col] insert a new array, never overwrite

a1ce4fc

Merge branch 'master' of https://github.com/pandas-dev/pandas into 33457

b1913b7

whatsnew

e600237

TomAugspurger added this to the 1.1 milestone Jul 27, 2020

TomAugspurger added the Indexing Related to indexing on series/frames, not to indexes themselves label Jul 27, 2020

TomAugspurger mentioned this pull request Jul 27, 2020

REGR: setting column with setitem should not modify existing array inplace #33457

Open

jbrockmendel mentioned this pull request Jul 27, 2020

REGR: revert ExtensionBlock.set to be in-place #35271

Merged

jbrockmendel commented Jul 27, 2020

View reviewed changes

doc/source/whatsnew/v1.1.0.rst Outdated Show resolved Hide resolved

TomAugspurger mentioned this pull request Jul 28, 2020

REGR: setting column with setitem should not modify existing array inplace #35266

Closed

TomAugspurger modified the milestones: 1.1, 1.2 Jul 28, 2020

update tests

140f5f2

Merge branch 'master' of https://github.com/pandas-dev/pandas into 33457

c096c5d

jbrockmendel mentioned this pull request Jul 28, 2020

REF: dont set ndarray.data in libreduction #34997

Closed

jbrockmendel added 6 commits July 29, 2020 20:08

Merge branch 'master' of https://github.com/pandas-dev/pandas into 33457

cbd45e8

update whatsnew

989ba97

Merge branch 'master' of https://github.com/pandas-dev/pandas into 33457

bf6e5f5

Merge branch 'master' of https://github.com/pandas-dev/pandas into 33457

a5ffd10

Merge branch 'master' of https://github.com/pandas-dev/pandas into 33457

e126ab5

Merge branch 'master' of https://github.com/pandas-dev/pandas into 33457

8c4f9f3

jreback modified the milestones: 1.3, 1.4, 2.0 Jun 9, 2021

jbrockmendel added 12 commits November 8, 2021 10:02

Merge branch 'master' into 33457

0e8f671

Merge branch 'master' into 33457

3cdeeb4

Merge branch 'master' into 33457

acd3514

Merge branch 'master' into 33457

d97a1ac

ArrayManager fixup

4342f5d

Merge branch 'master' into 33457

f4dafc6

avoid warning

bebb12f

Merge branch 'master' into 33457

04475e3

Merge branch 'master' into 33457

ed6f3ec

fixed on AM

fe9fe66

Merge branch 'master' into 33457

1ae50bf

Merge branch 'master' into 33457

9d32c62

jreback requested changes Dec 20, 2021

View reviewed changes

jbrockmendel added 4 commits December 29, 2021 10:45

Merge branch 'master' into 33457

8972875

revert whitespace change

6524331

revert no-longer-necessary

123568d

Merge branch 'master' into 33457

8bef37a

jbrockmendel modified the milestones: 2.0, 1.5 Jan 6, 2022

jbrockmendel mentioned this pull request Jan 13, 2022

BUG: df.iloc[:, 0] = df.iloc[::-1, 0] not setting inplace for EAs #45352

Merged

4 tasks

jbrockmendel closed this Jan 26, 2022

jbrockmendel deleted the 33457 branch January 26, 2022 17:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: df[col] = arr should not overwrite data in df[col] #35417

BUG: df[col] = arr should not overwrite data in df[col] #35417

jbrockmendel commented Jul 26, 2020 •

edited by jorisvandenbossche

pep8speaks commented Jul 26, 2020 •

edited

TomAugspurger commented Jul 27, 2020

TomAugspurger commented Jul 27, 2020

TomAugspurger commented Jul 27, 2020

TomAugspurger commented Jul 27, 2020

jbrockmendel commented Jul 27, 2020

jbrockmendel commented Jul 27, 2020

TomAugspurger commented Jul 27, 2020

jbrockmendel commented Jul 28, 2020

simonjayhawkins commented Jul 29, 2020

jorisvandenbossche commented Jun 8, 2021

jbrockmendel commented Jun 8, 2021

simonjayhawkins commented Jun 8, 2021

jreback Dec 20, 2021

jbrockmendel Dec 20, 2021

jbrockmendel Dec 31, 2021

jbrockmendel commented Jan 26, 2022

BUG: df[col] = arr should not overwrite data in df[col] #35417

BUG: df[col] = arr should not overwrite data in df[col] #35417

Conversation

jbrockmendel commented Jul 26, 2020 • edited by jorisvandenbossche

pep8speaks commented Jul 26, 2020 • edited

Comment last updated at 2021-12-29 22:43:33 UTC

TomAugspurger commented Jul 27, 2020

TomAugspurger commented Jul 27, 2020

TomAugspurger commented Jul 27, 2020

TomAugspurger commented Jul 27, 2020

jbrockmendel commented Jul 27, 2020

jbrockmendel commented Jul 27, 2020

TomAugspurger commented Jul 27, 2020

jbrockmendel commented Jul 28, 2020

simonjayhawkins commented Jul 29, 2020

jorisvandenbossche commented Jun 8, 2021

jbrockmendel commented Jun 8, 2021

simonjayhawkins commented Jun 8, 2021

jreback Dec 20, 2021

Choose a reason for hiding this comment

jbrockmendel Dec 20, 2021

Choose a reason for hiding this comment

jbrockmendel Dec 31, 2021

Choose a reason for hiding this comment

jbrockmendel commented Jan 26, 2022

jbrockmendel commented Jul 26, 2020 •

edited by jorisvandenbossche

pep8speaks commented Jul 26, 2020 •

edited