Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REGR: revert ExtensionBlock.set to be in-place #35271

Merged

Conversation

jorisvandenbossche
Copy link
Member

@jorisvandenbossche jorisvandenbossche commented Jul 14, 2020

Alternative for #35266 and closes #35369

@jorisvandenbossche
Copy link
Member Author

This is a more targeted revert (so a smaller change compared to #35266), but thus also not keeping the fix from #35266.

For 1.1, I however think it might be better to fix the regression in __setitem__ than to keep the fix in iloc. My reasoning: it seems worse to have a regression where an existing array is updated where it should not have been modiifed, as to have a bug where an existing array doesn't get updated where it should have been updated in place (in addition to that the bug already existed before).

@TomAugspurger
Copy link
Contributor

cc @jreback and @jbrockmendel. Do you have a feeling for which of #35266 or this you want to proceed with for fixing #33457?

@jbrockmendel
Copy link
Member

@TomAugspurger I'm planning to take a swing at this today.

@jreback jreback added Indexing Related to indexing on series/frames, not to indexes themselves Regression Functionality that used to work in a prior pandas version labels Jul 16, 2020
@jreback
Copy link
Contributor

jreback commented Jul 16, 2020

this lgtm

@jbrockmendel
Copy link
Member

@jreback pls put a hold on this pending resolution of #33457 (comment)

@simonjayhawkins
Copy link
Member

This PR would also fix #35369

@jbrockmendel
Copy link
Member

@simonjayhawkins would #35369 be addressed by removing the blk.should_store(value) case in BlockManager.iset? The discussion in #33457 is pointing towards that being the "correct" approach for this.

@simonjayhawkins
Copy link
Member

@simonjayhawkins would #35369 be addressed by removing the blk.should_store(value) case in BlockManager.iset? The discussion in #33457 is pointing towards that being the "correct" approach for this.

yes, test added here passes with that patch.

@TomAugspurger
Copy link
Contributor

@jbrockmendel I think this is the one to go with for 1.1.0. What are your thoughts?

@jbrockmendel
Copy link
Member

Since this is un-fixing a tested bug in order to fix an un-tested bug, I'd prefer to keep this as-is until we get the longer-term solution in place (hopefully 1.1.1), but not a particularly strong preference.

@simonjayhawkins
Copy link
Member

Since this is un-fixing a tested bug in order to fix an un-tested bug

FWIW, I think fixing regressions should take precedence. We don't know how many more users will be affected once 1.1 is out.

If we can do both and keep the bug fix great otherwise we should revert IMO.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jul 27, 2020

Since this is un-fixing a tested bug in order to fix an un-tested bug, I'd prefer to keep this as-is

Which bugs are those? The "un-fixing" is the DataFrame.__setitem__ with Categorical dtype I think, but I'm not sure about the un-tested bug. And what does "as-is" mean here, 1.0.x or master?

@simonjayhawkins I'm interpreting your comment as a preference for this PR for 1.1.0, is that fair?

@simonjayhawkins
Copy link
Member

@simonjayhawkins I'm interpreting your comment as a preference for this PR for 1.1.0, is that fair?

yes

@jbrockmendel
Copy link
Member

Which bugs are those? The "un-fixing" is the DataFrame.setitem with Categorical dtype I think, but I'm not sure about the un-tested bug. And what does "as-is" mean here, 1.0.x or master?

The "un-fixing" is df.iloc[:, n] = foo, see the existing test that this PR xfails and the whatsnew note it removes.

The "un-tested bug" I'm referring to is the behavior that this PR fixes+tests, df[col] = foo.

ATM in master df[col] = foo consistently overwrites for all dtypes (conditional on should_store), while this PR (and 1.0.x) is inconsistent about it. I don't think that inconsistency is intentional, so it should be considered a bug.

By the same token, if we're not going to call df[col] = foo over-writing a bug for non-EA dtypes, then I don't think this should be considered a regression.

I think we're agreed that #35417 is the correct long-term fix. If, as @TomAugspurger suggested here, that needs to wait for 2.x, then I'd rather have internally consistent behavior in the interim (but I'd rather #35417 not wait for 2.x, in which case I'm OK with behavior being inconsistent for a shorter period)

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jul 27, 2020

meta-comment: I don't want to get too hung up on labeling these as "bug or not" or whether something is a regression; I'm more concerned with breaking people's workflows.

@jbrockmendel I'm still unsure what your preference is for specifically 1.1.0 (I longer-term is clearer). Is this OK to merge?

@jbrockmendel
Copy link
Member

I'm still unsure what your preference is for specifically 1.1.0 (I longer-term is clearer). Is this OK to merge?

I'm OK with this conditional on #35417 not being held up until 2.0

@TomAugspurger
Copy link
Contributor

OK, thanks.

I think I can be on board with "make DataFrame.__setitem__ consistently assigning a new array". I'll clarify / change my comment over in #33457 about it waiting for 2.x.

@TomAugspurger TomAugspurger merged commit 6302f7b into pandas-dev:master Jul 27, 2020
@jorisvandenbossche jorisvandenbossche deleted the EA-setitem-regression2 branch August 18, 2020 13:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Indexing Related to indexing on series/frames, not to indexes themselves Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: df reassignment following reorder_categories changed behavior in 1.1.0rc0
5 participants