Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REGR: Fix fillna making a copy when dict was given as fill value and inplace is set #47327

Merged
merged 10 commits into from Jun 21, 2022

Conversation

phofl
Copy link
Member

@phofl phofl commented Jun 13, 2022

I think updating inplace is sensible here? This is a subtle change so I think we should backport, because the fact that we are using setitem under th hood should not leak into the interface. If we want to change this, we should deprecate, but only makes sens if we don't get Copy on Write in for 2.0

@phofl phofl added Regression Functionality that used to work in a prior pandas version Copy / view semantics labels Jun 13, 2022
@phofl phofl added this to the 1.4.3 milestone Jun 13, 2022
@simonjayhawkins
Copy link
Member

I think updating inplace is sensible here? This is a subtle change so I think we should backport, because the fact that we are using setitem under th hood should not leak into the interface. If we want to change this, we should deprecate, but only makes sens if we don't get Copy on Write in for 2.0

@phofl can you also add your comments to the issue, #47188 to avoid having the discussion in more than one place as I think the expected behavior has not yet been confirmed.

@phofl
Copy link
Member Author

phofl commented Jun 13, 2022

Done

@simonjayhawkins
Copy link
Member

Done

Thanks. The fix here is different to suggested by @jbrockmendel in #47188 (comment) so requesting a review from Brock.

@phofl
Copy link
Member Author

phofl commented Jun 13, 2022

Yep, sure.

Somehow this fails on the ci while passing locally, will have to investigate

@jbrockmendel
Copy link
Member

Makes sense to me. The solution I suggested in #47188 might avoid creating an intermediate array, but I think without the solution here it would be a wash. So it might be worth looking into combining the two at some point.

Copy link
Member

@jbrockmendel jbrockmendel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending green

@jreback
Copy link
Contributor

jreback commented Jun 17, 2022

@@ -1183,6 +1183,9 @@ def fillna(
# test_fillna_dtype_conversion_equiv_replace
nbs = self.where(value, ~mask.T, _downcast=False)

if inplace:
return nbs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can just return early in the if inplace: block above?

@simonjayhawkins
Copy link
Member

Somehow this fails on the ci while passing locally, will have to investigate

yep. I see it passing locally. you don't need the change to pandas/core/internals/blocks.py in this PR as the code in pandas/core/generic.py doesn't pass the inplace argument.

perhaps remove that and see what fails?

@phofl
Copy link
Member Author

phofl commented Jun 20, 2022

Failure is the same, was just interested if it would make any difference.

Tried setting up a new env and tried editing 1.4.2 release, but can not reproduce the failure...

@simonjayhawkins
Copy link
Member

but can not reproduce the failure...

some of the failures are for duplicate labels. maybe loc and setitem handle them differently, although that doesn't explain why it works on one platform and not another.

@phofl
Copy link
Member Author

phofl commented Jun 20, 2022

I think the duplicated failures are just deprecation warnings, which is expected I think.

What really bugs me is the new test that fails

@simonjayhawkins
Copy link
Member

What really bugs me is the new test that fails

yeah, difficult without being able to reproduce. maybe just use the ci for troubleshooting and mark this as draft while debugging.

@phofl phofl marked this pull request as draft June 20, 2022 20:02
@phofl
Copy link
Member Author

phofl commented Jun 20, 2022

This is really tricky, don't really have an idea where to start, because this should work

@simonjayhawkins
Copy link
Member

This is really tricky, don't really have an idea where to start, because this should work

it's failing on the array manager tests? can reproduce with

PANDAS_DATA_MANAGER=array pytest pandas/tests/frame/methods/test_fillna.py -k test_inplace_dict_update_view

can probably just skip them.

@phofl
Copy link
Member Author

phofl commented Jun 20, 2022

Ah we are running the tests twice, I did not know that

@simonjayhawkins
Copy link
Member

Thanks @phofl lgtm except can remove the change to pandas/core/internals/blocks.py?

_maybe_downcast short-circuits if downcast is False, so I assume that the inplace parameter of blocks.fillna maybe an "if can do" and that returning early might change some untested downcasting behavior. Not checked in detail, but if not needed for this PR, since backporting maybe safer not to change?

@phofl
Copy link
Member Author

phofl commented Jun 21, 2022

Yep definitely should remove. Not quite finished yet. As @Yikun mentioned, should we also change for eval and update? Could do this in one pr.

@simonjayhawkins
Copy link
Member

Could do this in one pr.

sure.

@phofl
Copy link
Member Author

phofl commented Jun 21, 2022

Removed eval, this is tricky. Update was straightforward

@phofl phofl marked this pull request as ready for review June 21, 2022 13:14
@jreback
Copy link
Contributor

jreback commented Jun 21, 2022

doesn't this overlap with the PR for the performance change in update?

@phofl
Copy link
Member Author

phofl commented Jun 21, 2022

Good point, forgot about that.

This restores behavior from 1.3.5, so the other pr was not only about performance, it would have also tackled a regression. Did not think about that before

@simonjayhawkins
Copy link
Member

merge this and close #47407?

@phofl
Copy link
Member Author

phofl commented Jun 21, 2022

Yes, if @jreback is ok?

I would open a dedicated issue for eval, the other is already pretty long

@jreback
Copy link
Contributor

jreback commented Jun 21, 2022

yes ok here

@simonjayhawkins simonjayhawkins merged commit 3364f9a into pandas-dev:main Jun 21, 2022
@lumberbot-app

This comment was marked as resolved.

@simonjayhawkins
Copy link
Member

Thanks @phofl

There is also #46983. regression from __setitem__ change. I've not looked but might be a similar fix.

@simonjayhawkins
Copy link
Member

I would open a dedicated issue for eval, the other is already pretty long

#47449

@phofl
Copy link
Member Author

phofl commented Jun 21, 2022

Thx

@phofl phofl deleted the 47188 branch June 21, 2022 19:42
simonjayhawkins added a commit that referenced this pull request Jun 22, 2022
…en dict was given as fill value and inplace is set) (#47448)

* Backport PR #47327: REGR: Fix fillna making a copy when dict was given as fill value and inplace is set

Co-authored-by: Patrick Hoefler <61934744+phofl@users.noreply.github.com>
yehoshuadimarsky pushed a commit to yehoshuadimarsky/pandas that referenced this pull request Jul 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Copy / view semantics Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: inplace behavior is inconsist for fillna
4 participants