Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: fix df.where(cond) when cond is empty #21947

Merged

Conversation

Projects
None yet
4 participants
@pajachiet
Copy link
Contributor

commented Jul 17, 2018

  • when cond is empty, cond.dtypes are objects, which raised
    ValueError: Boolean array expected for the condition, not object
  • closes #xxxxx
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry
@gfyoung

This comment has been minimized.

Copy link
Member

commented Jul 17, 2018

@pajachiet : Thanks for the PR! Can you add a test and a whatsnew entry in 0.24.0.txt as well?

From a higher level, this seems like a corner case to me here: why would you want to pass in an empty object for cond for .where ?

@gfyoung gfyoung added the Algos label Jul 17, 2018

@jreback

This comment has been minimized.

Copy link
Contributor

commented Jul 17, 2018

how is this not an error?

if you don’t have a conf then where doesn’t make sure

@gfyoung

This comment has been minimized.

Copy link
Member

commented Jul 17, 2018

how is this not an error?

@jreback : That's what I was wondering too myself. Perhaps a better error message is in order?

@codecov

This comment has been minimized.

Copy link

commented Jul 17, 2018

Codecov Report

Merging #21947 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #21947      +/-   ##
==========================================
+ Coverage   92.25%   92.25%   +<.01%     
==========================================
  Files         161      161              
  Lines       51169    51170       +1     
==========================================
+ Hits        47207    47208       +1     
  Misses       3962     3962
Flag Coverage Δ
#multiple 90.64% <100%> (ø) ⬆️
#single 42.28% <0%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/generic.py 96.81% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c6366f5...f930315. Read the comment docs.

@pajachiet

This comment has been minimized.

Copy link
Contributor Author

commented Jul 17, 2018

Hello

I wanted to write a test, but I did not find where were the unit tests for the 'where' function.

This is indeed a corner case, when both df and cond are empty. Example:

import pandas as pd
data = [1]
df = pd.DataFrame(columns=['a'], data=data)
cond = df.applymap(lambda x: x > 0)
df.where(cond)  # Ok

data = []
df = pd.DataFrame(columns=['a'], data=data)
cond = df.applymap(lambda x: x > 0)
df.where(cond)  # Error

This kind of corner case do happen :

  • when you do unit testing
  • when your dataframe comes from real 'empty' data.

I had the same kind of corner case with df.duplicated in previous version of pandas.

@jreback

This comment has been minimized.

Copy link
Contributor

commented Jul 17, 2018

yeah certainly could have a better error message (e.g. you can detect the .empty), but this should just be an error.

@jreback

This comment has been minimized.

Copy link
Contributor

commented Jul 17, 2018

the df empty is ok, but the cond cannot be empty (unless the df itself is empty).

@pajachiet

This comment has been minimized.

Copy link
Contributor Author

commented Jul 17, 2018

Yes, cond can be empty if and only if df is empty.

But it is a bug to raise an error when they are both empty, no ?

As the bug is on cond dtype checking, I thought the easiest way to solve it was to avoid dtypes checking on empty dataframe (as it does not makes sense anyway).

One could also explicitly check for df.empty, but this would be weird as it already works when cond is empty with correct dtypes :

data = []
df = pd.DataFrame(columns=['a'], data=data)
cond = df > 0 
df.where(cond)  # Ok
cond.dtypes  # bool
@jreback
Copy link
Contributor

left a comment

needs a test

add in pandas/tests/frame/test_indexing (you may need one for series, though I don't know if this path is hit, in pandas/tests/series/tests_indexing)

@pep8speaks

This comment has been minimized.

Copy link

commented Jul 25, 2018

Hello @pajachiet! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on August 09, 2018 at 11:45 Hours UTC

@pajachiet pajachiet force-pushed the pajachiet:fix-bug-where-with-empty-cond-df branch from 8f799db to 6772d0e Jul 25, 2018

@pajachiet pajachiet force-pushed the pajachiet:fix-bug-where-with-empty-cond-df branch 2 times, most recently from d04e677 to 319f1e2 Aug 5, 2018

@pajachiet

This comment has been minimized.

Copy link
Contributor Author

commented Aug 6, 2018

Hello,
I added a simple test as requested.
I changed the commit two times because of pep8 mistake, then trying to fix the build, but it is still failing for a reason I do not understand (it does not seem related to the change).
Could you help fixing it ?

Show resolved Hide resolved pandas/tests/frame/test_indexing.py Outdated
Show resolved Hide resolved pandas/tests/frame/test_indexing.py Outdated
Show resolved Hide resolved pandas/tests/frame/test_indexing.py Outdated
@jreback
Copy link
Contributor

left a comment

pls add a whatsnew entry in v0.24.0 / bug fixes / reshaping

@jreback
Copy link
Contributor

left a comment

use this PR number as the issue number (as I don't think there is an issue about this)

@jreback

This comment has been minimized.

Copy link
Contributor

commented Nov 4, 2018

closing as stale. if you'd like to continue, pls ping.

@jreback jreback closed this Nov 4, 2018

@pajachiet

This comment has been minimized.

Copy link
Contributor Author

commented Nov 4, 2018

I believe the bug in the CI was not caused by changes in this PR.

This PR fixed a simple but real bug. I managed to fix it in another way for my client, so I only did this PR if it can improve pandas, especially in those edges cases that can be really annoying.

I added a test, a whatsnew entry and answered your comments.
Could you please manage to merge it ?

Otherwise keep it close, it's too much an investment for such a minor improvment.

@gfyoung

This comment has been minimized.

Copy link
Member

commented Nov 4, 2018

@pajachiet : Do you want to finish this up? I think it requires a rebase, that's all.

@jreback

This comment has been minimized.

Copy link
Contributor

commented Nov 4, 2018

Otherwise keep it close, it's too much an investment for such a minor improvment.

well if it’s a bug then it will probably come up again

it’s always helpful to improve things

@gfyoung

This comment has been minimized.

Copy link
Member

commented Nov 5, 2018

Let's take this for another spin.

@gfyoung gfyoung reopened this Nov 5, 2018

BUG: Fix df.where(cond) when cond is empty
When `cond` is empty, `cond.dtypes` are objects,
which raises a `ValueError`. Now, it does not.

Original author: @pajachiet

@gfyoung gfyoung force-pushed the pajachiet:fix-bug-where-with-empty-cond-df branch from 92ccd06 to 3342c09 Nov 5, 2018

@gfyoung gfyoung added this to the 0.24.0 milestone Nov 5, 2018

@gfyoung gfyoung added the Bug label Nov 5, 2018

@jreback jreback removed this from the 0.24.0 milestone Nov 6, 2018

@gfyoung

This comment has been minimized.

Copy link
Member

commented Nov 6, 2018

@jreback : Is there a reason why we can't attempt to merge this for 0.24.0 ? I addressed the last comment (regarding the issue reference). It's also rebased and green. Did you have any other comments to add?

@gfyoung

This comment has been minimized.

Copy link
Member

commented Nov 6, 2018

Just saw your email regarding 0.24.0. That answers my question above.

@jreback

jreback approved these changes Nov 6, 2018

Copy link
Contributor

left a comment

minor comment.

Show resolved Hide resolved doc/source/whatsnew/v0.24.0.txt Outdated

@jreback jreback added this to the 0.24.0 milestone Nov 6, 2018

@jreback jreback merged commit eb4e11e into pandas-dev:master Nov 6, 2018

0 of 3 checks passed

pandas-dev.pandas
Details
ci/circleci: py36_locale CircleCI is running your tests
Details
continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
@jreback

This comment has been minimized.

Copy link
Contributor

commented Nov 6, 2018

thanks!

@pajachiet

This comment has been minimized.

Copy link
Contributor Author

commented Nov 6, 2018

Many thanks ! :)

JustinZhengBC added a commit to JustinZhengBC/pandas that referenced this pull request Nov 14, 2018

brute4s99 added a commit to brute4s99/pandas that referenced this pull request Nov 19, 2018

Pingviinituutti added a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

Pingviinituutti added a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.