Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: BooleanArray.any with all False values and skipna=False is buggy #33253

Closed
jorisvandenbossche opened this issue Apr 3, 2020 · 9 comments · Fixed by #33284
Closed

BUG: BooleanArray.any with all False values and skipna=False is buggy #33253

jorisvandenbossche opened this issue Apr 3, 2020 · 9 comments · Fixed by #33284
Labels
Bug NA - MaskedArrays Related to pd.NA and nullable extension arrays
Milestone

Comments

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Apr 3, 2020

From a report on my blog, this seems buggy:

In [10]: a = pd.array([False, False], dtype="boolean")  

In [11]: a  
Out[11]: 
<BooleanArray>
[False, False]
Length: 2, dtype: boolean

In [12]: a.any() 
Out[12]: False

In [13]: a.any(skipna=False)   
Out[13]: <NA>

I don't think there is a reason to return NA if there are no NAs in the array? (even with skipna=False)

We probably need to do a check for the presence of any NA here in the else clause:

if skipna:
return result
else:
if result or len(self) == 0:
return result
else:
return self.dtype.na_value

@jorisvandenbossche jorisvandenbossche added Bug NA - MaskedArrays Related to pd.NA and nullable extension arrays labels Apr 3, 2020
@jorisvandenbossche
Copy link
Member Author

And the same for all() with an all True array:

In [14]: a = pd.array([True, True], dtype="boolean") 

In [15]: a.all() 
Out[15]: True

In [16]: a.all(skipna=False)    
Out[16]: <NA>

@jorisvandenbossche
Copy link
Member Author

jorisvandenbossche commented Apr 3, 2020

@TomAugspurger I think it is quite obvious that the result should be same here for skipna=True/False if there are no NAs, but can you do a quick sanity check? (we didn't include thoses cases in the overview table at #29686)

@TomAugspurger
Copy link
Contributor

Agreed with your proposed changes to the return values.

@jorisvandenbossche jorisvandenbossche added this to the 1.0.4 milestone Apr 3, 2020
@linxiaow
Copy link
Contributor

linxiaow commented Apr 4, 2020

It seems that checking if there is any NA in both .any() method and .all() method right after the second else clause may solve this issue based on the table #29686. I am new to open source and just looking around. I was wondering whether I can have a try if you believe this issue is suitable for a beginner?

@jorisvandenbossche
Copy link
Member Author

@linxiaow Yes, this issue should be suitable. A pull request is very welcome!

@linxiaow
Copy link
Contributor

linxiaow commented Apr 4, 2020

I took a look at the code

values = self._data.copy()
np.putmask(values, self._mask, False)
result = values.any()
if skipna:
return result
else:
if result or len(self) == 0:
return result
else:
return self.dtype.na_value

I quickly went through previous code and find a docstring for mask variable
mask : numpy.ndarray
A 1-d boolean-dtype array indicating missing values (True
indicates missing).

I would assume self._mask in line 518 is also 1d ndarray, and I believe a .any() method for self._mask in line 523 would be enough to know whether there is a missing value. Same for .all(). If I am right, I can make a PR ASAP

@jorisvandenbossche
Copy link
Member Author

@linxiaow that sounds about right! So PR welcome, that will also be easier to give feedback (and I would also advice to first write tests, so you can actually check if your fix is working)

@linxiaow
Copy link
Contributor

linxiaow commented Apr 4, 2020

@jorisvandenbossche I made a PR but it seems to fail the CI PR tests. A lot of PR fails, too. I am not sure what I missed. Any suggestions?

@linxiaow
Copy link
Contributor

linxiaow commented Apr 6, 2020

@jorisvandenbossche Right now I am dealing with whatsnew entry requested by PR here. I am confused with API reference to the issue we are closing. Here is what I have written under 1.1.0, bug fixed, Missing:

- Bug in :meth:`array.any` incorrectly returns ``<NA>`` for pandas.array of all ``False`` value, e.g. ``pd.array([False, False], dtype="boolean")``. Now it returns ``False`` (:issue:`33253`)
1a08e61#diff-e763086372257ed7ca944d8bc5042a04L371-R372

I did not pass the web and doc test, and I think that I do not reference the API in the correct way for :meth: array.any. I took a look at what other people wrote and the render result in whatsnew here, and their :meth: will link to an existing method in reference API section. However, I did not find .any method for pandas.array. Is this doc still missing as a new feature? Could you please give me some suggestions so that I can make my .rst compile, like where to find what I should write after :meth: in the quote I posted above? THX very much!

linxiaow added a commit to linxiaow/pandas that referenced this issue Apr 6, 2020
…n/test_reduction.py based on change suggestions related to issue pandas-dev#33253
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug NA - MaskedArrays Related to pd.NA and nullable extension arrays
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants