Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Add SparseArray.all #17570

Merged
merged 6 commits into from Sep 28, 2017
Merged

Conversation

Licht-T
Copy link
Contributor

@Licht-T Licht-T commented Sep 18, 2017

  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

This is the part of #17386.
Block.where uses ndarray.all, but there is no such method in SparseArray.
This makes that all evaluates SparseArray([False, False, True]) as True.
https://github.com/pandas-dev/pandas/blob/master/pandas/core/internals.py#L1400

values = self.sp_values

if len(values) != len(self):
values = values.tolist()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason you are converting to a list? and not a dense array?

Copy link
Contributor Author

@Licht-T Licht-T Sep 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback For the performance reason. This is sparse and self.to_dense() is not efficient.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am asking why you are converting .tolist(), then appending, that is non-performant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simply I wanted to use np.all. I got an idea not to use tolist(). I'll change the solution.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls add a whatsnew note. do we have a specific issue for this? (other that the PR)?

@@ -614,6 +614,24 @@ def fillna(self, value, downcast=None):
return self._simple_new(new_values, self.sp_index,
fill_value=fill_value)

def all(self, axis=0, *args, **kwargs):
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might as well add any as well.

@jreback jreback added Bug Sparse Sparse Data Type labels Sep 18, 2017
@Licht-T
Copy link
Contributor Author

Licht-T commented Sep 18, 2017

@jreback Thanks for your review.

  • Changed the fill_value check method and now we are not using tolist().
  • Removed dtype parameter validation check.
    There is no dtype parameter on ndarray.all.
  • Also added SparseArray.any.
  • There seems to be no specific issue for this.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comment. ping on green

@@ -544,6 +544,7 @@ Sparse

- Bug in ``SparseSeries`` raises ``AttributeError`` when a dictionary is passed in as data (:issue:`16905`)
- Bug in :func:`SparseDataFrame.fillna` not filling all NaNs when frame was instantiated from SciPy sparse matrix (:issue:`16112`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

say that these are now implemented to handle SparseArray


Returns
-------
all : bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a See Also to point to np.all

@codecov
Copy link

codecov bot commented Sep 21, 2017

Codecov Report

Merging #17570 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17570      +/-   ##
==========================================
- Coverage   91.22%    91.2%   -0.02%     
==========================================
  Files         163      163              
  Lines       49625    49642      +17     
==========================================
+ Hits        45270    45277       +7     
- Misses       4355     4365      +10
Flag Coverage Δ
#multiple 88.99% <100%> (ø) ⬆️
#single 40.19% <41.17%> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/compat/numpy/function.py 93.33% <100%> (+0.2%) ⬆️
pandas/core/sparse/array.py 91.31% <100%> (+0.01%) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.77% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 37e23d0...5d04485. Read the comment docs.

@codecov
Copy link

codecov bot commented Sep 21, 2017

Codecov Report

Merging #17570 into master will decrease coverage by 0.04%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17570      +/-   ##
==========================================
- Coverage   91.24%    91.2%   -0.05%     
==========================================
  Files         163      163              
  Lines       49766    49642     -124     
==========================================
- Hits        45411    45276     -135     
- Misses       4355     4366      +11
Flag Coverage Δ
#multiple 88.99% <100%> (-0.04%) ⬇️
#single 40.19% <41.17%> (-0.21%) ⬇️
Impacted Files Coverage Δ
pandas/compat/numpy/function.py 93.33% <100%> (+0.2%) ⬆️
pandas/core/sparse/array.py 91.31% <100%> (-0.28%) ⬇️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/util/_decorators.py 66% <0%> (-12%) ⬇️
pandas/core/dtypes/missing.py 87.19% <0%> (-3.26%) ⬇️
pandas/core/base.py 96.01% <0%> (-0.56%) ⬇️
pandas/core/indexes/range.py 92.59% <0%> (-0.25%) ⬇️
pandas/tseries/offsets.py 97% <0%> (-0.15%) ⬇️
pandas/core/categorical.py 95.57% <0%> (-0.14%) ⬇️
... and 32 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 074b485...090f398. Read the comment docs.

@Licht-T
Copy link
Contributor Author

Licht-T commented Sep 21, 2017

@jreback Thank you. These are now fixed.

@Licht-T
Copy link
Contributor Author

Licht-T commented Sep 22, 2017

@jreback I found the bug in this solution. When fill_value = np.array([1,1,1]), np.all(fill_value) returns True. In numpy, np.array([np.array([1,1,1]), 0, np.array([1,1,1])], dtype=np.object).all() raise ValueError, but this is not. I'll fix.

UPDATE: I found SparseArray does not allow ndarray as fill_value, so this will not happen. I am sorry for the mistake.

@jreback jreback added this to the 0.21.0 milestone Sep 28, 2017
@jreback
Copy link
Contributor

jreback commented Sep 28, 2017

lgtm. waiting for CI to go green again before merging (unrelated conda-ish issues)

@jreback
Copy link
Contributor

jreback commented Sep 28, 2017

can you rebase

@Licht-T
Copy link
Contributor Author

Licht-T commented Sep 28, 2017

@jreback Rebased.

@jreback jreback merged commit bbf0dda into pandas-dev:master Sep 28, 2017
@jreback
Copy link
Contributor

jreback commented Sep 28, 2017

thanks @Licht-T

alanbato pushed a commit to alanbato/pandas that referenced this pull request Nov 10, 2017
No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Sparse Sparse Data Type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants