Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial implementation for arbitrary fill values. #165

Merged
merged 12 commits into from Jul 18, 2018

Conversation

hameerabbasi
Copy link
Collaborator

Closes #143

Initial implementation for arbitrary fill values.

@codecov
Copy link

codecov bot commented Jul 15, 2018

Codecov Report

Merging #165 into master will increase coverage by 0.15%.
The diff coverage is 98.97%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #165      +/-   ##
==========================================
+ Coverage   96.96%   97.12%   +0.15%     
==========================================
  Files          11       11              
  Lines        1219     1252      +33     
==========================================
+ Hits         1182     1216      +34     
+ Misses         37       36       -1
Impacted Files Coverage Δ
sparse/io.py 100% <ø> (ø) ⬆️
sparse/coo/core.py 96.63% <100%> (+0.31%) ⬆️
sparse/coo/umath.py 96.82% <100%> (-0.02%) ⬇️
sparse/utils.py 98.4% <100%> (+1.28%) ⬆️
sparse/dok.py 95.57% <100%> (ø) ⬆️
sparse/sparse_array.py 93.1% <100%> (+1.1%) ⬆️
sparse/coo/common.py 97.29% <100%> (+0.03%) ⬆️
sparse/coo/indexing.py 98.9% <87.5%> (-1.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e2906c0...3faf622. Read the comment docs.

Copy link
Member

@shoyer shoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few minor comments, but I'm excited about this! The implementation looks quite clean.

This will let us define functions all the nan-aggregations (e.g., nanmedian()) for use in xarray. These functions can simply require a fill value of NaN.

@@ -56,6 +56,7 @@ def linear_loc(coords, shape):
return out


@check_fill_value(2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it more readable to use keyword argument with literal values, e..g, @check_zero_fill_value(nargs=2)

sparse/utils.py Outdated


def equivalent(x, y):
return (x == y) | ((x != x) & (y != y))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: if you're calling this on arrays, you could skip the non-NA checks for dtypes that can't hold NaN/NaT.

sparse/utils.py Outdated
def generator(func):
@functools.wraps(func)
def wrapped(*args, **kwargs):
for arg in args[:nargs]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Watch out: This has one of the same issues we encountered in NEP 18: it means the public API for these functions will be changed, to only accepted positional arguments.

I would suggest writing helper functions to call inside func inside, e.g., check_zero_fill_value(a, b).

sparse/utils.py Outdated
@functools.wraps(func)
def wrapped(*args, **kwargs):
for arg in args[:nargs]:
if hasattr(arg, 'fill_value') and \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: per PEP 8, prefer using extra parentheses () to explicit continuation with \.

sparse/utils.py Outdated
from .sparse_array import SparseArray

@functools.wraps(func)
def wrapped(arrays, *args, **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, consider using a function instead of a decorator here.

In this case, the decorator would work OK (as long as the first argument is always called arrays) but decorators are more magical than simple function calls.

However, operations which convert the sparse array into a dense one will raise exceptions
For example, the following raises a :obj:`ValueError`.
However, operations which convert the sparse array into a dense one will usually change the fill
value instead of raising an error.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new sentence here is a little confusing to me. These operations how change the fill value instead of converting a sparse array to a dense array, so they don't "convert the sparse array into a dense one" at all now.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the sentence structure a bit.

@@ -265,7 +216,6 @@ All of the following will raise an :obj:`IndexError`, like in Numpy 1.13 and lat
z[3, 6]
z[1, 4, 8]
z[-6]
z[[True, True, False, True], 3, 4]

.. note:: Numpy advanced indexing is currently not supported.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

below here: maybe note that stack and concatenate require matching fill values, and that some operations (e.g., tensordot) require a fill value of zero?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@hameerabbasi
Copy link
Collaborator Author

@shoyer I think this is now ready to merge.
@mrocklin If you have time, if it's going to take too long, let me know.
@ahwillia If you're willing to review. Reviewing is great for learning the codebase, if you're interested.

Copy link
Member

@shoyer shoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just another minor comment. I agree that this looks good to merge!

sparse/utils.py Outdated
Traceback (most recent call last):
...
ValueError: This operation requires zero fill values.
"""
for arg in args:
if (hasattr(arg, 'fill_value') and
not equivalent(arg.fill_value, _zero_of_dtype(arg.dtype))):
raise ValueError('This operation requires zero fill values.')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a minor point, but it's nice to include offending values in all error messages, e.g., arg.fill_value in this case

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what best practice would be here. We wouldn't know what exact argument would produce this fill value, so showing it may be not be useful. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe:

'This operation requires zero fill values, but argument {} has fill value {}'.format(i, arg.fill_value)

where i comes from iterating over args with enumerate().

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

sparse/utils.py Outdated
@@ -276,4 +345,4 @@ def check_consistent_fill_value(arrays):
fv = arrays[0].fill_value

if not all(equivalent(fv, s.fill_value) for s in arrays):
raise ValueError('Consistent fill-values required.')
raise ValueError('This operation requires consistent fill-values.')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same concern as above about including fill values in the error message

@mrocklin
Copy link
Collaborator

@mrocklin If you have time, if it's going to take too long, let me know.

I appreciate getting pinged, but I probably won't be reviewing this one. That's also fine. I probably shouldn't be active on every pull request here. It looks like @shoyer seems pretty happy, which is a good sign. I trust his attention to detail :)

@hameerabbasi
Copy link
Collaborator Author

Well I believe there should be at least one reviewer. I'm thinking of ways to rope new contributors in... Reviewers or code-wise. 😄

@hameerabbasi hameerabbasi merged commit f767837 into pydata:master Jul 18, 2018
@hameerabbasi hameerabbasi deleted the fill-values branch July 18, 2018 16:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants