New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add COO.mean and sparse.nanmean #190
Conversation
Add mean reductions, matching the numpy interface.
- dtype=float16 tests fail for larger arrays. What numpy is doing to get their result isn't clear. For now we xfail. - Use larger arrays for the tests, as some numeric issues only show up on larger arrays (before there was a decent chance that only one element in an axis was nonzero). - Fix a few tests that weren't testing anything - Cleanup superfluous warnings
cc @hameerabbasi for review. |
Codecov Report
@@ Coverage Diff @@
## master #190 +/- ##
==========================================
- Coverage 97.52% 97.52% -0.01%
==========================================
Files 11 11
Lines 1376 1416 +40
==========================================
+ Hits 1342 1381 +39
- Misses 34 35 +1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clean-ups! I appreciate it.
Here is a little feedback.
sparse/coo/common.py
Outdated
axis = (axis,) | ||
den = 1 | ||
for ax in axis: | ||
den *= x.shape[ax] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can probably be written more compactly with functools.reduce
and a generator.
sparse/coo/core.py
Outdated
elif not isinstance(axis, tuple): | ||
axis = (axis,) | ||
den = 1 | ||
for ax in axis: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can probably be written more compactly with functools.reduce
and a generator.
@pytest.mark.filterwarnings('ignore:overflow') | ||
@pytest.mark.parametrize('reduction, kwargs', [ | ||
('sum', {'dtype': np.float16}), | ||
('mean', {'dtype': np.float16}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional: The reason we had eqkwargs
for float16
was large floating-point inaccuracies with float16
. We're basically using np.allclose
to verify equivalence, and the kwargs
in assert_eq
are passed along to that.
It might be nice to rewrite this at some point, but that's how it is now. There is a plan to re-write this, basically something that takes two callables and some shapes and compares the NumPy and sparse functions (see #163).
Anyway, back to the point: If the rtol
and atol
parameters are increased, I'm pretty sure this will pass as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway, back to the point: If the rtol and atol parameters are increased, I'm pretty sure this will pass as well.
This may be true, but the errors have to be pretty large. With atol=10, rtol=10
, you still get errors for sum
. This is an algorithmic difference, not just floating point error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With
atol=10, rtol=10
, you still get errors forsum
.
In that case, feel free to leave this as-is for now (and if you want, raise an issue documenting this).
numpy.nanmean : Equivalent Numpy function. | ||
""" | ||
assert out is None | ||
x = asCOO(x, name='nanmean') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised this works -- asCOO
has no name
parameter. I must have forgotten to check for superfluous kwargs
somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it does, asCOO
is different from as_coo
in core.py. See the top of this file.
x = asCOO(x, name='nanmean') | ||
|
||
if not np.issubdtype(x.dtype, np.floating): | ||
return x.mean(axis=axis, keepdims=keepdims, dtype=dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a test for this branch, apparently.
@@ -85,6 +125,7 @@ def test_ufunc_reductions_kwargs(reduction, kwargs): | |||
@pytest.mark.parametrize('keepdims', [False]) | |||
@pytest.mark.parametrize('fraction', [0.25, 0.5, 0.75, 1.0]) | |||
@pytest.mark.filterwarnings('ignore:All-NaN') | |||
@pytest.mark.filterwarnings('ignore:Mean of empty slice') | |||
def test_nan_reductions(reduction, axis, keepdims, fraction): | |||
s = sparse.random((2, 3, 4), data_rvs=random_value_array(np.nan, fraction), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems you forgot to change to the fixture here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so? nanmean
throws that warning instead of All-NaN
, we need both filters to cover all the warnings from this test. Unless I'm misunderstanding your comment here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, my bad. The fixture doesn't do NaNs. Ignore. It would be nice to and a non-floating dtype
here to hit the line we missed -- But not is okay too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant that the fixture you created wasn't used in the NaN-reductions, but like I said, you can safely ignore me.
Thanks @hameerabbasi! |
Adds
COO.mean
andsparse.nanmean
, matching the numpy apiAlso applies several cleanups to the reductions test suite:
dtype=float16
. These fail for some, but not all configurations. It's not clear how numpy interprets this for various reductions - neither casting prior to summing or casting after the sum give the same result as numpy depending on configuration. SinceCOO.sum
was already failing on larger arrays, I xfailed these tests for now.