Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add COO.mean and sparse.nanmean #190

Merged
merged 3 commits into from Sep 19, 2018
Merged

Add COO.mean and sparse.nanmean #190

merged 3 commits into from Sep 19, 2018

Conversation

jcrist
Copy link
Collaborator

@jcrist jcrist commented Sep 19, 2018

Adds COO.mean and sparse.nanmean, matching the numpy api

Also applies several cleanups to the reductions test suite:

  • Test with larger arrays on numeric reductions. Some issues with current functions only showed up on larger arrays (before there was a decent chance only one element along an axis was nonzero).
  • Test with more input dtypes
  • XFail tests setting dtype=float16. These fail for some, but not all configurations. It's not clear how numpy interprets this for various reductions - neither casting prior to summing or casting after the sum give the same result as numpy depending on configuration. Since COO.sum was already failing on larger arrays, I xfailed these tests for now.
  • Cleanup superfluous warnings, reducing pytest output
  • A few assorted cleanups.

Add mean reductions, matching the numpy interface.
- dtype=float16 tests fail for larger arrays. What numpy is doing to get
their result isn't clear. For now we xfail.
- Use larger arrays for the tests, as some numeric issues only show up
on larger arrays (before there was a decent chance that only one element
in an axis was nonzero).
- Fix a few tests that weren't testing anything
- Cleanup superfluous warnings
@jcrist
Copy link
Collaborator Author

jcrist commented Sep 19, 2018

cc @hameerabbasi for review.

@codecov
Copy link

codecov bot commented Sep 19, 2018

Codecov Report

Merging #190 into master will decrease coverage by <.01%.
The diff coverage is 97.5%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #190      +/-   ##
==========================================
- Coverage   97.52%   97.52%   -0.01%     
==========================================
  Files          11       11              
  Lines        1376     1416      +40     
==========================================
+ Hits         1342     1381      +39     
- Misses         34       35       +1
Impacted Files Coverage Δ
sparse/coo/__init__.py 100% <ø> (ø) ⬆️
sparse/coo/core.py 97.08% <100%> (+0.15%) ⬆️
sparse/coo/common.py 97.51% <95.23%> (-0.19%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7abe667...18c372d. Read the comment docs.

Copy link
Collaborator

@hameerabbasi hameerabbasi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clean-ups! I appreciate it.

Here is a little feedback.

axis = (axis,)
den = 1
for ax in axis:
den *= x.shape[ax]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can probably be written more compactly with functools.reduce and a generator.

elif not isinstance(axis, tuple):
axis = (axis,)
den = 1
for ax in axis:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can probably be written more compactly with functools.reduce and a generator.

@pytest.mark.filterwarnings('ignore:overflow')
@pytest.mark.parametrize('reduction, kwargs', [
('sum', {'dtype': np.float16}),
('mean', {'dtype': np.float16}),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional: The reason we had eqkwargs for float16 was large floating-point inaccuracies with float16. We're basically using np.allclose to verify equivalence, and the kwargs in assert_eq are passed along to that.

It might be nice to rewrite this at some point, but that's how it is now. There is a plan to re-write this, basically something that takes two callables and some shapes and compares the NumPy and sparse functions (see #163).

Anyway, back to the point: If the rtol and atol parameters are increased, I'm pretty sure this will pass as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, back to the point: If the rtol and atol parameters are increased, I'm pretty sure this will pass as well.

This may be true, but the errors have to be pretty large. With atol=10, rtol=10, you still get errors for sum. This is an algorithmic difference, not just floating point error.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With atol=10, rtol=10, you still get errors for sum.

In that case, feel free to leave this as-is for now (and if you want, raise an issue documenting this).

numpy.nanmean : Equivalent Numpy function.
"""
assert out is None
x = asCOO(x, name='nanmean')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised this works -- asCOO has no name parameter. I must have forgotten to check for superfluous kwargs somewhere.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it does, asCOO is different from as_coo in core.py. See the top of this file.

x = asCOO(x, name='nanmean')

if not np.issubdtype(x.dtype, np.floating):
return x.mean(axis=axis, keepdims=keepdims, dtype=dtype)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a test for this branch, apparently.

@@ -85,6 +125,7 @@ def test_ufunc_reductions_kwargs(reduction, kwargs):
@pytest.mark.parametrize('keepdims', [False])
@pytest.mark.parametrize('fraction', [0.25, 0.5, 0.75, 1.0])
@pytest.mark.filterwarnings('ignore:All-NaN')
@pytest.mark.filterwarnings('ignore:Mean of empty slice')
def test_nan_reductions(reduction, axis, keepdims, fraction):
s = sparse.random((2, 3, 4), data_rvs=random_value_array(np.nan, fraction),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems you forgot to change to the fixture here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so? nanmean throws that warning instead of All-NaN, we need both filters to cover all the warnings from this test. Unless I'm misunderstanding your comment here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, my bad. The fixture doesn't do NaNs. Ignore. It would be nice to and a non-floating dtype here to hit the line we missed -- But not is okay too.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant that the fixture you created wasn't used in the NaN-reductions, but like I said, you can safely ignore me.

@jcrist
Copy link
Collaborator Author

jcrist commented Sep 19, 2018

Thanks @hameerabbasi!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants