New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add COO.mean and sparse.nanmean #190

Merged
merged 3 commits into from Sep 19, 2018

Conversation

Projects
None yet
2 participants
@jcrist
Collaborator

jcrist commented Sep 19, 2018

Adds COO.mean and sparse.nanmean, matching the numpy api

Also applies several cleanups to the reductions test suite:

  • Test with larger arrays on numeric reductions. Some issues with current functions only showed up on larger arrays (before there was a decent chance only one element along an axis was nonzero).
  • Test with more input dtypes
  • XFail tests setting dtype=float16. These fail for some, but not all configurations. It's not clear how numpy interprets this for various reductions - neither casting prior to summing or casting after the sum give the same result as numpy depending on configuration. Since COO.sum was already failing on larger arrays, I xfailed these tests for now.
  • Cleanup superfluous warnings, reducing pytest output
  • A few assorted cleanups.

jcrist added some commits Sep 19, 2018

Add COO.mean and sparse.nanmean
Add mean reductions, matching the numpy interface.
Cleanup tests
- dtype=float16 tests fail for larger arrays. What numpy is doing to get
their result isn't clear. For now we xfail.
- Use larger arrays for the tests, as some numeric issues only show up
on larger arrays (before there was a decent chance that only one element
in an axis was nonzero).
- Fix a few tests that weren't testing anything
- Cleanup superfluous warnings
@jcrist

This comment has been minimized.

Collaborator

jcrist commented Sep 19, 2018

cc @hameerabbasi for review.

@codecov

This comment has been minimized.

codecov bot commented Sep 19, 2018

Codecov Report

Merging #190 into master will decrease coverage by <.01%.
The diff coverage is 97.5%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #190      +/-   ##
==========================================
- Coverage   97.52%   97.52%   -0.01%     
==========================================
  Files          11       11              
  Lines        1376     1416      +40     
==========================================
+ Hits         1342     1381      +39     
- Misses         34       35       +1
Impacted Files Coverage Δ
sparse/coo/__init__.py 100% <ø> (ø) ⬆️
sparse/coo/core.py 97.08% <100%> (+0.15%) ⬆️
sparse/coo/common.py 97.51% <95.23%> (-0.19%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7abe667...18c372d. Read the comment docs.

@hameerabbasi

Thanks for the clean-ups! I appreciate it.

Here is a little feedback.

axis = (axis,)
den = 1
for ax in axis:
den *= x.shape[ax]

This comment has been minimized.

@hameerabbasi

hameerabbasi Sep 19, 2018

Collaborator

This can probably be written more compactly with functools.reduce and a generator.

elif not isinstance(axis, tuple):
axis = (axis,)
den = 1
for ax in axis:

This comment has been minimized.

@hameerabbasi

hameerabbasi Sep 19, 2018

Collaborator

This can probably be written more compactly with functools.reduce and a generator.

@pytest.mark.filterwarnings('ignore:overflow')
@pytest.mark.parametrize('reduction, kwargs', [
('sum', {'dtype': np.float16}),
('mean', {'dtype': np.float16}),

This comment has been minimized.

@hameerabbasi

hameerabbasi Sep 19, 2018

Collaborator

Optional: The reason we had eqkwargs for float16 was large floating-point inaccuracies with float16. We're basically using np.allclose to verify equivalence, and the kwargs in assert_eq are passed along to that.

It might be nice to rewrite this at some point, but that's how it is now. There is a plan to re-write this, basically something that takes two callables and some shapes and compares the NumPy and sparse functions (see #163).

Anyway, back to the point: If the rtol and atol parameters are increased, I'm pretty sure this will pass as well.

This comment has been minimized.

@jcrist

jcrist Sep 19, 2018

Collaborator

Anyway, back to the point: If the rtol and atol parameters are increased, I'm pretty sure this will pass as well.

This may be true, but the errors have to be pretty large. With atol=10, rtol=10, you still get errors for sum. This is an algorithmic difference, not just floating point error.

This comment has been minimized.

@hameerabbasi

hameerabbasi Sep 19, 2018

Collaborator

With atol=10, rtol=10, you still get errors for sum.

In that case, feel free to leave this as-is for now (and if you want, raise an issue documenting this).

numpy.nanmean : Equivalent Numpy function.
"""
assert out is None
x = asCOO(x, name='nanmean')

This comment has been minimized.

@hameerabbasi

hameerabbasi Sep 19, 2018

Collaborator

I'm surprised this works -- asCOO has no name parameter. I must have forgotten to check for superfluous kwargs somewhere.

This comment has been minimized.

@jcrist

jcrist Sep 19, 2018

Collaborator

Yes it does, asCOO is different from as_coo in core.py. See the top of this file.

x = asCOO(x, name='nanmean')
if not np.issubdtype(x.dtype, np.floating):
return x.mean(axis=axis, keepdims=keepdims, dtype=dtype)

This comment has been minimized.

@hameerabbasi

hameerabbasi Sep 19, 2018

Collaborator

We need a test for this branch, apparently.

@@ -85,6 +125,7 @@ def test_ufunc_reductions_kwargs(reduction, kwargs):
@pytest.mark.parametrize('keepdims', [False])
@pytest.mark.parametrize('fraction', [0.25, 0.5, 0.75, 1.0])
@pytest.mark.filterwarnings('ignore:All-NaN')
@pytest.mark.filterwarnings('ignore:Mean of empty slice')
def test_nan_reductions(reduction, axis, keepdims, fraction):
s = sparse.random((2, 3, 4), data_rvs=random_value_array(np.nan, fraction),

This comment has been minimized.

@hameerabbasi

hameerabbasi Sep 19, 2018

Collaborator

It seems you forgot to change to the fixture here.

This comment has been minimized.

@jcrist

jcrist Sep 19, 2018

Collaborator

I don't think so? nanmean throws that warning instead of All-NaN, we need both filters to cover all the warnings from this test. Unless I'm misunderstanding your comment here.

This comment has been minimized.

@hameerabbasi

hameerabbasi Sep 19, 2018

Collaborator

Ah, my bad. The fixture doesn't do NaNs. Ignore. It would be nice to and a non-floating dtype here to hit the line we missed -- But not is okay too.

This comment has been minimized.

@hameerabbasi

hameerabbasi Sep 19, 2018

Collaborator

I meant that the fixture you created wasn't used in the NaN-reductions, but like I said, you can safely ignore me.

@hameerabbasi hameerabbasi merged commit f21b027 into pydata:master Sep 19, 2018

3 of 5 checks passed

codecov/patch 97.5% of diff hit (target 97.52%)
Details
codecov/project 97.52% (-0.01%) compared to 7abe667
Details
LGTM analysis: Python No alert changes
Details
ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@jcrist jcrist deleted the jcrist:mean branch Sep 19, 2018

@jcrist

This comment has been minimized.

Collaborator

jcrist commented Sep 19, 2018

Thanks @hameerabbasi!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment