New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement roll (circular shift of elements) #160

Merged
merged 1 commit into from Jun 25, 2018

Conversation

Projects
None yet
3 participants
@ahwillia
Contributor

ahwillia commented Jun 24, 2018

This is my attempt to implement the equivalent of numpy.roll which I think would be really useful to have in this package.

I think this is nearly done, but it could be improved. In particular, the function is supposed to produce a copy of the array but I wasn't sure the preferred way of doing that so I called the COO constructor directly. Also, my function assumes a COO array as input, which is maybe not so flexible as other sparse array types are developed?

Thanks again for providing this awesome package. Looking forward to your comments!

@codecov

This comment has been minimized.

codecov bot commented Jun 24, 2018

Codecov Report

Merging #160 into master will increase coverage by 0.05%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #160      +/-   ##
==========================================
+ Coverage    96.9%   96.96%   +0.05%     
==========================================
  Files          11       11              
  Lines        1197     1218      +21     
==========================================
+ Hits         1160     1181      +21     
  Misses         37       37
Impacted Files Coverage Δ
sparse/coo/__init__.py 100% <ø> (ø) ⬆️
sparse/coo/common.py 97.26% <100%> (+0.3%) ⬆️
sparse/coo/umath.py 96.82% <0%> (-0.02%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update eeae480...afd3e80. Read the comment docs.

@hameerabbasi

Things are excellent here, mostly! Just a few comments.

# roll across specified axis
else:
axis = normalize_axis_tuple(axis, a.ndim, allow_duplicate=True)
broadcasted = np.core.multiarray.broadcast(shift, axis)

This comment has been minimized.

@hameerabbasi

hameerabbasi Jun 24, 2018

Collaborator

Is this just broadcasting the shape? If so, we should use our own utility function. See

sparse/sparse/coo/umath.py

Lines 358 to 388 in b51d749

def _get_broadcast_shape(shape1, shape2, is_result=False):
"""
Get the overall broadcasted shape.
Parameters
----------
shape1, shape2 : tuple[int]
The input shapes to broadcast together.
is_result : bool
Whether or not shape2 is also the result shape.
Returns
-------
result_shape : tuple[int]
The overall shape of the result.
Raises
------
ValueError
If the two shapes cannot be broadcast together.
"""
# https://stackoverflow.com/a/47244284/774273
if not all((l1 == l2) or (l1 == 1) or ((l2 == 1) and not is_result) for l1, l2 in
zip(shape1[::-1], shape2[::-1])):
raise ValueError('operands could not be broadcast together with shapes %s, %s' %
(shape1, shape2))
result_shape = tuple(max(l1, l2) for l1, l2 in
zip_longest(shape1[::-1], shape2[::-1], fillvalue=1))[::-1]
return result_shape

In general, I don't think it's good to rely on an undocumented NumPy function, which could change and break things at any time.

This comment has been minimized.

@ahwillia

ahwillia Jun 24, 2018

Contributor

Hmm what about normalize_axis_tuple is there a utility function for that as well?

This comment has been minimized.

@ahwillia

ahwillia Jun 25, 2018

Contributor

Nevermind I found normalize_axis in utils.

This comment has been minimized.

@hameerabbasi

hameerabbasi Jun 25, 2018

Collaborator

You'll need to add allow_repeat into that one, I believe.

x = np.array([0, 1, 0, 2, 0, 3, 0, 0, 4, 0, 0, 0, 5, 0, 6])
xs = sparse.as_coo(x)
for sh in (0, 2, -2, 20, -20):
assert_eq(np.roll(x, sh), sparse.roll(xs, sh).todense())

This comment has been minimized.

@hameerabbasi

hameerabbasi Jun 24, 2018

Collaborator

The .todense() can be removed.

This comment has been minimized.

@ahwillia

ahwillia Jun 24, 2018

Contributor

Interesting.... This causes the test to fail at assert is_canonical(...) what is this checking for? Do I need to resort the coordinates or something inside sparse.roll(...) so that I maintain lexographic order of the indices?

This comment has been minimized.

@hameerabbasi

hameerabbasi Jun 25, 2018

Collaborator

COO arrays are meant to be immutable. This means you shouldn't construct and then change their coords/data.

What you should do instead is, make coords/data and pass that to the constructor.

is_canonical checks if the coords have any duplicates, and are sorted in lexographic order. This is needed by many operations internally. In your case, they're not sorted, because you're changing them directly in the object, instead of passing them to the constructor. I suggest you use COO(..., has_duplicates=False) (this will sort for you).

for ax in (None, 0, 1):
for sh in (0, 2, -2, 10, -10):
assert_eq(np.roll(x2, sh, axis=ax),
sparse.roll(x2s, sh, axis=ax).todense())

This comment has been minimized.

@hameerabbasi

hameerabbasi Jun 24, 2018

Collaborator

The .todense() can be removed.

shifts = [(0, 1), (1, 0), (-1, 1), (1, -1)]
for sh in shifts:
assert_eq(np.roll(x2, sh, axis=(0, 1)),
sparse.roll(x2s, sh, axis=(0, 1)).todense())

This comment has been minimized.

@hameerabbasi

hameerabbasi Jun 24, 2018

Collaborator

The .todense() can be removed.

for ax in ((0, 0, 0), (1, 1, 1), (0, 1, 1), (0, 0, 1)):
for sh in [(0, 1, 0), (0, 1, 1), (-1, 1, -1), (1, -1, 1)]:
assert_eq(np.roll(x2, sh, axis=(0, 1, 1)),
sparse.roll(x2s, sh, axis=(0, 1, 1)).todense())

This comment has been minimized.

@hameerabbasi

hameerabbasi Jun 24, 2018

Collaborator

The .todense() can be removed.

# empty
x = np.array([])
assert_eq(np.roll(x, 1), sparse.roll(sparse.as_coo(x), 1).todense())

This comment has been minimized.

@hameerabbasi

hameerabbasi Jun 24, 2018

Collaborator

The .todense() can be removed.

@@ -1527,3 +1527,35 @@ def test_invalid_iterable_error():
with pytest.raises(ValueError):
x = [((2.3, 4.5), 3.2)]
COO.from_iter(x)
def test_roll_coo():

This comment has been minimized.

@hameerabbasi

hameerabbasi Jun 24, 2018

Collaborator

Optional: Maybe follow the pattern of using sparse.random to generate something of a given shape and then roll it using both NumPy and sparse implementations.

This comment has been minimized.

@hameerabbasi

hameerabbasi Jun 24, 2018

Collaborator

Mandatory: Parameterize your tests. There are a lot of examples in the tests already, and docs here.

broadcasted = np.core.multiarray.broadcast(shift, axis)
if broadcasted.ndim > 1:
raise ValueError(

This comment has been minimized.

@hameerabbasi

hameerabbasi Jun 24, 2018

Collaborator

Needs a test for this branch. See this SO question.

for sh, ax in broadcasted:
result.coords[ax] += sh
result.coords[ax] %= result.shape[ax]

This comment has been minimized.

@mrocklin

mrocklin Jun 24, 2018

Collaborator

Can I ask for a test that verifies that the original remains unchanged after the roll operation?

I suspect that result.coords is a.coords here.

This comment has been minimized.

@ahwillia

ahwillia Jun 24, 2018

Contributor

Yes this is something I wasn't 100% sure on. What is the best/preferred way to deepcopy a COO array at the moment?

This comment has been minimized.

@hameerabbasi

hameerabbasi Jun 25, 2018

Collaborator

There isn't a built-in method, because we kind of assume immutability, and if things are immutable, you don't need to worry about deep copies.

However, you can just do coords = np.array(a.coords) and similarly for data and then proceed. You're also modifying the object after construction (which is a big no-no), so this method is best anyway.

@hameerabbasi

This comment has been minimized.

Collaborator

hameerabbasi commented Jun 25, 2018

Also needs an addition in docs/generated/sparse.rst otherwise it won't show up in docs. Please note that the file is alphabetic.

@ahwillia

This comment has been minimized.

Contributor

ahwillia commented Jun 25, 2018

I think all comments are addressed. Thanks so much for the feedback.

A few remaining questions:

  • How should this function handle a non-COO input? Should I add a call to as_coo(...) at the top?
  • Should I do more work to check inputs (e.g. if user provides a 2D sequence for axis or shift, should I throw a more descriptive error message?)
@hameerabbasi

How should this function handle a non-COO input? Should I add a call to as_coo(...) at the top?

Yes, call as_coo at the top.

Should I do more work to check inputs (e.g. if user provides a 2D sequence for axis or shift, should I throw a more descriptive error message?)

Ideally, NumPy error messages should be mirrored. But as long as this fails, it's alright.

return roll(a.reshape((-1,)), shift, 0).reshape(a.shape)
# rolling does nothing for empty array
elif a.nnz == 0:

This comment has been minimized.

@hameerabbasi

hameerabbasi Jun 25, 2018

Collaborator

Optional: Is this branch really required? I'd do the same for all special cases that aren't required. Will the above code already handle it?

This comment has been minimized.

@ahwillia

ahwillia Jun 25, 2018

Contributor

Ah yep let me get rid of that branch quickly

@ahwillia ahwillia force-pushed the ahwillia:master branch 2 times, most recently from ff25aaf to c1e8a3e Jun 25, 2018

@ahwillia

This comment has been minimized.

Contributor

ahwillia commented Jun 25, 2018

Squashed everything to a single commit. Ready for final review.

@hameerabbasi

This comment has been minimized.

Collaborator

hameerabbasi commented Jun 25, 2018

I need to request changes, but the review mechanism on GitHub seems broken right now. I'll come back to it soon.

But in essence: Replace np.asarray(...).ndim with np.ndim(...), np.iterable(...) with isinstance(..., collections.Iterable), and add a test for the newest ValueError branch (perhaps by parametrizing the test you already have).

@hameerabbasi

Final comments, hopefully. Thanks for your patience with me. Only one is mandatory.

"If 'shift' is a 1D sequence, "
"'axis' must have equal length.")
if np.asarray(axis).ndim > 1 or np.asarray(shift).ndim > 1:

This comment has been minimized.

@hameerabbasi

hameerabbasi Jun 25, 2018

Collaborator

Use np.ndim(...) instead of np.asarray(...).ndim

This comment has been minimized.

@ahwillia

ahwillia Jun 25, 2018

Contributor

nice. thanks 👍

"'axis' must have equal length.")
if np.asarray(axis).ndim > 1 or np.asarray(shift).ndim > 1:
raise ValueError(

This comment has been minimized.

@hameerabbasi

hameerabbasi Jun 25, 2018

Collaborator

Mandatory: Add test for this ValueError, perhaps as a parametrized test in test_valerr.

This comment has been minimized.

@ahwillia

ahwillia Jun 25, 2018

Contributor

I think that should be in there already at line 1585.

This comment has been minimized.

@hameerabbasi

hameerabbasi Jun 25, 2018

Collaborator

I see. There are a few things wrong with that. It's not parameterized. And when the top line raises an error, that one is detected and the last line is ignored.

This comment has been minimized.

@hameerabbasi

hameerabbasi Jun 25, 2018

Collaborator

You can see here that it isn't covered here.

This comment has been minimized.

@ahwillia

ahwillia Jun 25, 2018

Contributor

Got it. Thanks for the patient explanation. Should be fixed now.

shift = np.full(len(axis), shift)
# ensure axis is iterable
elif not np.iterable(axis):

This comment has been minimized.

@hameerabbasi

hameerabbasi Jun 25, 2018

Collaborator

Replace np.iterable(axis) with isinstance(axis, tuple) (normalize_axis does that for you).

This comment has been minimized.

@ahwillia

ahwillia Jun 25, 2018

Contributor

done.

shift = (shift,)
# handle broadcasting
if np.iterable(axis) and len(shift) == 1:

This comment has been minimized.

@hameerabbasi

hameerabbasi Jun 25, 2018

Collaborator

Replace np.iterable(axis) with isinstance(axis, tuple) (normalize_axis does that for you).

This comment has been minimized.

@ahwillia

ahwillia Jun 25, 2018

Contributor

done.

axis = normalize_axis(axis, a.ndim)
# make shift iterable
if not np.iterable(shift):

This comment has been minimized.

@hameerabbasi

hameerabbasi Jun 25, 2018

Collaborator

Replace np.iterable(shift) with isinstance(shift, collections.Iterable)

This comment has been minimized.

@ahwillia

ahwillia Jun 25, 2018

Contributor

done.

@@ -637,6 +637,7 @@ def roll(a, shift, axis=None):
Output array, with the same shape as a.
"""
from .core import COO, as_coo
import collections

This comment has been minimized.

@hameerabbasi

hameerabbasi Jun 25, 2018

Collaborator

Can you do this at the top of the file?

This comment has been minimized.

@ahwillia

ahwillia Jun 25, 2018

Contributor

yep. done in latest commit.

This comment has been minimized.

@hameerabbasi

hameerabbasi Jun 25, 2018

Collaborator

And remove it from here?

This comment has been minimized.

@ahwillia

ahwillia Jun 25, 2018

Contributor

I think there is a commit that's not syncing...

This comment has been minimized.

@ahwillia

This comment has been minimized.

@hameerabbasi

hameerabbasi Jun 25, 2018

Collaborator

Ah, my mistake. Did you push it to the repo, are you sure it isn't local?

This comment has been minimized.

@ahwillia

This comment has been minimized.

@hameerabbasi

hameerabbasi Jun 25, 2018

Collaborator

I get the same. I'm assuming GitHub is migrating stuff, and since it uses microservices, one thing is breaking at a time. Guess we'll just have to wait until things catch up and the tests run.

This comment has been minimized.

@hameerabbasi

hameerabbasi Jun 25, 2018

Collaborator

Could you possibly make a minor change and re-push to the repo? Possibly squash all the commits?

@ahwillia ahwillia force-pushed the ahwillia:master branch from a6339db to afd3e80 Jun 25, 2018

@ahwillia

This comment has been minimized.

Contributor

ahwillia commented Jun 25, 2018

Squashed everything and forced update. Looks like the tests ran now.

@hameerabbasi hameerabbasi merged commit c23441b into pydata:master Jun 25, 2018

4 checks passed

ci/circleci: build_27 Your tests passed on CircleCI!
Details
ci/circleci: build_36 Your tests passed on CircleCI!
Details
codecov/patch 100% of diff hit (target 96.9%)
Details
codecov/project 96.96% (+0.05%) compared to eeae480
Details
@hameerabbasi

This comment has been minimized.

Collaborator

hameerabbasi commented Jun 25, 2018

Thanks for the patience and the addition, @ahwillia! This is in!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment