Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement roll (circular shift of elements) #160

Merged
merged 1 commit into from Jun 25, 2018

Conversation

ahwillia
Copy link
Contributor

This is my attempt to implement the equivalent of numpy.roll which I think would be really useful to have in this package.

I think this is nearly done, but it could be improved. In particular, the function is supposed to produce a copy of the array but I wasn't sure the preferred way of doing that so I called the COO constructor directly. Also, my function assumes a COO array as input, which is maybe not so flexible as other sparse array types are developed?

Thanks again for providing this awesome package. Looking forward to your comments!

@codecov
Copy link

codecov bot commented Jun 24, 2018

Codecov Report

Merging #160 into master will increase coverage by 0.05%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #160      +/-   ##
==========================================
+ Coverage    96.9%   96.96%   +0.05%     
==========================================
  Files          11       11              
  Lines        1197     1218      +21     
==========================================
+ Hits         1160     1181      +21     
  Misses         37       37
Impacted Files Coverage Δ
sparse/coo/__init__.py 100% <ø> (ø) ⬆️
sparse/coo/common.py 97.26% <100%> (+0.3%) ⬆️
sparse/coo/umath.py 96.82% <0%> (-0.02%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update eeae480...afd3e80. Read the comment docs.

Copy link
Collaborator

@hameerabbasi hameerabbasi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Things are excellent here, mostly! Just a few comments.

# roll across specified axis
else:
axis = normalize_axis_tuple(axis, a.ndim, allow_duplicate=True)
broadcasted = np.core.multiarray.broadcast(shift, axis)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just broadcasting the shape? If so, we should use our own utility function. See

sparse/sparse/coo/umath.py

Lines 358 to 388 in b51d749

def _get_broadcast_shape(shape1, shape2, is_result=False):
"""
Get the overall broadcasted shape.
Parameters
----------
shape1, shape2 : tuple[int]
The input shapes to broadcast together.
is_result : bool
Whether or not shape2 is also the result shape.
Returns
-------
result_shape : tuple[int]
The overall shape of the result.
Raises
------
ValueError
If the two shapes cannot be broadcast together.
"""
# https://stackoverflow.com/a/47244284/774273
if not all((l1 == l2) or (l1 == 1) or ((l2 == 1) and not is_result) for l1, l2 in
zip(shape1[::-1], shape2[::-1])):
raise ValueError('operands could not be broadcast together with shapes %s, %s' %
(shape1, shape2))
result_shape = tuple(max(l1, l2) for l1, l2 in
zip_longest(shape1[::-1], shape2[::-1], fillvalue=1))[::-1]
return result_shape

In general, I don't think it's good to rely on an undocumented NumPy function, which could change and break things at any time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm what about normalize_axis_tuple is there a utility function for that as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind I found normalize_axis in utils.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll need to add allow_repeat into that one, I believe.

x = np.array([0, 1, 0, 2, 0, 3, 0, 0, 4, 0, 0, 0, 5, 0, 6])
xs = sparse.as_coo(x)
for sh in (0, 2, -2, 20, -20):
assert_eq(np.roll(x, sh), sparse.roll(xs, sh).todense())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .todense() can be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting.... This causes the test to fail at assert is_canonical(...) what is this checking for? Do I need to resort the coordinates or something inside sparse.roll(...) so that I maintain lexographic order of the indices?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

COO arrays are meant to be immutable. This means you shouldn't construct and then change their coords/data.

What you should do instead is, make coords/data and pass that to the constructor.

is_canonical checks if the coords have any duplicates, and are sorted in lexographic order. This is needed by many operations internally. In your case, they're not sorted, because you're changing them directly in the object, instead of passing them to the constructor. I suggest you use COO(..., has_duplicates=False) (this will sort for you).

for ax in (None, 0, 1):
for sh in (0, 2, -2, 10, -10):
assert_eq(np.roll(x2, sh, axis=ax),
sparse.roll(x2s, sh, axis=ax).todense())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .todense() can be removed.

shifts = [(0, 1), (1, 0), (-1, 1), (1, -1)]
for sh in shifts:
assert_eq(np.roll(x2, sh, axis=(0, 1)),
sparse.roll(x2s, sh, axis=(0, 1)).todense())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .todense() can be removed.

for ax in ((0, 0, 0), (1, 1, 1), (0, 1, 1), (0, 0, 1)):
for sh in [(0, 1, 0), (0, 1, 1), (-1, 1, -1), (1, -1, 1)]:
assert_eq(np.roll(x2, sh, axis=(0, 1, 1)),
sparse.roll(x2s, sh, axis=(0, 1, 1)).todense())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .todense() can be removed.


# empty
x = np.array([])
assert_eq(np.roll(x, 1), sparse.roll(sparse.as_coo(x), 1).todense())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .todense() can be removed.

@@ -1527,3 +1527,35 @@ def test_invalid_iterable_error():
with pytest.raises(ValueError):
x = [((2.3, 4.5), 3.2)]
COO.from_iter(x)


def test_roll_coo():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional: Maybe follow the pattern of using sparse.random to generate something of a given shape and then roll it using both NumPy and sparse implementations.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mandatory: Parameterize your tests. There are a lot of examples in the tests already, and docs here.

broadcasted = np.core.multiarray.broadcast(shift, axis)

if broadcasted.ndim > 1:
raise ValueError(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a test for this branch. See this SO question.


for sh, ax in broadcasted:
result.coords[ax] += sh
result.coords[ax] %= result.shape[ax]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I ask for a test that verifies that the original remains unchanged after the roll operation?

I suspect that result.coords is a.coords here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is something I wasn't 100% sure on. What is the best/preferred way to deepcopy a COO array at the moment?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There isn't a built-in method, because we kind of assume immutability, and if things are immutable, you don't need to worry about deep copies.

However, you can just do coords = np.array(a.coords) and similarly for data and then proceed. You're also modifying the object after construction (which is a big no-no), so this method is best anyway.

@hameerabbasi
Copy link
Collaborator

hameerabbasi commented Jun 25, 2018

Also needs an addition in docs/generated/sparse.rst otherwise it won't show up in docs. Please note that the file is alphabetic.

@ahwillia
Copy link
Contributor Author

I think all comments are addressed. Thanks so much for the feedback.

A few remaining questions:

  • How should this function handle a non-COO input? Should I add a call to as_coo(...) at the top?
  • Should I do more work to check inputs (e.g. if user provides a 2D sequence for axis or shift, should I throw a more descriptive error message?)

Copy link
Collaborator

@hameerabbasi hameerabbasi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How should this function handle a non-COO input? Should I add a call to as_coo(...) at the top?

Yes, call as_coo at the top.

Should I do more work to check inputs (e.g. if user provides a 2D sequence for axis or shift, should I throw a more descriptive error message?)

Ideally, NumPy error messages should be mirrored. But as long as this fails, it's alright.

return roll(a.reshape((-1,)), shift, 0).reshape(a.shape)

# rolling does nothing for empty array
elif a.nnz == 0:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional: Is this branch really required? I'd do the same for all special cases that aren't required. Will the above code already handle it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yep let me get rid of that branch quickly

@ahwillia ahwillia force-pushed the master branch 2 times, most recently from ff25aaf to c1e8a3e Compare June 25, 2018 17:09
@ahwillia
Copy link
Contributor Author

Squashed everything to a single commit. Ready for final review.

@hameerabbasi
Copy link
Collaborator

I need to request changes, but the review mechanism on GitHub seems broken right now. I'll come back to it soon.

But in essence: Replace np.asarray(...).ndim with np.ndim(...), np.iterable(...) with isinstance(..., collections.Iterable), and add a test for the newest ValueError branch (perhaps by parametrizing the test you already have).

Copy link
Collaborator

@hameerabbasi hameerabbasi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Final comments, hopefully. Thanks for your patience with me. Only one is mandatory.

"If 'shift' is a 1D sequence, "
"'axis' must have equal length.")

if np.asarray(axis).ndim > 1 or np.asarray(shift).ndim > 1:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use np.ndim(...) instead of np.asarray(...).ndim

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice. thanks 👍

"'axis' must have equal length.")

if np.asarray(axis).ndim > 1 or np.asarray(shift).ndim > 1:
raise ValueError(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mandatory: Add test for this ValueError, perhaps as a parametrized test in test_valerr.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that should be in there already at line 1585.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. There are a few things wrong with that. It's not parameterized. And when the top line raises an error, that one is detected and the last line is ignored.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can see here that it isn't covered here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Thanks for the patient explanation. Should be fixed now.

shift = np.full(len(axis), shift)

# ensure axis is iterable
elif not np.iterable(axis):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace np.iterable(axis) with isinstance(axis, tuple) (normalize_axis does that for you).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

shift = (shift,)

# handle broadcasting
if np.iterable(axis) and len(shift) == 1:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace np.iterable(axis) with isinstance(axis, tuple) (normalize_axis does that for you).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

axis = normalize_axis(axis, a.ndim)

# make shift iterable
if not np.iterable(shift):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace np.iterable(shift) with isinstance(shift, collections.Iterable)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@@ -637,6 +637,7 @@ def roll(a, shift, axis=None):
Output array, with the same shape as a.
"""
from .core import COO, as_coo
import collections
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you do this at the top of the file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep. done in latest commit.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And remove it from here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a commit that's not syncing...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, my mistake. Did you push it to the repo, are you sure it isn't local?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get the same. I'm assuming GitHub is migrating stuff, and since it uses microservices, one thing is breaking at a time. Guess we'll just have to wait until things catch up and the tests run.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you possibly make a minor change and re-push to the repo? Possibly squash all the commits?

@ahwillia
Copy link
Contributor Author

Squashed everything and forced update. Looks like the tests ran now.

@hameerabbasi hameerabbasi merged commit c23441b into pydata:master Jun 25, 2018
@hameerabbasi
Copy link
Collaborator

Thanks for the patience and the addition, @ahwillia! This is in!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants