Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: fix unexpected return of np.pad with mode=wrap #22575

Merged
merged 14 commits into from Dec 7, 2022
Merged

Conversation

LuYunChi
Copy link
Contributor

np.pad with mode="wrap" returns unexpected result that original data is not strictly looped in padding. This may happen in some occassions when padding widths in the same dimension are unbalanced (see added testcase in test_arraypad.py and the related issue). The reason is the function pad makes iterative calls of _set_wrap_both() in the above situation, yet period for padding is not correctly computed in each iteration.

The bug is fixed by guaranteeing that period is always a multiple of original data size, and also be the possible maximum for computation efficiency.

Closes #22464

@seberg seberg added the 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes label Nov 12, 2022
@seberg
Copy link
Member

seberg commented Nov 12, 2022

I guess we can probably go ahead with this, no failing tests seems like a good indicator. We should add a brief release note, though.
@WarrenWeckesser do you want to have a look? Also ping @lagru since you are my designated pad specialist :).

I do think we should have a high dimensional test though, @LuYunChi.

@LuYunChi
Copy link
Contributor Author

I can add a two dimensional test, or a even higher is needed?

Copy link
Contributor

@lagru lagru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've suggested a few improvements, however it looks correct from my side. Thank you @LuYunChi!

I'd also agree that this is probably the behavior most users would expect and probably what should be used in numpy.pad. However, taking the docstring into account

The first values are used to pad the end and the end values are used to pad the beginning.

I'm hesitant to call this a bug fix and just push it onto users without a (deprecation) warning. I would back @WarrenWeckesser's suggestion in #22464 (comment)

[...] erhaps we could consider adding a new mode, say modwrap, that implements the expected behavior. If there is concern about the proliferation of modes, the current 'wrap' mode could also be deprecated and removed in a future release.

numpy/lib/arraypad.py Show resolved Hide resolved
numpy/lib/arraypad.py Outdated Show resolved Hide resolved
numpy/lib/arraypad.py Outdated Show resolved Hide resolved
numpy/lib/arraypad.py Show resolved Hide resolved
numpy/lib/arraypad.py Outdated Show resolved Hide resolved
Comment on lines 1126 to 1138
a = np.arange(9, dtype=float).reshape(3, 3)
a = np.pad(a, [(2, 4), (1, 1)], mode='wrap')
b = np.array([
[5, 3, 4, 5, 3],
[8, 6, 7, 8, 6],
[2, 0, 1, 2, 0],
[5, 3, 4, 5, 3],
[8, 6, 7, 8, 6],
[2, 0, 1, 2, 0],
[5, 3, 4, 5, 3],
[8, 6, 7, 8, 6],
[2, 0, 1, 2, 0]])
assert_array_equal(a, b)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use a non-minimal two dimensional test case, we can test for the fault behavior in both dimension. Also might be good to add a comment explaining what this tests does.

Suggested change
a = np.arange(9, dtype=float).reshape(3, 3)
a = np.pad(a, [(2, 4), (1, 1)], mode='wrap')
b = np.array([
[5, 3, 4, 5, 3],
[8, 6, 7, 8, 6],
[2, 0, 1, 2, 0],
[5, 3, 4, 5, 3],
[8, 6, 7, 8, 6],
[2, 0, 1, 2, 0],
[5, 3, 4, 5, 3],
[8, 6, 7, 8, 6],
[2, 0, 1, 2, 0]])
assert_array_equal(a, b)
# Assert that 'wrap' pads only with multiples of the original area if
# the pad width is larger than the original array
a = np.arange(4).reshape(2, 2)
a = np.pad(a, [(1, 3), (3, 1)], mode='wrap')
b = np.array(
[[3, 2, 3, 2, 3, 2],
[1, 0, 1, 0, 1, 0],
[3, 2, 3, 2, 3, 2],
[1, 0, 1, 0, 1, 0],
[3, 2, 3, 2, 3, 2],
[1, 0, 1, 0, 1, 0]]
)
assert_array_equal(a, b)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing you could do, is use np.tile(arr, (3, 3)) or even (5, 5). Then do some slicing.
Then you could use @pytst.mark.parametrize to test a could of different pad widths.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented a version using np.tile and @pytest.mark.parametrize, but it seems the test case becomes less straightforward, since it needs some computation for slice index. So, I would prefer still using Lagru's case.

----------------------------------------------------------

``np.pad`` with ``mode=wrap`` now always fills padding with
strict loops of original data.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we just fix the behavior, I think it is important to call this "compatibility" (or at least "change"), and also mention when it breaks: We can only do this, if we expect that practically nobody uses this padding method when the pad size is larger than the initial array.

I haven't made up my mind yet. If we introduce a wrapmod, we could also make an alias to wrap, but have "wrap" transition to an error when the wrap-width is too big?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What specific action should I do to "make modwap an alias to wrap"?

@LuYunChi
Copy link
Contributor Author

I am going to introduce wrapmod, since that seems to hold backward compatibility.

@WarrenWeckesser
Copy link
Member

Sorry I'm not keeping up with this in real-time. Personally, I think we should consider the old behavior a bug and just fix it, instead of introducing the wrapmod mode that I considered in #22464.

@LuYunChi
Copy link
Contributor Author

Thank you @WarrenWeckesser for your suggestion. So what do you think? @seberg @lagru

@LuYunChi
Copy link
Contributor Author

LuYunChi commented Nov 13, 2022

Sorry I'm not keeping up with this in real-time. Personally, I think we should consider the old behavior a bug and just fix it, instead of introducing the wrapmod mode that I considered in #22464.

Also, when I am modifying the test case, I also find that adding a new mode wrapmod requires a copy of all tests for wrap, which seems quite inelegant. Any idea?

Oh, question solved! I can use @pytest.mark.parametrize!

@@ -410,14 +415,12 @@ def _set_wrap_both(padded, axis, width_pair):

if left_pad > 0:
# Pad with wrapped values on left side
# First slice chunk from right side of the non-pad area.
# First slice chunk from left side of the non-pad area.
Copy link
Contributor Author

@LuYunChi LuYunChi Nov 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually neither left nor right is correct. It is more of "from middle", but you can think of it as tracing from the left side. (The later comment for right pad is the same.)

@LuYunChi
Copy link
Contributor Author

I am going to introduce wrapmod, since that seems to hold backward compatibility.

So far, code changes are still on wrap, instead of introducing wrapmod.

@LuYunChi LuYunChi requested a review from seberg November 14, 2022 17:35
@seberg
Copy link
Member

seberg commented Nov 16, 2022

I am happy to pull the plug in this, because it seems to me that wrapping beyond the normal size is rather odd. The thing I can think of would be things like partial differential equations with periodic boundary conditions. There I believe the current wrapping wouldn't even be correct.

I see three main paths:

  1. wrapmod doesn't even make sense: Just remove it completel.
  2. We add a FutureWarning now, because we feel it there is too much potential for breaking. But this is annoying, since there wouldn't be a good way to avoid it.
  3. We introduce wrapmod. That mostly makes sense to me if we actually keep "wrap", but with the meaning of raising an error when the padding is too large.

I really suspect that we are more likely to fix a bug than create one in user code by doing this change. I also suspect that almost nobody will use "wrap" with padding larger than the domain. Due to the combination of both (bug fix and exceptionally few users who see this), I am OK with just doing the change. Warren seems to tend towards that as well.

(On the code, I would love the tests slightly expanded as noted, but if it seems to cover the paths well to @lagru, this seems OK.)

@LuYunChi LuYunChi requested review from lagru and removed request for seberg November 20, 2022 03:27
@LuYunChi LuYunChi requested review from seberg and removed request for lagru and seberg December 6, 2022 00:40
@LuYunChi LuYunChi requested a review from lagru December 6, 2022 00:41
@LuYunChi
Copy link
Contributor Author

LuYunChi commented Dec 6, 2022

May we make a final decision on this pull request? @lagru

Copy link
Contributor

@lagru lagru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks nearly finished on my side! I've suggested a few final tweaks to how 22575.compatibility.rst is worded. Thanks @LuYunChi.

numpy/lib/arraypad.py Outdated Show resolved Hide resolved
doc/release/upcoming_changes/22575.compatibility.rst Outdated Show resolved Hide resolved
doc/release/upcoming_changes/22575.compatibility.rst Outdated Show resolved Hide resolved
doc/release/upcoming_changes/22575.compatibility.rst Outdated Show resolved Hide resolved
LuYunChi and others added 4 commits December 6, 2022 17:41
Co-authored-by: Lars Grüter <lagru+github@mailbox.org>
Co-authored-by: Lars Grüter <lagru+github@mailbox.org>
Co-authored-by: Lars Grüter <lagru+github@mailbox.org>
Co-authored-by: Lars Grüter <lagru+github@mailbox.org>
@LuYunChi LuYunChi requested a review from seberg December 6, 2022 22:44
Copy link
Contributor

@lagru lagru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ready to me. :)

@seberg
Copy link
Member

seberg commented Dec 7, 2022

LGTM and if @lagru is happy, then I am :), so lets give this a shot. Thanks @LuYunChi! Sorry that it was a tad difficult to settle on the proper approach, I hope we got it right :).

@seberg seberg merged commit 2aad439 into numpy:main Dec 7, 2022
markopacak pushed a commit to markopacak/numpy that referenced this pull request Dec 12, 2022
np.pad with mode="wrap" returns unexpected result that original data is not strictly looped in padding. This may happen in some occassions when padding widths in the same dimension are unbalanced (see added testcase in test_arraypad.py and the related issue). The reason is the function pad makes iterative calls of _set_wrap_both() in the above situation, yet period for padding is not correctly computed in each iteration.

The bug is fixed by guaranteeing that period is always a multiple of original data size, and also be the possible maximum for computation efficiency.

Closes numpy#22464

Co-authored-by: Lars Grüter <lagru+github@mailbox.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
00 - Bug 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes
Projects
Development

Successfully merging this pull request may close these issues.

BUG: numpy.pad with mode='wrap' give unexpected return
4 participants