New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: fix unexpected return of np.pad with mode=wrap #22575
Conversation
I guess we can probably go ahead with this, no failing tests seems like a good indicator. We should add a brief release note, though. I do think we should have a high dimensional test though, @LuYunChi. |
I can add a two dimensional test, or a even higher is needed? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've suggested a few improvements, however it looks correct from my side. Thank you @LuYunChi!
I'd also agree that this is probably the behavior most users would expect and probably what should be used in numpy.pad
. However, taking the docstring into account
The first values are used to pad the end and the end values are used to pad the beginning.
I'm hesitant to call this a bug fix and just push it onto users without a (deprecation) warning. I would back @WarrenWeckesser's suggestion in #22464 (comment)
[...] erhaps we could consider adding a new mode, say modwrap, that implements the expected behavior. If there is concern about the proliferation of modes, the current 'wrap' mode could also be deprecated and removed in a future release.
numpy/lib/tests/test_arraypad.py
Outdated
a = np.arange(9, dtype=float).reshape(3, 3) | ||
a = np.pad(a, [(2, 4), (1, 1)], mode='wrap') | ||
b = np.array([ | ||
[5, 3, 4, 5, 3], | ||
[8, 6, 7, 8, 6], | ||
[2, 0, 1, 2, 0], | ||
[5, 3, 4, 5, 3], | ||
[8, 6, 7, 8, 6], | ||
[2, 0, 1, 2, 0], | ||
[5, 3, 4, 5, 3], | ||
[8, 6, 7, 8, 6], | ||
[2, 0, 1, 2, 0]]) | ||
assert_array_equal(a, b) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we use a non-minimal two dimensional test case, we can test for the fault behavior in both dimension. Also might be good to add a comment explaining what this tests does.
a = np.arange(9, dtype=float).reshape(3, 3) | |
a = np.pad(a, [(2, 4), (1, 1)], mode='wrap') | |
b = np.array([ | |
[5, 3, 4, 5, 3], | |
[8, 6, 7, 8, 6], | |
[2, 0, 1, 2, 0], | |
[5, 3, 4, 5, 3], | |
[8, 6, 7, 8, 6], | |
[2, 0, 1, 2, 0], | |
[5, 3, 4, 5, 3], | |
[8, 6, 7, 8, 6], | |
[2, 0, 1, 2, 0]]) | |
assert_array_equal(a, b) | |
# Assert that 'wrap' pads only with multiples of the original area if | |
# the pad width is larger than the original array | |
a = np.arange(4).reshape(2, 2) | |
a = np.pad(a, [(1, 3), (3, 1)], mode='wrap') | |
b = np.array( | |
[[3, 2, 3, 2, 3, 2], | |
[1, 0, 1, 0, 1, 0], | |
[3, 2, 3, 2, 3, 2], | |
[1, 0, 1, 0, 1, 0], | |
[3, 2, 3, 2, 3, 2], | |
[1, 0, 1, 0, 1, 0]] | |
) | |
assert_array_equal(a, b) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing you could do, is use np.tile(arr, (3, 3))
or even (5, 5)
. Then do some slicing.
Then you could use @pytst.mark.parametrize
to test a could of different pad widths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented a version using np.tile
and @pytest.mark.parametrize
, but it seems the test case becomes less straightforward, since it needs some computation for slice index. So, I would prefer still using Lagru's case.
---------------------------------------------------------- | ||
|
||
``np.pad`` with ``mode=wrap`` now always fills padding with | ||
strict loops of original data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we just fix the behavior, I think it is important to call this "compatibility" (or at least "change"), and also mention when it breaks: We can only do this, if we expect that practically nobody uses this padding method when the pad size is larger than the initial array.
I haven't made up my mind yet. If we introduce a wrapmod
, we could also make an alias to wrap, but have "wrap"
transition to an error when the wrap-width is too big?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What specific action should I do to "make modwap
an alias to wrap
"?
I am going to introduce |
Sorry I'm not keeping up with this in real-time. Personally, I think we should consider the old behavior a bug and just fix it, instead of introducing the |
Thank you @WarrenWeckesser for your suggestion. So what do you think? @seberg @lagru |
Also, when I am modifying the test case, I also find that adding a new mode Oh, question solved! I can use |
@@ -410,14 +415,12 @@ def _set_wrap_both(padded, axis, width_pair): | |||
|
|||
if left_pad > 0: | |||
# Pad with wrapped values on left side | |||
# First slice chunk from right side of the non-pad area. | |||
# First slice chunk from left side of the non-pad area. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually neither left nor right is correct. It is more of "from middle", but you can think of it as tracing from the left side. (The later comment for right pad is the same.)
So far, code changes are still on |
I am happy to pull the plug in this, because it seems to me that wrapping beyond the normal size is rather odd. The thing I can think of would be things like partial differential equations with periodic boundary conditions. There I believe the current wrapping wouldn't even be correct. I see three main paths:
I really suspect that we are more likely to fix a bug than create one in user code by doing this change. I also suspect that almost nobody will use (On the code, I would love the tests slightly expanded as noted, but if it seems to cover the paths well to @lagru, this seems OK.) |
May we make a final decision on this pull request? @lagru |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks nearly finished on my side! I've suggested a few final tweaks to how 22575.compatibility.rst
is worded. Thanks @LuYunChi.
Co-authored-by: Lars Grüter <lagru+github@mailbox.org>
Co-authored-by: Lars Grüter <lagru+github@mailbox.org>
Co-authored-by: Lars Grüter <lagru+github@mailbox.org>
Co-authored-by: Lars Grüter <lagru+github@mailbox.org>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks ready to me. :)
np.pad with mode="wrap" returns unexpected result that original data is not strictly looped in padding. This may happen in some occassions when padding widths in the same dimension are unbalanced (see added testcase in test_arraypad.py and the related issue). The reason is the function pad makes iterative calls of _set_wrap_both() in the above situation, yet period for padding is not correctly computed in each iteration. The bug is fixed by guaranteeing that period is always a multiple of original data size, and also be the possible maximum for computation efficiency. Closes numpy#22464 Co-authored-by: Lars Grüter <lagru+github@mailbox.org>
np.pad
withmode="wrap"
returns unexpected result that original data is not strictly looped in padding. This may happen in some occassions when padding widths in the same dimension are unbalanced (see added testcase intest_arraypad.py
and the related issue). The reason is the functionpad
makes iterative calls of_set_wrap_both()
in the above situation, yetperiod
for padding is not correctly computed in each iteration.The bug is fixed by guaranteeing that
period
is always a multiple of original data size, and also be the possible maximum for computation efficiency.Closes #22464