Module to pad n-dimensional arrays by using different functions. #198

timcera · 2012-02-06T19:47:14Z

Currently have 'with_minimum', 'with_maximum', 'with_mean', 'with_median', 'with_linear_ramp', 'with_reflect', 'with_constant', and 'with_wrap'. Includes unit tests, docstrings, and doctests. Based on a feature request at http://projects.scipy.org/numpy/ticket/655.

I use it with fft to minimize edge effects in my tidal analysis program (http://tappy.sf.net). Thought (hoped?) maybe someone else would also find it useful.

mwiebe · 2012-02-16T01:28:57Z

FYI, I mentioned this pull request in an email discussion as an example of a pure Python contribution:

http://mail.scipy.org/pipermail/numpy-discussion/2012-February/060551.html

WarrenWeckesser · 2012-02-16T14:40:54Z

Another FYI: I added the file "_arraytools.py" in the scipy.signal package last June (https://github.com/scipy/scipy/blob/master/scipy/signal/_arraytools.py). It has a smaller set of capabilities than this PR. It provides functions for creating odd, even or constant extensions along a single axis. These were sufficient for enhancing the filtfilt function.

teoliphant · 2012-02-17T21:05:38Z

This is very interesting. Can you explain why you made the padding routines return a sub-class instead of just another ndarray? Are the padded-values generated on-the-fly or are they pre-computed.

timcera · 2012-02-17T23:45:27Z

Padded values are pre-computed.

The sub-class is there because of something I took out of the pull request where I needed the 'original_shape'. That is all the sub-class does is set the 'original_shape' attribute representing the pre-padded shape of the passed in array in terms of how it is set in the padded array. If there is a better way to add an attribute to a ndarray - I am open for suggestions.

I now wish that my reason was more profound, but that's it.

What did I take out? I wanted an easy to use addition to many functions where you could say something like 'np.convolve(a, kern, 'same', pad='with_mean', stat_len=((5,5),))' and np.convolve would use the pad module in the background, padding before taking the convolution. I finished 'np.convolve', but then realized that this might be a really hard sell, so I just put up the pad module.

The 'original_shape' attribute still might be useful:

a = np.arange(100).astype('f')
b = pad.with_mean(a, ((25, 20), ), stat_len=((2, 3), ))
b.original_shape
[slice(25, -20, None)]

Say you now did a convolution...

kern = np.ones(5)/5.0
c = np.convolve(b, kern, 'same')

You can then make sure you have the original shape
which represents the convolution of the original data -
hopefully with edge effects minimized

c = c[b.original_shape]

Not a big deal and there may be easier ways to track the pre-pad shape so I don't mind taking that out if a sub-classed array is a concern.

Kindest regards,
Tim

teoliphant · 2012-02-19T04:20:19Z

I think keeping track of the original shape is best done in another way when it is needed. My sense is that it would be best to have the padding functions just return a base-class array and remove the PadArray subclass.

teoliphant · 2012-02-23T02:52:27Z

Thanks Tim. This is looking good. I'm sorry to be a pain, but one more thing. This functionality should all go under the numpy.lib name-space and be accessible from there. I woud just use names like pad_with_* or if you like you can remove the "with" and just use pad_*. But, if you really want the pad.with_mean spelling just put the pad.py file in the lib directory and use

from numpy.lib import pad
pad.with_mean()

Thanks again. This is a nice addition.

timcera · 2012-02-23T14:26:04Z

Travis,

Haven't decided yet which naming convention to go with. Anyone who is following this have a preference? Will decide and post a change soon.

timcera · 2012-02-23T15:00:03Z

Warren,

Looking at your additions to _arraytools.py I am wondering whether I should change the name pad.with_reflect

Your even_ext is equivalent to my pad.with_reflect:

import numpy as np
a = np.array([[1.0,2.0,3.0,4.0,5.0], [0.0, 1.0, 4.0, 9.0, 16.0]])
_even_ext(a, 2)
array([[  3.,   2.,   1.,   2.,   3.,   4.,   5.,   4.,   3.],
       [  4.,   1.,   0.,   1.,   4.,   9.,  16.,   9.,   4.]])

pad.with_reflect(a, ((0,0),(2,2)))
array([[  3.,   2.,   1.,   2.,   3.,   4.,   5.,   4.,   3.],
       [  4.,   1.,   0.,   1.,   4.,   9.,  16.,   9.,   4.]])

Would this process be known to more people as an even extension?

Your const_ext is very different from my pad.with_constant since with pad.with_constant you have to specify what the constant is, where your const_ext takes the edge value as the constant.

Your const_ext can be emulated with pad.with_mean, pad.with_minimum, pad.with_maximum, or pad.with_median with a stat_len = (1,1) since all of those statistics collapse to the edge value if only given the edge value.

pad.with_minimum(a, ((0,0),(2,2)), stat_len=(1,1))
array([[  1.,   1.,   1.,   2.,   3.,   4.,   5.,   5.,   5.],
       [  0.,   0.,   0.,   1.,   4.,   9.,  16.,  16.,  16.]])

Should I add a pad.with_edge?

The odd_ext I don't have something comparable.

Would it be helpful to add a pad.with_odd (though right now I am not exactly sure that would be the best name)?

Kindest regards,
Tim

teoliphant · 2012-02-23T20:28:20Z

Hey Tim,

The pad.with_reflect function should probably take key-word argument about whether the reflection should be even or odd. Alternatively, you could add two functions:

pad.with_even_reflect
pad.with_odd_reflect

Also,

I think the use-case of padding with a constant is common enough, that you should add an "entry-point" function:

pad.with_const

Even if it just calls the pad.with_minimum function in the end --- you might also just optimize that case.

Thanks,

-Travis

teoliphant · 2012-02-24T23:50:09Z

This looks good to me. I'll merge it over the weekend unless someone has an issue.

WarrenWeckesser · 2012-02-24T23:58:15Z

If you can hold off merging until Sunday, I'll have some time to provide more feedback tomorrow.

WarrenWeckesser · 2012-02-25T22:06:20Z

I didn't say so in my short "FYI" above, but I think this is a nice addition to numpy.

Now, some comments on the implementation...

pad_reflect:

This should probably not allow a pad length greater than one less than the length of the corresonding dimension in the array. Otherwise, what is the definition of the extension?
Currently it does this:

In [302]: pad_reflect([1,2,3], 2)
Out[302]: array([3, 2, 1, 2, 3, 2, 1])

In [303]: pad_reflect([1,2,3], 3)    # Why 0 and 2 at the ends?
Out[303]: array([0, 3, 2, 1, 2, 3, 2, 1, 2])

In [304]: pad_reflect([1,2,3], 4)    # What's going on?
Out[304]: array([0, 0, 3, 2, 1, 2, 3, 2, 1, 2, 3])

pad_wrap:

When the pad length exceeds the data length, it pads with zeros. This seems wrong. I expect it to continue wrapping.

In [283]: pad_wrap([1,2,3], 3)  # OK
Out[283]: array([1, 2, 3, 1, 2, 3, 1, 2, 3])

In [284]: pad_wrap([1,2,3], 4)
Out[284]: array([0, 1, 2, 3, 1, 2, 3, 1, 2, 3, 0])

I expected array([3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1])

pad_linear_ramp:

This is confusing, and in some cases ill-defined beyond rank 1 arrays.

Consider this example:

In [290]: pad_linear_ramp([[2.0]], 2, [(1,3), (0, 2)])
Out[290]: 
array([[ 0.  ,  0.5 ,  1.  ,  1.5 ,  2.  ],
       [ 0.  ,  0.75,  1.5 ,  1.75,  2.  ],
       [ 0.  ,  1.  ,  2.  ,  2.  ,  2.  ],
       [ 0.  ,  1.25,  2.5 ,  2.25,  2.  ],
       [ 0.  ,  1.5 ,  3.  ,  2.5 ,  2.  ]])

Why should the left and right columns be constant, while first and last rows vary linearly?

Another confusing point is the definition of the linear ramp in the corner regions. Consider this:

In [293]: pad_linear_ramp([[2.0]], [(4,0),(4,0)])
Out[293]: 
array([[ 0.   ,  0.   ,  0.   ,  0.   ,  0.   ],
       [ 0.   ,  0.125,  0.25 ,  0.375,  0.5  ],
       [ 0.   ,  0.25 ,  0.5  ,  0.75 ,  1.   ],
       [ 0.   ,  0.375,  0.75 ,  1.125,  1.5  ],
       [ 0.   ,  0.5  ,  1.   ,  1.5  ,  2.   ]])

A different method for extending the linear ramp to the corner regions results in:

array([[ 0. ,  0. ,  0. ,  0. ,  0. ],
       [ 0. ,  0.5,  0.5,  0.5,  0.5],
       [ 0. ,  0.5,  1. ,  1. ,  1. ],
       [ 0. ,  0.5,  1. ,  1.5,  1.5],
       [ 0. ,  0.5,  1. ,  1.5,  2. ]])

That's actually what I expected. However, this has the same problem as the current implementation: it is ill-defined when different values are given for different axes.

The simplest solution is to just remove the pad_linear_ramp() function. I don't see any important use-cases for it, so I don't think it will be missed.

pad_median:

Consider the following example:

In [470]: c
Out[470]: 
array([[[ 4.,  8.,  3.],
        [ 3.,  1.,  9.],
        [ 7.,  8.,  8.]],

       [[ 7.,  9.,  6.],
        [ 3.,  8.,  9.],
        [ 3.,  8.,  4.]],

       [[ 9.,  7.,  1.],
        [ 5.,  5.,  0.],
        [ 0.,  2.,  3.]]])

In [471]: m = pad_median(c, stat_len=2)

In [472]: m[0,0,0]
Out[472]: 5.375

In [473]: median(c[:2,:2,:2])
Out[473]: 5.5

In [474]: median(median(median(c[:2,:2,:2], axis=-1), axis=-1))
Out[474]: 5.375

The confusing part for me is that the extension code is taking a median of medians, so the answer depends on the how the data is distributed among the intermediate medians.

Another example:

In [487]: w = array([[3,1,4],[4,5,9],[9,8,2]])

In [488]: pad_median(w)
Out[488]: 
array([[4, 4, 5, 4, 4],
       [3, 3, 1, 4, 3],
       [5, 4, 5, 9, 5],
       [8, 9, 8, 2, 8],
       [4, 4, 5, 4, 4]])

In [489]: pad_median(w.T).T
Out[489]: 
array([[5, 4, 5, 4, 5],
       [3, 3, 1, 4, 3],
       [5, 4, 5, 9, 5],
       [8, 9, 8, 2, 8],
       [5, 4, 5, 4, 5]])

Notice that the corners are different. It would be nice if they weren't, but that would mean that code would have to figure out the complete set of numbers that contribute to the corner values, and compute the median of that set in one calculation instead of using the current median of medians. At a minimum, the procedure used by the code to compute corner regions should be added to the docstrings.

By the way, the numbers in the last example come from John Cook's blog post:
http://www.johndcook.com/blog/2009/06/23/tukey-median-ninther/

pad_mean, pad_maximum, pad_minimum, pad_median:

These all have the same API, and only differ in the statistic that they compute. How about combining them into a single function, say 'pad_stat' (or...?), that takes a 'method' keyword with the possible values 'max', 'min', 'mean', 'median'?

WarrenWeckesser · 2012-02-25T22:10:16Z

numpy/lib/pad.py

Why is this here?

Not sure exactly what you are pointing to. Maybe the '_docformat...' line? That was, if memory serves, an early requirement to get code accepted into numpy. The pad module is actually quite old and languished in the numpy bug/feature tracker for a couple years.

I was referring to 'modname'. It is not used anywhere.

I also wondered about 'docformat', but a few other modules have that, so I assumed it was some sort of numpy convention. Travis might know more about that. Does numpy still maintain compatibility with epydoc?

I don't know of any efforts to maintain compatibility with epydoc directly.

timcera · 2012-02-27T01:37:27Z

Warren,

Thank you for taking the time. This is very helpful.

The 'corner' regions are ill-defined, so the last axis wins out by using padded values of the previous axis. Is there a better convention? Would have to be able to extend to n-dimensions.

Kindest regards,
Tim

WarrenWeckesser · 2012-02-27T03:08:16Z

I'll stick with my suggestion to simply remove 'pad_linear_ramp'.

timcera · 2012-02-27T04:02:13Z

The 'corner' regions are calculated from previously padded values in all of the functions for rank greater than 1. You made a similar observation about the pad_median function taking a median of medians.

I guess that a case could be made to go diagonally out from each corner, using values for the statistic from the diagonal of the input array. Which would be what you expected from pad_linear_ramp. Quite a bit of added complexity, and I don't see an easy way to do that at the moment (for rank n).

I use pad_linear_ramp for rank-1 arrays, and thought that it would be useful for rank-2 arrays (for example to minimize edge effects when taking a fft) - beyond that? What is done now for the corner regions in image work?

teoliphant · 2012-02-27T05:54:26Z

The linear_ramp extrapolation technique is useful in 1-d. In fact, I could see adding a family of extrapolators based on polynomial extension based on the edge points.

In N-d, the corner cases for all the calculations are a good question. There are a lot of approaches that can be justified. I think the important thing is to document clearly what is done.

I suppose, my preference would be to produce the "average" of the different (separable) methods that make sense. Doing non-separable methods are more difficult to extend into N-d and also have higher computational cost for less benefit.

Yet another "curse of dimensionality"

timcera · 2012-02-28T14:15:48Z

Found some resources:

http://octave.sourceforge.net/image/function/padarray.html

Reenact the Octave examples:

a = [[1,2,3],[4,5,6]]
np.lib.pad_constant(a, ((2,2),(1,1)))       # Octave:  padarray([1,2,3;4,5,6],[2,1])
np.lib.pad_constant(a, ((2,2),(1,1)), 5)   # Octave:  padarray([1,2,3;4,5,6],[2,1], 5)
np.lib.pad_constant(a, ((2,0),(1,0)))     # Octave:  padarray([1,2,3;4,5,6],[2,1],0,'pre')
np.lib.pad_wrap(a, ((2,2),(1,1)))       # Octave:  padarray([1,2,3;4,5,6],[2,1],'circular')
np.lib.pad_edge(a, ((2,2),(1,1)))      # Octave: padarray([1,2,3;4,5,6],[2,1],'replicate')
# NA       Octave: padarray([1,2,3;4,5,6],[2,1],'symmetric')

Octave has 'symmetric' which I don't. No example but looks like Octave's 'reflect' is the same as mine. Not really complicated padding functions, but handles the corner regions the way that I did; edges of later axes use padded values of previous axes.

http://www.mathworks.com/help/toolbox/images/ref/padarray.html

Since Octave is a Matlab look alike, similar behavior to above:

#   NA           Matlab: padarray(a,[0 3],'symmetric','pre')
a = [[1,2],[3,4]]
np.lib.pad_edge(a, ((0,2),(0,3)))       # Matlab: padarray(A,[3 2],'replicate','post')
c = np.arange(8) + 1
c = c.reshape((2,2,2))
np.lib.pad_constant(c, ((0,0),(3,3),(3,3)))   # Matlab: padarray(C,[3 3])

http://www.mathworks.com/matlabcentral/fileexchange/7720-pad-array

User contribution that adds padding with tapering windowing functions (the closest that I would come to this is linear_ramp). Some examples, but no answers! Also returns placement and shape of original array within the padded array! :-)

http://reference.wolfram.com/mathematica/ref/ArrayPad.html

These guys have too much time on their hands!

Under the 'Scope' section there are reflection examples where the padding is longer than the length of the axis.

In[1]:= ArrayPad[{a,b,c}, 4, "Reflected"]
Out[1]= {a,b,c,b, a,b,c, b,a,b,c}

Basically the creation of the pad values just 'reflects' back and forth along the vector.

Here are all the Matlab types and a guess at how np.lib.pad would emulate...

c a constant c
- np.lib.pad_constant
{c1,c2,...} cyclic repetition of constants
- You would have to call np.lib.pad_constant multiple times
"Extrapolated" polynomial extrapolation of elements
- NA
"Fixed" repetitions of the elements on each boundary
- np.lib.pad_edge
"Periodic" cyclic repetitions of the complete array
- NA
"Reflected" reflections of the array in the boundary
- np.lib.pad_reflect
"ReflectedDifferences" reflections of the differences between elements
- NA
"Reversed" reversals of the complete array
- NA - same as Matlab/Octave 'symmetric'
"ReversedDifferences" reversals of the differences between elements
- NA
"ReversedNegation" negated reversals of the array
- NA

timcera · 2012-03-02T07:19:52Z

Hold off on looking at this yet. Of all things, I messed up some of the docstrings. Will fix soon.

timcera · 2012-03-22T14:28:32Z

Final review?

charris · 2012-03-22T14:47:59Z

Hi Tim, I'm waiting for the folks involved in this to come to a conclusion. Pinging the list might get more attention.

teoliphant · 2012-03-24T00:06:22Z

This looks good from my perspective.

stefanv · 2012-03-24T08:14:45Z

numpy/lib/pad.py

The grammar of this error message is somewhat bust.

Yep. Try to keep the error messages short and punchy.

charris · 2012-03-24T15:50:51Z

numpy/lib/pad.py

You can remove the imports/globals comments.

charris · 2012-03-24T16:00:47Z

This looks pretty clean and complete. Just a few nits to pick here and there.

timcera · 2012-03-26T17:28:19Z

I have no idea about why all of these commits came in the way they did.

Any git guru that can help me out?

Kindest regards,
Tim

stefanv · 2012-03-26T17:49:17Z

It looks like you did a merge with upstream--that should be avoided. You'll have to rebase your changes, and then do a forced push, e.g.:

git rebase upstream/master
git push -f timcera pad_new

timcera · 2012-03-27T00:59:51Z

Thank you Stefan. That made it mostly right.

I still had to revert a commit that reappeared (3e9c0b7) and there are two strange commits that appeared out of nowhere but fortunately cancel each other out (6ea2605 and ce0f2e9).

I looked at the overall diff and it is what I intend. I imagine that all of these commits are going to be squashed into one?

I apologize for my clumsiness with git. I need to finish this pull request and start fresh with a new fork.

Is there a best practices for numpy/scipy git workflow?

Stefan and Charles: the last round of review comments are addressed in 1de9de9.

charris · 2012-03-27T02:13:06Z

It's still messed up :-/ If you merged master or upstream, you should be able to get back to the state before merge using git reflog. Here is what the beginning of the reflog looks like after I've been playing with your PR a bit:

$charris@f16 numpy.git (pull-198)$ git reflog
4877a8a HEAD@{0}: rebase -i (finish): returning to refs/heads/pull-198
4877a8a HEAD@{1}: checkout: moving from pull-198 to 4877a8a
4877a8a HEAD@{2}: rebase: aborting
3e52d3d HEAD@{3}: checkout: moving from pull-198 to 3e52d3d
4877a8a HEAD@{4}: rebase: aborting
3624c14 HEAD@{5}: checkout: moving from pull-198 to 3624c14
4877a8a HEAD@{6}: am: Revert "Module to pad n-dimensional arrays by using differ
b622424 HEAD@{7}: am: STYLE: Various style and grammar changes.
...

If you find a good place to return to, then do

git reset --hard HEAD@{nnn}

where nnn is the appropriate number. Make a backup tag first

git tag backup

Then when it all goes crash, you can recover

git checkout -b branchname backup

Mind, I haven't tried the last two bits myself ;)

timcera · 2012-03-27T13:45:14Z

Thanks. I now have a 'padn' branch which...

git diff master..padn

shows the right stuff.

How do I know that this is 'right'?

I then was thinking to do something like:

git branch -m pad_new pad_backup
git branch -m padn pad_new
git push -f origin pad_new

where 'origin' is my fork.

charris · 2012-03-27T14:56:54Z

You can try that. If it doesn't work then I can pull your branch directly if you push it to github

git fetch https://github.com/timcera/numpy padn
git co -b padn FETCH_HEAD

or you can just make another pull request. In any case, we can make things work.

charris · 2012-03-27T15:01:16Z

Its the history that we want right, not the diff. when all else fails, I make a diff with master, co out a new branch, and edit the patch into a series of smaller patches that can be applied one after the other. But I don't think we are at the stage of extremity...

feat: Add vqsub[q]_[s8|s16|s32|s64|u8|u16|u32|u64]

WarrenWeckesser reviewed Feb 25, 2012
View reviewed changes

stefanv reviewed Mar 24, 2012
View reviewed changes

charris reviewed Mar 24, 2012
View reviewed changes

numpy/lib/pad.py Outdated

Copy link

Member

charris Mar 24, 2012

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove the imports/globals comments.

timcera merged commit ec56ee1 into numpy:master Mar 27, 2012

luyahan pushed a commit to plctlab/numpy that referenced this pull request Apr 25, 2024

Merge pull request numpy#198 from howjmay/vqsub

7f7870f

feat: Add vqsub[q]_[s8|s16|s32|s64|u8|u16|u32|u64]

Uh oh!

Module to pad n-dimensional arrays by using different functions. #198

Module to pad n-dimensional arrays by using different functions. #198

Uh oh!

Conversation

timcera commented Feb 6, 2012

Uh oh!

mwiebe commented Feb 16, 2012

Uh oh!

WarrenWeckesser commented Feb 16, 2012

Uh oh!

teoliphant commented Feb 17, 2012

Uh oh!

timcera commented Feb 17, 2012

Uh oh!

teoliphant commented Feb 19, 2012

Uh oh!

teoliphant commented Feb 23, 2012

Uh oh!

timcera commented Feb 23, 2012

Uh oh!

timcera commented Feb 23, 2012

Uh oh!

teoliphant commented Feb 23, 2012

Uh oh!

teoliphant commented Feb 24, 2012

Uh oh!

WarrenWeckesser commented Feb 24, 2012

Uh oh!

WarrenWeckesser commented Feb 25, 2012

Uh oh!

WarrenWeckesser Feb 25, 2012

Choose a reason for hiding this comment

Uh oh!

timcera Feb 27, 2012

Choose a reason for hiding this comment

Uh oh!

WarrenWeckesser Feb 27, 2012

Choose a reason for hiding this comment

Uh oh!

teoliphant Feb 27, 2012

Choose a reason for hiding this comment

Uh oh!

timcera commented Feb 27, 2012

Uh oh!

WarrenWeckesser commented Feb 27, 2012

Uh oh!

timcera commented Feb 27, 2012

Uh oh!

teoliphant commented Feb 27, 2012

Uh oh!

timcera commented Feb 28, 2012

Uh oh!

timcera commented Mar 2, 2012

Uh oh!

timcera commented Mar 22, 2012

Uh oh!

charris commented Mar 22, 2012

Uh oh!

teoliphant commented Mar 24, 2012

Uh oh!

stefanv Mar 24, 2012

Choose a reason for hiding this comment

Uh oh!

charris Mar 24, 2012

Choose a reason for hiding this comment

Uh oh!

charris Mar 24, 2012

Choose a reason for hiding this comment

Uh oh!

charris commented Mar 24, 2012

Uh oh!

timcera commented Mar 26, 2012

Uh oh!

stefanv commented Mar 26, 2012

Uh oh!

timcera commented Mar 27, 2012

Uh oh!

charris commented Mar 27, 2012

Uh oh!