Skip to content

Conversation

timcera
Copy link
Contributor

@timcera timcera commented Feb 6, 2012

Currently have 'with_minimum', 'with_maximum', 'with_mean', 'with_median', 'with_linear_ramp', 'with_reflect', 'with_constant', and 'with_wrap'. Includes unit tests, docstrings, and doctests. Based on a feature request at http://projects.scipy.org/numpy/ticket/655.

I use it with fft to minimize edge effects in my tidal analysis program (http://tappy.sf.net). Thought (hoped?) maybe someone else would also find it useful.

@mwiebe
Copy link
Member

mwiebe commented Feb 16, 2012

FYI, I mentioned this pull request in an email discussion as an example of a pure Python contribution:

http://mail.scipy.org/pipermail/numpy-discussion/2012-February/060551.html

@WarrenWeckesser
Copy link
Member

Another FYI: I added the file "_arraytools.py" in the scipy.signal package last June (https://github.com/scipy/scipy/blob/master/scipy/signal/_arraytools.py). It has a smaller set of capabilities than this PR. It provides functions for creating odd, even or constant extensions along a single axis. These were sufficient for enhancing the filtfilt function.

@teoliphant
Copy link
Member

This is very interesting. Can you explain why you made the padding routines return a sub-class instead of just another ndarray? Are the padded-values generated on-the-fly or are they pre-computed.

@timcera
Copy link
Contributor Author

timcera commented Feb 17, 2012

Padded values are pre-computed.

The sub-class is there because of something I took out of the pull request where I needed the 'original_shape'. That is all the sub-class does is set the 'original_shape' attribute representing the pre-padded shape of the passed in array in terms of how it is set in the padded array. If there is a better way to add an attribute to a ndarray - I am open for suggestions.

I now wish that my reason was more profound, but that's it.

What did I take out? I wanted an easy to use addition to many functions where you could say something like 'np.convolve(a, kern, 'same', pad='with_mean', stat_len=((5,5),))' and np.convolve would use the pad module in the background, padding before taking the convolution. I finished 'np.convolve', but then realized that this might be a really hard sell, so I just put up the pad module.

The 'original_shape' attribute still might be useful:

a = np.arange(100).astype('f')
b = pad.with_mean(a, ((25, 20), ), stat_len=((2, 3), ))
b.original_shape
[slice(25, -20, None)]

Say you now did a convolution...

kern = np.ones(5)/5.0
c = np.convolve(b, kern, 'same')

You can then make sure you have the original shape
which represents the convolution of the original data -
hopefully with edge effects minimized

c = c[b.original_shape]

Not a big deal and there may be easier ways to track the pre-pad shape so I don't mind taking that out if a sub-classed array is a concern.

Kindest regards,
Tim

@teoliphant
Copy link
Member

I think keeping track of the original shape is best done in another way when it is needed. My sense is that it would be best to have the padding functions just return a base-class array and remove the PadArray subclass.

@teoliphant
Copy link
Member

Thanks Tim. This is looking good. I'm sorry to be a pain, but one more thing. This functionality should all go under the numpy.lib name-space and be accessible from there. I woud just use names like pad_with_* or if you like you can remove the "with" and just use pad_*. But, if you really want the pad.with_mean spelling just put the pad.py file in the lib directory and use

from numpy.lib import pad
pad.with_mean()

Thanks again. This is a nice addition.

@timcera
Copy link
Contributor Author

timcera commented Feb 23, 2012

Travis,

Haven't decided yet which naming convention to go with. Anyone who is following this have a preference? Will decide and post a change soon.

@timcera
Copy link
Contributor Author

timcera commented Feb 23, 2012

Warren,

Looking at your additions to _arraytools.py I am wondering whether I should change the name pad.with_reflect

Your even_ext is equivalent to my pad.with_reflect:

import numpy as np
a = np.array([[1.0,2.0,3.0,4.0,5.0], [0.0, 1.0, 4.0, 9.0, 16.0]])
_even_ext(a, 2)
array([[  3.,   2.,   1.,   2.,   3.,   4.,   5.,   4.,   3.],
       [  4.,   1.,   0.,   1.,   4.,   9.,  16.,   9.,   4.]])

pad.with_reflect(a, ((0,0),(2,2)))
array([[  3.,   2.,   1.,   2.,   3.,   4.,   5.,   4.,   3.],
       [  4.,   1.,   0.,   1.,   4.,   9.,  16.,   9.,   4.]])

Would this process be known to more people as an even extension?

Your const_ext is very different from my pad.with_constant since with pad.with_constant you have to specify what the constant is, where your const_ext takes the edge value as the constant.

Your const_ext can be emulated with pad.with_mean, pad.with_minimum, pad.with_maximum, or pad.with_median with a stat_len = (1,1) since all of those statistics collapse to the edge value if only given the edge value.

pad.with_minimum(a, ((0,0),(2,2)), stat_len=(1,1))
array([[  1.,   1.,   1.,   2.,   3.,   4.,   5.,   5.,   5.],
       [  0.,   0.,   0.,   1.,   4.,   9.,  16.,  16.,  16.]])

Should I add a pad.with_edge?

The odd_ext I don't have something comparable.

Would it be helpful to add a pad.with_odd (though right now I am not exactly sure that would be the best name)?

Kindest regards,
Tim

@teoliphant
Copy link
Member

Hey Tim,

The pad.with_reflect function should probably take key-word argument about whether the reflection should be even or odd. Alternatively, you could add two functions:

pad.with_even_reflect
pad.with_odd_reflect

Also,

I think the use-case of padding with a constant is common enough, that you should add an "entry-point" function:

pad.with_const

Even if it just calls the pad.with_minimum function in the end --- you might also just optimize that case.

Thanks,

-Travis

@teoliphant
Copy link
Member

This looks good to me. I'll merge it over the weekend unless someone has an issue.

@WarrenWeckesser
Copy link
Member

If you can hold off merging until Sunday, I'll have some time to provide more feedback tomorrow.

@WarrenWeckesser
Copy link
Member

I didn't say so in my short "FYI" above, but I think this is a nice addition to numpy.

Now, some comments on the implementation...

pad_reflect:

This should probably not allow a pad length greater than one less than the length of the corresonding dimension in the array. Otherwise, what is the definition of the extension?
Currently it does this:

In [302]: pad_reflect([1,2,3], 2)
Out[302]: array([3, 2, 1, 2, 3, 2, 1])

In [303]: pad_reflect([1,2,3], 3)    # Why 0 and 2 at the ends?
Out[303]: array([0, 3, 2, 1, 2, 3, 2, 1, 2])

In [304]: pad_reflect([1,2,3], 4)    # What's going on?
Out[304]: array([0, 0, 3, 2, 1, 2, 3, 2, 1, 2, 3])

pad_wrap:

When the pad length exceeds the data length, it pads with zeros. This seems wrong. I expect it to continue wrapping.

In [283]: pad_wrap([1,2,3], 3)  # OK
Out[283]: array([1, 2, 3, 1, 2, 3, 1, 2, 3])

In [284]: pad_wrap([1,2,3], 4)
Out[284]: array([0, 1, 2, 3, 1, 2, 3, 1, 2, 3, 0])

I expected array([3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1])

pad_linear_ramp:

This is confusing, and in some cases ill-defined beyond rank 1 arrays.

Consider this example:

In [290]: pad_linear_ramp([[2.0]], 2, [(1,3), (0, 2)])
Out[290]: 
array([[ 0.  ,  0.5 ,  1.  ,  1.5 ,  2.  ],
       [ 0.  ,  0.75,  1.5 ,  1.75,  2.  ],
       [ 0.  ,  1.  ,  2.  ,  2.  ,  2.  ],
       [ 0.  ,  1.25,  2.5 ,  2.25,  2.  ],
       [ 0.  ,  1.5 ,  3.  ,  2.5 ,  2.  ]])

Why should the left and right columns be constant, while first and last rows vary linearly?

Another confusing point is the definition of the linear ramp in the corner regions. Consider this:

In [293]: pad_linear_ramp([[2.0]], [(4,0),(4,0)])
Out[293]: 
array([[ 0.   ,  0.   ,  0.   ,  0.   ,  0.   ],
       [ 0.   ,  0.125,  0.25 ,  0.375,  0.5  ],
       [ 0.   ,  0.25 ,  0.5  ,  0.75 ,  1.   ],
       [ 0.   ,  0.375,  0.75 ,  1.125,  1.5  ],
       [ 0.   ,  0.5  ,  1.   ,  1.5  ,  2.   ]])

A different method for extending the linear ramp to the corner regions results in:

array([[ 0. ,  0. ,  0. ,  0. ,  0. ],
       [ 0. ,  0.5,  0.5,  0.5,  0.5],
       [ 0. ,  0.5,  1. ,  1. ,  1. ],
       [ 0. ,  0.5,  1. ,  1.5,  1.5],
       [ 0. ,  0.5,  1. ,  1.5,  2. ]])

That's actually what I expected. However, this has the same problem as the current implementation: it is ill-defined when different values are given for different axes.

The simplest solution is to just remove the pad_linear_ramp() function. I don't see any important use-cases for it, so I don't think it will be missed.

pad_median:

Consider the following example:

In [470]: c
Out[470]: 
array([[[ 4.,  8.,  3.],
        [ 3.,  1.,  9.],
        [ 7.,  8.,  8.]],

       [[ 7.,  9.,  6.],
        [ 3.,  8.,  9.],
        [ 3.,  8.,  4.]],

       [[ 9.,  7.,  1.],
        [ 5.,  5.,  0.],
        [ 0.,  2.,  3.]]])

In [471]: m = pad_median(c, stat_len=2)

In [472]: m[0,0,0]
Out[472]: 5.375

In [473]: median(c[:2,:2,:2])
Out[473]: 5.5

In [474]: median(median(median(c[:2,:2,:2], axis=-1), axis=-1))
Out[474]: 5.375

The confusing part for me is that the extension code is taking a median of medians, so the answer depends on the how the data is distributed among the intermediate medians.

Another example:

In [487]: w = array([[3,1,4],[4,5,9],[9,8,2]])

In [488]: pad_median(w)
Out[488]: 
array([[4, 4, 5, 4, 4],
       [3, 3, 1, 4, 3],
       [5, 4, 5, 9, 5],
       [8, 9, 8, 2, 8],
       [4, 4, 5, 4, 4]])

In [489]: pad_median(w.T).T
Out[489]: 
array([[5, 4, 5, 4, 5],
       [3, 3, 1, 4, 3],
       [5, 4, 5, 9, 5],
       [8, 9, 8, 2, 8],
       [5, 4, 5, 4, 5]])

Notice that the corners are different. It would be nice if they weren't, but that would mean that code would have to figure out the complete set of numbers that contribute to the corner values, and compute the median of that set in one calculation instead of using the current median of medians. At a minimum, the procedure used by the code to compute corner regions should be added to the docstrings.

By the way, the numbers in the last example come from John Cook's blog post:
http://www.johndcook.com/blog/2009/06/23/tukey-median-ninther/

pad_mean, pad_maximum, pad_minimum, pad_median:

These all have the same API, and only differ in the statistic that they compute. How about combining them into a single function, say 'pad_stat' (or...?), that takes a 'method' keyword with the possible values 'max', 'min', 'mean', 'median'?

import numpy as np

#===globals======================
modname = "pad"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure exactly what you are pointing to. Maybe the '_docformat...' line? That was, if memory serves, an early requirement to get code accepted into numpy. The pad module is actually quite old and languished in the numpy bug/feature tracker for a couple years.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was referring to 'modname'. It is not used anywhere.

I also wondered about 'docformat', but a few other modules have that, so I assumed it was some sort of numpy convention. Travis might know more about that. Does numpy still maintain compatibility with epydoc?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know of any efforts to maintain compatibility with epydoc directly.

@timcera
Copy link
Contributor Author

timcera commented Feb 27, 2012

Warren,

Thank you for taking the time. This is very helpful.

The 'corner' regions are ill-defined, so the last axis wins out by using padded values of the previous axis. Is there a better convention? Would have to be able to extend to n-dimensions.

Kindest regards,
Tim

@WarrenWeckesser
Copy link
Member

I'll stick with my suggestion to simply remove 'pad_linear_ramp'.

@timcera
Copy link
Contributor Author

timcera commented Feb 27, 2012

The 'corner' regions are calculated from previously padded values in all of the functions for rank greater than 1. You made a similar observation about the pad_median function taking a median of medians.

I guess that a case could be made to go diagonally out from each corner, using values for the statistic from the diagonal of the input array. Which would be what you expected from pad_linear_ramp. Quite a bit of added complexity, and I don't see an easy way to do that at the moment (for rank n).

I use pad_linear_ramp for rank-1 arrays, and thought that it would be useful for rank-2 arrays (for example to minimize edge effects when taking a fft) - beyond that? What is done now for the corner regions in image work?

@teoliphant
Copy link
Member

The linear_ramp extrapolation technique is useful in 1-d. In fact, I could see adding a family of extrapolators based on polynomial extension based on the edge points.

In N-d, the corner cases for all the calculations are a good question. There are a lot of approaches that can be justified. I think the important thing is to document clearly what is done.

I suppose, my preference would be to produce the "average" of the different (separable) methods that make sense. Doing non-separable methods are more difficult to extend into N-d and also have higher computational cost for less benefit.

Yet another "curse of dimensionality"

@timcera
Copy link
Contributor Author

timcera commented Feb 28, 2012

Found some resources:

http://octave.sourceforge.net/image/function/padarray.html

Reenact the Octave examples:

a = [[1,2,3],[4,5,6]]
np.lib.pad_constant(a, ((2,2),(1,1)))       # Octave:  padarray([1,2,3;4,5,6],[2,1])
np.lib.pad_constant(a, ((2,2),(1,1)), 5)   # Octave:  padarray([1,2,3;4,5,6],[2,1], 5)
np.lib.pad_constant(a, ((2,0),(1,0)))     # Octave:  padarray([1,2,3;4,5,6],[2,1],0,'pre')
np.lib.pad_wrap(a, ((2,2),(1,1)))       # Octave:  padarray([1,2,3;4,5,6],[2,1],'circular')
np.lib.pad_edge(a, ((2,2),(1,1)))      # Octave: padarray([1,2,3;4,5,6],[2,1],'replicate')
# NA       Octave: padarray([1,2,3;4,5,6],[2,1],'symmetric')

Octave has 'symmetric' which I don't. No example but looks like Octave's 'reflect' is the same as mine. Not really complicated padding functions, but handles the corner regions the way that I did; edges of later axes use padded values of previous axes.

http://www.mathworks.com/help/toolbox/images/ref/padarray.html

Since Octave is a Matlab look alike, similar behavior to above:

#   NA           Matlab: padarray(a,[0 3],'symmetric','pre')
a = [[1,2],[3,4]]
np.lib.pad_edge(a, ((0,2),(0,3)))       # Matlab: padarray(A,[3 2],'replicate','post')
c = np.arange(8) + 1
c = c.reshape((2,2,2))
np.lib.pad_constant(c, ((0,0),(3,3),(3,3)))   # Matlab: padarray(C,[3 3])

http://www.mathworks.com/matlabcentral/fileexchange/7720-pad-array

User contribution that adds padding with tapering windowing functions (the closest that I would come to this is linear_ramp). Some examples, but no answers! Also returns placement and shape of original array within the padded array! :-)

http://reference.wolfram.com/mathematica/ref/ArrayPad.html

These guys have too much time on their hands!

Under the 'Scope' section there are reflection examples where the padding is longer than the length of the axis.

In[1]:= ArrayPad[{a,b,c}, 4, "Reflected"]
Out[1]= {a,b,c,b, a,b,c, b,a,b,c}

Basically the creation of the pad values just 'reflects' back and forth along the vector.

Here are all the Matlab types and a guess at how np.lib.pad would emulate...

  • c a constant c
    • np.lib.pad_constant
  • {c1,c2,...} cyclic repetition of constants
    • You would have to call np.lib.pad_constant multiple times
  • "Extrapolated" polynomial extrapolation of elements
    • NA
  • "Fixed" repetitions of the elements on each boundary
    • np.lib.pad_edge
  • "Periodic" cyclic repetitions of the complete array
    • NA
  • "Reflected" reflections of the array in the boundary
    • np.lib.pad_reflect
  • "ReflectedDifferences" reflections of the differences between elements
    • NA
  • "Reversed" reversals of the complete array
    • NA - same as Matlab/Octave 'symmetric'
  • "ReversedDifferences" reversals of the differences between elements
    • NA
  • "ReversedNegation" negated reversals of the array
    • NA

@timcera
Copy link
Contributor Author

timcera commented Mar 2, 2012

Hold off on looking at this yet. Of all things, I messed up some of the docstrings. Will fix soon.

@timcera
Copy link
Contributor Author

timcera commented Mar 22, 2012

Final review?

@charris
Copy link
Member

charris commented Mar 22, 2012

Hi Tim, I'm waiting for the folks involved in this to come to a conclusion. Pinging the list might get more attention.

@teoliphant
Copy link
Member

This looks good from my perspective.

def __str__(self):
return """

For pad_width should get a list/tuple with length
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The grammar of this error message is somewhat bust.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. Try to keep the error messages short and punchy.

of an n-dimensional array.
"""

#===imports======================
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove the imports/globals comments.

@charris
Copy link
Member

charris commented Mar 24, 2012

This looks pretty clean and complete. Just a few nits to pick here and there.

@timcera
Copy link
Contributor Author

timcera commented Mar 26, 2012

I have no idea about why all of these commits came in the way they did.

Any git guru that can help me out?

Kindest regards,
Tim

@stefanv
Copy link
Contributor

stefanv commented Mar 26, 2012

It looks like you did a merge with upstream--that should be avoided. You'll have to rebase your changes, and then do a forced push, e.g.:

git rebase upstream/master
git push -f timcera pad_new

@timcera
Copy link
Contributor Author

timcera commented Mar 27, 2012

Thank you Stefan. That made it mostly right.

I still had to revert a commit that reappeared (3e9c0b7) and there are two strange commits that appeared out of nowhere but fortunately cancel each other out (6ea2605 and ce0f2e9).

I looked at the overall diff and it is what I intend. I imagine that all of these commits are going to be squashed into one?

I apologize for my clumsiness with git. I need to finish this pull request and start fresh with a new fork.

Is there a best practices for numpy/scipy git workflow?

Stefan and Charles: the last round of review comments are addressed in 1de9de9.

@charris
Copy link
Member

charris commented Mar 27, 2012

It's still messed up :-/ If you merged master or upstream, you should be able to get back to the state before merge using git reflog. Here is what the beginning of the reflog looks like after I've been playing with your PR a bit:

$charris@f16 numpy.git (pull-198)$ git reflog
4877a8a HEAD@{0}: rebase -i (finish): returning to refs/heads/pull-198
4877a8a HEAD@{1}: checkout: moving from pull-198 to 4877a8a
4877a8a HEAD@{2}: rebase: aborting
3e52d3d HEAD@{3}: checkout: moving from pull-198 to 3e52d3d
4877a8a HEAD@{4}: rebase: aborting
3624c14 HEAD@{5}: checkout: moving from pull-198 to 3624c14
4877a8a HEAD@{6}: am: Revert "Module to pad n-dimensional arrays by using differ
b622424 HEAD@{7}: am: STYLE: Various style and grammar changes.
...

If you find a good place to return to, then do

git reset --hard HEAD@{nnn}

where nnn is the appropriate number. Make a backup tag first

git tag backup

Then when it all goes crash, you can recover

git checkout -b branchname backup

Mind, I haven't tried the last two bits myself ;)

@timcera
Copy link
Contributor Author

timcera commented Mar 27, 2012

Thanks. I now have a 'padn' branch which...

git diff master..padn

shows the right stuff.

How do I know that this is 'right'?

I then was thinking to do something like:

git branch -m pad_new pad_backup
git branch -m padn pad_new
git push -f origin pad_new

where 'origin' is my fork.

@charris
Copy link
Member

charris commented Mar 27, 2012

You can try that. If it doesn't work then I can pull your branch directly if you push it to github

git fetch https://github.com/timcera/numpy padn
git co -b padn FETCH_HEAD

or you can just make another pull request. In any case, we can make things work.

@charris
Copy link
Member

charris commented Mar 27, 2012

Its the history that we want right, not the diff. when all else fails, I make a diff with master, co out a new branch, and edit the patch into a series of smaller patches that can be applied one after the other. But I don't think we are at the stage of extremity...

@timcera timcera merged commit ec56ee1 into numpy:master Mar 27, 2012
luyahan pushed a commit to plctlab/numpy that referenced this pull request Apr 25, 2024
feat: Add vqsub[q]_[s8|s16|s32|s64|u8|u16|u32|u64]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants