Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resampling of nD arrays #511

Merged
merged 14 commits into from
Jul 5, 2013
Merged

Conversation

ankit-maverick
Copy link
Contributor

Current api :

In [1]: import numpy as np
In [2]: from skimage.transform._warps import downscale_local_means
In [3]: a = np.arange(16).reshape(4, 4)
In [4]: a
Out[4]: 
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])
In [5]: downscale_local_means(a, (2, 2))
Out[5]: 
array([[  2.5,   4.5],
       [ 10.5,  12.5]])
In [6]: 
In [6]: downscale_local_means(a, (2, 3))
Out[6]: 
array([[  3.        ,   1.66666667],
       [ 11.        ,   4.33333333]])
In [7]: from skimage.measure._sum_blocks import sum_blocks

In [8]: sum_blocks(a, (2, 2))
Out[8]: 
array([[10, 18],
       [42, 50]])


import numpy as np

def downsample(image, factors, how = 'sum'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP8: "Don't use spaces around the = sign when used to indicate a keyword argument or a default parameter value."

Also, I'm unsure if we have an official standard API for this but I believe a better kwarg would be mode or method.

@JDWarner
Copy link
Contributor

Good beginnings! I left some comments in the code.

That Travis failure has nothing to do with this PR.

Thought to consider: could this easily be generalized to 3-D or n-D? I believe so...

@ankit-maverick
Copy link
Contributor Author

@JDWarner : Sure, that should be possible. Though, I would have to think a bit on how to generalize it to n-dimensions.

@cdeil
Copy link
Contributor

cdeil commented Apr 21, 2013

@ankit-maverick Thanks for implementing this!

Here's my comments:

  • +1 to @JDWarner's suggestions to make it n-D. The first argument should then be renamed array instead of image.
  • At the moment the looping is done in Python ... can you make this fast by using clever numpy trickery or Cython?
  • Intuitively it's not clear to me what "divide" and "uniform" mean for up-sampling. I don't have a better suggestion though ... maybe someone else has an idea?
  • The two different cases "sum" and "mean" only differ by a constant factor and shouldn't be coded twice. @JDWarner suggested applying the factor after the loops for the "mean" case. Another option could be to always multiply with a factor, but have that factor be one for the "sum" case. I think doing it this way might be faster because there is only one pass and not two over the data. But I'm not sure ... probably better to think about how to do the n-D case and avoid looping in Python as much as possible first.

@ankit-maverick
Copy link
Contributor Author

@cdeil : Thanks a lot for the feedback. This seems a good amount of refactoring to do. I am currently busy with my end-terms, but will push as soon as it's possible for me.

@JDWarner
Copy link
Contributor

@cdeil 👍

  • Agreed. I think 3D would be a good base goal, with hopes for n-D if possible.
  • Now that I think about it, skimage.util.view_as_blocks could provide a basis for fast vectorized numpy execution for the integer case. Might need to pad with zeros first. For the non-integer case, Cython may be needed to achieve similar speed.
  • Question: Is up-sampling meaningful in this context? I understand what's going on when downsampling (adding up local samples in a deterministic way), but upsampling (smearing out local values over a larger region) seems like very poor extrapolation. It's possible I'm just missing the point, being in a different field.
  • Agreed!

Two additional thoughts:

  • Is this truly a transform, or is it more of a measurement? Thinking about if it belongs in skimage.transform vs skimage.measure.
  • dtype concerns for overflows in sum method (related to previous). If it's a measurement we want to be able to quantify the combination, potentially even if it would have overflowed the original dtype. If it's a transform, I would generally expect the result to maintain dtype. In the second case, using "mean" exclusively would solve the problem - but I'm not sure which behavior is desired. Just bringing it up.

@stefanv
Copy link
Member

stefanv commented Apr 22, 2013

@JDWarner Perhaps we can start with the case where only multiples of factor is supported. I agree that upsampling this way is a bit odd.

@ankit-maverick
Copy link
Contributor Author

@cdeil @JDWarner : I still haven't found any alternative to Python looping in numpy. If you have any idea about something better that can be used, please point me to that resource.

Also, because of the above reason, the only way currently I can think of implementing these functions for 3D is by first testing the dimension of input ndarray and add code for 3 nested for loops which seems inefficient. Any ideas that you can think of?

@ahojnnes
Copy link
Member

I have sth in mind, that could work for arbitrary dimensions... Not sure if it works, but I can show you when I get home.

Johannes Schönberger

Am 30.04.2013 um 16:12 schrieb "AnkitAgrawal" <notifications@github.commailto:notifications@github.com>:

@cdeilhttps://github.com/cdeil @JDWarnerhttps://github.com/JDWarner : I still haven't found any alternative to Python looping in numpy. If you have any idea about something better that can be used, please point me to that resource.

Also, because of the above reason, the only way currently I can think of implementing these functions for 3D is by first testing the dimension of input ndarray and add code for 3 nested for loops which seems inefficient. Any ideas that you can think of?


Reply to this email directly or view it on GitHubhttps://github.com//pull/511#issuecomment-17229355.

@JDWarner
Copy link
Contributor

@ankit-maverick For vectorized looping over non-overlapping patches, check out skimage.util.view_as_blocks. This really neat function provides a view into the array using striding tricks. Here's a quick example in 2D (but view_as_blocks supports arbitrary dimensionality, so long as arr.ndim and the patch size length agree):

>>> import numpy as np
>>> from skimage.util import view_as_blocks
>>> arr = np.arange(64).reshape((8, 8))
>>> view_as_blocks(arr, (2, 2)).shape
(4, 4, 2, 2)
>>> view_as_blocks(arr, (4, 4)).shape
(2, 2, 4, 4)

Each (point, point, :, :) in this example represents a patch. Then, to be happy with arbitrary dimensionality, just flatten the patches and sum along the final axis. There's a trick that helps for this in np.reshape: if you pass a partial desired shape and then end with -1 all further axes are flattened into that axis.

The only gotcha here is that view_as_blocks only supports patch shapes which cleanly divide into the original array shape. If an array with one dimension that happens to be prime gets passed, this would be trouble. So, to be truly general, first check and ensure this is OK and if not pad asymmetrically (with zeros, I would think).

I'm sure there are other ways to do this, too.

@ankit-maverick
Copy link
Contributor Author

@JDWarner @ahojnnes : Thanks. Anyways I got something much cleaner and better. A good commit is coming!

@ankit-maverick
Copy link
Contributor Author

Credits to @seberg for suggesting this function to me on IRC.

@ankit-maverick
Copy link
Contributor Author

TO DECIDE :

  • Best location for these functions in the codebase : transform module
  • Correct way of implementing Upsampling


def downsample(image, factors, method='sum'):

# works only if image.shape is perfectly divisible by factors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to relax this limitation and allow sums over arbitrary integer patches (obviously <= original array size). The method would be to zero pad each dimension which fails a residual check by the minimum necessary to allow the patch to fit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To support arbitrary patch sizes with zero padding but retain n-D generality, this utility function may be of use:

def _pad_asymmetric_zeros(arr, pad_amt, axis=-1):
    """Pads `arr` by `pad_amt` along specified `axis`"""
    if axis == -1:
        axis = arr.ndim - 1

    zeroshape = tuple([x if i != axis else pad_amt
                       for (i, x) in enumerate(arr.shape)])

    return np.concatenate((arr, np.zeros(zeroshape, dtype=arr.dtype)),
                          axis=axis)

Check each axis and if patch size does not divide evenly, apply this with appropriate pad_amt.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Zero padding will be better than cropping because of loss of boundary information in the later, but then the boundary values may seem inconsistent with the neighbouring blocks. I am wondering if something like reflective padding can be better here? @JDWarner : Your thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Entirely depends on the goal. From what I understand ref. @cdeil the sum method is aimed at being quantitative. In this case you definitely want to use zero padding, or you change the total sum of the image! This is really bad for scientific applications. If this ended up in skimage.measure, or a wrapper for that method, I would definitely say zeros is the way to go.

On the other hand if you're going for looks rather than quantitative science, "downsampling" an image with local means, then I would say reflective padding would be appropriate - otherwise you'd get dark edges.

@JDWarner
Copy link
Contributor

JDWarner commented May 1, 2013

Thanks for the update, achieving n-D!

In my opinion, the method='sum' variant belongs more in skimage.measure with a name like sum_local_patches or local_patch_sums (which would be easier to have a parallel local_patch_means). The goal here seems to be quantitative. I'd expect any true resampling method to preserve the input dtype, so I'm hesitant to allow this to hit release in skimage.transform and surprise users.

The 'mean' variant makes more sense in a resampling context (and could/should preserve dtype if used in this way). As far as naming goes my objections to using derivatives of the term "sampling" in this context are on record, but perhaps I'm being overly pedantic. resample_local_means perhaps?

@stefanv @cdeil thoughts?

@stefanv
Copy link
Member

stefanv commented May 1, 2013

Let's keep up/downsampling in the transform module for now--I think that's where most people will look for it by default.

@ankit-maverick
Copy link
Contributor Author

@JDWarner @stefanv @cdeil @ahojnnes : On further thought, I feel the current implementation of upsample is redundant and not meaningful since it does not increase the resolution of the image in the true sense, for eg: upsampling by a factor of (2, 2) with the current implementation will return image with 4x size but since 2x2 pixel blocks have same values, it's not of any use, but a waste of memory.

Instead, we could do interpolation while upsampling like Image Resize Example in http://www.cambridgeincolour.com/tutorials/image-interpolation.htm.

Also, check some interpolation results at scipy.interpolate doc : http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html

@ all : your thoughts?

@ahojnnes
Copy link
Member

ahojnnes commented May 2, 2013

@ankit-maverick I have implemented all of these interpolation methods. See skimage.transform.warp.

I think the best place to put your downsample function is skimage.transform._warps.py, because there we already have rescale, resize etc. and they are all related to this problem in my opinion.

@ankit-maverick
Copy link
Contributor Author

@ankit-maverick I have implemented all of these interpolation methods. See skimage.transform.warp.

@ahojnnes : Thanks for informing me.

I think the best place to put your downsample function is skimage.transform._warps.py, because there we already have rescale, resize etc. and they are all related to this problem in my opinion.

I will update this in the next commit with tests and documentation probably tomorrow once I am done with my work on the SoC proposal.


def downsample(array, factors, mode='sum'):
"""Performs downsampling with integer factors.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need an in-depth description of the algorithm here. Especially highlight the difference to resize / rescale - maybe cross-reference downsample in those functions as well.

@ahojnnes
Copy link
Member

ahojnnes commented May 7, 2013

@ankit-maverick Your implementation is very elegant now, I really like it! I made some minor comments… let me know what you think.

@ahojnnes ahojnnes closed this Jul 4, 2013
@ahojnnes ahojnnes reopened this Jul 4, 2013
@ankit-maverick
Copy link
Contributor Author

@ahojnnes Can you check the Travis log and check why I am getting the ImportError? Works fine locally.

@sciunto
Copy link
Member

sciunto commented Jul 4, 2013

@ankit-maverick circular import?

@ankit-maverick
Copy link
Contributor Author

@sciunto Thanks for pointing out. But I still don't understand why the import chain doesn't stop at this line https://travis-ci.org/scikit-image/scikit-image/jobs/8729027#L2554

@ankit-maverick
Copy link
Contributor Author

@stefanv @ahojnnes Anyone there to help with the above issue? I would like to close this up before I go to sleep.

@JDWarner
Copy link
Contributor

JDWarner commented Jul 4, 2013

Even though you specifically told it to import from ..transform._warps the __init__.py file in skimage.transform still gets parsed. Then hough_transform gets imported, which itself attempts to import measure. Round and round we go...

Sometimes this can be fixed by moving an import inside a function. Only the function hough_line_peaks uses measure, so you could try this... though I'm not sure it will work in this case.

@ankit-maverick
Copy link
Contributor Author

Thanks a lot @JDWarner , your suggestion worked!!

@cdeil @ahojnnes This can be merged now.

@ahojnnes
Copy link
Member

ahojnnes commented Jul 5, 2013

I have some more smaller concerns, but I'll shortly address them in another PR.

ahojnnes added a commit that referenced this pull request Jul 5, 2013
@ahojnnes ahojnnes merged commit 24c0c40 into scikit-image:master Jul 5, 2013


def resize(image, output_shape, order=1, mode='constant', cval=0.):
"""Resize image to match a certain size.

Resize performs interpolation to upsample or downsample 2D arrays. For
downsampling any n-dimensional array by performing arithmetic sum or
arithmetic mean, see measure._sum_blocks.sum_blocks and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we making public references to private functions here?

@ahojnnes
Copy link
Member

ahojnnes commented Jul 5, 2013

@stefanv There is already a new PR to fix the issues.

@coveralls
Copy link

Coverage Status

Changes Unknown when pulling 971e7be on ankit-maverick:resample into * on scikit-image:master*.

@forman
Copy link

forman commented Jul 26, 2016

Just stumbled over #492 where it seems I had very similar requirements as @cdeil and developed https://github.com/CAB-LAB/gridtools, which are now in operational use for the production of a climate data cube. I use Numba for performance reasons. I'm happy to contribute one or the other method.

@jni
Copy link
Member

jni commented Jul 26, 2016

@forman Sorry what do you mean exactly? It seems from a quick reading that @cdeil's requirements were met by this PR?

@forman
Copy link

forman commented Jul 28, 2016

@jni oh, sorry, I wasn't clear at all. I just wanted to inform anyone interested about some additional downsampling methods I've implemented in gridtools which may extend scikit-image capabilities.

@cdeil
Copy link
Contributor

cdeil commented Jul 28, 2016

@forman - Thanks for pointing out https://github.com/CAB-LAB/gridtools

I see you've released it under the GPL license.

In case you're not aware: this means that many numerical Python people will not read or use or contribute to it. Numpy, Scipy, Numba, scikit-image, ... most numerical Python packages are under a liberal license like BSD or MIT.
I don't want to argue about licensing, but just in case you aren't opposed: please re-license to improve interop and re-use with similar numerical Python packages.

@forman
Copy link

forman commented Jul 28, 2016

@cdeil you are right, I'll put it under a more liberal license for the next release. Thanks!

@cdeil
Copy link
Contributor

cdeil commented Jul 28, 2016

@forman - Thank you!

In this case ... I'll install it and check it out now.

😄

@forman
Copy link

forman commented Jul 28, 2016

Done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants