Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor/move neighborhood utility functions in morphology #4209

Merged
merged 16 commits into from Apr 9, 2020

Conversation

lagru
Copy link
Member

@lagru lagru commented Oct 5, 2019

Description

Changes are extracted from #4165 hopefully making reviewing these easier.

  • This cleans up several functions _offsets_to_raveled_neighbors, _fast_pad, _set_edge_values_inplace and _resolve_neighborhood and consolidates them in the _util module (some where already there). These functions are used together at several places in the morphology module.
  • Fixes a small bug that blocks Add function to remove near objects in nd-image #4165: _offsets_to_raveled_neighbors would fail in case a dimension in the structuring element was larger than in the given image_shape. While this use case isn't necessary for algorithms that work by padding images before constructing the raveled offsets (e.g. flood_fill, local_maxima) this precludes other functions to work with images where a dimension is < 3 (typical length of the structuring element). With the new implementation this works even if the structuring element is larger than the image itself in any dimension.

For reviewers

  • Check that the PR title is short, concise, and will make sense 1 year
    later.
  • Check that new functions are imported in corresponding __init__.py.
  • Check that new features, API changes, and deprecations are mentioned in
    doc/release/release_dev.rst.
  • Consider backporting the PR with @meeseeksdev backport to v0.14.x

@lagru lagru added type: bug 🔧 type: Maintenance Refactoring and maintenance of internals labels Oct 5, 2019
skimage/morphology/_util.py Outdated Show resolved Hide resolved
skimage/morphology/_util.py Show resolved Hide resolved
@sciunto
Copy link
Member

sciunto commented Oct 10, 2019

Can you add a test that failed for #4165 please?

@pep8speaks
Copy link

pep8speaks commented Oct 21, 2019

Hello @lagru! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 100:11: E126 continuation line over-indented for hanging indent
Line 100:19: E241 multiple spaces after ','
Line 100:25: E241 multiple spaces after ','
Line 100:31: E241 multiple spaces after ','
Line 100:37: E241 multiple spaces after ','
Line 100:43: E241 multiple spaces after ','
Line 100:49: E241 multiple spaces after ','
Line 100:55: E241 multiple spaces after ','
Line 100:61: E241 multiple spaces after ','
Line 100:67: E241 multiple spaces after ','
Line 101:9: E131 continuation line unaligned for hanging indent
Line 101:14: E241 multiple spaces after ','
Line 101:38: E241 multiple spaces after ','
Line 101:44: E241 multiple spaces after ','
Line 101:50: E241 multiple spaces after ','
Line 101:62: E241 multiple spaces after ','
Line 102:20: E241 multiple spaces after ','
Line 102:26: E241 multiple spaces after ','
Line 117:11: E126 continuation line over-indented for hanging indent
Line 117:14: E241 multiple spaces after ','
Line 117:20: E241 multiple spaces after ','
Line 117:26: E241 multiple spaces after ','
Line 117:32: E241 multiple spaces after ','
Line 117:38: E241 multiple spaces after ','
Line 117:50: E241 multiple spaces after ','
Line 117:62: E241 multiple spaces after ','
Line 117:68: E241 multiple spaces after ','
Line 118:10: E131 continuation line unaligned for hanging indent
Line 118:14: E241 multiple spaces after ','
Line 118:20: E241 multiple spaces after ','
Line 118:26: E241 multiple spaces after ','
Line 118:32: E241 multiple spaces after ','
Line 118:38: E241 multiple spaces after ','
Line 118:44: E241 multiple spaces after ','
Line 118:50: E241 multiple spaces after ','
Line 118:56: E241 multiple spaces after ','
Line 118:62: E241 multiple spaces after ','
Line 118:68: E241 multiple spaces after ','
Line 119:20: E241 multiple spaces after ','
Line 119:26: E241 multiple spaces after ','
Line 119:32: E241 multiple spaces after ','
Line 119:50: E241 multiple spaces after ','
Line 120:10: E131 continuation line unaligned for hanging indent
Line 120:20: E241 multiple spaces after ','
Line 120:26: E241 multiple spaces after ','
Line 120:32: E241 multiple spaces after ','
Line 120:38: E241 multiple spaces after ','
Line 120:44: E241 multiple spaces after ','
Line 120:50: E241 multiple spaces after ','
Line 120:56: E241 multiple spaces after ','
Line 120:62: E241 multiple spaces after ','
Line 120:68: E241 multiple spaces after ','
Line 121:10: E131 continuation line unaligned for hanging indent
Line 121:20: E241 multiple spaces after ','
Line 121:26: E241 multiple spaces after ','
Line 121:44: E241 multiple spaces after ','
Line 121:62: E241 multiple spaces after ','
Line 121:68: E241 multiple spaces after ','
Line 122:9: E131 continuation line unaligned for hanging indent
Line 122:20: E241 multiple spaces after ','
Line 122:32: E241 multiple spaces after ','
Line 122:38: E241 multiple spaces after ','
Line 122:44: E241 multiple spaces after ','
Line 122:62: E241 multiple spaces after ','
Line 122:68: E241 multiple spaces after ','
Line 123:10: E131 continuation line unaligned for hanging indent
Line 123:14: E241 multiple spaces after ','
Line 123:20: E241 multiple spaces after ','
Line 123:26: E241 multiple spaces after ','
Line 123:32: E241 multiple spaces after ','
Line 124:20: E241 multiple spaces after ','
Line 124:26: E241 multiple spaces after ','
Line 124:32: E241 multiple spaces after ','
Line 124:38: E241 multiple spaces after ','
Line 124:50: E241 multiple spaces after ','
Line 124:62: E241 multiple spaces after ','
Line 125:10: E131 continuation line unaligned for hanging indent

Comment last updated at 2020-04-09 09:57:20 UTC


# Sort by distance
distances = np.abs(offsets).sum(axis=1)
raveled_offsets = raveled_offsets[np.argsort(distances)]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't been able to figure out a way to keep the same order for neighbors of the same connectivity. Maybe that's due to argsort defaulting to the unstable quicksort? This does affect in which order algorithms iterate over neighbors with the same distance. I don't think this relevant for the algorithms relying on this function though and the test suite seems to back this up...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand correctly that the problem is that the order of indices is not always the same? (lack of reproducibility)

If yes, why not add another information to disambiguate the distance, such as a raveled index within the structuring element? You could then use the order parameter of argsort, see for example

>>> a = np.array([(2, 10), (2, 5), (1, 3), (1, 2), (3, 1)], dtype=[('distance', 'f8'), ('index', 'f8')])                                                
>>> np.sort(a, order=['distance', 'index'])                                                                                                             
array([(1.,  2.), (1.,  3.), (2.,  5.), (2., 10.), (3.,  1.)],
      dtype=[('distance', '<f8'), ('index', '<f8')])
>>> np.argsort(a, order=['distance', 'index'])                                                                                                          
array([3, 2, 1, 0, 4])

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand correctly that the problem is that the order of indices is not always the same? (lack of reproducibility)

No it's always the same and reproducable but changes between the old and new implementation. E.g. old:

>>> _offsets_to_raveled_neighbors((3, 3, 3), np.ones((3, 3, 3)), (1, 1, 1))
array([ -1,   3,  -9,   1,   9,  -3,   4,   2,  12,  -2,  -4,  -6,  10,
        -8, -10, -12,   8,   6,  11, -13,   5,  -5,  -7, -11,   7,  13])

and new:

>>> _offsets_to_raveled_neighbors((3, 3, 3), np.ones((3, 3, 3)), (1, 1, 1))
array([  3,  -9,   1,  -1,   9,  -3,   4,   2,  12,  -2,  -4,  -6,  10,
        -8, -10, -12,   8,   6,  11, -13,   5,  -5,  -7, -11,   7,  13])

I vaguely remember fiddling around with the kind= argument to argsort. I think we want a stable sort that preserves the order of input arguments (which is memory-order) as long as they have the same distance. So kind="mergesort" and switch to kind="stable" once we hit NumPy 1.15?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. LOL Sorry for the long delay in responding!

@lagru lagru added this to the 0.17 milestone Oct 26, 2019
Simplify construction of indices, use consistent name for structuring
element (selem), some reformatting and improve documentation.
The old implementation would raise a ValueError if a dimension length
given in `image_shape` was smaller than in `selem`. This precluded
operations on images e.g. with the shape (2, 100).
which may act as lightweight tests.
The explicit tests are meant to catch subtle changes in the order of
neigbors with the same connectivity. They were reviewed by constructing
an image of the given shape and filling in the raveled image around the
image center and manual inspection of the result. E.g. with

    image = np.zeros(image_shape)
    image_raveled = image.ravel()
    image_center = tuple(s // 2 for s in image_shape)
    image_center_raveled = np.ravel_multi_index(
        image_center, image_shape
    )
    for i, offset in enumerate(offsets):
        index = image_center_raveled + offset
        assert image_raveled[index] == 0
        image_raveled[index] += i + 1
@lagru
Copy link
Member Author

lagru commented Dec 6, 2019

@emmanuelle, I rebased this as discussed in the meeting. :)

_fast_pad)
from ._util import _offsets_to_raveled_neighbors
from ._util import (_resolve_neighborhood, _set_edge_values_inplace,
_fast_pad, _offsets_to_raveled_neighbors)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will only be used in your other PR, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No? Not sure what you are referring to.

_resolve_neighborhood, _set_edge_values_inplace and _fast_pad are used in _flood_fill.py and extrema.py. While _offsets_to_raveled_neighbors are additionally used in max_tree.py and watershed.py.
#4165 will rely on _resolve_neighborhood and _offsets_to_raveled_neighbors.

@@ -51,35 +54,206 @@ def _validate_connectivity(image_dim, connectivity, offset):
return c_connectivity, offset


def _offsets_to_raveled_neighbors(image_shape, structure, center, order='C'):
def _offsets_to_raveled_neighbors(image_shape, selem, center, order='C'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the change for a better consistency. We should definitely make this list of variable names in the contributing notes :-).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, can I be the annoying objector here?

I think originally we stopped using selem because it is a very obscure term for people who are not familiar with image processing, and morphology in particular. Structure is a bit better, but imho "footprint" or "neighborhood" are the clearest. What do we think? Consistency is great, but let's think about what the best syntax is going forward, and be consistent with that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no clear preference concerning the best argument name for this, I just want to draw attention to the existence of the skimage.morphology.selem module.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jni I like neighborhood. But think this should be a separate PR. In any case I think selem is better than the old structure because of the afore mentioned consistency. So until we change this everywhere I'd leave it as selem.

Not a strong opinion though. I'll change it if you find a silent consensus.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is private. The choice is not critical at the end. :)


# If any dimension in image_shape is smaller than selem.shape
# duplicates might occur, remove them
if any(x < y for x, y in zip(image_shape, selem.shape)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couldn't you just add this part to the existing function? I'm curious to know why you modified the rest of the function, since I find it harder to read than the original one. It might be only a matter of taste of course...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Short answer: No. 😉 I probably should have done a better job highlighting the problem and what I did about it. Consider the following scenario:

I have an image of shape (10, 2) and want to compute the neighborhood offsets for a structuring element with the shape (3, 3) with center at (1, 1). This is a valid use case we need to anticipate because we might want the algorithm be able to step from image[i, 0] to image[i, 1 ] and vice versa. For that we need neighbors on both sides of the center and thus a dimension length of 3.

Now the old implementation would fail because np.ravel_multi_index doesn't support this use case:

import numpy as np
selem = np.ones((3, 3), dtype=bool)
np.ravel_multi_index(np.nonzero(selem), dims=(10, 2))
# Raises Exception: ...
#     np.ravel_multi_index(np.nonzero(selem), image.shape)
#  File "<__array_function__ internals>", line 6, in ravel_multi_index
# ValueError: invalid entry in coordinates array
np.ravel_multi_index(np.nonzero(selem), dims=(10, 3))
# Out: array([0, 1, 2, 3, 4, 5, 6, 7, 8])

So the previous implementation wasn't able to deal with this case because it relied on that function to compute the offsets. Unfortunately the mode argument is not helpful here because wrapping or clipping is not what we want. I tried adjusting the indices by the center index beforehand but again ravel_multi_index doesn't handle negative indices the way we want to.

But the new algorithm will lead to duplicate neighbors. I find it really hard to think about this but I'll try to explain with an example. We have the structuring element with the shape (3, 3) all ones and the center (1, 1) and a C-ordered image with the shape (6, 2). After adjusting for the center we get the raveled offsets (representation with index pairs ordered by dimension):

[[(-1, -1), (-1,  0), (-1,  1)],
 [( 0, -1), ( 0,  0), ( 0,  1)],
 [( 1, -1), ( 1,  0), ( 1,  1)]]

The second value in a pair denotes the step in the last dimension which is contiguous (-> ravel factor 1). But because samples in the first dimension don't border each other we need to address this by multiplying with the length of the previous dimension (-> ravel factor 2). For (0, 1) and (1, -1) this results (0, 1) and [2, -1] which both resolve (by summing) to the raveled index 1. Considering the array

[[ 0,  1],
 [ 2,  3],
 [ 4,  5],
 [ 6,  7],
 [ 8,  9],
 [10, 11]]

this means that the pixel to the right of 6 is 7 and to the lower-right is 7 as well! This "memory-wise wrapping" when crossing image boundaries might be counter-intuitive to simply wrapping around in one dimension. Though, for the functions using this result it's not relevant because they always check that the image edge is not crossed while iterating the neighborhood.

It's also interesting to switch the image shape (2, 7) around (or use F-order) in which case we get no duplicates value wise [-7, -6, -5, -1, 0, 1, 5, 6, 7]. However we get the same wrapping behavior.

Okay, I hope that helped. 😅

Though, I wholeheartedly agree that the old implementation is simpler, easier to read and probably to maintain but I saw no way around it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very interesting! Thanks for the exposé, @lagru! Stupid edge cases ruining our otherwise elegant lives... =P I'll try to do a proper review of this PR in the next couple of days... Thanks!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See also numpy/numpy#15475.

@emmanuelle
Copy link
Member

Thanks @lagru , I left a couple of questions for you.

skimage/morphology/_util.py Outdated Show resolved Hide resolved
Copy link
Member

@jni jni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lagru I've left some very minor comments! Sorry for the long lag time!

"edge" may be associated with "edge filters" while "border" is used
throughout the library already.
@rfezzani rfezzani added 🩹 type: Bug fix Fixes unexpected or incorrect behavior and removed type: bug labels Feb 22, 2020
@sciunto
Copy link
Member

sciunto commented Apr 8, 2020

@lagru would you like to resolve the conflicts?

@sciunto
Copy link
Member

sciunto commented Apr 8, 2020

@scikit-image/core This PR is MRG+1 for a while, we need a second review here please :)

@lagru
Copy link
Member Author

lagru commented Apr 8, 2020

@sciunto done. Thanks for bringing this to attention.

Copy link
Member

@rfezzani rfezzani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some cosmetic suggestions, otherwise, very clean ;-). Thank you @lagru.

@lagru
Copy link
Member Author

lagru commented Apr 9, 2020

@rfezzani thanks for the feedback. I think the discrepancy between the "connectivity" parameter in the watershed function and its usage everywhere else is confusing and we should address this. I'll probably touch the related error messages in a future PR anyway because that would allow more descriptive ones.

@rfezzani rfezzani merged commit 29106cc into scikit-image:master Apr 9, 2020
@rfezzani
Copy link
Member

rfezzani commented Apr 9, 2020

Thank you @lagru 🎉

@lagru lagru deleted the refactor-neighbor-offsets branch April 9, 2020 12:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🩹 type: Bug fix Fixes unexpected or incorrect behavior 🔧 type: Maintenance Refactoring and maintenance of internals
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants