New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor/move neighborhood utility functions in morphology #4209
Refactor/move neighborhood utility functions in morphology #4209
Conversation
76d8c11
to
55a450f
Compare
Can you add a test that failed for #4165 please? |
5c8e486
to
87ca278
Compare
Hello @lagru! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
Comment last updated at 2020-04-09 09:57:20 UTC |
|
||
# Sort by distance | ||
distances = np.abs(offsets).sum(axis=1) | ||
raveled_offsets = raveled_offsets[np.argsort(distances)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't been able to figure out a way to keep the same order for neighbors of the same connectivity. Maybe that's due to argsort
defaulting to the unstable quicksort
? This does affect in which order algorithms iterate over neighbors with the same distance. I don't think this relevant for the algorithms relying on this function though and the test suite seems to back this up...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I understand correctly that the problem is that the order of indices is not always the same? (lack of reproducibility)
If yes, why not add another information to disambiguate the distance, such as a raveled index within the structuring element? You could then use the order
parameter of argsort
, see for example
>>> a = np.array([(2, 10), (2, 5), (1, 3), (1, 2), (3, 1)], dtype=[('distance', 'f8'), ('index', 'f8')])
>>> np.sort(a, order=['distance', 'index'])
array([(1., 2.), (1., 3.), (2., 5.), (2., 10.), (3., 1.)],
dtype=[('distance', '<f8'), ('index', '<f8')])
>>> np.argsort(a, order=['distance', 'index'])
array([3, 2, 1, 0, 4])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I understand correctly that the problem is that the order of indices is not always the same? (lack of reproducibility)
No it's always the same and reproducable but changes between the old and new implementation. E.g. old:
>>> _offsets_to_raveled_neighbors((3, 3, 3), np.ones((3, 3, 3)), (1, 1, 1))
array([ -1, 3, -9, 1, 9, -3, 4, 2, 12, -2, -4, -6, 10,
-8, -10, -12, 8, 6, 11, -13, 5, -5, -7, -11, 7, 13])
and new:
>>> _offsets_to_raveled_neighbors((3, 3, 3), np.ones((3, 3, 3)), (1, 1, 1))
array([ 3, -9, 1, -1, 9, -3, 4, 2, 12, -2, -4, -6, 10,
-8, -10, -12, 8, 6, 11, -13, 5, -5, -7, -11, 7, 13])
I vaguely remember fiddling around with the kind=
argument to argsort
. I think we want a stable sort that preserves the order of input arguments (which is memory-order) as long as they have the same distance. So kind="mergesort"
and switch to kind="stable"
once we hit NumPy 1.15?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. LOL Sorry for the long delay in responding!
Simplify construction of indices, use consistent name for structuring element (selem), some reformatting and improve documentation.
The old implementation would raise a ValueError if a dimension length given in `image_shape` was smaller than in `selem`. This precluded operations on images e.g. with the shape (2, 100).
which may act as lightweight tests.
The explicit tests are meant to catch subtle changes in the order of neigbors with the same connectivity. They were reviewed by constructing an image of the given shape and filling in the raveled image around the image center and manual inspection of the result. E.g. with image = np.zeros(image_shape) image_raveled = image.ravel() image_center = tuple(s // 2 for s in image_shape) image_center_raveled = np.ravel_multi_index( image_center, image_shape ) for i, offset in enumerate(offsets): index = image_center_raveled + offset assert image_raveled[index] == 0 image_raveled[index] += i + 1
b4d2e0a
to
ff952e9
Compare
@emmanuelle, I rebased this as discussed in the meeting. :) |
_fast_pad) | ||
from ._util import _offsets_to_raveled_neighbors | ||
from ._util import (_resolve_neighborhood, _set_edge_values_inplace, | ||
_fast_pad, _offsets_to_raveled_neighbors) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will only be used in your other PR, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No? Not sure what you are referring to.
_resolve_neighborhood
, _set_edge_values_inplace
and _fast_pad
are used in _flood_fill.py
and extrema.py
. While _offsets_to_raveled_neighbors
are additionally used in max_tree.py
and watershed.py
.
#4165 will rely on _resolve_neighborhood
and _offsets_to_raveled_neighbors
.
@@ -51,35 +54,206 @@ def _validate_connectivity(image_dim, connectivity, offset): | |||
return c_connectivity, offset | |||
|
|||
|
|||
def _offsets_to_raveled_neighbors(image_shape, structure, center, order='C'): | |||
def _offsets_to_raveled_neighbors(image_shape, selem, center, order='C'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the change for a better consistency. We should definitely make this list of variable names in the contributing notes :-).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, can I be the annoying objector here?
I think originally we stopped using selem because it is a very obscure term for people who are not familiar with image processing, and morphology in particular. Structure is a bit better, but imho "footprint" or "neighborhood" are the clearest. What do we think? Consistency is great, but let's think about what the best syntax is going forward, and be consistent with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no clear preference concerning the best argument name for this, I just want to draw attention to the existence of the skimage.morphology.selem
module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jni I like neighborhood
. But think this should be a separate PR. In any case I think selem
is better than the old structure
because of the afore mentioned consistency. So until we change this everywhere I'd leave it as selem
.
Not a strong opinion though. I'll change it if you find a silent consensus.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is private. The choice is not critical at the end. :)
|
||
# If any dimension in image_shape is smaller than selem.shape | ||
# duplicates might occur, remove them | ||
if any(x < y for x, y in zip(image_shape, selem.shape)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
couldn't you just add this part to the existing function? I'm curious to know why you modified the rest of the function, since I find it harder to read than the original one. It might be only a matter of taste of course...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Short answer: No. 😉 I probably should have done a better job highlighting the problem and what I did about it. Consider the following scenario:
I have an image
of shape (10, 2)
and want to compute the neighborhood offsets for a structuring element with the shape (3, 3)
with center at (1, 1)
. This is a valid use case we need to anticipate because we might want the algorithm be able to step from image[i, 0]
to image[i, 1 ]
and vice versa. For that we need neighbors on both sides of the center and thus a dimension length of 3.
Now the old implementation would fail because np.ravel_multi_index doesn't support this use case:
import numpy as np
selem = np.ones((3, 3), dtype=bool)
np.ravel_multi_index(np.nonzero(selem), dims=(10, 2))
# Raises Exception: ...
# np.ravel_multi_index(np.nonzero(selem), image.shape)
# File "<__array_function__ internals>", line 6, in ravel_multi_index
# ValueError: invalid entry in coordinates array
np.ravel_multi_index(np.nonzero(selem), dims=(10, 3))
# Out: array([0, 1, 2, 3, 4, 5, 6, 7, 8])
So the previous implementation wasn't able to deal with this case because it relied on that function to compute the offsets. Unfortunately the mode
argument is not helpful here because wrapping or clipping is not what we want. I tried adjusting the indices by the center index beforehand but again ravel_multi_index
doesn't handle negative indices the way we want to.
But the new algorithm will lead to duplicate neighbors. I find it really hard to think about this but I'll try to explain with an example. We have the structuring element with the shape (3, 3)
all ones and the center (1, 1)
and a C-ordered image with the shape (6, 2)
. After adjusting for the center we get the raveled offsets (representation with index pairs ordered by dimension):
[[(-1, -1), (-1, 0), (-1, 1)],
[( 0, -1), ( 0, 0), ( 0, 1)],
[( 1, -1), ( 1, 0), ( 1, 1)]]
The second value in a pair denotes the step in the last dimension which is contiguous (-> ravel factor 1). But because samples in the first dimension don't border each other we need to address this by multiplying with the length of the previous dimension (-> ravel factor 2). For (0, 1)
and (1, -1)
this results (0, 1)
and [2, -1]
which both resolve (by summing) to the raveled index 1. Considering the array
[[ 0, 1],
[ 2, 3],
[ 4, 5],
[ 6, 7],
[ 8, 9],
[10, 11]]
this means that the pixel to the right of 6
is 7
and to the lower-right is 7
as well! This "memory-wise wrapping" when crossing image boundaries might be counter-intuitive to simply wrapping around in one dimension. Though, for the functions using this result it's not relevant because they always check that the image edge is not crossed while iterating the neighborhood.
It's also interesting to switch the image shape (2, 7)
around (or use F-order) in which case we get no duplicates value wise [-7, -6, -5, -1, 0, 1, 5, 6, 7]
. However we get the same wrapping behavior.
Okay, I hope that helped. 😅
Though, I wholeheartedly agree that the old implementation is simpler, easier to read and probably to maintain but I saw no way around it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very interesting! Thanks for the exposé, @lagru! Stupid edge cases ruining our otherwise elegant lives... =P I'll try to do a proper review of this PR in the next couple of days... Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See also numpy/numpy#15475.
Thanks @lagru , I left a couple of questions for you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lagru I've left some very minor comments! Sorry for the long lag time!
"edge" may be associated with "edge filters" while "border" is used throughout the library already.
@lagru would you like to resolve the conflicts? |
@scikit-image/core This PR is MRG+1 for a while, we need a second review here please :) |
_fast_pad was changed on master in e58cead while it was moved on this branch.
@sciunto done. Thanks for bringing this to attention. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some cosmetic suggestions, otherwise, very clean ;-). Thank you @lagru.
@rfezzani thanks for the feedback. I think the discrepancy between the "connectivity" parameter in the watershed function and its usage everywhere else is confusing and we should address this. I'll probably touch the related error messages in a future PR anyway because that would allow more descriptive ones. |
Thank you @lagru 🎉 |
Description
Changes are extracted from #4165 hopefully making reviewing these easier.
_offsets_to_raveled_neighbors
,_fast_pad
,_set_edge_values_inplace
and_resolve_neighborhood
and consolidates them in the_util
module (some where already there). These functions are used together at several places in themorphology
module._offsets_to_raveled_neighbors
would fail in case a dimension in the structuring element was larger than in the givenimage_shape
. While this use case isn't necessary for algorithms that work by padding images before constructing the raveled offsets (e.g.flood_fill
,local_maxima
) this precludes other functions to work with images where a dimension is < 3 (typical length of the structuring element). With the new implementation this works even if the structuring element is larger than the image itself in any dimension.For reviewers
later.
__init__.py
.doc/release/release_dev.rst
.@meeseeksdev backport to v0.14.x