Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add function to remove near objects in nd-image #4165

Open
wants to merge 69 commits into
base: main
Choose a base branch
from

Conversation

lagru
Copy link
Member

@lagru lagru commented Sep 16, 2019

Description

Replaces #4024 with an implementation based on SciPy's cKDTree.

I propose to add a function that iterates over objects (connected pixels that are not zero) inside a binary image and removes neighboring objects until all remaining ones are more than a minimal euclidean distance from each other.

Furthermore it moves the functions _resolve_neighborhood, _set_edge_values_inplace and _fast_pad into morphology._util. They are used for several nd-algorithms throughout the submodule so that seems like a sensible place to keep them. Done in #4209

This function might be used to implement a distance "filtering" for functions like local_maxima as well (closes #3816).

Checklist

Release note

For maintainers and optionally contributors, please refer to the instructions on how to document this PR for the release notes.

{label="New feature"} Add the new function `skimage.morphology.remove_near_objects`
which can remove labeled objects until a minimal distance is ensured. 
{label="Documentation"} Add a new gallery example on "Removing objects"
based on their size or distance.

@lagru lagru added the ⏩ type: Enhancement Improve existing features label Sep 16, 2019
@sciunto sciunto added this to the 0.17 milestone Sep 17, 2019
@pep8speaks
Copy link

pep8speaks commented Sep 19, 2019

Hello @lagru! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 189:80: E501 line too long (88 > 79 characters)

Line 327:80: E501 line too long (88 > 79 characters)

Line 290:80: E501 line too long (82 > 79 characters)
Line 328:80: E501 line too long (82 > 79 characters)
Line 334:80: E501 line too long (82 > 79 characters)
Line 366:80: E501 line too long (82 > 79 characters)
Line 376:80: E501 line too long (84 > 79 characters)
Line 402:80: E501 line too long (81 > 79 characters)
Line 420:80: E501 line too long (84 > 79 characters)

Comment last updated at 2022-04-10 15:22:21 UTC

@lagru lagru added type: new feature ⏩ type: Enhancement Improve existing features and removed ⏩ type: Enhancement Improve existing features labels Sep 21, 2019
@jni
Copy link
Member

jni commented Sep 26, 2019

@lagru thanks for this! Sorry that I don't have time for a full review just yet, but this is something I've wanted for a long time! =)

My three comments from a brief overview:

  • couldn't you use flood fill directly rather than reimplement it?
  • Should we maybe set the default priority to object size rather than C-order priority? Whatever we end up choosing, I would prefer that it be independent of axis direction! If size is too brittle (how to break ties?), then label ID might be a better choice. They might coincide sometimes, but in other times (e.g. watershed), it might work well. =)
  • Does SciPy's cKDtree have a C API we can use directly? And is it worth it anyway. ;)

Will do a proper review soon!

@lagru
Copy link
Member Author

lagru commented Sep 26, 2019

couldn't you use flood fill directly rather than reimplement it?

I think it might be possible to use _flood_fill_equal. However there are two disadvantages compared to the current solution:

  • The current approach reuses a single queue. This queue reallocates its internal buffer when it grows to small. Reusing that queue allows us to get away with a minimal number of reallocations.
  • _flood_fill_equal releases the gil. Generally that's a good thing. But I think I read somewhere that this can actually be slower if the time spent in the released state is very short. That's relevant for the current implementation because there might be many maxima which are only 1 sample large which would lead to releasing and acquiring the GIL very often in short order.

But I'll try it out and see what the benchmarks say so that we can make a more informed decision.

Should we maybe set the default priority to object size rather than C-order priority? Whatever we end up choosing, I would prefer that it be independent of axis direction! If size is too brittle (how to break ties?), then label ID might be a better choice. They might coincide sometimes, but in other times (e.g. watershed), it might work well. =)

Good idea! Using the object size sounds more intuitive. Currently the priority actually corresponds to the label ID which is assigned by C-order. I'll see what I can do.

Does SciPy's cKDtree have a C API we can use directly? And is it worth it anyway. ;)

That's something I'd like to know as well and a weak point of the current implementation. query_ball_point is called for every object sample that remains in the final image. Calling Python from Cython for something that's executed that often feels just wrong. 😅 If we could somehow use a C/Cython interface I'd expect significant gains performance wise. Furthermore we could consider releasing the GIL for the whole function. I already looked into doing this earlier and had the following thoughts:

  • query_ball_point is actually implemented in Cython but only has a Python interface (def and not cpdef). Perhaps we could ask SciPy to change that but then we'd have to wait until we can rely on that feature (minimal required SciPy version).
  • Internally query_ball_point calls a function of the same name that is declared in C++ here. I think we could potentially use that function directly (I doubt its considered part of SciPy's public API) but I couldn't figure out how to include that header file / use that function. The header file is not available unless SciPy is installed from source. But I'm not sure if we actually need that header and could use the shipped binary of ckdtree directly (ckdtree.pyx also contains a declaration of that function).

However because a KDTree is such a nice data structure for this problem 😍 the performance is already on par with the "brute force" solution in #4024 and the code is way shorter!

@lagru lagru changed the title Add function to remove close objects in binary nd-image Add function to remove close objects in nd-image Sep 27, 2019
@lagru
Copy link
Member Author

lagru commented Sep 27, 2019

Note to self: Find out if there is an efficient way to find the surface of objects. That should make significant performance gains possible because only the surface between two objects needs to be evaluated to find their distance to each other (the inside of objects can be ignored)!

This function iterates over all objects (connected pixels that are True)
inside an image and removes neighboring objects until all remaining ones
are at least a minimal euclidean distance from each other.
_extrema_cy.pyx was cythonized twice for now reason.
Simplify construction of indices, use consistent name for structuring
element (selem), some reformatting and improve documentation.
This trick significantly improves the performance by reducing the number
of points inside the KDTree whose neighborhood needs to be evaluated
for objects that are to close. The greater the checked objects size to
surface ratio, the greater the improvement.
numpy.bincount seems to cast its input to dtype('intp') according to
the rule safe. Therefore giving a dtype such as uint32 or int64 that
can hold a larger number fails on 32-bit platforms.
With recent changes the evalutation order of objects changed.
The latter was added in NumPy 1.15 which is not yet a minimal
requirement for scikit-image.
@stefanv stefanv dismissed stale reviews from jni and themself April 5, 2023 20:44

Lars is proposing to do some more work.

@lagru lagru modified the milestones: 0.21, 0.22 Apr 11, 2023
@jarrodmillman jarrodmillman removed this from the 0.22 milestone Oct 3, 2023
@lagru lagru added 👶 type: New feature and removed ⏩ type: Enhancement Improve existing features labels Apr 20, 2024
and also add a random test that uses CKDTree.sparse_distance_matrix to
manually check distance between remaining objects.
@lagru
Copy link
Member Author

lagru commented Apr 20, 2024

Ready to merge IMO! 🎉 I'm somewhat proud of this one now. 😊 #4165 (comment) is addressed and I figured out how to include only objects surface in the KD-tree, while still including the inside of objects when they are removed. That shortens up the critical loop of this algorithm significantly. It even works with anistropic data now; #7318 gave me the idea.

I included a basic outline of the algorithm in the docstring and documented complicated parts extensively with comments and tests. Hope that makes reviewing easier! 🙏

for unity spacing and if p-norm is 1 or 2. Also refactor code structure
a little bit.
continue

neighborhood = kdtree.query_ball_point(
kdtree.data[i_indices, ...],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just cross-reviewing from SciPy since there were some concerns expressed about the list return type here (which we need because of the ragged data structures that are possible) for performance reasons.

A few questions:

  1. Do you have any sample timings/examples demonstrating the bottleneck/% time spent on this line for a typical use case?
  2. Any chance you could use the workers argument to speed up via concurrency? This might be particularly viable if you could accumulate the query points externally first.
  3. How many points are you querying against in a typical use case? The reason I ask is that you seem to be querying the tree against itself, which may benefit from using query_ball_tree instead.

Copy link
Member Author

@lagru lagru Apr 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reviewing!

Do you have any sample timings/examples demonstrating the bottleneck/% time spent on this line for a typical use case?

Honestly, I haven't. Since the transition to meson, I'm not sure how to profile, debug or linetrace Cython files. Instead I tried to linetrace via a notebook without success. The approach suggested by Cython's docs

Details
import pstats, cProfile
import pyximport
pyximport.install()

import matplotlib.pyplot as plt
import skimage as ski

# Extract foreground by thresholding an image taken by the Hubble Telescope
image = ski.color.rgb2gray(ski.data.hubble_deep_field())
foreground = image > ski.filters.threshold_li(image)
objects = ski.measure.label(foreground)

cProfile.runctx(
    "ski.morphology.remove_near_objects(objects, min_distance=5)",
    globals(),
    locals(), 
    "Profile.prof",
)
s = pstats.Stats("Profile.prof")
s.strip_dirs().sort_stats("time").print_stats()

also doesn't work; it doesn't detect the _remove_near_objects stuff. Do you have a solution to profile / debug meson + Cython on SciPy's side? 🙏 I took a quick look at your repo but didn't find anything.

Any chance you could use the workers argument to speed up via concurrency? This might be particularly viable if you could accumulate the query points externally first.

Hmm, I thought about it, but with the current logic this is tricky to parallelize. It may work if only calls for a single object are combined. I'll look into it!

How many points are you querying against in a typical use case? The reason I ask is that you seem to be querying the tree against itself, which may benefit from using query_ball_tree instead.

The current implementation has to query once for every pixel that is on an objects boundary / surface. Objects that were already removed in earlier iterations are skipped. For the case demonstrated in the new gallery example of this PR, I call query_ball_point 2,636 times (out of 20,505 samples, ~13%). This heavily depends on what min_distance is used and how the objects surface to total size ratio is.

I tried query_ball_tree and sparse_distance_matrix, but this approach quickly runs into memory issues.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the following case

import numpy as np
import skimage as ski

image = ski.color.rgb2gray(ski.data.hubble_deep_field())
foreground1 = image > ski.filters.threshold_li(image)  # many small objects
foreground2 = np.ones_like(foreground1)
foreground2[400, :] = 0  # two very large objects
foreground = np.concatenate([foreground1, foreground2])
objects = ski.measure.label(foreground)

%timeit ski.morphology.remove_near_objects(objects, min_distance=5)
%timeit ski.morphology.remove_near_objects(objects, min_distance=50)
%timeit ski.morphology.remove_near_objects(objects, min_distance=100)
%timeit ski.morphology.remove_near_objects(objects, min_distance=300)

Querying and entire object's indices at once with kdtree.query_ball_point(kdtree.data[start:stop], ...],

Patch
Subject: [PATCH] Query indices with the same object ID at once
---
Index: skimage/morphology/_misc_cy.pyx
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/skimage/morphology/_misc_cy.pyx b/skimage/morphology/_misc_cy.pyx
--- a/skimage/morphology/_misc_cy.pyx	(revision c8756cae67078c1cfb05763a503792e05e842aef)
+++ b/skimage/morphology/_misc_cy.pyx	(date 1713872591799)
@@ -46,34 +46,45 @@
         The shape of the unraveled `image`.
     """
     cdef:
-        Py_ssize_t i_indices, j_indices  # Loop variables to index `indices`
+        Py_ssize_t i_indices, j_indices, start, stop  # Loop variables to index `indices`
         Py_ssize_t i_out  # Loop variable to index `out`
-        np_anyint object_id, other_id
-        list neighborhood
+        np_anyint start_id, stop_id
         set remembered_ids
+        list n
 
     remembered_ids = set()
-    for i_indices in range(border_indices.shape[0]):
-        i_out = border_indices[i_indices]
-        object_id = out[i_out]
-        # Skip if sample is part of a removed object
-        if object_id == 0:
+    start = 0
+    start_id = out[border_indices[start]]
+    for stop in range(border_indices.shape[0]):
+        i_out = border_indices[stop]
+        stop_id = out[i_out]
+
+        if start_id == stop_id:
+            continue
+        elif start_id == 0:
+            start = stop
+            start_id = stop_id
             continue
 
         neighborhood = kdtree.query_ball_point(
-            kdtree.data[i_indices, ...],
+            kdtree.data[start:stop, ...],
             r=min_distance,
             p=p_norm,
         )
-        for j_indices in neighborhood:
-            # Check object IDs in neighborhood
-            other_id = out[border_indices[j_indices]]
-            if other_id != 0 and other_id != object_id:
-                # If neighbor ID wasn't already removed or is the current one
-                # remove the boundary and remember the ID
-                _remove_object(out, border_indices, j_indices)
-                remembered_ids.add(other_id)
+
+        for n in neighborhood:
+            for j_indices in n:
+                # Check object IDs in neighborhood
+                other_id = out[border_indices[j_indices]]
+                if other_id != 0 and other_id != start_id:
+                    # If neighbor ID wasn't already removed or is the current one
+                    # remove the boundary and remember the ID
+                    _remove_object(out, border_indices, j_indices)
+                    remembered_ids.add(other_id)
 
+        start = stop
+        start_id = out[border_indices[start]]
+
     # Delete inner parts of remembered objects
     for j_indices in range(inner_indices.shape[0]):
         object_id = out[inner_indices[j_indices]]

is faster for min_distance=5 and 50 but a lot slower once min_distance get's larger. Without profiling I'm not really sure about the reason. The implementation that queries single points only, still seems to be the one that's fastest overall.

Using workers seems to introduce an overhead that, for most cases, is slower regardless of the specific implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add minimal distance argument to local_maxima
9 participants