Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Add minimal distance argument to local_maxima #4024

Closed
wants to merge 3 commits into from

Conversation

lagru
Copy link
Member

@lagru lagru commented Jul 14, 2019

Description

Closes #3816.

Adds the optional keyword distance to local_maxima (and local_minima) to specify the minimal euclidean distance allowed between maxima. In case of a conflict, the maximum with the smaller value is dismissed.

Checklist

For reviewers

  • Check that the PR title is short, concise, and will make sense 1 year
    later.
  • Check that new functions are imported in corresponding __init__.py.
  • Check that new features, API changes, and deprecations are mentioned in
    doc/release/release_dev.rst.
  • Consider backporting the PR with @meeseeksdev backport to v0.14.x

Adds the optional keyword `distance` to local_maxima (and local_minima)
to specify the minimal euclidean distance allowed between maxima. In
case of a conflict, the maximum with the smaller value is dismissed.
@pep8speaks
Copy link

pep8speaks commented Jul 14, 2019

Hello @lagru! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-07-14 21:28:31 UTC

@lagru
Copy link
Member Author

lagru commented Jul 14, 2019

The relevant implementation is in Cython and uses a rather "naive brute force" algorithm:

  • Iterate maxima in order of priority (here maxima with higher image values are prioritized)
    • continue if current maximum was removed in an earlier step
    • Find all points of current maxima
    • Find all points within minimal distance of any point in the current maxima
    • Remove all other maxima that reach into this volume

Early benchmarks indicate a slow-down by 400% and more depending on how the distance-keyword is used. I still want to benchmark this more thoroughly and see if I can find any optimizations and how it compares to similar functions: e.g. peak_Local_max and find_peaks come to mind.

The current API is not necessarily the best one. It may be useful to make this public or add this keyword to h_maxima as well.

@lagru
Copy link
Member Author

lagru commented Jul 14, 2019

Can't believe I forgot my earlier attempt #3016 (comment). 😄 I'll need to investigate how these two approaches compare...

@lagru
Copy link
Member Author

lagru commented Jul 14, 2019

Okay, a quick evaluation hints that the implementation in #3016 (comment) is significantly slower and scales very badly with an increasing number of peaks. So the new approach seems like the way to go.

@jni
Copy link
Member

jni commented Jul 23, 2019

@lagru very cool! Sorry for the silence, it's been busy. =)

I haven't had a chance to fully understand the code yet. With my review, I'll probably aim to break the function into some smaller functions so that the flow can be clearer. In the meantime though, it would be great to add some tests to this, and in particular to verify that functions where peak_local_max is currently doing a poor job actually work with this approach, e.g. #2728

Also, imho the keyword argument should be called minimum_distance, rather than just distance.

Thanks!

@jni
Copy link
Member

jni commented Jul 23, 2019

@lagru another point, about your initial implementation: you were using brute-force to compare the distance to every other peak. Instead, a scipy.spatial.cKDTree could be used to rapidly search for nearby maxima, once you have the coordinates.

@lagru
Copy link
Member Author

lagru commented Jul 23, 2019

I'll probably aim to break the function into some smaller functions so that the flow can be clearer. In the meantime though, it would be great to add some tests to this, and in particular to verify that functions where peak_local_max is currently doing a poor job actually work with this approach

Great suggestions. Definitely something to keep in mind/do next. I'm still tweaking the algorithm so I refrained from adding tests and compartmentalizing the code for now.

cKDTree looks like it could be really helpful. I initially shied away because it works with "unraveled" indices (that can be solved though). I'm guessing that because it's written in C we could use it directly in Cython. Not sure whether the C-API is part of SciPy's official API though. I'll investigate and see if there's a performance improvement to be had.


# Evaluate the space within the minimal distance of the current
# maximum
while queue_pop(&to_search, &current_index):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @jni suggested, a cKDTree might be a good tool to speed this while-block up.

@sciunto
Copy link
Member

sciunto commented Jul 27, 2019

I don't resist to xref to this issue: #4048 Perhaps we can solve and gain also on peak_local_max.

@lagru
Copy link
Member Author

lagru commented Aug 8, 2019

@jni in the mailing list you mentioned this:

The padding of a volume to avoid neighbor issues was done for convenience. It can and should be removed, and checking for out-of-bounds access is a mod (%) calculation away.

That'd be great to do however I don't see how this is only a modulo calculation away. As I understand it, one has to do a modulo call for all dimension in the correct order an evaluate the reminder each time (a % b = r, r=0 or r=b-1 -> border, else not border). But that wouldn't solve the problem because we actually care about not crossing the border in the wrong direction (evaluating an index at the border is fine in itself). So we'd have to consider the offsets as well.

That is quite a bit of overhead compared to padding with a border flag. Seems to me that we would be trading memory cost for runtime and complexity. Do you have examples or experience regarding this or (if you find the time) can you elaborate where this statement is coming from?

@JDWarner
Copy link
Contributor

JDWarner commented Aug 8, 2019

Lars is correct, it's not easy or trivial; if one goes back through my flood fill PR my immediate approach was to remove the padding. That initially seemed to work, but actually let the fill alias off one side and into the opposite side of the image. Fixing this became such a headache that I reverted to padding.

@lagru
Copy link
Member Author

lagru commented Sep 16, 2019

Closed in favor of #4165 which proposes an equally fast (if not faster) solution based on SciPy's cKDTree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add minimal distance argument to local_maxima
6 participants