New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feature request] Majority voting filter in ndimage #9873
Comments
Sounds reasonable, and there's at least some demand for it: https://stackoverflow.com/questions/39829716/majority-filter-python May be a bit specialized though and fit better in scikit-image. @stefanv @jni any opinion on this functionality? |
Seems like a job for @skjerns for references on how to use these: https://ilovesymposia.com/2017/03/12/scipys-new-lowlevelcallable-is-a-game-changer/ Those both use Numba, which neither SciPy nor scikit-image have as a dependency yet. This notebook from @kne42 has a reference about how to do this with Cython: https://gist.github.com/kne42/b487af595332db468654a5c5b1ce416c And of course see also the SciPy documentation: |
This sounds like it may be implemented effectively with |
@stefanv what do you mean exactly? I'm probably missing a step in my brain, but I can't map a rank filter to "bincount" which is what needs to happen here? |
@jni Rank filters calculate the histogram of values under the window, and that histogram is updated each time the window moves. From that, you can find the n-th most occurring value, I suspect? |
🤯 Having said that, the real solution is to fix SciPy's filters to use this technique, right? =) See #4878 |
The rank filters only operate on integers, though, and SciPy's API is quite complex (as discussed in #4878). That said, we need to find a champion for |
and are 2D only, if I remember correctly. |
Great that there seems to be some interested in this. @jni I can have a look at the low-level API and see if I can come up with a solution. However, this would be new terrain for me and I always had a bit of respect for contributing to such important packages as scipy.
indeed, however the 1D operation can just be reframed as a 2D problem via reshaping? |
Welcome and never fear! Packages like SciPy have coalesced into something of extremely high quality, but it is through the gradual contributions and improvement of scientists who have not been trained as software engineers. Having said this the ndimage code base is quite challenging, as hinted at by @stefanv. Where are you based, @skjerns? Any chance you'll be coming to the SciPy conference in July or EuroSciPy in August?
It is not 1D arrays that I am worried about, but 3D and higher. =) |
On Wed, 27 Feb 2019 04:08:51 -0800, Juan Nunez-Iglesias wrote:
> however the 1D operation can just be reframed as a 2D problem via reshaping?
It is not 1D arrays that I am worried about, but 3D and higher. =)
The principles of rank filtering hold in higher dimensions, but we'll
have to rewrite the window sliding code to be more general. That said,
I suspect there is code like that inside of NumPy already, which one may
be able to co-opt.
|
So the preliminary solution right now is to write some code based on
I’m europe based, but I don’t think I will come to the conference, sorry :-/ |
I'll let @stefanv and @jni comment on the |
@skjerns some of the links I shared show how to use Cython instead of Numba. You could implement it with Cython and then add it as a PR either here or in scikit-image. Both places make sense to me. |
I gave it a try and created a pure-Cython version using C++ If we were constrained to ints 0-255 the problem would be much easier and faster to solve using an array, the second implementation does this and is magnitudes faster. But because we have Maybe you can have a quick look at it and see if I'm going in the right direction? majority_filter.pyx cimport cython
import numpy as np
cimport numpy as np
from collections import defaultdict
from numpy cimport npy_intp
from libcpp.map cimport map as cpp_map # using ordered map seems faster than unordered map
cdef extern from "math.h":
float INFINITY
# version to compare to using standard python objects
@cython.boundscheck(False)
@cython.wraparound(False)
def vanilla(np.ndarray x not None):
cdef:
Py_ssize_t i,n
n = len(x)
dx = defaultdict(int)
for i from 0 <= i < n:
dx[x[i]] += 1
max_v = max(sorted(dx), key=lambda k: dx[k])
return max_v
# pure - Cython variant
cdef api int llc_cython(double *buffer, npy_intp filter_size,
double *return_value, void *user_data):
cdef cpp_map[double, int] counter
cdef int c
cdef int c_largest = -1
cdef double k
cdef double k_smallest = INFINITY # in case of a tie, we take the smaller key value
for i from 0 <= i < filter_size:
k = buffer[i]
c = counter[k] + 1
counter[k] = c
if c < c_largest: # most of the time it will be smaller
continue
elif c > c_largest: # second most often it will be larger
c_largest = c
k_smallest = k
else: # rarely it will have the same count
if k<k_smallest:
c_largest = c
k_smallest = k
return_value[0] = k_smallest
return 1
# pure Cython - int variant (see comment below)
cdef api int llc_cython_int(double *buffer, npy_intp filter_size,
double *return_value, void *user_data):
cdef int[256] counter = [0]*256
cdef int c
cdef int c_largest = -1
cdef int k
cdef int k_smallest = <int> INFINITY# in case of a tie, we take the smaller key value
for i from 0 <= i < filter_size:
k = <int>buffer[i]
c = counter[k] + 1
counter[k] = c
if c < c_largest: # most of the time it will be smaller
continue
elif c > c_largest: # second most often it will be larger
c_largest = c
k_smallest = k
else: # rarely it will have the same count
if k<k_smallest:
c_largest = c
k_smallest = k
return_value[0] = <double>k_smallest
return 1 setup.py from distutils.core import setup
from Cython.Build import cythonize
import numpy as np
setup(
ext_modules = cythonize("majority_filter.pyx", language="c++"),
include_dirs = np.get_include()
) test.py from scipy.ndimage.filters import generic_filter
from scipy import LowLevelCallable
import time
import numpy as np
import pyximport
pyximport.install(inplace=True, reload_support=True)
import majority_filter
#%%
np.random.seed(0)
# int is realistic with voting, you would not call this on random numbers
arr = np.random.randint(0,10,1024*1024).reshape([1024,1024])
footprint = np.ones(25).reshape([5,5])
llc_python = LowLevelCallable.from_cython(majority_filter, "llc_cython")
start = time.time()
res1 = generic_filter(arr, majority_filter.vanilla , footprint=footprint, mode='constant')
print(time.time() - start) # -> 10.4 s
start = time.time()
res2= generic_filter(arr, llc_python , footprint=footprint, mode='constant')
print(time.time() - start) # -> 1.6 s
print(np.mean(np.isclose(res1, res2))) # -> all close True 1.6 seconds seems still rather slow for 1024x1024 image, yet I don't know how else to implement a counter in Cython? Is there any way to significantly speed this up? |
Hey @skjerns! Thanks for the code samples.
|
@jni Thanks!
Using So to wrap up: We leave this solution for now? int variant llc (click me)Variant to work with `int` from 0-255 # pure Cython - int variant
cdef api int llc_cython_int(double *buffer, npy_intp filter_size,
double *return_value, void *user_data):
cdef int[256] counter = [0]*256
cdef int c
cdef int c_largest = -1
cdef int k
cdef int k_smallest = <int> INFINITY# in case of a tie, we take the smaller key value
for i from 0 <= i < filter_size:
k = <int>buffer[i]
c = counter[k] + 1
counter[k] = c
if c < c_largest: # most of the time it will be smaller
continue
elif c > c_largest: # second most often it will be larger
c_largest = c
k_smallest = k
else: # rarely it will have the same count
if k<k_smallest:
c_largest = c
k_smallest = k
return_value[0] = <double>k_smallest
return 1 |
Personally I think it would make a very cool documentation example. @rgommers what do you think? The thing is it's best put in a gallery kind of setting, which I don't think (?) SciPy has? |
Yeah, I think the function is too slow to add it to
We've got a pretty big one: https://scipy-cookbook.readthedocs.io/. A lot of content is quite old, but that is our user-contributed gallery. We accept PRs with notebooks for it (repo at https://github.com/scipy/scipy-cookbook). We also have a doc section where this would fit, http://scipy.github.io/devdocs/tutorial/ndimage.html#extending-scipy-ndimage-in-c, however I'm not sure how much it adds over the example that's already there. So I think adding it to |
Hi all, |
ndimage.filters
has quite useful filters, but in my opinion one filter could be implemented:An occurrence based filter, such as a majority filter.
The filter would go over an array, count occurrences within the given window size and take the n-ths most occuring value as an output, e.g.:
would it make sense to implement such a filter?
Or is there any other way I can express such an operation that I haven't seen yet?
The text was updated successfully, but these errors were encountered: