-
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
faster conversion between RGB and HSV colorspaces #5362
base: main
Are you sure you want to change the base?
Conversation
benchmark both float32 and float64
Hello @grlee77! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
Comment last updated at 2021-08-16 15:34:48 UTC |
I wrote simple CUDA kernel code for all color conversions in cuCIM. This is done by JIT compiling elementwise kernels for each color conversion (example for separate_stains), giving a large acceleration over the CPU code. The cases involving HSV were the ones where the relative performance difference was largest, so I have backported a CPU equivalent here for that one. |
Actually, these functions are also a reasonable use case for Pythran. I tried this just out of interest and the following is what the Pythran code for #pythran export rgb2hsv_inner(float64[:, 3] order (C))
#pythran export rgb2hsv_inner(float32[:, 3] order (C))
def rgb2hsv_inner(rgb):
hsv = np.empty_like(rgb)
n = rgb.shape[0]
for i in range(n):
minv = rgb[i, :].min()
maxv = rgb[i, :].max()
delta = maxv - minv
if delta == 0.0:
hsv[i, :2] = 0.0
else:
hsv[i, 1] = delta / maxv
if rgb[i, 0] == maxv:
hsv[i, 0] = (rgb[i, 1] - rgb[i, 2]) / delta
elif rgb[i, 1] == maxv:
hsv[i, 0] = 2.0 + (rgb[i, 2] - rgb[i, 0]) / delta
elif rgb[i, 2] == maxv:
hsv[i, 0] = 4.0 + (rgb[i, 0] - rgb[i, 1]) / delta
hsv[i, 0] /= 6.0
hsv[i, 0] -= floor(hsv[i, 0])
hsv[i, 2] = maxv
return hsv It is pretty similar to the Cython case, but a bit simpler. This Pythran version was actually slightly faster than the Cython code here on small images, but ~25% slower on large ones. I am not sure what is the underlying cause for the difference. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @grlee77 😉
skimage/color/_colorconv.pyx
Outdated
Py_ssize_t i, n, ch | ||
|
||
n = rgb.shape[0] | ||
for i in range(n): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about prange
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can try it. There may not be enough computation for it to provide a benefit, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried prange
and did see some improvements by a factor of 2-5 for 10 cores (20 threads) on large images. For some reason the rgb2hsv
function became much slower than before if setting OMP_NUM_THREADS=1 before running the benchmarks though! That seems odd to me, but I wasn't able to quickly diagnose the source of the problem so I have left it without prange
for now.
Pythran version loops twice over the channels looking for min and max value while in your implementation there is a unique traversal. This may explain the performance gain 😉 |
I thought that too, but tried changing it and didn't see much difference |
skimage/color/_colorconv.pyx
Outdated
elif rgb[i, 2] == maxv: | ||
hsv[i, 0] = 4.0 + (rgb[i, 0] - rgb[i, 1]) / delta | ||
hsv[i, 0] /= 6.0 | ||
hsv[i, 0] -= floor(<double>hsv[i, 0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure that this will work but what about defining a _floor
function specific to np_floats
?
if np_floats is cnp.float32_t:
_floor = floorf
else:
_floor = floor
it may avoid this cast to double precision when the input is single precision...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we would need to a C99 flag (or language='c++'
) to ensure floorf
is available. I think that used to be a problem for older MSVC versions, but is probably fine now.
Cython 0.29.x doesn't define floorf in it's libc.math
, but the current 3.0 alpha does. That doesn't really matter, though, since we can always just add it ourselves with:
cdef extern from "<math.h>" nogil:
float floorf(float)
Only a specific branch potentially needs to wrap negative values. DOC: document the source of the algorithm and the publication where HSV was proposed
The skimage version is still pretty slow -- I tried just grabbing OP's suggestion but it doesn't seem any faster. Any suggested fixes? Or is the current skimage version actually a sped-up implementation, and it was even slower before? Thanks! |
Description
The current code for
rgb2hsv
andhsv2rgb
is pretty inefficient. This PR uses Cython to accelerate these two functions, while also reducing the peak memory required. Related, to #1133, but this PR only improves two of the functions in the module.For the included benchmarks, I get ~20 times faster computation and the memory used by for hsv2rgb is several times smaller. The cause of the large memory footprint in the current hsv2rgb implementation is the following lines where several temporary arrays matching the size of the input image are created:
scikit-image/skimage/color/colorconv.py
Lines 315 to 322 in 821c7f2
Note
The calls to
reshape(-1, 3)
are used to collapse multiple spatial dimensions into a single spatial axis to loop over in the Cython code. The output of the Cython function is then restored to the original shape.Checklist
./doc/examples
(new features only)./benchmarks
, if your changes aren't covered by anexisting benchmark
For reviewers
later.
__init__.py
.doc/release/release_dev.rst
.