Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

faster conversion between RGB and HSV colorspaces #5362

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

grlee77
Copy link
Contributor

@grlee77 grlee77 commented Apr 28, 2021

Description

The current code for rgb2hsv and hsv2rgb is pretty inefficient. This PR uses Cython to accelerate these two functions, while also reducing the peak memory required. Related, to #1133, but this PR only improves two of the functions in the module.

For the included benchmarks, I get ~20 times faster computation and the memory used by for hsv2rgb is several times smaller. The cause of the large memory footprint in the current hsv2rgb implementation is the following lines where several temporary arrays matching the size of the input image are created:

hi = np.stack([hi, hi, hi], axis=-1).astype(np.uint8) % 6
out = np.choose(
hi, np.stack([np.stack((v, t, p), axis=-1),
np.stack((q, v, p), axis=-1),
np.stack((p, v, t), axis=-1),
np.stack((p, q, v), axis=-1),
np.stack((t, p, v), axis=-1),
np.stack((v, p, q), axis=-1)]))

Note

The calls to reshape(-1, 3) are used to collapse multiple spatial dimensions into a single spatial axis to loop over in the Cython code. The output of the Cython function is then restored to the original shape.

Checklist

For reviewers

  • Check that the PR title is short, concise, and will make sense 1 year
    later.
  • Check that new functions are imported in corresponding __init__.py.
  • Check that new features, API changes, and deprecations are mentioned in
    doc/release/release_dev.rst.

@pep8speaks
Copy link

pep8speaks commented Apr 28, 2021

Hello @grlee77! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 45:80: E501 line too long (88 > 79 characters)

Comment last updated at 2021-08-16 15:34:48 UTC

@grlee77
Copy link
Contributor Author

grlee77 commented Apr 28, 2021

I wrote simple CUDA kernel code for all color conversions in cuCIM. This is done by JIT compiling elementwise kernels for each color conversion (example for separate_stains), giving a large acceleration over the CPU code.

The cases involving HSV were the ones where the relative performance difference was largest, so I have backported a CPU equivalent here for that one.

@grlee77
Copy link
Contributor Author

grlee77 commented Apr 28, 2021

Actually, these functions are also a reasonable use case for Pythran. I tried this just out of interest and the following is what the Pythran code for rgb2hsv_inner that I came up with looks like:

#pythran export rgb2hsv_inner(float64[:, 3] order (C))
#pythran export rgb2hsv_inner(float32[:, 3] order (C))
def rgb2hsv_inner(rgb):
    hsv = np.empty_like(rgb)
    n = rgb.shape[0]
    for i in range(n):
        minv = rgb[i, :].min()
        maxv = rgb[i, :].max()
        delta = maxv - minv
        if delta == 0.0:
            hsv[i, :2] = 0.0
        else:
            hsv[i, 1] = delta / maxv
            if rgb[i, 0] == maxv:
                hsv[i, 0] = (rgb[i, 1] - rgb[i, 2]) / delta
            elif rgb[i, 1] == maxv:
                hsv[i, 0] = 2.0 + (rgb[i, 2] - rgb[i, 0]) / delta
            elif rgb[i, 2] == maxv:
                hsv[i, 0] = 4.0 + (rgb[i, 0] - rgb[i, 1]) / delta
            hsv[i, 0] /= 6.0
            hsv[i, 0] -= floor(hsv[i, 0])
        hsv[i, 2] = maxv
    return hsv

It is pretty similar to the Cython case, but a bit simpler. This Pythran version was actually slightly faster than the Cython code here on small images, but ~25% slower on large ones. I am not sure what is the underlying cause for the difference.

Copy link
Member

@rfezzani rfezzani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @grlee77 😉

skimage/color/colorconv.py Show resolved Hide resolved
skimage/color/colorconv.py Show resolved Hide resolved
Py_ssize_t i, n, ch

n = rgb.shape[0]
for i in range(n):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about prange here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can try it. There may not be enough computation for it to provide a benefit, though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried prange and did see some improvements by a factor of 2-5 for 10 cores (20 threads) on large images. For some reason the rgb2hsv function became much slower than before if setting OMP_NUM_THREADS=1 before running the benchmarks though! That seems odd to me, but I wasn't able to quickly diagnose the source of the problem so I have left it without prange for now.

@rfezzani
Copy link
Member

Pythran version loops twice over the channels looking for min and max value while in your implementation there is a unique traversal. This may explain the performance gain 😉

@grlee77
Copy link
Contributor Author

grlee77 commented Apr 28, 2021

Pythran version loops twice over the channels looking for min and max value while in your implementation there is a unique traversal.

I thought that too, but tried changing it and didn't see much difference

elif rgb[i, 2] == maxv:
hsv[i, 0] = 4.0 + (rgb[i, 0] - rgb[i, 1]) / delta
hsv[i, 0] /= 6.0
hsv[i, 0] -= floor(<double>hsv[i, 0])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure that this will work but what about defining a _floor function specific to np_floats?

if np_floats is cnp.float32_t:
    _floor = floorf
else:
    _floor = floor

it may avoid this cast to double precision when the input is single precision...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we would need to a C99 flag (or language='c++') to ensure floorf is available. I think that used to be a problem for older MSVC versions, but is probably fine now.

Cython 0.29.x doesn't define floorf in it's libc.math, but the current 3.0 alpha does. That doesn't really matter, though, since we can always just add it ourselves with:

cdef extern from "<math.h>" nogil:
    float floorf(float)

Only a specific branch potentially needs to wrap negative values.

DOC: document the source of the algorithm and the publication where HSV was proposed
@jonahpearl
Copy link

The skimage version is still pretty slow -- I tried just grabbing OP's suggestion but it doesn't seem any faster. Any suggested fixes? Or is the current skimage version actually a sped-up implementation, and it was even slower before? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants