hcl2rgb bottleneck in colorize methods #79

thorsteinssonh · 2021-03-17T15:53:08Z

Hi there old pytroller friends, hope you are all well !

I've been playing with the excellent trollimage library again,
and I found that there is quite a significant bottleneck in the hcl2rgb routine on larger images.
On 1000x1000 size images I'm measuring the the np.interp step is about 20ms, while
the conversion of the result back to RGB space with hcl2rgb adds 350ms to the whole procedure.

def _interpolate_rgb_colors(arr, colors, values):
    hcl_colors = _convert_rgb_list_to_hcl(colors) # this is peanuts...
    channels = [np.interp(arr,
                          np.array(values),
                          np.array(hcl_colors)[:, i])
                for i in range(3)]                   # this is ca 17ms
    channels = list(hcl2rgb(*channels))  # this is 350ms
    return channels

At first I suspected the np.interp was the issue,
but after profiling I saw that hcl2rgb was the bottleneck, at least on my machine.

Maybe give the user an option... hcl=True/False could also solve the problem for those that want more speed.
I'm working on plotting quite a few weather maps on the fly using the library,
work fantastic, but noticed maybe some chance to speed this up.

I'll take a look into hcl2rgb and see if anything I could help with.

The text was updated successfully, but these errors were encountered:

mraspaud · 2021-03-17T16:01:27Z

Hi @thorsteinssonh , really nice to hear from you again!
Yes, the conversion to hcl was needed in my opinion to get a nice smooth result. It would be great if you find ways to optimize it :)

mraspaud · 2021-03-17T16:02:13Z

You can of course check what the result of skipping this step is, but iirc I didn't like it in some cases.

thorsteinssonh · 2021-03-17T16:04:08Z

yes, the HCL results are beautiful... would be best to find a solution... I'll look into it.

djhoese · 2023-01-22T01:03:08Z

So I started working on #121 to work around some edge cases for the existing HCL conversion and saw this issue and thought the existing wasn't in numpy so I should automatically have a faster algorithm. Turns out it isn't using numpy arrays. Even though I have some optimizations I can do in my code, the new algorithm I'm doing is actually slower than the existing one for a test image of random data (1000 x 1000) and test data of (3000 x 3000). It would definitely be good if I could do stuff in Cython/C-level, but I'm not sure I'm ready to jump to that.

# new version:

In [1]: from trollimage.colormap import Colormap, rgb2hcl_numpy, hcl2rgb_numpy; import numpy as np

In [2]: cmap = Colormap(values=list(range(4)), colors=[(1.0, 1.0, 0.0), (0.0, 1.0, 1.0), (1.0, 1.0, 1.0), (0.0, 0.0, 0.0)])

In [3]: data = np.random.random((3000, 3000))

In [4]: %%timeit
   ...: colorized_image = cmap.colorize(data)
   ...:
   ...:
1.97 s ± 50 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# old version:

In [1]: from trollimage.colormap import Colormap, rgb2hcl_numpy, hcl2rgb_numpy; import numpy as np

In [2]: cmap = Colormap(values=list(range(4)), colors=[(1.0, 1.0, 0.0), (0.0, 1.0, 1.0), (1.0, 1.0, 1.0), (0.0, 0.0, 0.0)])

In [3]: data = np.random.random((3000, 3000))

In [4]: %%timeit
   ...: colorized_image = cmap.colorize(data)
   ...:
   ...:
/home/davidh/repos/git/trollimage/trollimage/colorspaces.py:128: RuntimeWarning: invalid value encountered in power
  1.055 * (arr ** (1.0 / 2.4)) - 0.055,
1.68 s ± 4.22 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

djhoese · 2023-01-31T16:28:42Z

My new PR #122 makes these operations much faster. For the same test in my above comment I can now do it in <500ms. That PR is mostly ready for a first review. I could add more tests, but there are also some failing tests that I need to verify are only failing because of small differences between the implementations and not something broken.

djhoese · 2023-08-19T17:37:50Z

FYI I rewrote all of the cython code in #122 to use different colorspace code and sadly it isn't nearly as fast as #122 originally was with the old cython code. Main still takes ~1.68s with the above test, and #122 currently takes 1.5s. So faster, but not much. I'm also not sure what other types of optimizations can be done.

Edit: Thought about it. I could completely reorganize the conversion methods so they operate on a single pixel at a time instead of looping. I assumed/hoped that doing a for loop for each conversion would be faster than doing all the conversions on one pixel...but memory locality is probably a problem for the way I'm doing it now.

djhoese · 2023-08-24T19:19:53Z

FYI I just tried to optimize the code added in #122 by switching from doing the for loops in each conversion to doing the for loop at the top level and changing how things were passed to the conversion functions. All of them were slower than one I'm doing now.

outer for loop and pass pointers around and inline all: ~2.5
same as 1, but pass inputs by value instead of pointer: ~2.5
do for loop inside the main conversion function, pass inputs by value: ~1.75s

I think I could still try adding structs for the output and see if that helps the compiler inline stuff better.

djhoese · 2023-08-24T19:39:15Z

Hhhmm I can't figure out how to get fused types and structs to work properly. I can make a fused type of the two structs (32-bit float and 64-bit float) but I can't figure out a way for cython to let me use it.

djhoese · 2023-09-15T21:35:07Z

I was talking with someone at a programmer meetup about this Cython-ization of colorspace code and they made me realize that I don't think I ever tested it with dask and that I never tried using OpenMP (via cython's prange). I just did some tests and the prange did make it go from ~1.75s to ~1.54s. Using a dask array with chunk size of 1000 for the same overall size made the newest version (no prange) in 468ms, while the same dask array in the old python-only version ran in 578ms. So about the same speed up overall I feel like. Not sure how or why that is that way...I would have expected the amount of nogil code in the new Cython version to out perform the old version when dask was used by a lot more.

mraspaud · 2023-09-18T06:38:16Z

Interesting, thanks for the feedback!

djhoese mentioned this issue Aug 26, 2022

Colorspace conversions are invalid or out of date #106

Closed

djhoese mentioned this issue Jan 30, 2023

Switch to new Cython-based RGB to CIE LCH_ab conversion for colorizing #122

Merged

9 tasks

djhoese closed this as completed in #122 Aug 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hcl2rgb bottleneck in colorize methods #79

hcl2rgb bottleneck in colorize methods #79

thorsteinssonh commented Mar 17, 2021 •

edited

mraspaud commented Mar 17, 2021

mraspaud commented Mar 17, 2021

thorsteinssonh commented Mar 17, 2021

djhoese commented Jan 22, 2023

djhoese commented Jan 31, 2023

djhoese commented Aug 19, 2023 •

edited

djhoese commented Aug 24, 2023

djhoese commented Aug 24, 2023

djhoese commented Sep 15, 2023

mraspaud commented Sep 18, 2023

hcl2rgb bottleneck in colorize methods #79

hcl2rgb bottleneck in colorize methods #79

Comments

thorsteinssonh commented Mar 17, 2021 • edited

mraspaud commented Mar 17, 2021

mraspaud commented Mar 17, 2021

thorsteinssonh commented Mar 17, 2021

djhoese commented Jan 22, 2023

djhoese commented Jan 31, 2023

djhoese commented Aug 19, 2023 • edited

djhoese commented Aug 24, 2023

djhoese commented Aug 24, 2023

djhoese commented Sep 15, 2023

mraspaud commented Sep 18, 2023

thorsteinssonh commented Mar 17, 2021 •

edited

djhoese commented Aug 19, 2023 •

edited