New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hcl2rgb bottleneck in colorize methods #79
Comments
Hi @thorsteinssonh , really nice to hear from you again! |
You can of course check what the result of skipping this step is, but iirc I didn't like it in some cases. |
yes, the HCL results are beautiful... would be best to find a solution... I'll look into it. |
So I started working on #121 to work around some edge cases for the existing HCL conversion and saw this issue and thought the existing wasn't in numpy so I should automatically have a faster algorithm. Turns out it isn't using numpy arrays. Even though I have some optimizations I can do in my code, the new algorithm I'm doing is actually slower than the existing one for a test image of random data (1000 x 1000) and test data of (3000 x 3000). It would definitely be good if I could do stuff in Cython/C-level, but I'm not sure I'm ready to jump to that.
|
My new PR #122 makes these operations much faster. For the same test in my above comment I can now do it in <500ms. That PR is mostly ready for a first review. I could add more tests, but there are also some failing tests that I need to verify are only failing because of small differences between the implementations and not something broken. |
FYI I rewrote all of the cython code in #122 to use different colorspace code and sadly it isn't nearly as fast as #122 originally was with the old cython code. Main still takes ~1.68s with the above test, and #122 currently takes 1.5s. So faster, but not much. I'm also not sure what other types of optimizations can be done. Edit: Thought about it. I could completely reorganize the conversion methods so they operate on a single pixel at a time instead of looping. I assumed/hoped that doing a for loop for each conversion would be faster than doing all the conversions on one pixel...but memory locality is probably a problem for the way I'm doing it now. |
FYI I just tried to optimize the code added in #122 by switching from doing the for loops in each conversion to doing the for loop at the top level and changing how things were passed to the conversion functions. All of them were slower than one I'm doing now.
I think I could still try adding structs for the output and see if that helps the compiler inline stuff better. |
Hhhmm I can't figure out how to get fused types and structs to work properly. I can make a fused type of the two structs (32-bit float and 64-bit float) but I can't figure out a way for cython to let me use it. |
I was talking with someone at a programmer meetup about this Cython-ization of colorspace code and they made me realize that I don't think I ever tested it with dask and that I never tried using OpenMP (via cython's prange). I just did some tests and the prange did make it go from ~1.75s to ~1.54s. Using a dask array with chunk size of 1000 for the same overall size made the newest version (no prange) in 468ms, while the same dask array in the old python-only version ran in 578ms. So about the same speed up overall I feel like. Not sure how or why that is that way...I would have expected the amount of nogil code in the new Cython version to out perform the old version when dask was used by a lot more. |
Interesting, thanks for the feedback! |
Hi there old pytroller friends, hope you are all well !
I've been playing with the excellent trollimage library again,
and I found that there is quite a significant bottleneck in the
hcl2rgb
routine on larger images.On 1000x1000 size images I'm measuring the the
np.interp
step is about 20ms, whilethe conversion of the result back to RGB space with
hcl2rgb
adds 350ms to the whole procedure.At first I suspected the np.interp was the issue,
but after profiling I saw that
hcl2rgb
was the bottleneck, at least on my machine.Maybe give the user an option... hcl=True/False could also solve the problem for those that want more speed.
I'm working on plotting quite a few weather maps on the fly using the library,
work fantastic, but noticed maybe some chance to speed this up.
I'll take a look into hcl2rgb and see if anything I could help with.
The text was updated successfully, but these errors were encountered: