restoration.rolling_ball taking 2+mins #5193

ajoshi80 · 2021-01-22T21:39:54Z

The restoration.rolling_ball function takes over 2 minutes for a single 1800*1800 image using radius 15 rolling ball.

Way to reproduce

    image = cv.imread(path)
    image_inverted = util.invert(image)
    background_inverted = restoration.rolling_ball(image_inverted, radius=15)
    filtered_image_inverted = image_inverted - background_inverted
    filtered_image = util.invert(filtered_image_inverted)

Version information

# Paste the output of the following python commands
from __future__ import print_function
import sys; print(sys.version)
import platform; print(platform.platform())
import skimage; print("scikit-image version: {}".format(skimage.__version__))
import numpy; print("numpy version: {}".format(numpy.__version__))

Python 3.7.7 (default, May  7 2020, 21:25:33) 
[GCC 7.3.0] :: Anaconda, Inc. on linux

Linux-4.4.0-122-generic-x86_64-with-debian-stretch-sid
scikit-image version: 0.18.1
numpy version: 1.19.0

The text was updated successfully, but these errors were encountered:

jni · 2021-01-23T13:03:54Z

It might just be that rolling ball is a slow algorithm. @FirefoxMetzger did a pretty heroic job optimizing the implementation. Could you check how long Fiji takes to do the same thing?

https://imagej.net/Rolling_Ball_Background_Subtraction

Having said that, a lot of the optimization required might have had to do with us trying to cater to nD + floating point and we just made things harder on ourselves...

FirefoxMetzger · 2021-01-24T00:01:00Z

Yes, the rolling ball does some heavy computation in the main loop. Does your image have a color channel? In that case, you likely want to convert it to grayscale beforehand or specify the kernel manually to avoid unnecessary computation along the channel dimension.

Is it possible for you to share the image with us, so we can benchmark it on our machines? This usually makes it easier to figure out what is going on.

Sidenote: The kernel is twice the size of the radius in each dimension (to fit the ball's diameter), which - assuming you use a grayscale image - means something like 3 billion (3.11e9) multiplications and additions irrespective of the number format for an 1800x1800 image. The algorithm isn't too cache-friendly, so add to that the time it takes to constantly load data from the RAM and that could explain the two minutes.

A few things to check that might impact performance:

Convert color images to grayscale if you can, otherwise use kernel= and explicitly declare an ellipsoid kernel that is small in the channel dimension, e.g. 1. The algorithm will zero-pad the image along each dimension to fit the kernel. Along the channel dimension, this may not make sense for you and reduce performance.
Check if OpenMP acceleration is enabled. The algorithm takes a num_threads keyword argument that does pretty much what it says. You can run it with different values and see if the execution time changes; if not, you may consider building skimage with OpenMP enabled.
Check if your CPU supports at least SSE2 instructions. This is probably true if you have an Intel or AMD chip; for ARM this is worth checking. The algorithm works without it, but - since we can't churn the numbers as quickly - you will have to wait longer for a result.

ajoshi80 · 2021-01-25T07:42:50Z

The image has color channels. I have tried Fiji/ImageJ's version and it can do it in less than 20s with rolling ball 30. If we convert to grayscale and compute background, do we then subtract from each channel?

FirefoxMetzger · 2021-01-25T09:40:02Z

That explains it; where possible we try to be unopinionated. The original rolling ball algorithm - by that I mean the paper from the 90's - only specified the algorithm on grayscale images; how to extend that to color images, volumes, time series, etc. depends on your use-case.

For you, it likely doesn't make sense to have the ball extend along the color dimension. You have two options:

do what you suggested, convert to gray, and subtract the result from each channel
Filter each channel individually

The code for option 1 looks something like this

from skimage import io, restoration, color, util

image = io.imread(path)  # Note: if you use OpenCV's cv.imread the channel order will be BGR not RGB
image = util.img_as_float64(image)
gray_image = color.rgb2gray(image)
image_inverted = util.invert(image)
gray_inverted = util.invert(gray_image)
background = restoration.rolling_ball(gray_inverted, radius=15)
filtered_image_inverted = image_inverted - background[..., None]
filtered_image = util.invert(filtered_image_inverted)

The entire script takes about 900ms on my machine, including the loading of the image from disk.

The code for option 2 looks something like this

from skimage import io, restoration, util

image = io.imread("test.png")  # Note: if you use OpenCV's cv.imread the channel order will be BGR not RGB
image = util.img_as_float64(image)
image_inverted = util.invert(image)
kernel = restoration.ellipsoid_kernel((31, 31, 1), 31)  # filter each color channel individually
background = restoration.rolling_ball(image_inverted, kernel=kernel)
filtered_image_inverted = image_inverted - background
filtered_image = util.invert(filtered_image_inverted)

which takes about 15.5s on my machine, again including the initial loading from disk.

You can reduce the time of the second option by filtering each channel separately, which makes the image a little more cache and SIMD friendly.

from skimage import io, restoration, util

image = io.imread("test.png")  # Note: if you use OpenCV's cv.imread the channel order will be BGR not RGB
image = util.img_as_float64(image)
image_inverted = util.invert(image)
background_R = restoration.rolling_ball(image_inverted[..., 0], radius=15)
background_G = restoration.rolling_ball(image_inverted[..., 1], radius=15)
background_B = restoration.rolling_ball(image_inverted[..., 2], radius=15)
background = np.stack([background_R, background_G, background_B], 2)
filtered_image_inverted = image_inverted - background
filtered_image = util.invert(filtered_image_inverted)

Which takes about 1.4 sec on my machine. Or more elegantly (but slower) by temporarily storing the image channel-first

from skimage import io, restoration, util
import numpy as np

image = io.imread("test.png")  # Note: if you use OpenCV's cv.imread the channel order will be BGR not RGB
image = np.moveaxis(image, 2, 0) # make the image channel-first
image = util.img_as_float64(image)
image_inverted = util.invert(image)
kernel = restoration.ellipsoid_kernel((1, 31, 31), 31)  # filter each color channel individually
background = restoration.rolling_ball(image_inverted, kernel=kernel)
filtered_image_inverted = image_inverted - background
filtered_image = util.invert(filtered_image_inverted)
filtered_image = np.moveaxis(filtered_image, 0, 2)  # make the image channel-last

Which takes about 2.5 seconds on my machine.

I don't know how imageJ does this under the hood, but maybe someone else knows? Either way, I hope this helps you reducing the runtime a bit.

grlee77 · 2021-01-25T15:34:29Z

Thanks for the detailed feedback here @FirefoxMetzger.

A couple of additional possibilities would be:

convert to HSV and filter the V (value) channel then convert back to RGB
convert to YUV and filter the Y (luminance) channel then convert back to RGB

I guess in practice one would have to compare the quality of these various approaches in an application to determine which is most appropriate to use. I also don't know which approach ImageJ takes.

grlee77 · 2021-01-25T15:39:55Z

Actually, there is a link to the java code in one of the links above. It looks like it uses a color conversion approach as in my prior comment (HSB with filtering on the "brightness" channel).

I think HSB is just a synonym for what scikit-image calls HSV, so the scikit-image conversion functions are skimage.color.rgb2hsv and skimage.color.hsv2rgb.

This is Java code from the link above

    public void subtractRGBBackround(ColorProcessor ip, int ballRadius) {
        int width = ip.getWidth();
        int height = ip.getHeight();
        byte[] H = new byte[width*height];
        byte[] S = new byte[width*height];
        byte[] B = new byte[width*height];
        ip.getHSB(H, S, B);
        ByteProcessor brightness = new ByteProcessor(width, height, B, null);
        subtractBackround(brightness, radius);
        ip.setHSB(H, S, (byte[])brightness.getPixels());
    }

This should also be fast as in the cases @FirefoxMetzger benchmarked above.

FirefoxMetzger · 2021-01-25T16:11:28Z

Nice find, @grlee77 ! This is indeed a very similar idea to the one that converts to grayscale. It only filters a single channel, so one would expect the timings to be near the grayscale approach, i.e., around 1 second runtime.

grlee77 · 2021-01-25T21:57:21Z

Feel free to continue discussion here, but it looks like this issue can be closed as performance seems satisfactory as long as a 2D rolling ball is applied to the color image.

If desired, please open a separate issue regarding a potential enhancement for color image processing. Specifically, we could add an rgb kwarg to the method and have it implement the HSV approach mentioned above if set to True.

rfezzani added the 📈 type: Performance label Jan 23, 2021

grlee77 closed this as completed Jan 25, 2021

alexdesiqueira mentioned this issue Feb 1, 2021

2021's calendar of community management #5169

Closed

FirefoxMetzger mentioned this issue Aug 22, 2021

Re: Plans for Future Maintenance of Gym #2259 openai/gym#2341

Closed

FirefoxMetzger mentioned this issue May 16, 2024

Rolling ball algorithm scales badly with radius #7423

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

restoration.rolling_ball taking 2+mins #5193

restoration.rolling_ball taking 2+mins #5193

ajoshi80 commented Jan 22, 2021

jni commented Jan 23, 2021

FirefoxMetzger commented Jan 24, 2021

ajoshi80 commented Jan 25, 2021 •

edited

FirefoxMetzger commented Jan 25, 2021

grlee77 commented Jan 25, 2021

grlee77 commented Jan 25, 2021

FirefoxMetzger commented Jan 25, 2021

grlee77 commented Jan 25, 2021

restoration.rolling_ball taking 2+mins #5193

restoration.rolling_ball taking 2+mins #5193

Comments

ajoshi80 commented Jan 22, 2021

Way to reproduce

Version information

jni commented Jan 23, 2021

FirefoxMetzger commented Jan 24, 2021

ajoshi80 commented Jan 25, 2021 • edited

FirefoxMetzger commented Jan 25, 2021

grlee77 commented Jan 25, 2021

grlee77 commented Jan 25, 2021

FirefoxMetzger commented Jan 25, 2021

grlee77 commented Jan 25, 2021

ajoshi80 commented Jan 25, 2021 •

edited