# Benchmarking image processing filters using scikit-image, scipy, simple-itk and clesperanto
Here we compare performance of a Gaussian filter implemented in [scikit-image](https://scikit-image.org), [scipy](https://scipy.org), [SimpleITK](https://simpleitk.org/) and [clEsperanto](https://github.com/clEsperanto/pyclesperanto).

**Please note:** This notebook will stress-test your graphics card. Before running it, save all files. It is possible that the graphics cards driver crashes while executing this. If you feel unsure, consider running this notebook on the cluster only.

In [1]:
import pyclesperanto as cle
# alternatively:
# import pyclesperanto_prototype as cle
from skimage.io import imread
import numpy as np
from scipy import ndimage as sndi
import stackview
from timeit import timeit
import skimage.filters
from scipy import ndimage as sndi
import napari_simpleitk_image_processing as nsitk

In [2]:
cle.available_device_names()

['gfx1103', 'NVIDIA GeForce RTX 4070 Laptop GPU']

In [3]:
# to measure kernel execution duration properly, we need to set this flag. It will slow down exection of workflows a bit though
cle.wait_for_kernel_to_finish(True)

# selet a GPU with the following in the name. This will fallback to any other GPU if none with this name is found
cle.select_device('TX')

(OpenCL) NVIDIA GeForce RTX 4070 Laptop GPU (OpenCL 3.0 CUDA)
	Vendor:                      NVIDIA Corporation
	Driver Version:              560.94
	Device Type:                 GPU
	Compute Units:               36
	Global Memory Size:          8187 MB
	Local Memory Size:           0 MB
	Maximum Buffer Size:         2046 MB
	Max Clock Frequency:         1230 MHz
	Image Support:               Yes

In [4]:
image = imread('../02a_image_processing/data/Haase_MRT_tfl3d1.tif')
image.shape

(192, 256, 256)

In [5]:
stackview.insight(image[96])

0,1
,"shape(256, 256) dtypeuint8 size64.0 kB min0max255"

0,1
shape,"(256, 256)"
dtype,uint8
size,64.0 kB
min,0
max,255


## scikit-image

In [6]:
blurred_image = np.zeros(image.shape)

In [7]:
%%timeit
skimage.filters.gaussian(image, sigma=5, out=blurred_image, preserve_range=True)

2.53 s ± 21.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [8]:
stackview.insight(blurred_image[96])

0,1
,"shape(256, 256) dtypefloat64 size512.0 kB min0.21742568961345074max175.52599618139908"

0,1
shape,"(256, 256)"
dtype,float64
size,512.0 kB
min,0.21742568961345074
max,175.52599618139908


## scipy

In [9]:
blurred_image2 = np.zeros(image.shape)

In [10]:
%%timeit
sndi.gaussian_filter(image, sigma=(5, 5, 5), output=blurred_image2);

1.78 s ± 13.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [11]:
stackview.insight(blurred_image2[96])

0,1
,"shape(256, 256) dtypefloat64 size512.0 kB min0.6476920943581549max175.52599618139908"

0,1
shape,"(256, 256)"
dtype,float64
size,512.0 kB
min,0.6476920943581549
max,175.52599618139908


## simpleitk

In [12]:
%%timeit
nsitk.gaussian_blur(image, variance_x=25, variance_y=25, variance_z=25)
# note: ITK expects the variance, which is sigma squared.

443 ms ± 11.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [13]:
blurred_image3 = nsitk.gaussian_blur(image, variance_x=25, variance_y=25, variance_z=25)

In [14]:
stackview.insight(blurred_image3[96])

0,1
,"shape(256, 256) dtypeuint8 size64.0 kB min0max176"

0,1
shape,"(256, 256)"
dtype,uint8
size,64.0 kB
min,0
max,176


## clEsperanto

In [15]:
ocl_image = cle.push(image)
ocl_blurred = cle.create(ocl_image.shape)

In [16]:
%%timeit
cle.gaussian_blur(ocl_image, output_image=ocl_blurred, sigma_x=5, sigma_y=5, sigma_z=5)

19.2 ms ± 465 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [17]:
ocl_blurred[96]

0,1
,"cle._ image shape(256, 256) dtypefloat32 size256.0 kB min0.2174256443977356max175.52597045898438"

0,1
shape,"(256, 256)"
dtype,float32
size,256.0 kB
min,0.2174256443977356
max,175.52597045898438


In [18]:
del ocl_image
del ocl_blurred

## Exercise
Go back 2 weeks to the [exercise where we compared Voronoi-Otsu-Labeling in two libraries](https://github.com/ScaDS/BIDS-lecture-2025/blob/main/03a_image_segmentation/11_voronoi_otsu_labeling.ipynb). Benchmark these two functions. How much faster is the one compared to the other on your laptop? How much faster is it on the cluster?

In [19]:
from napari_segment_blobs_and_things_with_membranes import voronoi_otsu_labeling