New shiny Imgproc module for OpenCV 5.0 #25012

vpisarev · 2024-02-12T22:38:40Z

Introduction

Please, look at #25011 first, as the ideas for Imgproc improvements have the same rationale and share many parts with the Core module.

Also, look at #25001, where it's suggested to remove some obsolete functionality.

The goals for the ultimate Imgproc module in OpenCV 5.0 release are:

provide essential set of image processing functions (of course, there will never be enough of such functions)
further improve efficency (taking advantage of the modern vector CPUs and GPUs)
preserve relatively compact size and small footprint or even reduce the size

In the era of deep learning the classical image processing may still be useful, but for somewhat different purposes, not to solve big complex problems (which are usually addressed by deep models nowadays). So, what are the possible uses of imgproc today:

image augmentation to train robust models that can tolerate variations in lighting conditions, color response, noise, distorted geometry etc. See https://pytorch.org/vision/stable/transforms.html for example of such a framework. In opencv_contrib we also have a draft version of a similar framework: [GSoC22] Data Augmentation Module in OpenCV (imgaug) opencv_contrib#3335.
pre-/post-processing before/after running a deep model. In DNN module we already have a family of functions blobFromImage(s) that perform typical preprocessing for some typical computer vision models, but the pipeline can also potentially include other operations, such as demosaicing (converting Bayer pattern to RGB), denoising, extracting crop (not necessarily rectangular) from image after running detection model and before running specialized model on a ROI, finding contour of an object after running semantic segmentation model (e.g. Segment Anything) etc.
processing of an image when the task is relatively simple (e.g. just adjust the brightness or color or sharpness) or when the processed image is huge and so running deep net would take a lot of time.
visualization of the results e.g. when overlay masks, contours, bounding boxes, body skeleton segments and facial features etc. can be drawn on top of input image to illustrate the results of algorithm.

The current imgproc already covers most of the use cases, but certain things still can be improved.

Improvements suggested for Imgproc in OpenCV 5.0

Common improvements

Enable batch processing in most of the image processing functions. Batch processing lets user to take the most of the modern many-core CPUs and even more powerful GPUs. When processed images are rather small, e.g. 'just' 640x480 or less, it's often ineffective to use more than 2-4 CPU cores or more than a tiny fraction of GPU. The rest of the devices usually sit idle unless users organize quite sophisticated multi-threaded asyncronous image processing pipeline on top of it. With natively supported batch processing a trivial usage scheme can still be effective. That is, users just collect several frames from video or a batch of images for training and then apply a set of transforms using normal sequential code. How to do it from API standpoint? Support 3D/4D input arrays where the outermost dimension is batch size, just like in ONNX specification. It can be done with little or no changes in the existing API.
Enable parallel processing where possible. Of course, if we have a big batch, the threading scheme is trivial. But it's usually not difficult to make job partitioning more adaptive. In the case of big batch each thread processes one or several images, while in the case of a single big image each thread processes one or several tiles of that image. Usually, it's enough to have one utility function or a class to organize such a smart job partioning scheme for a group of similar function (e.g. for all filters in OpenCV).
Make use of FP16/BF16 type, AVX2/AVX512 instructions on modern CPUs (use dispatching more actively). FP16 is useful as input/output type on platforms that support hardware conversion FP16<=>FP32 (such as Intel/AMD with AVX2/AVX512 or ARMv8). Also, it's useful as intermediate type for processing UINT8 images, which is the most common case in optimized pipelines. In the latter case ARMv8.2+FP16 is needed, other platforms have limited support for FP16 arithmetics, unfortunately.
CPU HAL, in the form of primitive calls to IPP, Carotene, custom HAL and in the form of OpenCV's Scalable Universal Intrinsics is already actively used. For OpenCV 5 it's time to update the current CPU HAL (see New CPU HAL for OpenCV 5.0 #25019) and introduce a special non-CPU HAL (which should be extension of OpenCL's T-API to other frameworks, such as CUDA), see Introducing non-CPU HAL for OpenCV 5+ #25025, so that all image processing functions can run faster using various accelerators.

Now let's discuss various specific improvements in Imgproc

Filters

Get rid of FilterEngine. For some filters, like bilateral filter, we already have efficient parallel implementations. But for many other filters we still use complex and slow (sequential) FilterEngine. It's time to get rid of it. All filters should be parallel.
Move 'guided filter' from opencv_contrib to imgproc, which is faster and more flexible alternative to bilateral filter.
Sharpening filter (unsharp mask-based) would also be nice to add.
Add support for FP16/BF16 to filters.

Image Warping

Currently, for each processed tile a fixed-point mapping array is computed and stored to a temporary buffer and then 'remap' kernel is called. For WarpAffine, WarpPerspective and FP remap we can avoid temporary buffers, at the cost of increased code size, but it's worth it, since the functionality is essential, especially for image augmentation.
Add support for FP16/BF16.

Resize

Resize is one of the most important functions in the whole OpenCV. We should offer the best and the fastest implementation of resize.

It makes sense to implement generalized Resize, covering 100% of ONNX Resize specification and maybe other popular variations as well. Usually different algorithms use slight variations on how coordinates for subpixel interpolation are computed and rounded, which interpolation coefficients are used to compute final pixel values etc. Most of those cases can probably be efficiently handled by a compact set of configurable kernels.
Probably, need to add an interpolation mode, which is a mix of INTER_AREA and INTER_LINEAR. That is, we could use intermediate integer scale factor to bring image size down and then scale it to the final size using bilinear interpolation. That should give quality comparable to INTER_AREA (e.g. moire-free), but at higher speed, because general-case INTER_AREA is quite slow. Note that smart implementation can compute 1st stage and then the 2nd one on each tile, w/o explictly forming complete intermediate image.
Multi-step image upscale algorithms are also welcome to explore. We should also accelerate Bicubic and Lancos interpolation for image upscaling
Add dynamic dispatchers with optimized branches for popular platforms, at least AVX2, AVX512 and NEON+FP16. Resize is example of critical function where we can potentially use native intrinsics to obtain better performance.
Develop new smart tests that could tolerate deviations in different implementations of resize. They could use PSNR, SSIM and other metrics. We really need to enable faster resize in IPP and other potential HAL accelerators. Now 3rd-party resize implementations are often disabled as they don't pass tests. There should be special flags in resize (and maybe a few other functions, like RGB=>Grayscale conversion) to disable 3rd-party implementations for some accuracy-critical functions.

Color conversion

Refactor cvtColor, another super-popular function in the whole OpenCV. Now the list of colorspace conversion codes is huge and yet the function does not cover some useful cases. cvtColor should be converted into a configurable pipeline of color-processing element-wise operations (well, almost element-wise in the case of 4:2:0 or 4:2:2 chroma subsampling). Users should be able to specify input color space (sometimes it's not just 'some' RGB, but 'linear RGB', sRGB, AdobeRGB or whatever), the order of input channels (in the case of RGB or YUV), subsampling of input channels in the case of YUV (4:2:0 or 4:2:2 or 4:4:4) etc. and the same is for the output color space, including optional postprocessing steps like scaling and type conversion. Maybe some extra transformations could be applied during colorspace conversion, e.g. color temperature shift or brightness/contrast adjustments or maybe a custom LUT etc. It all sounds complex, but in reality it will be just a customizable sequence of pre-programmed kernels that are called one-by-one on each tile inside a parallel loop over tiles. With element-wise operations it's not difficult to implement.
It would be very nice to integrate support for color management into imgproc: https://github.com/opencv/opencv_contrib/blob/4.x/modules/mcc/include/opencv2/mcc/ccm.hpp. This will make this part of OpenCV truly professional.

Bits of computational photography

There is a dedicated Photo module for that, but for preprocessing we may probably need some basic image enhancement algorithms in imgproc module (like already existing bilateralFilter):

More general functions to adjust brightness/contrast/color etc., like generalized LUT (even 3D LUT) with various kind of interpolation would be useful to have.
Some simple image denoising methods, including non-local means or DCT-based denoising.
De-fogging filter (that increases crispness & contrast) is another useful preprocessing kernel. cv::equalizeHist is a very simple method to do it, we may need something a little more advanced.
1-2 automatic white balance algorithms would also be useful to have, e.g. gray world or white point or whatever (maybe also based on face detector and skintone analysis). Maybe it would be useful to be able to specify explicit color temperature as parameter of image demosaicing algorithm.

Drawing

OpenCV drawing functions are amonst the most popular ones. And yet they are very old and quite slow. There are much faster and more flexible methods to render 2D antialiased figures, patch will be provided soon.
As requested in the comment at Clean Imgproc module #25001, support for floating-point coordinates would be nice to add. Maybe coordinates in API should be converted to float's or even double's.
Text rendering has been already substantially improved in OpenCV 5.0-pre with added support for Truetype fonts (https://github.com/opencv/opencv_extra/blob/5.x/testdata/highgui/drawing/text_test.png), however some further polishing is needed:
- synchronize stb-based implementation with the latest upstream version (or use freetype renderer, as in the original PR)
- use more robust method to handle right-to-left scripts and/or scripts with ligatures, like Arabic, Devangari etc. The most natural method is to use Harfbuzz (https://github.com/harfbuzz/harfbuzz). Support for Harfbuzz was included into the original patch, but then was excluded to reduce footprint.
- fonts (currently, free Rubik font is used) likely need to be updated to support more characters out of the box, e.g. DroidSans-Fallback.
- (small item, just noticed: check and fix typos, e.g. "vechicle" => "vehicle")

Computational Geometry

This functionality is not as widely used as cv::resize(), but it's still fundamental and important. Let's keep it and improve.

We probably need more algorithms to extract vector contours, e.g. from the output of semantic segmentation or instance segmentation deep models or Canny edge detector. In some specialized packages there is notion of 'vector tracing' that we probably need to implement.
Once a contour or a 2D point cloud is obtained, the current functionality, like convexHull or fitEllipse, is fine, but we need more basic operations to extract a part of contour or a subset of point cloud.
Maybe Delaunay triangulation should be moved to opencv_contrib.
As described in Clean Imgproc module #25001, it's suggested to move all Hough transform-related functionality into opencv_contrib. Instead, we need efficient compact deep models to detect geometric primitives (lines, circles, ellipses). Of course, it means dependency of DNN module and so this new functionality shall be put into another module, e.g. objdetect to avoid potential cyclic dependency between imgproc and dnn modules.

The text was updated successfully, but these errors were encountered:

vpisarev added feature evolution labels Feb 12, 2024

vpisarev added this to the 5.0 milestone Feb 12, 2024

vpisarev changed the title ~~New shiny Imgproc module in OpenCV 5~~ New shiny Imgproc module for OpenCV 5 Feb 12, 2024

vpisarev changed the title ~~New shiny Imgproc module for OpenCV 5~~ New shiny Imgproc module for OpenCV 5.0 Feb 12, 2024

philnelson added the OpenCV5 label Feb 13, 2024

vpisarev mentioned this issue Feb 14, 2024

New CPU HAL for OpenCV 5.0 #25019

Open

This was referenced Feb 21, 2024

use standard C++ types in API #24994

Open

Clean Imgproc module #25001

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New shiny Imgproc module for OpenCV 5.0 #25012

New shiny Imgproc module for OpenCV 5.0 #25012

vpisarev commented Feb 12, 2024 •

edited

New shiny Imgproc module for OpenCV 5.0 #25012

New shiny Imgproc module for OpenCV 5.0 #25012

Comments

vpisarev commented Feb 12, 2024 • edited

Introduction

Improvements suggested for Imgproc in OpenCV 5.0

Common improvements

Filters

Image Warping

Resize

Color conversion

Bits of computational photography

Drawing

Computational Geometry

vpisarev commented Feb 12, 2024 •

edited