Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add StackBlur for imgproc #20379

Merged
merged 1 commit into from Sep 29, 2022
Merged

Add StackBlur for imgproc #20379

merged 1 commit into from Sep 29, 2022

Conversation

zihaomu
Copy link
Member

@zihaomu zihaomu commented Jul 9, 2021

Add New Blur Filter: StackBlur

Hi, the basic idea is blurring input images through one-dimension blur in the horizontal and the vertical respectively. StackBlur can blur the image in linear time. In theory, stackblur should run at the same speed as BoxBlur, and stackblur can eliminate the boxed-feeling of Boxblur in a large kernel size, and achieve the same effect as Gaussian blur.

Stack Blur was invented by Mario Klingemann mario@quasimondo.com.
Details description can be found at: http://underdestruction.com/2004/02/25/stackblur-2004

HW/SW:
Apple M1 chip / MacOS

Speed Test

The testing image size: [1920 x 2048], Input format: CV_8UC3.
Test the rule, run 1000 times to pick the value with the lowest speed.
The source log of the image and some test script can be found this google drive.

stack_vs_gaussian

More performance results:

Here are some experimental results for CV_8U input:

Gaussian Blur Box Blur Stack Blur
image size 1920 x 2048, Kernel 3x3
1 channel 0.36ms 0.93ms 0.81ms
3 channel 0.76ms 3.05ms 2.44ms
4 channel 0.93ms 4.18ms 2.16ms
image size 1920 x 2048, Kernel 5x5
1 channel 0.43ms 1.14ms 1.02ms
3 channel 1.17ms 3.08ms 2.49ms
4 channel 1.46ms 4.35ms 3.27ms
image size 1920 x 2048, Kernel 21x21
1 channel 2.83ms 2.44ms 1.86ms
3 channel 8.88ms 5.46ms 4.49ms
4 channel 12.2ms 5.58ms 4.95ms
image size 1920 x 2048, Kernel 101x101
1 channel 27.73ms 2.68ms 1.49ms
3 channel 73.53ms 5.88ms 4.25ms
4 channel 99.10ms 6.32ms 5.30ms

Here are some experimental results for CV_32F input:

Gaussian Blur Box Blur Stack Blur
image size 1920 x 2048, Kernel 3x3
1 channel 2.08ms 3.24ms 2.13ms
3 channel 5.86ms 10.99ms 4.51ms
4 channel 7.14ms 13.84ms 5.55ms
image size 1920 x 2048, Kernel 5x5
1 channel 2.71ms 4.22ms 1.68ms
3 channel 7.33ms 12.0ms 4.40ms
4 channel 10.4ms 16.1ms 5.71ms
image size 1920 x 2048, Kernel 21x21
1 channel 9.76ms 5.55ms 3.65ms
3 channel 29.02ms 10.33ms 6.13ms
4 channel 44.1ms 12.1ms 7.57ms
image size 1920 x 2048, Kernel 101x101
1 channel 55.06ms 5.93ms 3.36ms
3 channel 163.40m 11.01ms 6.42ms
4 channel 245.3ms 13.01ms 8.02ms

Update 2022.09.21

Stackblur is slower than GaussianBlur when the kernel is small and the format of the input is uchar. In other cases, it is faster than Gaussian blur, and the larger the kernel size, the greater the speedup ratio.

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or other license that is incompatible with OpenCV
  • The PR is proposed to proper branch
  • There is reference to original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

#include "precomp.hpp"
#include "opencv2/core/hal/intrin.hpp"

static unsigned short const stackblur_mul[255] =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess, you based your implementation on some other code. What's the license of the original implementation? At least, you should put the link/reference to the original code, copyright and information about the license (or the license itself, if it's short)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, this implementation is based on two projects (project1 and project2). Should I Change the Copy Right like the following?:
Copyright (c) 2010 Mario Klingemann

@vpisarev
Copy link
Contributor

vpisarev commented Jul 9, 2021

@zihaomu, thank you for the contribution!
in the performance table "21x21" is used all the time, is it a typo?

@zihaomu zihaomu marked this pull request as ready for review July 18, 2021 14:09
@asmorkalov
Copy link
Contributor

@zihaomu Friendly reminder about tests.

@zihaomu
Copy link
Member Author

zihaomu commented Jul 21, 2021

@zihaomu Friendly reminder about tests.

Thanks. I will keep working on it.

@asmorkalov asmorkalov removed the pr: needs test New functionality requires minimal tests set label Aug 3, 2021
@asmorkalov
Copy link
Contributor

@vpisarev your turn.

@zihaomu
Copy link
Member Author

zihaomu commented Aug 30, 2021

Hi @vpisarev, CI test still fails. I don't know if there is any error in the use of the macro. Should I change the macro flag from #if CV_SIMD to #if CV_SIMD128?

Copy link
Member

@alalek alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for contribution!

modules/imgproc/src/stackblur.cpp Outdated Show resolved Hide resolved
modules/imgproc/src/stackblur.cpp Outdated Show resolved Hide resolved
modules/imgproc/src/stackblur.cpp Outdated Show resolved Hide resolved
modules/imgproc/src/stackblur.cpp Outdated Show resolved Hide resolved
modules/imgproc/src/stackblur.cpp Outdated Show resolved Hide resolved
modules/imgproc/src/stackblur.cpp Outdated Show resolved Hide resolved
modules/imgproc/src/stackblur.cpp Outdated Show resolved Hide resolved
modules/imgproc/src/stackblur.cpp Outdated Show resolved Hide resolved
modules/imgproc/src/stackblur.cpp Outdated Show resolved Hide resolved
modules/imgproc/test/test_stackblur.cpp Outdated Show resolved Hide resolved
@zihaomu
Copy link
Member Author

zihaomu commented Oct 31, 2021

@alalek Thank you for your code review! And sorry for the late reply. I will update the code and upload it next week.

@zihaomu zihaomu force-pushed the stackblur branch 2 times, most recently from cbd343c to 6ff95be Compare September 17, 2022 04:13
@zihaomu zihaomu changed the title StackBlur for imgpro -- in progress Add StackBlur for imgproc Sep 17, 2022
@zihaomu
Copy link
Member Author

zihaomu commented Sep 19, 2022

Hi @vpisarev, @alalek and @asmorkalov, sorry for the late update. Please take a look at this patch, it has passed the CI test.

Copy link
Contributor

@asmorkalov asmorkalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many functions of the implementation are empty for the case, if CV_SIMD is not defined. It's the case for ARMv6, ARMv7 without NEON, regular MIPS, most of current RISC-V. Please implement the branch with generic C++ code.

@zihaomu
Copy link
Member Author

zihaomu commented Sep 19, 2022

Many functions of the implementation are empty for the case, if CV_SIMD is not defined. It's the case for ARMv6, ARMv7 without NEON, regular MIPS, most of current RISC-V. Please implement the branch with generic C++ code.

Hi @asmorkalov, thanks for the code reviewing. I have updated the code to be compatible with no SIMD version.

@asmorkalov
Copy link
Contributor

Also It makes sense to add performance test for the new function as soon as you bring vectorized version. Performance report (against CV_DISABLE_OPTIMIZATION) will be very useful.

@asmorkalov
Copy link
Contributor

Performance on desktop with simd vs without simd code:

./opencv-master/modules/ts/misc/summary.py ./stackblur_no_opt.xml ./stackblur_default.xml 

Geometric mean (ms)

                   Name of Test                    stackblur stackblur stackblur 
                                                      no      default   default  
                                                      opt                  vs    
                                                                       stackblur 
                                                                           no    
                                                                          opt    
                                                                       (x-factor)
stackblur3x3::Size_MatType::(640x480, 8UC1)          1.155     0.176      6.56   
stackblur3x3::Size_MatType::(640x480, 16UC1)         0.645     0.185      3.48   
stackblur3x3::Size_MatType::(640x480, 16SC1)         0.623     0.156      4.00   
stackblur3x3::Size_MatType::(640x480, 32FC1)         0.461     0.133      3.46   
stackblur3x3::Size_MatType::(640x480, 8UC4)          3.120     0.654      4.77   
stackblur3x3::Size_MatType::(1280x720, 8UC1)         2.388     0.477      5.01   
stackblur3x3::Size_MatType::(1280x720, 16UC1)        1.880     0.562      3.34   
stackblur3x3::Size_MatType::(1280x720, 16SC1)        1.818     0.475      3.82   
stackblur3x3::Size_MatType::(1280x720, 32FC1)        1.388     0.550      2.52   
stackblur3x3::Size_MatType::(1280x720, 8UC4)         9.424     1.878      5.02   
stackblur3x3::Size_MatType::(1920x1080, 8UC1)        5.337     1.081      4.94   
stackblur3x3::Size_MatType::(1920x1080, 16UC1)       4.244     1.251      3.39   
stackblur3x3::Size_MatType::(1920x1080, 16SC1)       4.083     1.056      3.87   
stackblur3x3::Size_MatType::(1920x1080, 32FC1)       3.381     2.146      1.58   
stackblur3x3::Size_MatType::(1920x1080, 8UC4)       21.049     4.313      4.88   
stackblur101x101::Size_MatType::(640x480, 8UC1)      0.790     0.461      1.71   
stackblur101x101::Size_MatType::(640x480, 16UC1)     0.646     0.485      1.33   
stackblur101x101::Size_MatType::(640x480, 16SC1)     0.651     0.439      1.48   
stackblur101x101::Size_MatType::(640x480, 32FC1)     0.659     0.449      1.47   
stackblur101x101::Size_MatType::(640x480, 8UC4)      2.376     1.560      1.52   
stackblur101x101::Size_MatType::(1280x720, 8UC1)     1.795     1.290      1.39   
stackblur101x101::Size_MatType::(1280x720, 16UC1)    1.814     1.275      1.42   
stackblur101x101::Size_MatType::(1280x720, 16SC1)    1.838     1.233      1.49   
stackblur101x101::Size_MatType::(1280x720, 32FC1)    1.918     1.298      1.48   
stackblur101x101::Size_MatType::(1280x720, 8UC4)     6.954     4.579      1.52   
stackblur101x101::Size_MatType::(1920x1080, 8UC1)    3.901     2.844      1.37   
stackblur101x101::Size_MatType::(1920x1080, 16UC1)   4.055     2.902      1.40   
stackblur101x101::Size_MatType::(1920x1080, 16SC1)   4.070     2.833      1.44   
stackblur101x101::Size_MatType::(1920x1080, 32FC1)   4.219     3.070      1.37   
stackblur101x101::Size_MatType::(1920x1080, 8UC4)   15.496    10.210      1.52

@zihaomu
Copy link
Member Author

zihaomu commented Sep 20, 2022

Hi @asmorkalov. Thanks for the performance test. This test result is reasonable.
Stackblur can be divided into two filters, the row direction and the column direction. In my test, the most time-consuming part is in row direction, it takes about 3 or 4 times as long as the column direction.
For column direction, it is always computed on contiguous memory.
For row direction, when kernel size is small (<=9), it will run in special branches with SIMD, that is why the 3x3 can get better x-factor.
And for large kernel size, since the output data before and after is related, it can only run without SIMD. That's why the x-factor is small.

@asmorkalov
Copy link
Contributor

It looks like tests for small resolution are very fast and performance statistics is unstable. That's why some perf tests fail on CI. I propose to exclude VGA resolution and some other too fast cases from performance testing.

@zihaomu
Copy link
Member Author

zihaomu commented Sep 21, 2022

Update, fixed the performance test.

Hi @asmorkalov, can you help me to set the right test case? I have no idea why performance tests fail.

modules/imgproc/test/test_stackblur.cpp Outdated Show resolved Hide resolved
modules/imgproc/test/test_stackblur.cpp Outdated Show resolved Hide resolved
modules/imgproc/test/test_stackblur.cpp Outdated Show resolved Hide resolved
modules/imgproc/test/test_stackblur.cpp Outdated Show resolved Hide resolved
@zihaomu zihaomu force-pushed the stackblur branch 2 times, most recently from 9b500f0 to 4bb39ca Compare September 28, 2022 02:29
Copy link
Contributor

@asmorkalov asmorkalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Please squash commits and I'll merge the patch.

@asmorkalov asmorkalov merged commit b403d37 into opencv:4.x Sep 29, 2022
@asmorkalov asmorkalov added this to the 4.7.0 milestone Sep 29, 2022
@peters
Copy link
Contributor

peters commented Oct 26, 2022

@asmorkalov It does not appear that stackBlur has been merged into the 5.x branch? Cherry-picking commit 2918071a3e17ac06fe13d0c7223676587f9d6877 worked.

@asmorkalov
Copy link
Contributor

Yes. We have a lot of merge conflicts between 4.x and 5.x regular merge procedure bas not been established yet.

Comment on lines +1623 to +1624
/** @brief Blurs an image using the StackBlur.
The function applies and StackBlur to an image.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation formatting is broken.

New empty line is required after @brief.
Or @details.

Copy link
Member Author

@zihaomu zihaomu Oct 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be fixed at new patch.

Stack Blur Algorithm by Mario Klingemann <mario@quasimondo.com>
@param src input image. The number of channels can be arbitrary, but the depth should be one of
CV_8U, CV_16U, CV_16S or CV_32F.
@param dst output image of the same size and type as src.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the same size

Which is border handling method used?

Copy link
Member Author

@zihaomu zihaomu Oct 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for code reviewing. @alalek. For now, we only use BORDER_REPLICATE type as border type, since the original paper is it. More details of paper can be found at http://underdestruction.com/2004/02/25/stackblur-2004/.
Should we support other border types? Or just describe this detail in a new patch?

Comment on lines +464 to +465
ParallelStackBlurRow& operator=(const ParallelStackBlurRow &) { return *this; }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed

Comment on lines +1258 to +1259
CV_Error_( CV_StsNotImplemented,
("Unsupported input format in StackBlur, the supported formats are: CV_8U, CV_16U, CV_16S and CV_32F."));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CV_Error_ use cv::format internally.
It is not needed here, so common CV_Error should be used instead.

}

int sp1 = sp + 1;
sp1 &= -(sp1 < stackLenW);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a test code. So we don't need any kind of "optimization" obfuscation here.

if (stackStart >= kernelSize) stackStart -= kernelSize;

int sp1 = sp + 1;
sp1 &= -(sp1 < kernelSize);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure that compilers optimize this properly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The no optimized code is following:

int sp1 = sp + 1;
if (sp1 >= kernelSize)
    sp1 = 0;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and we should not obfuscate code for compilers.
More operations may block optimizations (because we can't pass assumptions into optimization transformations).

See here: https://godbolt.org/z/G3jdx5v8a

parallel_for_(Range(0, widthElem), ParallelStackBlurColumn<float, float>(dst, dst, radiusH), numOfThreads);
}
else
CV_Error_( CV_StsNotImplemented,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CV_StsNotImplemented

We should not use enums from C API anymore.


dstPtr += widthElem;
++sp;
if (sp >= kernelSize) sp = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if and its body should be placed on separate lines. This is necessary for correct "code coverage" reports.

@alalek alalek mentioned this pull request Jan 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants