Skip to content

Conversation

@pratham-mcw
Copy link
Contributor

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.

  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV

  • The PR is proposed to the proper branch

  • This PR improves the performance of the Weighted Filter function from the ximgproc module on Windows on ARM64.

  • The optimization is achieved by unrolling two performance-critical loops in the generateCentersPP function in modules/core/src/kmeans.cpp, which is internally used by the Weighted Filter function.

  • The unrolling is enabled only for ARM64 builds using #if defined(_M_ARM64) guards to preserve compatibility and maintain performance on other architectures.

Performance Improvements:

  • Improves execution time for Weighted Filter performance tests on ARM64 without affecting other platforms.
image

{
double p = (double)rng*sum0;
int ci = 0;
#if defined(_M_ARM64)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#if CV_ENABLE_UNROLLED should be checked too.

@pratham-mcw
Copy link
Contributor Author

Hi @opencv-alalek , just following up on this PR. This loop unrolling has resulted in significant performance improvements in the WeightedMedianFilterTest function from the ximgproc module.
Please let me know if any additional changes are needed.

@asmorkalov
Copy link
Contributor

The patch shows speedup on other platforms too.
Jetson orin (1 thread):

perf::WeightedMedianFilterTest::(127x61, 8UC1, 1, 1, 3, 1)                      5.613           5.762             0.97      
perf::WeightedMedianFilterTest::(127x61, 8UC1, 1, 1, 3, 8)                      4.640           4.680             0.99      
perf::WeightedMedianFilterTest::(127x61, 8UC1, 1, 1, 5, 1)                      7.481           7.539             0.99      
perf::WeightedMedianFilterTest::(127x61, 8UC1, 1, 1, 5, 8)                      6.280           6.360             0.99      
perf::WeightedMedianFilterTest::(127x61, 8UC1, 1, 3, 3, 1)                      16.595          16.894            0.98      
perf::WeightedMedianFilterTest::(127x61, 8UC1, 1, 3, 3, 8)                      13.800          13.934            0.99      
perf::WeightedMedianFilterTest::(127x61, 8UC1, 1, 3, 5, 1)                      22.091          22.562            0.98      
perf::WeightedMedianFilterTest::(127x61, 8UC1, 1, 3, 5, 8)                      18.712          18.959            0.99      
perf::WeightedMedianFilterTest::(127x61, 8UC1, 3, 1, 3, 1)                      61.607          56.737            1.09      
perf::WeightedMedianFilterTest::(127x61, 8UC1, 3, 1, 3, 8)                      57.787          53.061            1.09      
perf::WeightedMedianFilterTest::(127x61, 8UC1, 3, 1, 5, 1)                      63.723          58.955            1.08      
perf::WeightedMedianFilterTest::(127x61, 8UC1, 3, 1, 5, 8)                      59.476          54.763            1.09      
perf::WeightedMedianFilterTest::(127x61, 8UC1, 3, 3, 3, 1)                      76.840          72.219            1.06      
perf::WeightedMedianFilterTest::(127x61, 8UC1, 3, 3, 3, 8)                      67.145          62.248            1.08      
perf::WeightedMedianFilterTest::(127x61, 8UC1, 3, 3, 5, 1)                      83.989          79.565            1.06      
perf::WeightedMedianFilterTest::(127x61, 8UC1, 3, 3, 5, 8)                      72.077          67.245            1.07      
perf::WeightedMedianFilterTest::(127x61, 32FC1, 1, 1, 3, 1)                     6.336           6.351             1.00      
perf::WeightedMedianFilterTest::(127x61, 32FC1, 1, 1, 3, 8)                     5.274           5.324             0.99      
perf::WeightedMedianFilterTest::(127x61, 32FC1, 1, 1, 5, 1)                     8.024           8.117             0.99      
perf::WeightedMedianFilterTest::(127x61, 32FC1, 1, 1, 5, 8)                     6.957           7.014             0.99      
perf::WeightedMedianFilterTest::(127x61, 32FC1, 1, 3, 3, 1)                     18.490          18.875            0.98      
perf::WeightedMedianFilterTest::(127x61, 32FC1, 1, 3, 3, 8)                     15.817          16.042            0.99      
perf::WeightedMedianFilterTest::(127x61, 32FC1, 1, 3, 5, 1)                     23.945          24.190            0.99      
perf::WeightedMedianFilterTest::(127x61, 32FC1, 1, 3, 5, 8)                     21.193          21.377            0.99      
perf::WeightedMedianFilterTest::(127x61, 32FC1, 3, 1, 3, 1)                     61.637          57.000            1.08      
perf::WeightedMedianFilterTest::(127x61, 32FC1, 3, 1, 3, 8)                     58.339          53.675            1.09      
perf::WeightedMedianFilterTest::(127x61, 32FC1, 3, 1, 5, 1)                     64.093          59.623            1.07      
perf::WeightedMedianFilterTest::(127x61, 32FC1, 3, 1, 5, 8)                     60.049          55.426            1.08      
perf::WeightedMedianFilterTest::(127x61, 32FC1, 3, 3, 3, 1)                     78.500          74.123            1.06      
perf::WeightedMedianFilterTest::(127x61, 32FC1, 3, 3, 3, 8)                     69.478          64.675            1.07      
perf::WeightedMedianFilterTest::(127x61, 32FC1, 3, 3, 5, 1)                     86.127          81.893            1.05      
perf::WeightedMedianFilterTest::(127x61, 32FC1, 3, 3, 5, 8)                     74.529          69.926            1.07      
perf::WeightedMedianFilterTest::(320x240, 8UC1, 1, 1, 3, 1)                     42.900          44.201            0.97      
perf::WeightedMedianFilterTest::(320x240, 8UC1, 1, 1, 3, 8)                     35.133          35.400            0.99      
perf::WeightedMedianFilterTest::(320x240, 8UC1, 1, 1, 5, 1)                     63.325          63.416            1.00      
perf::WeightedMedianFilterTest::(320x240, 8UC1, 1, 1, 5, 8)                     53.555          53.661            1.00      
perf::WeightedMedianFilterTest::(320x240, 8UC1, 1, 3, 3, 1)                    128.794         127.906            1.01      
perf::WeightedMedianFilterTest::(320x240, 8UC1, 1, 3, 3, 8)                    104.578         103.882            1.01      
perf::WeightedMedianFilterTest::(320x240, 8UC1, 1, 3, 5, 1)                    190.296         189.524            1.00      
perf::WeightedMedianFilterTest::(320x240, 8UC1, 1, 3, 5, 8)                    158.391         158.067            1.00      
perf::WeightedMedianFilterTest::(320x240, 8UC1, 3, 1, 3, 1)                    518.141         477.469            1.09      
perf::WeightedMedianFilterTest::(320x240, 8UC1, 3, 1, 3, 8)                    490.418         445.079            1.10      
perf::WeightedMedianFilterTest::(320x240, 8UC1, 3, 1, 5, 1)                    547.558         503.454            1.09      
perf::WeightedMedianFilterTest::(320x240, 8UC1, 3, 1, 5, 8)                    508.018         463.536            1.10      
perf::WeightedMedianFilterTest::(320x240, 8UC1, 3, 3, 3, 1)                    655.150         611.742            1.07      
perf::WeightedMedianFilterTest::(320x240, 8UC1, 3, 3, 3, 8)                    562.439         519.937            1.08      
perf::WeightedMedianFilterTest::(320x240, 8UC1, 3, 3, 5, 1)                    742.098         698.097            1.06      
perf::WeightedMedianFilterTest::(320x240, 8UC1, 3, 3, 5, 8)                    617.746         572.690            1.08      
perf::WeightedMedianFilterTest::(320x240, 32FC1, 1, 1, 3, 1)                    52.103          53.030            0.98      
perf::WeightedMedianFilterTest::(320x240, 32FC1, 1, 1, 3, 8)                    43.120          43.108            1.00      
perf::WeightedMedianFilterTest::(320x240, 32FC1, 1, 1, 5, 1)                    69.760          69.611            1.00      
perf::WeightedMedianFilterTest::(320x240, 32FC1, 1, 1, 5, 8)                    59.296          59.311            1.00      
perf::WeightedMedianFilterTest::(320x240, 32FC1, 1, 3, 3, 1)                   154.399         155.773            0.99      
perf::WeightedMedianFilterTest::(320x240, 32FC1, 1, 3, 3, 8)                   127.825         127.079            1.01      
perf::WeightedMedianFilterTest::(320x240, 32FC1, 1, 3, 5, 1)                   207.757         207.554            1.00      
perf::WeightedMedianFilterTest::(320x240, 32FC1, 1, 3, 5, 8)                   177.547         176.632            1.01      
perf::WeightedMedianFilterTest::(320x240, 32FC1, 3, 1, 3, 1)                   527.855         484.709            1.09      
perf::WeightedMedianFilterTest::(320x240, 32FC1, 3, 1, 3, 8)                   496.185         452.700            1.10      
perf::WeightedMedianFilterTest::(320x240, 32FC1, 3, 1, 5, 1)                   552.866         510.109            1.08      
perf::WeightedMedianFilterTest::(320x240, 32FC1, 3, 1, 5, 8)                   514.495         471.103            1.09      
perf::WeightedMedianFilterTest::(320x240, 32FC1, 3, 3, 3, 1)                   678.073         635.868            1.07      
perf::WeightedMedianFilterTest::(320x240, 32FC1, 3, 3, 3, 8)                   584.062         540.850            1.08      
perf::WeightedMedianFilterTest::(320x240, 32FC1, 3, 3, 5, 1)                   755.872         711.554            1.06      
perf::WeightedMedianFilterTest::(320x240, 32FC1, 3, 3, 5, 8)                   631.435         588.922            1.07  

@asmorkalov
Copy link
Contributor

Mac M1 (single thread):

perf::WeightedMedianFilterTest::(127x61, 8UC1, 1, 1, 3, 1)                      3.431           3.430             1.00      
perf::WeightedMedianFilterTest::(127x61, 8UC1, 1, 1, 3, 8)                      2.500           2.508             1.00      
perf::WeightedMedianFilterTest::(127x61, 8UC1, 1, 1, 5, 1)                      5.404           5.441             0.99      
perf::WeightedMedianFilterTest::(127x61, 8UC1, 1, 1, 5, 8)                      4.355           4.356             1.00      
perf::WeightedMedianFilterTest::(127x61, 8UC1, 1, 3, 3, 1)                      9.293           9.325             1.00      
perf::WeightedMedianFilterTest::(127x61, 8UC1, 1, 3, 3, 8)                      7.616           7.628             1.00      
perf::WeightedMedianFilterTest::(127x61, 8UC1, 1, 3, 5, 1)                      15.398          15.402            1.00      
perf::WeightedMedianFilterTest::(127x61, 8UC1, 1, 3, 5, 8)                      13.146          13.183            1.00      
perf::WeightedMedianFilterTest::(127x61, 8UC1, 3, 1, 3, 1)                      40.274          35.593            1.13      
perf::WeightedMedianFilterTest::(127x61, 8UC1, 3, 1, 3, 8)                      38.297          33.541            1.14      
perf::WeightedMedianFilterTest::(127x61, 8UC1, 3, 1, 5, 1)                      43.501          38.418            1.13      
perf::WeightedMedianFilterTest::(127x61, 8UC1, 3, 1, 5, 8)                      40.084          35.389            1.13      
perf::WeightedMedianFilterTest::(127x61, 8UC1, 3, 3, 3, 1)                      49.111          44.379            1.11      
perf::WeightedMedianFilterTest::(127x61, 8UC1, 3, 3, 3, 8)                      43.205          38.502            1.12      
perf::WeightedMedianFilterTest::(127x61, 8UC1, 3, 3, 5, 1)                      57.835          53.440            1.08      
perf::WeightedMedianFilterTest::(127x61, 8UC1, 3, 3, 5, 8)                      48.658          43.977            1.11      
perf::WeightedMedianFilterTest::(127x61, 32FC1, 1, 1, 3, 1)                     4.200           4.231             0.99      
perf::WeightedMedianFilterTest::(127x61, 32FC1, 1, 1, 3, 8)                     3.563           3.556             1.00      
perf::WeightedMedianFilterTest::(127x61, 32FC1, 1, 1, 5, 1)                     5.822           5.851             1.00      
perf::WeightedMedianFilterTest::(127x61, 32FC1, 1, 1, 5, 8)                     5.375           5.402             1.00      
perf::WeightedMedianFilterTest::(127x61, 32FC1, 1, 3, 3, 1)                     11.595          11.631            1.00      
perf::WeightedMedianFilterTest::(127x61, 32FC1, 1, 3, 3, 8)                     9.885           9.922             1.00      
perf::WeightedMedianFilterTest::(127x61, 32FC1, 1, 3, 5, 1)                     17.638          17.691            1.00      
perf::WeightedMedianFilterTest::(127x61, 32FC1, 1, 3, 5, 8)                     15.443          15.450            1.00      
perf::WeightedMedianFilterTest::(127x61, 32FC1, 3, 1, 3, 1)                     40.916          36.235            1.13      
perf::WeightedMedianFilterTest::(127x61, 32FC1, 3, 1, 3, 8)                     39.054          34.324            1.14      
perf::WeightedMedianFilterTest::(127x61, 32FC1, 3, 1, 5, 1)                     43.756          39.053            1.12      
perf::WeightedMedianFilterTest::(127x61, 32FC1, 3, 1, 5, 8)                     40.885          36.155            1.13      
perf::WeightedMedianFilterTest::(127x61, 32FC1, 3, 3, 3, 1)                     50.974          46.305            1.10      
perf::WeightedMedianFilterTest::(127x61, 32FC1, 3, 3, 3, 8)                     45.747          41.075            1.11      
perf::WeightedMedianFilterTest::(127x61, 32FC1, 3, 3, 5, 1)                     59.946          55.305            1.08      
perf::WeightedMedianFilterTest::(127x61, 32FC1, 3, 3, 5, 8)                     51.256          46.585            1.10      
perf::WeightedMedianFilterTest::(320x240, 8UC1, 1, 1, 3, 1)                     28.288          28.307            1.00      
perf::WeightedMedianFilterTest::(320x240, 8UC1, 1, 1, 3, 8)                     22.765          22.760            1.00      
perf::WeightedMedianFilterTest::(320x240, 8UC1, 1, 1, 5, 1)                     49.264          49.230            1.00      
perf::WeightedMedianFilterTest::(320x240, 8UC1, 1, 1, 5, 8)                     41.924          41.884            1.00      
perf::WeightedMedianFilterTest::(320x240, 8UC1, 1, 3, 3, 1)                     84.328          84.530            1.00      
perf::WeightedMedianFilterTest::(320x240, 8UC1, 1, 3, 3, 8)                     67.971          67.791            1.00      
perf::WeightedMedianFilterTest::(320x240, 8UC1, 1, 3, 5, 1)                    147.393         147.502            1.00      
perf::WeightedMedianFilterTest::(320x240, 8UC1, 1, 3, 5, 8)                    125.255         125.621            1.00      
perf::WeightedMedianFilterTest::(320x240, 8UC1, 3, 1, 3, 1)                    349.427         308.143            1.13      
perf::WeightedMedianFilterTest::(320x240, 8UC1, 3, 1, 3, 8)                    330.925         289.114            1.14      
perf::WeightedMedianFilterTest::(320x240, 8UC1, 3, 1, 5, 1)                    379.929         337.975            1.12      
perf::WeightedMedianFilterTest::(320x240, 8UC1, 3, 1, 5, 8)                    349.876         308.497            1.13      
perf::WeightedMedianFilterTest::(320x240, 8UC1, 3, 3, 3, 1)                    432.664         390.851            1.11      
perf::WeightedMedianFilterTest::(320x240, 8UC1, 3, 3, 3, 8)                    375.992         334.852            1.12      
perf::WeightedMedianFilterTest::(320x240, 8UC1, 3, 3, 5, 1)                    522.695         480.914            1.09      
perf::WeightedMedianFilterTest::(320x240, 8UC1, 3, 3, 5, 8)                    433.439         391.867            1.11      
perf::WeightedMedianFilterTest::(320x240, 32FC1, 1, 1, 3, 1)                    36.239          36.261            1.00      
perf::WeightedMedianFilterTest::(320x240, 32FC1, 1, 1, 3, 8)                    30.962          30.979            1.00      
perf::WeightedMedianFilterTest::(320x240, 32FC1, 1, 1, 5, 1)                    57.712          58.029            0.99      
perf::WeightedMedianFilterTest::(320x240, 32FC1, 1, 1, 5, 8)                    50.134          50.163            1.00      
perf::WeightedMedianFilterTest::(320x240, 32FC1, 1, 3, 3, 1)                   108.282         108.477            1.00      
perf::WeightedMedianFilterTest::(320x240, 32FC1, 1, 3, 3, 8)                    91.631          91.816            1.00      
perf::WeightedMedianFilterTest::(320x240, 32FC1, 1, 3, 5, 1)                   171.448         171.550            1.00      
perf::WeightedMedianFilterTest::(320x240, 32FC1, 1, 3, 5, 8)                   149.451         149.238            1.00      
perf::WeightedMedianFilterTest::(320x240, 32FC1, 3, 1, 3, 1)                   357.812         316.149            1.13      
perf::WeightedMedianFilterTest::(320x240, 32FC1, 3, 1, 3, 8)                   339.117         297.757            1.14      
perf::WeightedMedianFilterTest::(320x240, 32FC1, 3, 1, 5, 1)                   388.244         346.507            1.12      
perf::WeightedMedianFilterTest::(320x240, 32FC1, 3, 1, 5, 8)                   358.094         316.445            1.13      
perf::WeightedMedianFilterTest::(320x240, 32FC1, 3, 3, 3, 1)                   456.489         415.215            1.10      
perf::WeightedMedianFilterTest::(320x240, 32FC1, 3, 3, 3, 8)                   400.089         358.602            1.12      
perf::WeightedMedianFilterTest::(320x240, 32FC1, 3, 3, 5, 1)                   546.792         505.669            1.08      
perf::WeightedMedianFilterTest::(320x240, 32FC1, 3, 3, 5, 8)                   457.746         416.122            1.10   

@asmorkalov asmorkalov self-assigned this Sep 23, 2025
@asmorkalov asmorkalov added this to the 4.13.0 milestone Sep 23, 2025
@asmorkalov asmorkalov merged commit 95354f0 into opencv:4.x Sep 23, 2025
27 of 28 checks passed
@asmorkalov asmorkalov mentioned this pull request Sep 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants