Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] rounding rule for cv::divide() #24213

Closed
4 tasks done
Kumataro opened this issue Sep 1, 2023 · 2 comments · Fixed by #24215
Closed
4 tasks done

[RFC] rounding rule for cv::divide() #24213

Kumataro opened this issue Sep 1, 2023 · 2 comments · Fixed by #24215
Labels
bug category: core optimization platform: arm ARM boards related issues: RPi, NVIDIA TK/TX, etc
Milestone

Comments

@Kumataro
Copy link
Contributor

Kumataro commented Sep 1, 2023

System Information

OpenCV version: 4.x branch
Operating System / Platform: Ubuntu 20.04 (Raspi4, arm64)
Compiler & compiler version: GCC 9.3.0

Detailed description

This issue is about cv::divide(). related with #24074
Rounding rule for it is not describled at cv::divide()

In arm64, it works with Round to nearest, ties away from zero
In x86-64, it works with Round to nearest, ties to even.
(Possibly, the behavior may change according to the rounding mode specification of the floating point unit.)

Q. Their results are different. Which are these behaviours correct/better ?

If arm64 behaviours should be fixed...

For arm64, I think it can fix following patch.
However v_round() function seems to used many times.
I feel the risk of breaking backwards compatibility.

I would appreciate it if you could comment on this issue.

before : https://developer.arm.com/architectures/instruction-sets/intrinsics/vcvtaq_s64_f64
after : https://developer.arm.com/architectures/instruction-sets/intrinsics/vcvtnq_s64_f64

diff --git a/modules/core/include/opencv2/core/hal/intrin_neon.hpp b/modules/core/include/opencv2/core/hal/intrin_neon.hpp
index 6f8973231b..14eb180819 100644
--- a/modules/core/include/opencv2/core/hal/intrin_neon.hpp
+++ b/modules/core/include/opencv2/core/hal/intrin_neon.hpp
@@ -1997,12 +1997,12 @@ inline v_int32x4 v_trunc(const v_float32x4& a)
 inline v_int32x4 v_round(const v_float64x2& a)
 {
     static const int32x2_t zero = vdup_n_s32(0);
-    return v_int32x4(vcombine_s32(vmovn_s64(vcvtaq_s64_f64(a.val)), zero));
+    return v_int32x4(vcombine_s32(vmovn_s64(vcvtnq_s64_f64(a.val)), zero));
 }

 inline v_int32x4 v_round(const v_float64x2& a, const v_float64x2& b)
 {
-    return v_int32x4(vcombine_s32(vmovn_s64(vcvtaq_s64_f64(a.val)), vmovn_s64(vcvtaq_s64_f64(b.val))));
+    return v_int32x4(vcombine_s32(vmovn_s64(vcvtnq_s64_f64(a.val)), vmovn_s64(vcvtnq_s64_f64(b.val))));
 }

Steps to reproduce

#include <opencv2/core.hpp>
#include <iostream>

int main(void)
{
  cv::Mat src1 = (cv::Mat_<uchar>(3,3) << 25,23,0, 0,0,0, 0,0,0 );
  std::cout << src1 << std::endl;
  cv::Mat dst;
  cv::divide(src1, 2, dst );
  std::cout << dst << std::endl;
  return 0;
}
[x86-64]
kmtr@kmtr-VMware-Virtual-Platform:~/work/studyT2$ ./a.out
[ 25,  23,   0;
   0,   0,   0;
   0,   0,   0]
[ 12,  12,   0;
   0,   0,   0;
   0,   0,   0]

[arm64]
kmtr@ubuntu:~/work/build4-main/study$ ./a.out
[ 25,  23,   0;
   0,   0,   0;
   0,   0,   0]
[ 13,  12,   0;
   0,   0,   0;
   0,   0,   0]

Issue submission checklist

  • I report the issue, it's not a question
  • I checked the problem with documentation, FAQ, open issues, forum.opencv.org, Stack Overflow, etc and have not found any solution
  • I updated to the latest OpenCV version and the issue is still there
  • There is reproducer code and related data files (videos, images, onnx, etc)
@Kumataro Kumataro added the bug label Sep 1, 2023
@opencv-alalek opencv-alalek added optimization category: core RFC platform: arm ARM boards related issues: RPi, NVIDIA TK/TX, etc labels Sep 2, 2023
@opencv-alalek
Copy link
Contributor

Perhaps it make sense to update intrinsic tests with v_round(): https://github.com/opencv/opencv/blob/4.8.0/modules/core/test/test_intrin_utils.hpp#L1483

@Kumataro
Copy link
Contributor Author

Kumataro commented Sep 2, 2023

Thank you for your comment !

I felt that the existing tests were sufficient to confirm the impact and the side effects.
I tried the existing tests, including core module, on 64bit arm Ubuntu running on Raspi4 and found no new issues.
I created Pull Request with new test code.

This fix should also affect MacOS ARM64, but I don't have the equipment and couldn't try it. I'm sorry

Supplement.
The same Neon instruction is used in v_truncate().
When using v_truncate, I think it is better to move it to the zero side when the decimal part becomes 0.5.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug category: core optimization platform: arm ARM boards related issues: RPi, NVIDIA TK/TX, etc
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants