Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIGBUS crash in pointSetBoundingRect() on ARMv7-A (armeabi-v7a) due to unaligned NEON load #25265

Closed
4 tasks done
db81 opened this issue Mar 25, 2024 · 3 comments · Fixed by #25484
Closed
4 tasks done
Labels
bug optimization platform: arm ARM boards related issues: RPi, NVIDIA TK/TX, etc
Milestone

Comments

@db81
Copy link

db81 commented Mar 25, 2024

System Information

OpenCV version: 3.4+
Platform: Android armeabi-v7a, presumably all 32-bit armv7-a boards with NEON enabled
Compiler: clang ver 14.0.6 in Android NDK 25b, generation of crashing instruction changes randomly with clang version

Detailed description

We have hit this crash in production here at ZenID.

Happens on 32-bit ARM (armeabi-v7a ABI) phones, including natively 64-bit phones forced to run 32-bit executable.

It may happen when feeding data to cv::boundingRect() that isn't aligned to 64 bit, depending on your luck with compiler optimization.

In our case we had

struct Foo {
...
std::array<cv::Point, 4> Outline; // alignof() == 4
...
}
...
Foo bar;
cv::boundingRect(bar.Outline);

It crashes on https://github.com/opencv/opencv/blob/843c09bc/modules/imgproc/src/shapedescr.cpp#L899

vx_load_low() uses vld1_u64(uint64_t*) intrinsic under the hood, which may translate into either vldr {d0}, [r0] or vld1.32 {d0}, [r0:64], depending on your luck with optimizer.

vldr is a generic 64 bit load wich requires alignment to 32 bit, vld1.32 would normally only require 32 bit alignment too (and supports unaligned load?), but clang adds an alignment hint (reasonably since it's being passed int64_t*) and that results in a crash.

For us it compiled to vldr with NDK 21d (clang ver 9.0.8) but then it became vld1.32 with NDK 25b (clang ver 14.0.6) and started crashing.

Godbolt repro of this behavior (play with the alignment attribute on ua64 and see what happens):


There are three ways I see of fixing it:

  1. Use int32_t* in pointSetBoundingRect().

I don't really see why it needs to operate on int64_t* when input depth is asserted to be 32 bits and it casts all the 64 bit vector loads to 32 bit anyway. Maybe there's some x86_64 SSE performance reason to do it?

This fixes the crash:

v_int32 ptXY = v_reinterpret_as_s32(v_expand_low(v_reinterpret_as_u32(vx_load_low((int32_t*)(pts + i)))));

However, somewhere in the codebase more unaligned NEON loads are sure to be lurking, hidden by vldr instructions and it may randomly explode on us in the future when the optimizer changes slightly.

  1. Make v_load/v_store intrinsics accept unaligned data.

Draft PR: #25267

This results in vld1.8 {d0}, [r0] instruction and solves all cases of load/store on unaligned address.

#if defined(__clang__) && defined(__arm__)
#define OPENCV_HAL_IMPL_NEON_LOAD_LOW_OP(_Tpvec, _Tp, suffix) \
inline _Tpvec v_load_low(const _Tp* ptr) \
{ \
typedef _Tp CV_DECL_ALIGNED(1) unaligned_ptr; \
unaligned_ptr* uptr = (unaligned_ptr*)ptr; \
return _Tpvec(vcombine_##suffix(vld1_##suffix(uptr), vdup_n_##suffix((_Tp)0))); \
}
// and same for other load/store

(in intrin_neon.hpp)

This is the solution we picked since it means we don't have to sift through the entire opencv codebase looking for potential unaligned access.

  1. Allow alignment to lane type but no higher than 32 bit in load/store

Draft PR: #25266

This is less fool-proof but also less invasive. In effect it ensures that vx_load_low() remains as vldr, which happens half the time anyway. I expect that opencv has a few more places where 64 bit NEON load aligned to 32 bit can happen, and very few (or none) places where you end up with ≤32 NEON load that is misaligned.

On the assembly level that means:

  • vld1.32 {d0}, [r0:64] => vldr {d0}, [r0]` (clang already does this half the time, non-determinisically!)
  • vld1.64 {d0}, [r0] => vld1.32 {d0}, [r0]`
  • vld1.64 {d0, d1}, [r0] => vld1.32 {d0, d1}, [r0]
  • vst1.64 {d0, d1}, [r0] => vst1.32 {d0, d1}, [r0]

So basically do this, only for 64bit load/store:

inline _Tpvec v_load_low(const _Tp* ptr) \
{ \
typedef _Tp CV_DECL_ALIGNED(4) aligned32_ptr; \
aligned32_ptr* uptr = (aligned32_ptr*)ptr; \
return _Tpvec(vcombine_##suffix(vld1_##suffix(uptr), vdup_n_##suffix((_Tp)0))); \
}

and do the same for v_load(), v_store(), etc.


I've looked at prior art: PR #16152 attempted something like my option 3 but making the 64-bit load/store completely unaligned. I think it doesn't make sense to go this far and only for 64-bit. That PR was closed with #16139 + #16463 used instead.

On the face of it, it makes sense to fix unaligned access on the algo side. In practice, that work is driven by bug reports and vldr vs vld1.32 thing is particularly nasty, because you don't know if it will compile into the crashing instruction, so it's hard to test for it even with full coverage, and crashes may stay lurking for years before suddenly appearing with a new compiler version.


Alignment requirements for the relevant instructions for your reference:
alignment-armv7-a

Steps to reproduce

Feed 64-bit misaligned (and 32-bit aligned) array of ints or floats to cv::boundingRect(), using clang ver 14.0.6 or any other version that happens to produce vld1.32 {d0}, [r0:64] instruction.

Issue submission checklist

  • I report the issue, it's not a question
  • I checked the problem with documentation, FAQ, open issues, forum.opencv.org, Stack Overflow, etc and have not found any solution
  • I updated to the latest OpenCV version and the issue is still there
  • There is reproducer code and related data files (videos, images, onnx, etc)
@db81
Copy link
Author

db81 commented Mar 25, 2024

I can run perf tests tomorrow if needed.

@opencv-alalek opencv-alalek added the platform: arm ARM boards related issues: RPi, NVIDIA TK/TX, etc label Mar 26, 2024
@asmorkalov asmorkalov added this to the 4.10.0 milestone Mar 26, 2024
@opencv-alalek
Copy link
Contributor

OpenCV SIMD implementations assume pointers alignment to the base type of used vectors.

There is the caller problem - it converts pointer to int64 without alignment check (which is invalid in C++ too):

https://github.com/opencv/opencv/blame/dad8af6b17f8e60d7b95a1203a1b4d22f56574cf/modules/imgproc/src/shapedescr.cpp#L883

Need to add code path for unaligned case for the pointSetBoundingRect function (which uses int32 instead of int64).
Take a look on example code near CV_STRONG_ALIGNMENT.


relates #13650

@mshabunin
Copy link
Contributor

@db81 , we decided to try to avoid extensive changes in the intrinsics code and fix this specific algorithm instead. Please check whether #25484 resolves your issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug optimization platform: arm ARM boards related issues: RPi, NVIDIA TK/TX, etc
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants