`SIGBUS` crash in `pointSetBoundingRect()` on ARMv7-A (armeabi-v7a) due to unaligned NEON load #25265

db81 · 2024-03-25T08:53:49Z

System Information

OpenCV version: 3.4+
Platform: Android armeabi-v7a, presumably all 32-bit armv7-a boards with NEON enabled
Compiler: clang ver 14.0.6 in Android NDK 25b, generation of crashing instruction changes randomly with clang version

Detailed description

We have hit this crash in production here at ZenID.

Happens on 32-bit ARM (armeabi-v7a ABI) phones, including natively 64-bit phones forced to run 32-bit executable.

It may happen when feeding data to cv::boundingRect() that isn't aligned to 64 bit, depending on your luck with compiler optimization.

In our case we had

struct Foo {
...
std::array<cv::Point, 4> Outline; // alignof() == 4
...
}
...
Foo bar;
cv::boundingRect(bar.Outline);

It crashes on https://github.com/opencv/opencv/blob/843c09bc/modules/imgproc/src/shapedescr.cpp#L899

vx_load_low() uses vld1_u64(uint64_t*) intrinsic under the hood, which may translate into either vldr {d0}, [r0] or vld1.32 {d0}, [r0:64], depending on your luck with optimizer.

vldr is a generic 64 bit load wich requires alignment to 32 bit, vld1.32 would normally only require 32 bit alignment too (and supports unaligned load?), but clang adds an alignment hint (reasonably since it's being passed int64_t*) and that results in a crash.

For us it compiled to vldr with NDK 21d (clang ver 9.0.8) but then it became vld1.32 with NDK 25b (clang ver 14.0.6) and started crashing.

Godbolt repro of this behavior (play with the alignment attribute on ua64 and see what happens):

clang 14.0.0 vld1.32 {d16}, [r0:64] - crash
clang 9.0.1 vldr {d16}, [r0] - no crash
also see alignment requirements from ARM reference manual at the end

There are three ways I see of fixing it:

Use int32_t* in pointSetBoundingRect().

I don't really see why it needs to operate on int64_t* when input depth is asserted to be 32 bits and it casts all the 64 bit vector loads to 32 bit anyway. Maybe there's some x86_64 SSE performance reason to do it?

This fixes the crash:

v_int32 ptXY = v_reinterpret_as_s32(v_expand_low(v_reinterpret_as_u32(vx_load_low((int32_t*)(pts + i)))));

However, somewhere in the codebase more unaligned NEON loads are sure to be lurking, hidden by vldr instructions and it may randomly explode on us in the future when the optimizer changes slightly.

Make v_load/v_store intrinsics accept unaligned data.

Draft PR: #25267

This results in vld1.8 {d0}, [r0] instruction and solves all cases of load/store on unaligned address.

#if defined(__clang__) && defined(__arm__)
#define OPENCV_HAL_IMPL_NEON_LOAD_LOW_OP(_Tpvec, _Tp, suffix) \
inline _Tpvec v_load_low(const _Tp* ptr) \
{ \
typedef _Tp CV_DECL_ALIGNED(1) unaligned_ptr; \
unaligned_ptr* uptr = (unaligned_ptr*)ptr; \
return _Tpvec(vcombine_##suffix(vld1_##suffix(uptr), vdup_n_##suffix((_Tp)0))); \
}
// and same for other load/store

(in intrin_neon.hpp)

This is the solution we picked since it means we don't have to sift through the entire opencv codebase looking for potential unaligned access.

Allow alignment to lane type but no higher than 32 bit in load/store

Draft PR: #25266

This is less fool-proof but also less invasive. In effect it ensures that vx_load_low() remains as vldr, which happens half the time anyway. I expect that opencv has a few more places where 64 bit NEON load aligned to 32 bit can happen, and very few (or none) places where you end up with ≤32 NEON load that is misaligned.

On the assembly level that means:

vld1.32 {d0}, [r0:64] => vldr {d0}, [r0]` (clang already does this half the time, non-determinisically!)
vld1.64 {d0}, [r0] => vld1.32 {d0}, [r0]`
vld1.64 {d0, d1}, [r0] => vld1.32 {d0, d1}, [r0]
vst1.64 {d0, d1}, [r0] => vst1.32 {d0, d1}, [r0]

So basically do this, only for 64bit load/store:

inline _Tpvec v_load_low(const _Tp* ptr) \
{ \
typedef _Tp CV_DECL_ALIGNED(4) aligned32_ptr; \
aligned32_ptr* uptr = (aligned32_ptr*)ptr; \
return _Tpvec(vcombine_##suffix(vld1_##suffix(uptr), vdup_n_##suffix((_Tp)0))); \
}

and do the same for v_load(), v_store(), etc.

I've looked at prior art: PR #16152 attempted something like my option 3 but making the 64-bit load/store completely unaligned. I think it doesn't make sense to go this far and only for 64-bit. That PR was closed with #16139 + #16463 used instead.

On the face of it, it makes sense to fix unaligned access on the algo side. In practice, that work is driven by bug reports and vldr vs vld1.32 thing is particularly nasty, because you don't know if it will compile into the crashing instruction, so it's hard to test for it even with full coverage, and crashes may stay lurking for years before suddenly appearing with a new compiler version.

Alignment requirements for the relevant instructions for your reference:

Steps to reproduce

Feed 64-bit misaligned (and 32-bit aligned) array of ints or floats to cv::boundingRect(), using clang ver 14.0.6 or any other version that happens to produce vld1.32 {d0}, [r0:64] instruction.

Issue submission checklist

I report the issue, it's not a question
I checked the problem with documentation, FAQ, open issues, forum.opencv.org, Stack Overflow, etc and have not found any solution
I updated to the latest OpenCV version and the issue is still there
There is reproducer code and related data files (videos, images, onnx, etc)

The text was updated successfully, but these errors were encountered:

db81 · 2024-03-25T09:19:19Z

I can run perf tests tomorrow if needed.

opencv-alalek · 2024-03-29T07:37:48Z

OpenCV SIMD implementations assume pointers alignment to the base type of used vectors.

There is the caller problem - it converts pointer to int64 without alignment check (which is invalid in C++ too):

https://github.com/opencv/opencv/blame/dad8af6b17f8e60d7b95a1203a1b4d22f56574cf/modules/imgproc/src/shapedescr.cpp#L883

Need to add code path for unaligned case for the pointSetBoundingRect function (which uses int32 instead of int64).
Take a look on example code near CV_STRONG_ALIGNMENT.

relates #13650

mshabunin · 2024-05-03T17:21:11Z

@db81 , we decided to try to avoid extensive changes in the intrinsics code and fix this specific algorithm instead. Please check whether #25484 resolves your issue.

db81 added the bug label Mar 25, 2024

This was referenced Mar 25, 2024

Permit alignment of 64-bit NEON loadstore to 32-bit on armv7-a #25266

Closed

Force unaligned load/store for NEON on armv7-a #25267

Closed

opencv-alalek added the platform: arm ARM boards related issues: RPi, NVIDIA TK/TX, etc label Mar 26, 2024

asmorkalov added this to the 4.10.0 milestone Mar 26, 2024

opencv-alalek added the optimization label Mar 29, 2024

mshabunin mentioned this issue Apr 24, 2024

imgproc: improved test for boundingRect #25484

Merged

asmorkalov closed this as completed in #25484 May 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`SIGBUS` crash in `pointSetBoundingRect()` on ARMv7-A (armeabi-v7a) due to unaligned NEON load #25265

`SIGBUS` crash in `pointSetBoundingRect()` on ARMv7-A (armeabi-v7a) due to unaligned NEON load #25265

db81 commented Mar 25, 2024 •

edited

Loading

Use `int32_t*` in `pointSetBoundingRect()`.

Make v_load/v_store intrinsics accept unaligned data.

Allow alignment to lane type but no higher than 32 bit in load/store

db81 commented Mar 25, 2024

opencv-alalek commented Mar 29, 2024

mshabunin commented May 3, 2024

SIGBUS crash in pointSetBoundingRect() on ARMv7-A (armeabi-v7a) due to unaligned NEON load #25265

SIGBUS crash in pointSetBoundingRect() on ARMv7-A (armeabi-v7a) due to unaligned NEON load #25265

Comments

db81 commented Mar 25, 2024 • edited Loading

System Information

Detailed description

Use int32_t* in pointSetBoundingRect().

Make v_load/v_store intrinsics accept unaligned data.

Allow alignment to lane type but no higher than 32 bit in load/store

Steps to reproduce

Issue submission checklist

db81 commented Mar 25, 2024

opencv-alalek commented Mar 29, 2024

mshabunin commented May 3, 2024

`SIGBUS` crash in `pointSetBoundingRect()` on ARMv7-A (armeabi-v7a) due to unaligned NEON load #25265

`SIGBUS` crash in `pointSetBoundingRect()` on ARMv7-A (armeabi-v7a) due to unaligned NEON load #25265

db81 commented Mar 25, 2024 •

edited

Loading

Use `int32_t*` in `pointSetBoundingRect()`.