SIGBUS
crash in pointSetBoundingRect()
on ARMv7-A (armeabi-v7a) due to unaligned NEON load
#25265
Closed
4 tasks done
Labels
Milestone
System Information
OpenCV version: 3.4+
Platform: Android armeabi-v7a, presumably all 32-bit armv7-a boards with NEON enabled
Compiler: clang ver 14.0.6 in Android NDK 25b, generation of crashing instruction changes randomly with clang version
Detailed description
We have hit this crash in production here at ZenID.
Happens on 32-bit ARM (armeabi-v7a ABI) phones, including natively 64-bit phones forced to run 32-bit executable.
It may happen when feeding data to
cv::boundingRect()
that isn't aligned to 64 bit, depending on your luck with compiler optimization.In our case we had
It crashes on https://github.com/opencv/opencv/blob/843c09bc/modules/imgproc/src/shapedescr.cpp#L899
vx_load_low()
usesvld1_u64(uint64_t*)
intrinsic under the hood, which may translate into eithervldr {d0}, [r0]
orvld1.32 {d0}, [r0:64]
, depending on your luck with optimizer.vldr
is a generic 64 bit load wich requires alignment to 32 bit,vld1.32
would normally only require 32 bit alignment too (and supports unaligned load?), but clang adds an alignment hint (reasonably since it's being passed int64_t*) and that results in a crash.For us it compiled to
vldr
with NDK 21d (clang ver 9.0.8) but then it becamevld1.32
with NDK 25b (clang ver 14.0.6) and started crashing.Godbolt repro of this behavior (play with the alignment attribute on
ua64
and see what happens):vld1.32 {d16}, [r0:64]
- crashvldr {d16}, [r0]
- no crashalso see alignment requirements from ARM reference manual at the end
There are three ways I see of fixing it:
Use
int32_t*
inpointSetBoundingRect()
.I don't really see why it needs to operate on
int64_t*
when input depth is asserted to be 32 bits and it casts all the 64 bit vector loads to 32 bit anyway. Maybe there's some x86_64 SSE performance reason to do it?This fixes the crash:
However, somewhere in the codebase more unaligned NEON loads are sure to be lurking, hidden by
vldr
instructions and it may randomly explode on us in the future when the optimizer changes slightly.Make v_load/v_store intrinsics accept unaligned data.
Draft PR: #25267
This results in
vld1.8 {d0}, [r0]
instruction and solves all cases of load/store on unaligned address.(in
intrin_neon.hpp
)This is the solution we picked since it means we don't have to sift through the entire opencv codebase looking for potential unaligned access.
Allow alignment to lane type but no higher than 32 bit in load/store
Draft PR: #25266
This is less fool-proof but also less invasive. In effect it ensures that
vx_load_low()
remains asvldr
, which happens half the time anyway. I expect that opencv has a few more places where 64 bit NEON load aligned to 32 bit can happen, and very few (or none) places where you end up with ≤32 NEON load that is misaligned.On the assembly level that means:
vld1.32 {d0}, [r0:64] =>
vldr {d0}, [r0]` (clang already does this half the time, non-determinisically!)vld1.64 {d0}, [r0] =>
vld1.32 {d0}, [r0]`vld1.64 {d0, d1}, [r0]
=>vld1.32 {d0, d1}, [r0]
vst1.64 {d0, d1}, [r0]
=>vst1.32 {d0, d1}, [r0]
So basically do this, only for 64bit load/store:
and do the same for
v_load()
,v_store()
, etc.I've looked at prior art: PR #16152 attempted something like my option 3 but making the 64-bit load/store completely unaligned. I think it doesn't make sense to go this far and only for 64-bit. That PR was closed with #16139 + #16463 used instead.
On the face of it, it makes sense to fix unaligned access on the algo side. In practice, that work is driven by bug reports and
vldr
vsvld1.32
thing is particularly nasty, because you don't know if it will compile into the crashing instruction, so it's hard to test for it even with full coverage, and crashes may stay lurking for years before suddenly appearing with a new compiler version.Alignment requirements for the relevant instructions for your reference:
Steps to reproduce
Feed 64-bit misaligned (and 32-bit aligned) array of ints or floats to
cv::boundingRect()
, using clang ver 14.0.6 or any other version that happens to producevld1.32 {d0}, [r0:64]
instruction.Issue submission checklist
The text was updated successfully, but these errors were encountered: