New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core: Workaround flip horiz #16152
core: Workaround flip horiz #16152
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
Looks like there is similar issue with float32
(on TK1).
-OPENCV_HAL_IMPL_NEON_WORKAROUND_64
+OPENCV_HAL_IMPL_NEON_LOADSTORE_OP_BYTEIO
Long term note:
Unfortunately alignment requirements for v_load()
is not specified. We definitely should define something (and make some micro-benchmarks for byte-based I/O workaround).
On ARM you just can't access int*
array through C++ if pointer is not aligned on 4 bytes.
inline _Tpvec v_load(const _Tp* ptr) \ | ||
{ return _Tpvec(vreinterpretq_##suffix##_u8(vld1q_u8((const unsigned char*)ptr))); } \ | ||
inline _Tpvec v_load_aligned(const _Tp* ptr) \ | ||
{ return _Tpvec(vreinterpretq_##suffix##_u8(vld1q_u8((const unsigned char*)ptr))); } \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_aligned
This requires that pointer is aligned on sizeof(_Tpv)
(vector size, 16 bytes). So "bytes"-way is not needed here.
There is same note for v_store_aligned()
calls.
In addition, there is "C++ emulator" SIMD backend (intrin_cpp). It can't work properly with unaligned pointers on ARM platforms (even "int*" (not 64-bit) would lead to crashed - there are several issues about that). So my suggestion is to declare requirement of "base type" alignment for passed pointers (like C++ compiler does). If algorithm can't guarantee alignment then it should use "reinterpret" trick over uint8 vectors. |
Could we use something like |
Has much stronger requirements. It requires alignment on whole vector size (16 bytes in case of SIMD128), instead of base type. Requirement for base type alignment is not heavy, otherwise common C++ tail processing after SIMD loops just can't work too. Problem comes with type punning (pair of "int" => "int64", or "char"s => "int") without alignment checks.
Unaligned pointers can be handled through Moreover it is not about There is "fixed" tests for experiments with unaligned pointers: https://github.com/alalek/opencv/commits/simd_unaligned_load |
Type-punned pointers are already "non grata" in C++ so that will be just another reason to cast pointers safely and responsibly. Universal intrinsics aren't the only place to care about proper data alignment. There is tail processing that won't be affected by a stricter v_load behavior and there could be a lot more places to add unaligned type casts. So stricter v_load won't fix the problem root cause while causing additional typing to handle it. I don't think there will be a lot of places(if any) that will handle this change in a way different from v_reinterpret so why should we burden user with this additional typing if we could conceal it inside the intrinsic? I suppose CV_DECL_ALIGNED(1) could be useful with any type, however there is another solution to choose from. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This workaround would be good for ARMv8(aarch32) also I think reinterpreting and loading over u32 may be better but in the case of ARMv7-A, we still have to deal with other types since
- LDRB/STRB - address must be byte aligned
- LDRH/STRH - address must be 2-byte aligned
- LDR/STR - address must be 4-byte aligned
can we just count on GCC |
There is -fsanitize=alignment in Clang, but it doesn't work with intrinsic functions (only direct pointers access). |
The solution replaced by #16463. |
This pullrequest changes
SIGBUS
when loadingv_uint64x2
/v_int64x2
on 32bit Arm platform