Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: Workaround flip horiz #16152

Closed
wants to merge 5 commits into from

Conversation

tomoaki0705
Copy link
Contributor

This pullrequest changes

Copy link
Member

@alalek alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

Looks like there is similar issue with float32 (on TK1).

-OPENCV_HAL_IMPL_NEON_WORKAROUND_64
+OPENCV_HAL_IMPL_NEON_LOADSTORE_OP_BYTEIO


Long term note:
Unfortunately alignment requirements for v_load() is not specified. We definitely should define something (and make some micro-benchmarks for byte-based I/O workaround).
On ARM you just can't access int* array through C++ if pointer is not aligned on 4 bytes.

inline _Tpvec v_load(const _Tp* ptr) \
{ return _Tpvec(vreinterpretq_##suffix##_u8(vld1q_u8((const unsigned char*)ptr))); } \
inline _Tpvec v_load_aligned(const _Tp* ptr) \
{ return _Tpvec(vreinterpretq_##suffix##_u8(vld1q_u8((const unsigned char*)ptr))); } \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_aligned

This requires that pointer is aligned on sizeof(_Tpv) (vector size, 16 bytes). So "bytes"-way is not needed here.

There is same note for v_store_aligned() calls.

@alalek
Copy link
Member

alalek commented Dec 16, 2019

Unfortunately alignment requirements for v_load() is not specified. We definitely should define something (and make some micro-benchmarks for byte-based I/O workaround).
On ARM you just can't access int* array through C++ if pointer is not aligned on 4 bytes.

In addition, there is "C++ emulator" SIMD backend (intrin_cpp). It can't work properly with unaligned pointers on ARM platforms (even "int*" (not 64-bit) would lead to crashed - there are several issues about that).

So my suggestion is to declare requirement of "base type" alignment for passed pointers (like C++ compiler does). If algorithm can't guarantee alignment then it should use "reinterpret" trick over uint8 vectors.

/cc @terfendail @vpisarev @seiko2plus

@asmorkalov asmorkalov added platform: arm ARM boards related issues: RPi, NVIDIA TK/TX, etc category: core labels Dec 17, 2019
@terfendail
Copy link
Contributor

Could we use something like typedef uint64 CV_DECL_ALIGNED(1) unaligned_uint64; ?
Looks like it solve similar issue for v_load_low on ARM
IMO it is over-complication to add alignment requirements to unaligned intrinsics. In this case developer will have to add alignment checks to algorithm or handle it somehow, but in this case unaligned intrinsics became useless because it will be better to ensure vector-size alignment and use v_load_aligned everywhere.
IMO it will be better to update v_load intrinsic to use internally
v_reinterpret_as_<type>(v_load((uchar*)src)) to ensure alignment safety without bothering end-user. Anyway if one cares about performance there are v_load_aligned alternative that is as fast as possible for every platform.

@alalek
Copy link
Member

alalek commented Dec 17, 2019

v_load_aligned

Has much stronger requirements. It requires alignment on whole vector size (16 bytes in case of SIMD128), instead of base type.

Requirement for base type alignment is not heavy, otherwise common C++ tail processing after SIMD loops just can't work too.

Problem comes with type punning (pair of "int" => "int64", or "char"s => "int") without alignment checks.

handle it somehow

Unaligned pointers can be handled through v_reinterpret.


Moreover it is not about uint64 only.

There is "fixed" tests for experiments with unaligned pointers: https://github.com/alalek/opencv/commits/simd_unaligned_load

@terfendail
Copy link
Contributor

Type-punned pointers are already "non grata" in C++ so that will be just another reason to cast pointers safely and responsibly. Universal intrinsics aren't the only place to care about proper data alignment. There is tail processing that won't be affected by a stricter v_load behavior and there could be a lot more places to add unaligned type casts. So stricter v_load won't fix the problem root cause while causing additional typing to handle it. I don't think there will be a lot of places(if any) that will handle this change in a way different from v_reinterpret so why should we burden user with this additional typing if we could conceal it inside the intrinsic?

I suppose CV_DECL_ALIGNED(1) could be useful with any type, however there is another solution to choose from.
Unfortunately I haven't managed to find vld1q_<suffix> performance metrics but I suppose that there should't be big difference.

Copy link
Contributor

@seiko2plus seiko2plus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This workaround would be good for ARMv8(aarch32) also I think reinterpreting and loading over u32 may be better but in the case of ARMv7-A, we still have to deal with other types since

  • LDRB/STRB - address must be byte aligned
  • LDRH/STRH - address must be 2-byte aligned
  • LDR/STR - address must be 4-byte aligned

@seiko2plus
Copy link
Contributor

can we just count on GCC -Wcast-align to detect alignment issues?

@alalek
Copy link
Member

alalek commented Dec 18, 2019

There is -fsanitize=alignment in Clang, but it doesn't work with intrinsic functions (only direct pointers access).

@asmorkalov
Copy link
Contributor

The solution replaced by #16463.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: core platform: arm ARM boards related issues: RPi, NVIDIA TK/TX, etc
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants