-
Notifications
You must be signed in to change notification settings - Fork 311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
inlining failed in call to ‘always_inline’ ‘void vst1q_u64(uint64_t*, uint64x2_t)’: target specific option mismatch 11002 | vst1q_u64 (uint64_t * __a, uint64x2_t __b) #834
Comments
@stefson
Unlike the p64 version, which sets extra attributes, I don't see any here. If you comment out the call to vst1q_u64, are there any other errors? For the warning, I think we can fix that by replacing 2 with 2ULL; we'll anyway soon make targets 64-bit. |
…#834 PiperOrigin-RevId: 460460056
…#834 PiperOrigin-RevId: 460460056
…#834 PiperOrigin-RevId: 460460056
…#834 PiperOrigin-RevId: 460479211
@stefson do you have any idea what might be happening? If not, it's an option to disable runtime dispatch for arm7 on this version of GCC. |
could you guide me a little bit in how to disable runtime dispatch? |
Sure, in detect_targets.h we have a line |
do you mean as in:
? |
Yes, looks good :) If this helps you, feel free to send this as a pull request, or we can do it if you prefer. |
it helps indeed, but I need more time to iron this out - arm hardware really is slow. |
PiperOrigin-RevId: 472208082
PiperOrigin-RevId: 472208082
PiperOrigin-RevId: 472208082
I wonder about a sensible strategy for a fix on the compiler side? |
I cannot reproduce any compilation issue on armhf/gcc10|11|12. For reference:
% gcc-10 --version % gcc-11 --version % gcc-12 --version |
I see the same results as you for my gcc-10 armv7a cross compiler, but still it gives me the error without the now pushed patch. |
Add '--verbose' to the compilation line that is failing and post back. Eg.:
|
hey, here is my output from --verbose, it is with commit 9b3bd6d to not hide the problem with the current workaround:
frankly I don't see any difference in the error, does the other information tell you something? |
@jan-wassenberg Do you believe it makes sense to compile highway with neon support using the default |
@malaterre, good catch, thanks for pointing to that. vfpv4 is supported since 2009, I'd be surprised if anyone still cares about vfpv3.
It makes sense that the compiler complains because arm_neon.h is compiled with the default target and only for Highway implementation and user code do we set vfpv4. Here's an idea @stefson : does it help to, in arm_neon-inl.h move the following block to the line after
|
Can you please post a patch against latest git for your idea? The risk of a missunderstanding is too high if you ask me that way :D |
PiperOrigin-RevId: 472271949
Sure, sent :) |
with gcc-10.4.0: latest-git+patch.log.gz this does not look good :-S |
armv7-gcc still broken with commit 9934046 , here is the build log: build.log.gz |
Thanks for sharing the result. I was unable to reproduce it with GCC 10.3 (godbolt lacks 10.4) and |
can you please name me the gcc versions (gcc-10.3.0 and later) which godbolt offers you? (Edit: I meant versions :D ) |
You can see them in the dropdown menu in the link above, where it currently says "ARM GCC 10.3.1" :) The next higher one is 11.1. |
ah, got it! :D I can offer you a log of failed compile with gcc-11.3.0, which seems identically to me: gcc-11.3.0-armv7a.log.gz |
:) |
I've opened a gentoo bug for this, hopefully someone more experienced from toolchain will take a look at it: https://bugs.gentoo.org/869077 |
Nice, glad you've got feedback from the Gentoo bug :) Is the (I still don't know why the default flags cause this conflict, but I imagine most use cases will be fine with setting compiler flags that require an Arm from 2009 or later.) |
so, I did compile on armv7a with neon and with gcc-10.4.0, and it passes. Also all tests are passing, yay! test compiling on armv7a+musl is still on my queue though. it seems libjxl now has a more or less similar hickup: libjxl/libjxl#1748 |
When using a armv7 gcc >= 8 toolchain (like [1]) with Highway configured with -DHWY_CMAKE_ARM7=OFF and HWY_ENABLE_CONTRIB=ON, compilation fails with error: In file included from /build/highway-1.0.3/hwy/ops/arm_neon-inl.h:33, from /build/highway-1.0.3/hwy/highway.h:358, from /build/highway-1.0.3/hwy/contrib/sort/shared-inl.h:104, from /build/highway-1.0.3/hwy/contrib/sort/traits128-inl.h:27, from /build/highway-1.0.3/hwy/contrib/sort/vqsort_128d.cc:23, from /build/highway-1.0.3/hwy/foreach_target.h:81, from /build/highway-1.0.3/hwy/contrib/sort/vqsort_128d.cc:20: /toolchain/lib/gcc/arm-buildroot-linux-gnueabihf/12.2.0/include/arm_neon.h: In function ‘void hwy::N_NEON::StoreU(Vec128<long long unsigned int, 2>, Full128<long long unsigned int>, uint64_t*)’: /toolchain/lib/gcc/arm-buildroot-linux-gnueabihf/12.2.0/include/arm_neon.h:11052:1: error: inlining failed in call to ‘always_inline’ ‘void vst1q_u64(uint64_t*, uint64x2_t)’: target specific option mismatch 11052 | vst1q_u64 (uint64_t * __a, uint64x2_t __b) | ^~~~~~~~~ /build/highway-1.0.3/hwy/ops/arm_neon-inl.h:2786:12: note: called from here 2786 | vst1q_u64(unaligned, v.raw); | ~~~~~~~~~^~~~~~~~~~~~~~~~~~ The same errors happen when configured with HWY_ENABLE_EXAMPLES=ON, or from client libraries like libjxl (at other places). The issue is that Highway Arm NEON ops have a dependency on the Advanced SIMD (Neon) v2 and the VFPv4 floating-point instructions. The SIMD (Neon) v1 and VFPv3 instructions are not supported. There was several attempts to fix variants of this issues. See google#834 and google#1032. HWY_NEON target is selected only if __ARM_NEON is defined. See: https://github.com/google/highway/blob/1.0.3/hwy/detect_targets.h#L251 This test is not sufficient since __ARM_NEON will be predefined in any cases when Neon is enabled (neon-vfpv3, neon-vfpv4). The issue is that HWY_CMAKE_ARM7=ON implies VFPv4 / NEON SIMD v2. When setting HWY_CMAKE_ARM7=OFF, "neon-vfpv4" will not be forced, but the code is still using intrinsics assuming VFPv4. Gcc will fail with error because code cannot be generated for the selected architecture. This issue can be avoided by adding "-DHWY_DISABLED_TARGETS=HWY_NEON" in CXXFLAGS. The problem with this solution is that every client program will also need to do the same. This goes against the very purpose of "hwy/detect_targets.h". Technically, Armv7-a processors with VFPv4 can be detected using some ACLE (Arm C Language Extensions [2]) predefined macros: Basically, we want Highway to define HWY_NEON only when the target supports SIMDv2/VFPv4 or higher. An older target with vfpv3 only (e.g. Cortex-A8, A9, ...) would NOT define HWY_NEON, and therefore would fallback on HWY_SCALAR implementation. However, not all compiler completely support ACLE. There is also several versions too. So we cannot easily rely on macros like "__ARM_VFPV4__" (which clang predefine, but not gcc). The alternative solution proposed in this patch, is to declare the HWY_NEON target architecture as broken, when we detect the target is Armv7-A, but mandatory features for vfpv4 (namely half-float, FMA) are missing. Half-floats are tested using the macro __ARM_NEON_FP, and the FMA with the macro __ARM_FEATURE_FMA. See ACLE [2]. The intent of declaring the target as broken, rather than selecting HWY_NEON only if vfpv4 features are detected is to remain a bit conservative, since the detection is slithly inaccurate. For a given compiler/cflags, predefined macros for Arm/ACLE can be reviewed with commands like: arm-linux-gnueabihf-gcc -mcpu=cortex-a9 -mfpu=neon-vfpv3 -Wp,-dM -E -c - < /dev/null | grep -Fi arm | sort arm-linux-gnueabihf-gcc -mcpu=cortex-a7 -mfpu=neon-vfpv4 -Wp,-dM -E -c - < /dev/null | grep -Fi arm | sort clang -target armv7a -mcpu=cortex-a9 -mfpu=neon-vfpv3 -mfloat-abi=hard -Wp,-dM -E -c - < /dev/null | grep -Fi arm | sort clang -target armv7a -mcpu=cortex-a7 -mfpu=neon-vfpv4 -mfloat-abi=hard -Wp,-dM -E -c - < /dev/null | grep -Fi arm | sort The different values of __ARM_NEON_FP can be seen, depending which "-mfpu" is passed. Same for __ARM_FEATURE_FMA. [1] https://toolchains.bootlin.com/downloads/releases/toolchains/armv7-eabihf/tarballs/armv7-eabihf--glibc--bleeding-edge-2022.08-1.tar.bz2 [2] https://github.com/ARM-software/acle/ Signed-off-by: Julien Olivain <ju.o@free.fr>
I believe this is now fully fixed thanks to @jolivain, please feel free to re-open if anything else comes up. |
template<typename Simd, typename Ptr>
static decltype(auto) load(const Simd simd_tag, Ptr p) {
if constexpr (Aligned) {
return Load(simd_tag, p);
} else {
return LoadU(simd_tag, p);
}
} Could you help please? Probably some incorrect/missing flags on aarch64, but maybe you have an idea on what I should look? Just in case, repo is public, but I don't expect you will look |
hm, I believe the fix for this issue came after the 1.0.3 release that you seem to be using. We will do another release in the next few days. Just to confirm: you are building for aarch64, right? If it's actually Arm V7, then you will want to add |
@jan-wassenberg Thanks for answer! Yes it's definitely armv8, we don't have armv7 machines I commented on this issue, because I tried some time ago and faced with it, then I see it resolved, decided to try another time, and faced with new issue. In general I really suspect this cmake file doing something wrong :( |
I've tested all kind of combinations, armv7 and aarch64, without any breakage with current git. Whats your toolchain and cpu, please? |
Thanks @stefson . I think one problem is indeed that they are overriding the compiler flags with a separate CMake file outside of Highway. Seems that these lines are insufficient and possibly counterproductive: Does it help to replace with -march=armv8.2-a? But another issue is that they are testing with Highway 1.0.3 which came before some of our fixes here. |
@stefson
unfortunately it needs some external packages: icu, boost, lz4 Then I see this from cmake.
So I assume https://github.com/iresearch-toolkit/iresearch/blob/master/cmake/OptimizeForArchitecture.cmake#L622 |
Anyway thanks, probably I should do something with this broken cmake script before update highway |
Agreed, because for us machine which run code can be with different processor but same architecture (in general we only support x86/armv8).
I want to remove this script, and measure difference, should I specify something additional? Maybe I can specify highway cmake first and it will specify all needed flags instead of me?) |
Ah, it's good news that we're hitting the "auto" case, then this CMake file is indeed not doing anything. On GCC+aarch64, Highway trunk works out of the box and generates code for all targets without any required flags: The 1.0.3 release might benefit from you adding |
… dispatch) This reverts the workaround in #834. The root cause was our requiring VFPv4 on Armv7, without reliably being able to detect it. Now, we have the HWY_CMAKE_ARM7 flag to explicitly opt-in and set the required compiler flags, or the check in #1143 that disables NEON when VFPv4 prereqs are detected as missing. Thus it is no longer necessary for us to set VFPv4 attributes on the intrinsics. This caused build failures in runtime-dispatch mode because the first compiled target was HWY_NEON, which added +crypto to intrinsics, but that caused HWY_NEON_WITHOUT_AES to fail because it inlined those intrinsics into non-crypto wrapper functions. PiperOrigin-RevId: 623081868
… dispatch) This reverts the workaround in #834. The root cause was our requiring VFPv4 on Armv7, without reliably being able to detect it. Now, we have the HWY_CMAKE_ARM7 flag to explicitly opt-in and set the required compiler flags, or the check in #1143 that disables NEON when VFPv4 prereqs are detected as missing. Thus it is no longer necessary for us to set VFPv4 attributes on the intrinsics. This caused build failures in runtime-dispatch mode because the first compiled target was HWY_NEON, which added +crypto to intrinsics, but that caused HWY_NEON_WITHOUT_AES to fail because it inlined those intrinsics into non-crypto wrapper functions. PiperOrigin-RevId: 623081868
… dispatch) This reverts the workaround in #834. The root cause was our requiring VFPv4 on Armv7, without reliably being able to detect it. Now, we have the HWY_CMAKE_ARM7 flag to explicitly opt-in and set the required compiler flags, or the check in #1143 that disables NEON when VFPv4 prereqs are detected as missing. Thus it is no longer necessary for us to set VFPv4 attributes on the intrinsics. This caused build failures in runtime-dispatch mode because the first compiled target was HWY_NEON, which added +crypto to intrinsics, but that caused HWY_NEON_WITHOUT_AES to fail because it inlined those intrinsics into non-crypto wrapper functions. PiperOrigin-RevId: 623081868
… dispatch) This reverts the workaround in #834. The root cause was our requiring VFPv4 on Armv7, without reliably being able to detect it. Now, we have the HWY_CMAKE_ARM7 flag to explicitly opt-in and set the required compiler flags, or the check in #1143 that disables NEON when VFPv4 prereqs are detected as missing. Thus it is no longer necessary for us to set VFPv4 attributes on the intrinsics. This caused build failures in runtime-dispatch mode because the first compiled target was HWY_NEON, which added +crypto to intrinsics, but that caused HWY_NEON_WITHOUT_AES to fail because it inlined those intrinsics into non-crypto wrapper functions. PiperOrigin-RevId: 623255087
hi, this is most likely a regression from d8867c9
compiler is gcc-10.4.0 on armhf
full build log: build.log.zip
The text was updated successfully, but these errors were encountered: