Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HWY_CMAKE_ARM7:BOOL=ON vs Dynamic Dispatch #1271

Closed
malaterre opened this issue Mar 29, 2023 · 4 comments · Fixed by #1273
Closed

HWY_CMAKE_ARM7:BOOL=ON vs Dynamic Dispatch #1271

malaterre opened this issue Mar 29, 2023 · 4 comments · Fixed by #1273

Comments

@malaterre
Copy link
Contributor

hwy claims to have dynamic dispatch. However when using the cmake flag HWY_CMAKE_ARM7:BOOL=ON the complete codebase is compiled with neon instruction (-mfpu=neon-vfpv4).

It turns out the shared library is taking advantages of this compilation flag, and generate code with neon intructions:

% gdb ./obj-arm-linux-gnueabihf/hwy_list_targets
GNU gdb (Debian 13.1-2) 13.1
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "arm-linux-gnueabihf".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./obj-arm-linux-gnueabihf/hwy_list_targets...
(gdb) r
Starting program: /home/malat/highway-1.0.3/obj-arm-linux-gnueabihf/hwy_list_targets
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1".

Program received signal SIGILL, Illegal instruction.
hwy::(anonymous namespace)::robust_statistics::CountingSort<unsigned long long> (values=0xbeffed08, num_values=256) at ./hwy/nanobenchmark.cc:214
214       std::vector<Unique> unique;
(gdb) bt full
#0  hwy::(anonymous namespace)::robust_statistics::CountingSort<unsigned long long> (values=0xbeffed08, num_values=256) at ./hwy/nanobenchmark.cc:214
        unique = std::vector of length 0, capacity 200277584
        p = <optimized out>
#1  0xb6ff4256 in hwy::(anonymous namespace)::robust_statistics::Mode<unsigned long long> (num_values=256, values=0xbeffed08) at ./hwy/nanobenchmark.cc:286
No locals.
#2  hwy::(anonymous namespace)::robust_statistics::Mode<unsigned long long, 256> (values=...) at ./hwy/nanobenchmark.cc:292
No locals.
#3  hwy::platform::TimerResolution () at ./hwy/nanobenchmark.cc:480
        samples = {1640, 680, 720, 720, 720, 720, 720, 680, 600, 680, 720, 720, 760, 720, 720, 680, 600, 680, 720, 760, 720, 720, 760, 640, 600, 640, 760, 720, 760, 720, 720, 640, 600, 680, 720, 720, 760,
          720, 720, 680, 600, 680, 720, 719, 720, 720, 720, 680, 600, 680, 720, 720, 720, 720, 720, 680, 600, 640, 760, 720, 720, 720, 720, 640, 600, 680, 720, 720, 760, 720, 720, 640, 600, 680, 720, 760,
          720, 720, 720, 640, 600, 640, 720, 720, 720, 720, 720, 640, 600, 680, 720, 720, 720, 720, 760, 640, 600, 680, 760, 720, 760, 720, 720, 680, 599, 680, 720, 720, 720, 720, 720, 680, 640, 680, 720,
          760, 720, 720, 720, 640, 600, 680, 720, 720, 720, 720, 720, 640, 600, 680, 720, 760, 720, 720, 720, 680, 600, 640, 720, 720, 720, 720, 720, 680, 640, 680, 720, 720, 720, 720, 720, 680, 600, 640,
          720, 720, 760, 720, 720, 640, 600, 680, 720, 720, 760, 720, 720, 680, 600, 640, 720, 720, 720, 720, 760, 680, 600, 680, 720, 720, 720, 720, 720, 640, 640, 680, 720, 720, 720, 720, 720, 680, 600,
          680, 720, 720, 720, 720, 720, 680...}
        i = <optimized out>
        t0 = <optimized out>
        t1 = <optimized out>
        rep = <optimized out>
        can_use_stop = true
        repetitions = {0 <repeats 32 times>, 13186041144842129408, 1, 17592186044416, 279787914, 142542138799204, 17592186048511, 0, 1, 142541374619648, 5770084, 263640, 520, 2164, 1680014846, 279787914,
          1680014810, 752179689, 1680014810, 764179556, 1673090989, 0, 0, 4294967550, 0, 13762973729367785472, 13186435773802365071, 13185760231415605028, 1108307720798211, 0, 0, 0, 0, 1108307720798208,
          1103205299907420, 65536, 1389782697508869, 1407804380614656, 281474977038444, 12885159936, 0, 204933669240, 13185811011904471040, 150323855360, 13186435770732249088, 13170777113315222376,
          16089343920, 13762973007813279746, 13185765720249593216, 0, 16089343952, 13762973007813279746, 13185765720249593216, 0, 17595256270208, 16089343880, 13186532544935362560, 0, 13762975155296927744,
          8309530817697147188, 25770127360, 281478044361982, 1407838740021249, 8589934595, 1407841806909548, 65025, 5770084, 65025, 5770084, 4295000484, 0, 0, 263640, 4096, 520, 1680014846, 279787914,
          1673090989, 0, 1680014810, 13170903170605382208, 13186528954342703104, 13170903167535153152, 13185775014558802296, 13762975106961762556, 13185776139974469871, 13186429108013211984, 1, 3204442356,
          3204442383, 0, 4294967306, 7365196392, 1, 13762988919442350416, 2199023255566, 282579962709375, 0, 4297588739, 223338299392, 360292368236413384, 11259024840327220, 4296605722, 0, 1103205299650560,
          21475093340, 4295032832, 1406206652710640, 1597728161520, 25769804156, 8590000128, 1406241012449016, 1133871693560, 25769804040, 17179869188, 1047972020468, 292057776372, 17179869252,
          7238662637146341380, 0, 0, 25769803776, 7238662641441308688, 1406206652710640, 1168231431920, 17179869456, 17179869185, 12884901908, 1295390749551316551, 10439841620871673989, 11396069235890618897,
          68719476740, 24011439870050305, 12884901888, 2, 73014445019, 55834575104, 71198746626085, 23932014281626402, 9944168979650592768, 4632040454706176000, 13873901602337195072, 153122390170800208,
          2341871814990364692, 4614043964063155330, 83334332495831810, 1010620087226897409, 4611687117939015697, 2322250218881419280, 5343807179059625984, 141287514736965, 1152925945713262688,
          4612398501962188802, 5261757287512868864, 2305881045444609312, 558346862592, 10378898593745469700, 738783853203685377, 13762980258003298314, 13172183419976679425, 13762980260922518864,
          13170903170605359480, 13185704267991739608, 0, 13185944611157180416, 3204443540, 13762980260788111365, 3204443540, 13762977632402533780, 13185944571411950376, 7365196300, 13762979917187579656,
          13186435773936692640, 13762979401791206054, 0 <repeats 20 times>, 37429944632, 13762979195636390016, 13185881157310349312, 37429945464, 13762979195636392312, 13185881157310349312,
          13762978852173245608...}
#4  0xb6ff2e0e in __static_initialization_and_destruction_0 (__priority=65535, __initialize_p=1) at ./hwy/nanobenchmark.cc:488
No locals.
#5  _GLOBAL__sub_I_nanobenchmark.cc(void) () at ./hwy/nanobenchmark.cc:763
No locals.
#6  0xb6fd244c in ?? () from /lib/ld-linux-armhf.so.3
No symbol table info available.
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) x/i $pc
=> 0xb6ff3fbe <hwy::(anonymous namespace)::robust_statistics::CountingSort<unsigned long long>(unsigned long long*, size_t)+14>:        vmov.i32        d16, #0 @ 0x00000000
@malaterre
Copy link
Contributor Author

This impact jpeg-xl on Debian/armhf:

% gdb /usr/bin/cjxl
GNU gdb (Debian 13.1-2) 13.1
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "arm-linux-gnueabihf".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/bin/cjxl...
(No debugging symbols found in /usr/bin/cjxl)
(gdb) r
Starting program: /usr/bin/cjxl
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1".

Program received signal SIGILL, Illegal instruction.
0xb6e31fbe in ?? () from /usr/lib/arm-linux-gnueabihf/libhwy.so.1
(gdb) x/i $pc
=> 0xb6e31fbe:  vmov.i32        d16, #0 @ 0x00000000
(gdb) bt full
#0  0xb6e31fbe in ?? () from /usr/lib/arm-linux-gnueabihf/libhwy.so.1
No symbol table info available.
#1  0xb6e32256 in hwy::platform::TimerResolution() () from /usr/lib/arm-linux-gnueabihf/libhwy.so.1
No symbol table info available.
#2  0xb6e30e0e in ?? () from /usr/lib/arm-linux-gnueabihf/libhwy.so.1
No symbol table info available.
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

@malaterre
Copy link
Contributor Author

malaterre commented Mar 29, 2023

@jan-wassenberg The root issue is that clang (or gcc) did not implement target-gated arm_neon.h intrinsics right ? This is the reason why the complete codebase is compiled with neon instructions, right ?

@malaterre
Copy link
Contributor Author

@jan-wassenberg
Copy link
Member

That's right, HWY_CMAKE_ARM7 is an alternative to dynamic dispatch and was necessary when the headers required a compiler flag. It is indeed expected that using it generates NEON instructions.

Now GCC supports dynamic dispatch, and clang-16 apparently does too but it's not in our package repo yet. As soon as it is and we test it, we can change HWY_HAVE_RUNTIME_DISPATCH to 1 there, and then stop using HWY_CMAKE_ARM7.

At least on Linux, which is currently the only platform on which we are able to detect the Arm CPU capabilities. If someone was interested, we could also support Android.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants