Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cross-compilation for RPi3 (armv7) fails on assembly #1465

Closed
Kaned1as opened this issue May 18, 2021 · 16 comments
Closed

Cross-compilation for RPi3 (armv7) fails on assembly #1465

Kaned1as opened this issue May 18, 2021 · 16 comments

Comments

@Kaned1as
Copy link

Trying to cross-compile TFLite 2.5.0 for RPi3 with CMake and XNNPACK enabled:

  • Followed this guide
  • GCC version: 8.3.0, built for RPi3 from Crosstool-NG (profile armv8-rpi3-linux-gnueabihf)
  • Environment: ARMCC_FLAGS="-march=armv7-a -mfpu=neon-vfpv4 -funsafe-math-optimizations"
  • Commit used: 90f520b

says this:

FAILED: _deps/xnnpack-build/CMakeFiles/XNNPACK.dir/src/qs8-igemm/gen/8x16c4-minmax-neondot.c.o 
/home/build/opt/contrib/toolchains/arm-linux-gnueabihf/bin/armv8-rpi3-linux-gnueabihf-gcc -DCPUINFO_SUPPORTED_PLATFORM=1 -DEIGEN_MPL2_ONLY -DFXDIV_USE_INLINE_ASSEMBLY=0 -DNOMINMAX=1 -DPTHREADPOOL_NO_DEPRECATED_API=1 -DXNNPACK_EXPORTS -DXNN_ENABLE_ASSEMBLY=1 -DXNN_ENABLE_MEMOPT=1 -DXNN_ENABLE_SPARSE=1 -DXNN_LOG_LEVEL=0 -Ixnnpack/include -Ixnnpack/src -Iclog-source/deps/clog/include -Icpuinfo-source/include -Ipthreadpool-source/include -IFXdiv-source/include -IFP16-source/include -march=armv7-a -mfloat-abi=hard -mfpu=neon-vfpv4 -fPIC -funsafe-math-optimizations -frecord-gcc-switches -flax-vector-conversions -fPIC -Wno-psabi -pthread -std=gnu99  -marm  -march=armv8.2-a+dotprod -mfpu=neon-fp-armv8  -O2 -MD -MT _deps/xnnpack-build/CMakeFiles/XNNPACK.dir/src/qs8-igemm/gen/8x16c4-minmax-neondot.c.o -MF _deps/xnnpack-build/CMakeFiles/XNNPACK.dir/src/qs8-igemm/gen/8x16c4-minmax-neondot.c.o.d -o _deps/xnnpack-build/CMakeFiles/XNNPACK.dir/src/qs8-igemm/gen/8x16c4-minmax-neondot.c.o -c xnnpack/src/qs8-igemm/gen/8x16c4-minmax-neondot.c
/tmp/ccMSJOfk.s: Assembler messages:
/tmp/ccMSJOfk.s:380: Error: selected processor does not support `vsdot.s8 q12,q9,d11[0]' in ARM mode
/tmp/ccMSJOfk.s:384: Error: selected processor does not support `vsdot.s8 q6,q9,d10[0]' in ARM mode
/tmp/ccMSJOfk.s:395: Error: selected processor does not support `vsdot.s8 q7,q12,d6[0]' in ARM mode
/tmp/ccMSJOfk.s:400: Error: selected processor does not support `vsdot.s8 q10,q12,d11[0]' in ARM mode
/tmp/ccMSJOfk.s:401: Error: selected processor does not support `vsdot.s8 q10,q8,d11[1]' in ARM mode
/tmp/ccMSJOfk.s:405: Error: selected processor does not support `vsdot.s8 q7,q8,d6[1]' in ARM mode
/tmp/ccMSJOfk.s:408: Error: selected processor does not support `vsdot.s8 q10,q12,d10[0]' in ARM mode
/tmp/ccMSJOfk.s:409: Error: selected processor does not support `vsdot.s8 q10,q8,d10[1]' in ARM mode
/tmp/ccMSJOfk.s:415: Error: selected processor does not support `vsdot.s8 q10,q12,d2[0]' in ARM mode
/tmp/ccMSJOfk.s:416: Error: selected processor does not support `vsdot.s8 q10,q8,d2[1]' in ARM mode
/tmp/ccMSJOfk.s:422: Error: selected processor does not support `vsdot.s8 q10,q12,d3[0]' in ARM mode
/tmp/ccMSJOfk.s:423: Error: selected processor does not support `vsdot.s8 q10,q8,d3[1]' in ARM mode
/tmp/ccMSJOfk.s:429: Error: selected processor does not support `vsdot.s8 q10,q12,d4[0]' in ARM mode
/tmp/ccMSJOfk.s:430: Error: selected processor does not support `vsdot.s8 q10,q8,d4[1]' in ARM mode
/tmp/ccMSJOfk.s:435: Error: selected processor does not support `vsdot.s8 q11,q12,d7[0]' in ARM mode
/tmp/ccMSJOfk.s:438: Error: selected processor does not support `vsdot.s8 q10,q12,d5[0]' in ARM mode
/tmp/ccMSJOfk.s:439: Error: selected processor does not support `vsdot.s8 q10,q8,d5[1]' in ARM mode
/tmp/ccMSJOfk.s:445: Error: selected processor does not support `vsdot.s8 q0,q9,d2[0]' in ARM mode
/tmp/ccMSJOfk.s:448: Error: selected processor does not support `vsdot.s8 q15,q9,d3[0]' in ARM mode
/tmp/ccMSJOfk.s:449: Error: selected processor does not support `vsdot.s8 q12,q8,d7[1]' in ARM mode
/tmp/ccMSJOfk.s:454: Error: selected processor does not support `vsdot.s8 q14,q9,d4[0]' in ARM mode
/tmp/ccMSJOfk.s:457: Error: selected processor does not support `vsdot.s8 q13,q9,d5[0]' in ARM mode
/tmp/ccMSJOfk.s:460: Error: selected processor does not support `vsdot.s8 q10,q9,d6[0]' in ARM mode
/tmp/ccMSJOfk.s:463: Error: selected processor does not support `vsdot.s8 q11,q9,d7[0]' in ARM mode
/tmp/ccMSJOfk.s:467: Error: selected processor does not support `vsdot.s8 q0,q8,d2[1]' in ARM mode
/tmp/ccMSJOfk.s:468: Error: selected processor does not support `vsdot.s8 q15,q8,d3[1]' in ARM mode
/tmp/ccMSJOfk.s:469: Error: selected processor does not support `vsdot.s8 q14,q8,d4[1]' in ARM mode
/tmp/ccMSJOfk.s:470: Error: selected processor does not support `vsdot.s8 q13,q8,d5[1]' in ARM mode
/tmp/ccMSJOfk.s:471: Error: selected processor does not support `vsdot.s8 q10,q8,d6[1]' in ARM mode
/tmp/ccMSJOfk.s:472: Error: selected processor does not support `vsdot.s8 q9,q8,d7[1]' in ARM mode
/tmp/ccMSJOfk.s:476: Error: selected processor does not support `vsdot.s8 q12,q8,d11[1]' in ARM mode
/tmp/ccMSJOfk.s:477: Error: selected processor does not support `vsdot.s8 q6,q8,d10[1]' in ARM mode
/tmp/ccMSJOfk.s:480: Error: selected processor does not support `vsdot.s8 q11,q8,d11[0]' in ARM mode
/tmp/ccMSJOfk.s:486: Error: selected processor does not support `vsdot.s8 q4,q8,d10[0]' in ARM mode
/tmp/ccMSJOfk.s:491: Error: selected processor does not support `vsdot.s8 q0,q8,d2[0]' in ARM mode
/tmp/ccMSJOfk.s:496: Error: selected processor does not support `vsdot.s8 q15,q8,d3[0]' in ARM mode
/tmp/ccMSJOfk.s:501: Error: selected processor does not support `vsdot.s8 q14,q8,d4[0]' in ARM mode
/tmp/ccMSJOfk.s:506: Error: selected processor does not support `vsdot.s8 q13,q8,d5[0]' in ARM mode
/tmp/ccMSJOfk.s:509: Error: selected processor does not support `vsdot.s8 q10,q8,d6[0]' in ARM mode
/tmp/ccMSJOfk.s:512: Error: selected processor does not support `vsdot.s8 q9,q8,d7[0]' in ARM mode
/tmp/ccMSJOfk.s:514: Error: selected processor does not support `vsdot.s8 q11,q8,d11[1]' in ARM mode
/tmp/ccMSJOfk.s:515: Error: selected processor does not support `vsdot.s8 q4,q8,d10[1]' in ARM mode
/tmp/ccMSJOfk.s:516: Error: selected processor does not support `vsdot.s8 q0,q8,d2[1]' in ARM mode
/tmp/ccMSJOfk.s:517: Error: selected processor does not support `vsdot.s8 q15,q8,d3[1]' in ARM mode
/tmp/ccMSJOfk.s:518: Error: selected processor does not support `vsdot.s8 q14,q8,d4[1]' in ARM mode
/tmp/ccMSJOfk.s:519: Error: selected processor does not support `vsdot.s8 q13,q8,d5[1]' in ARM mode
/tmp/ccMSJOfk.s:520: Error: selected processor does not support `vsdot.s8 q10,q8,d6[1]' in ARM mode
/tmp/ccMSJOfk.s:521: Error: selected processor does not support `vsdot.s8 q9,q8,d7[1]' in ARM mode
/tmp/ccMSJOfk.s:528: Error: selected processor does not support `vsdot.s8 q9,q8,d11[0]' in ARM mode
/tmp/ccMSJOfk.s:542: Error: selected processor does not support `vsdot.s8 q4,q8,d10[0]' in ARM mode
/tmp/ccMSJOfk.s:547: Error: selected processor does not support `vsdot.s8 q0,q8,d2[0]' in ARM mode
/tmp/ccMSJOfk.s:550: Error: selected processor does not support `vsdot.s8 q15,q8,d3[0]' in ARM mode
/tmp/ccMSJOfk.s:553: Error: selected processor does not support `vsdot.s8 q14,q8,d4[0]' in ARM mode
/tmp/ccMSJOfk.s:556: Error: selected processor does not support `vsdot.s8 q13,q8,d5[0]' in ARM mode
/tmp/ccMSJOfk.s:559: Error: selected processor does not support `vsdot.s8 q10,q8,d6[0]' in ARM mode
/tmp/ccMSJOfk.s:562: Error: selected processor does not support `vsdot.s8 q9,q8,d7[0]' in ARM mode
/tmp/ccMSJOfk.s:564: Error: selected processor does not support `vsdot.s8 q11,q8,d11[1]' in ARM mode
/tmp/ccMSJOfk.s:565: Error: selected processor does not support `vsdot.s8 q4,q8,d10[1]' in ARM mode
/tmp/ccMSJOfk.s:566: Error: selected processor does not support `vsdot.s8 q0,q8,d2[1]' in ARM mode
/tmp/ccMSJOfk.s:567: Error: selected processor does not support `vsdot.s8 q15,q8,d3[1]' in ARM mode
/tmp/ccMSJOfk.s:568: Error: selected processor does not support `vsdot.s8 q14,q8,d4[1]' in ARM mode
/tmp/ccMSJOfk.s:569: Error: selected processor does not support `vsdot.s8 q13,q8,d5[1]' in ARM mode
/tmp/ccMSJOfk.s:570: Error: selected processor does not support `vsdot.s8 q10,q8,d6[1]' in ARM mode
/tmp/ccMSJOfk.s:571: Error: selected processor does not support `vsdot.s8 q9,q8,d7[1]' in ARM mode
/tmp/ccMSJOfk.s:1256: Error: selected processor does not support `vsdot.s8 q12,q9,d3[0]' in ARM mode
/tmp/ccMSJOfk.s:1257: Error: selected processor does not support `vsdot.s8 q6,q9,d2[0]' in ARM mode
/tmp/ccMSJOfk.s:1262: Error: selected processor does not support `vsdot.s8 q8,q9,d1[0]' in ARM mode
/tmp/ccMSJOfk.s:1268: Error: selected processor does not support `vsdot.s8 q8,q9,d0[0]' in ARM mode
/tmp/ccMSJOfk.s:1275: Error: selected processor does not support `vsdot.s8 q10,q8,d3[0]' in ARM mode
/tmp/ccMSJOfk.s:1282: Error: selected processor does not support `vsdot.s8 q10,q8,d2[0]' in ARM mode
/tmp/ccMSJOfk.s:1288: Error: selected processor does not support `vsdot.s8 q10,q8,d1[0]' in ARM mode
/tmp/ccMSJOfk.s:1293: Error: selected processor does not support `vsdot.s8 q10,q8,d0[0]' in ARM mode
/tmp/ccMSJOfk.s:1298: Error: selected processor does not support `vsdot.s8 q10,q8,d7[0]' in ARM mode
/tmp/ccMSJOfk.s:1303: Error: selected processor does not support `vsdot.s8 q10,q9,d7[0]' in ARM mode
/tmp/ccMSJOfk.s:1308: Error: selected processor does not support `vsdot.s8 q10,q8,d6[0]' in ARM mode
/tmp/ccMSJOfk.s:1314: Error: selected processor does not support `vsdot.s8 q10,q9,d6[0]' in ARM mode
/tmp/ccMSJOfk.s:1318: Error: selected processor does not support `vsdot.s8 q7,q8,d5[0]' in ARM mode
/tmp/ccMSJOfk.s:1322: Error: selected processor does not support `vsdot.s8 q10,q9,d5[0]' in ARM mode
/tmp/ccMSJOfk.s:1329: Error: selected processor does not support `vsdot.s8 q10,q8,d4[0]' in ARM mode
/tmp/ccMSJOfk.s:1333: Error: selected processor does not support `vsdot.s8 q8,q9,d4[0]' in ARM mode
/tmp/ccMSJOfk.s:1342: Error: selected processor does not support `vsdot.s8 q9,q8,d2[0]' in ARM mode
/tmp/ccMSJOfk.s:1343: Error: selected processor does not support `vsdot.s8 q11,q8,d3[0]' in ARM mode
/tmp/ccMSJOfk.s:1348: Error: selected processor does not support `vsdot.s8 q9,q8,d1[0]' in ARM mode
/tmp/ccMSJOfk.s:1353: Error: selected processor does not support `vsdot.s8 q9,q8,d0[0]' in ARM mode
/tmp/ccMSJOfk.s:1358: Error: selected processor does not support `vsdot.s8 q9,q8,d7[0]' in ARM mode
/tmp/ccMSJOfk.s:1363: Error: selected processor does not support `vsdot.s8 q9,q8,d6[0]' in ARM mode
/tmp/ccMSJOfk.s:1368: Error: selected processor does not support `vsdot.s8 q9,q8,d5[0]' in ARM mode
/tmp/ccMSJOfk.s:1373: Error: selected processor does not support `vsdot.s8 q9,q8,d4[0]' in ARM mode
/tmp/ccMSJOfk.s:1379: Error: selected processor does not support `vsdot.s8 q9,q8,d3[0]' in ARM mode
/tmp/ccMSJOfk.s:1384: Error: selected processor does not support `vsdot.s8 q9,q8,d2[0]' in ARM mode
/tmp/ccMSJOfk.s:1389: Error: selected processor does not support `vsdot.s8 q9,q8,d1[0]' in ARM mode
/tmp/ccMSJOfk.s:1394: Error: selected processor does not support `vsdot.s8 q9,q8,d0[0]' in ARM mode
/tmp/ccMSJOfk.s:1399: Error: selected processor does not support `vsdot.s8 q9,q8,d7[0]' in ARM mode
/tmp/ccMSJOfk.s:1404: Error: selected processor does not support `vsdot.s8 q9,q8,d6[0]' in ARM mode
/tmp/ccMSJOfk.s:1409: Error: selected processor does not support `vsdot.s8 q9,q8,d5[0]' in ARM mode
/tmp/ccMSJOfk.s:1414: Error: selected processor does not support `vsdot.s8 q9,q8,d4[0]' in ARM mode
ninja: build stopped: subcommand failed.
@Maratyszcza
Copy link
Contributor

This is a toolchain issue. Either the assembler doesn't support NEON dot product instructions, or the compiler emits wrong directives for the assembler.

@happyalu
Copy link

happyalu commented Jul 12, 2021

I ran into this also, when building tensorflow-lite 2.5.0 with a crosstool-ng based toolchain I had built.

I then used the gcc-arm-8.3-2019.03-x86_64-arm-linux-gnueabihf toolchain that's listed as a download on tensorflow's cross compilation with cmake page. That build succeeded. The latest toolchain (10.2-2020.11) on the arm toolchain download page also failed to build xnnpack.

I then checked to see what the differences were, for the file that has the error: 8x16c4-minmax-neondot.c.

My toolchain had gcc 8.4; vs the suggested toolchain has gcc 8.3.
However, when I checked the assembly generated by both compilers, it was almost identical. So it seemed to be an assembler issue.

My toolchain was using binutils 2.36.1; the suggested toolchain was using binutils 2.32. I replaced the binutils in my toolchain with 2.32 and the xnnpack build succeeded with that. A bit of git bisect led me to this commit in binutils that changes the behavior of the assembler when it sees a .fpu directive. In particular, with +dotprod and vsdot, the following happens:

file test.s:

        .arch armv8.2-a
        .arch_extension dotprod

        vsdot.s8 q12,q9,d11[0]  @OK

        .fpu neon-fp-armv8      @RESETS_ARCH_EXTENSIONS_REMOVING_DOTPROD_SUPPORT

        vsdot.s8 q12,q9,d11[0]  @ERROR

        .arch_extension dotprod
        vsdot.s8 q12,q9,d11[0]  @OK_AGAIN
$ ../binutils-2.33.1/build/arm-linux-gnueabihf/bin/as test.s
$ ../binutils-2.34/build/arm-linux-gnueabihf/bin/as test.s
test.s: Assembler messages:
test.s:8: Error: selected processor does not support `vsdot.s8 q12,q9,d11[0]' in ARM mode

GCC 8, 9 and 10 all generate similar assembly with the extension listed first, then the fpu directive; causing the assembler to reject this input.

So I think the workaround here at the moment is to either switch to binutils 2.33.1. (Or maybe it's ok to temporarily disable the fpu reset behavior by patching the latest binutils..)

I'll try to report this to binutils.

@Maratyszcza
Copy link
Contributor

@happyalu Thank you for investigating! I don't think it is possible to work around this issue on XNNPACK side, please report to binutils.

@happyalu
Copy link

Thanks.

Reported here. https://sourceware.org/bugzilla/show_bug.cgi?id=28078

@hotung1027
Copy link

@happyalu @Maratyszcza,
I am currently cross-compiling from ubuntu-x86_64 platform for raspberry Pi 3 B+ 32 OS(armv7l back compatible),
the assembler generate the same error message

/tmp/cc3qZk8N.s: Assembler messages:
/tmp/cc3qZk8N.s:67: Error: selected processor does not support `vsdot.s8 q8,q13,d7[0]' in ARM mode
/tmp/cc3qZk8N.s:70: Error: selected processor does not support `vsdot.s8 q11,q12,d7[0]' in ARM mode
/tmp/cc3qZk8N.s:73: Error: selected processor does not support `vsdot.s8 q9,q12,d7[0]' in ARM mode
/tmp/cc3qZk8N.s:76: Error: selected processor does not support `vsdot.s8 q10,q12,d7[0]' in ARM mode
/tmp/cc3qZk8N.s:79: Error: selected processor does not support `vsdot.s8 q8,q12,d7[1]' in ARM mode
/tmp/cc3qZk8N.s:82: Error: selected processor does not support `vsdot.s8 q11,q12,d7[1]' in ARM mode
/tmp/cc3qZk8N.s:85: Error: selected processor does not support `vsdot.s8 q9,q12,d7[1]' in ARM mode
/tmp/cc3qZk8N.s:89: Error: selected processor does not support `vsdot.s8 q10,q12,d7[1]' in ARM mode
/tmp/cc3qZk8N.s:161: Error: selected processor does not support `vsdot.s8 q8,q13,d7[0]' in ARM mode
/tmp/cc3qZk8N.s:163: Error: selected processor does not support `vsdot.s8 q11,q12,d7[0]' in ARM mode
/tmp/cc3qZk8N.s:166: Error: selected processor does not support `vsdot.s8 q9,q12,d7[0]' in ARM mode
/tmp/cc3qZk8N.s:169: Error: selected processor does not support `vsdot.s8 q10,q12,d7[0]' in ARM mode
/tmp/cc3qZk8N.s:174: Error: selected processor does not support `vsdot.s8 q8,q12,d7[1]' in ARM mode
/tmp/cc3qZk8N.s:177: Error: selected processor does not support `vsdot.s8 q11,q12,d7[1]' in ARM mode
/tmp/cc3qZk8N.s:180: Error: selected processor does not support `vsdot.s8 q9,q12,d7[1]' in ARM mode
/tmp/cc3qZk8N.s:184: Error: selected processor does not support `vsdot.s8 q10,q12,d7[1]' in ARM mode

I go to the official arm-toolchain-compiler website, and I found that the vsdot asm instruction only support for armv8 or later
https://developer.arm.com/documentation/100069/0608/Advanced-SIMD-Instructions--32-bit-/VSDOT--vector-

Seems the only option is to avoid vectorized operation from XNNPACK just to build, but will be no performance gain.
So to use 64bit image, for armv8 compatible instruction.

hope this help you guys.

@Maratyszcza
Copy link
Contributor

I go to the official arm-toolchain-compiler website, and I found that the vsdot asm instruction only support for armv8 or later

Cortex-A72 cores in Raspberry Pi 4 are ARMv8 cores.

@happyalu
Copy link

Update from the upstream issue: This has now been fixed in gcc (master, and active versions), thanks to Richard Earnshaw!

@danielmanu93
Copy link

Hi everyone, I have similar error at some point when building python 3 on ODROID xu4 with the command "python3 setup.py build". I am new to installing pytorch on hardware. Any ideas to get this fixed will be appreciated. Thanks

@cwg968
Copy link

cwg968 commented Dec 1, 2021

me too!!!

@happyalu
Copy link

happyalu commented Dec 1, 2021

Until the next patch release of gcc is out, I'm not sure of any workaround besides using binutils <= 2.33.1.

law0 pushed a commit to law0/raspi_dms that referenced this issue Dec 11, 2021
Simplified and cleaned build scripts
Cleaned unecessary *.pc files
Unified toolchain (using tensorflow one.
    Mandatory because XNNPACK needs this specific toolchain
    see : google/XNNPACK#1465 (comment))

All build fine but untested on raspberry
law0 pushed a commit to law0/raspi_dms that referenced this issue Jan 17, 2022
Tflite specific toolchain was incompatible with opencv build.
It kept bugging with error such as (I couldn't find any fix, despite googling it everywhere) :
.../c++/8.3.0/ext/concurrence.h:122:34: error: '__PTHREAD_SPINS' was not declared in this scope
     __gthread_mutex_t _M_mutex = __GTHREAD_MUTEX_INIT;
                                  ^~~~~~~~~~~~~~~~~~~~

However, the Ubuntu 20.04 crossbuild-essential-armhf toolchain was also
incompatible with tensorflow lite build, due to issue mentionned in :
google/XNNPACK#1465 (comment)

This fix is a hack. I use the Ubuntu 20.04 crossbuild-essential-armhf
toolchain but replace the assembler with the one coming from the
tensorflow lite toolchain (binutils 2.32). Luckily, all is done within
the Docker image.

Both builds, local and on raspberry, works and run now.
maxisoft added a commit to maxisoft/pytorch-arm that referenced this issue Nov 17, 2022
@misterBart
Copy link

Well, I believe the patch has been applied to gcc>=9. I use Crosstool-NG to build an aarch64 toolchain with gcc12.3 and binutils2.29.1. But building Xnnpack still gives me the errors:

Assembler messages:
Error: unknown architectural extension `i8mm'
Error: unrecognized option -march=armv8.2-a+i8mm
gmake[2]: *** [CMakeFiles/microkernels-prod.dir/build.make:202: CMakeFiles/microkernels-prod.dir/src/amalgam/gen/neoni8mm.c.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:1413: CMakeFiles/microkernels-prod.dir/all] Error 2
gmake[1]: *** Waiting for unfinished jobs....
Assembler messages:
Error: unknown architectural extension `bf16'
Error: unrecognized option -march=armv8.2-a+bf16
gmake[2]: *** [CMakeFiles/microkernels-all.dir/build.make:51792: CMakeFiles/microkernels-all.dir/src/bf16-gemm/gen/bf16-gemm-2x4c8-minmax-neonbf16-bfmlal.c.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
Assembler messages:
Error: unknown architectural extension `bf16'
Error: unrecognized option -march=armv8.2-a+bf16

What works for me is: gcc>=9 and binutils>=2.34. This is somewhat unfortunate because this prohibits me to choose a glibc with a low version number, and consequently I cannot use Xnnpack on all client systems.

Does anybody have some experience with or thoughts about this issue that are still unmentioned in this thread?

@Maratyszcza
Copy link
Contributor

Maratyszcza commented Aug 15, 2023

@misterBart configure with -DXNNPACK_ENABLE_ARM_BF16=OFF and -DXNNPACK_ENABLE_ARM_I8MM=OFF. This will cost you some performance on newest systems though.

@misterBart
Copy link

misterBart commented Aug 16, 2023

@Maratyszcza Yes, good that you mention that, because I used that as a workaround to build for low glibc systems. Still, using those flags felt like applying a workaround and made me wonder how future-proof that solution is.
I was hoping a bit that someone in this thread has some extra insight in the combination gcc, binutils, and glibc that I do not have. Something that would also work for software that does not have flags like -DXNNPACK_ENABLE_ARM_BF16=OFF and -DXNNPACK_ENABLE_ARM_I8MM=OFF.

@Maratyszcza
Copy link
Contributor

You can't build a software using recent CPU instructions with a compiler which doesn't support those instructions. The only solution is to disable the use of instructions not supported by the compiler. This is exactly what -DXNNPACK_ENABLE_ARM_BF16=OFF (and similar options) does.

@misterBart
Copy link

Don't get me wrong, I am using a new compiler (gcc12.3). It's glibc that I would like to keep old when building.

@Maratyszcza
Copy link
Contributor

In addition to the compiler, you may need to use a new version of binutils, to avoid issues like in the first post.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants