Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NNUE] Android building issue #2860

Closed
AlexB123 opened this issue Jul 28, 2020 · 98 comments
Closed

[NNUE] Android building issue #2860

AlexB123 opened this issue Jul 28, 2020 · 98 comments

Comments

@AlexB123
Copy link

Hello SF team! I have an issue with Android compilations. I'm using NDK's Clang 9.0.8 (last available), on Windows 64 bit. During the compilation i'm getting this error..
error
After I changed the line 331 (in misc.cpp) to "std_aligned_alloc", i managed to make android engine.
Regarding the NETs - where should i put the NN.bin file (eg separate folder, "eval/nn.bin)? And what is the correct name of the net, is it nn.bin or nn.nnue?
Thank you!!

@AlexB123
Copy link
Author

Hi again! As i mentioned above i made android engine by changing the line to "std_aligned_alloc".
Engine working, but only without the network, if i mark the network option engine crashes instantly. Any ideas how to fix this?
Screenshot_20200729-231443

Btw, why the name of the network must be "nn-c157e0a5755b.nnue"? It's not much easier to call it "nn.nnue"?

@mstembera
Copy link
Contributor

You can create a PR here https://github.com/official-stockfish/Stockfish/tree/nnue-player-wip adding an ARM section to std_aligned_alloc() and std_aligned_free() in misc.cpp. The net is named with the first 12 characters of the SHA256 hash. This is so that when the default nets change we can uniquely tell them apart. You can use any name you like for your custom net and specify it as a UCI option.

@AlexB123
Copy link
Author

I've tried this change but it didn't work..
Sf error1
Sf error2

@AlexB123
Copy link
Author

You can create a PR here https://github.com/official-stockfish/Stockfish/tree/nnue-player-wip adding an ARM section to std_aligned_alloc() and std_aligned_free() in misc.cpp. The net is named with the first 12 characters of the SHA256 hash. This is so that when the default nets change we can uniquely tell them apart. You can use any name you like for your custom net and specify it as a UCI option.

Hello, thank you for the feedback! I don't know how to create a PR.
I'm not a programmer, so can you bring an example regarding the flags "std_aligned_alloc() and std_aligned_free()" in misc.cpp, how to add them (i mean in what order)? I've tried like this, it didn't work, apparently i did something wrong.
SF1
SF2
Thank you!

@AlexB123
Copy link
Author

Sorry, forgot to mention. I made SF NNUE from nodechip source code https://github.com/nodchip/Stockfish Engine working fine and it is reading nn.bin, although the speed of the engine is too slow comparing to normal SF, plus without the nn.bin, engine is horrible, look at analysis.
813388b5cee6484969028ffeadf81efd

@MichaelB7
Copy link
Contributor

Yes - his fork was designed for NNUE only - will not play well without bin.

@AlexB123
Copy link
Author

Yes - his fork was designed for NNUE only - will not play well without bin.

Tnx, i didn't know that. I thought that without nn.bin it suppose to play like normal SF, anyway now i know. :-)

@MichaelB7
Copy link
Contributor

MichaelB7 commented Jul 30, 2020

Yes - he made it two different executables, the Stockfish team is making it a UCI option , one exe that can play both.

mstembera pushed a commit to mstembera/Stockfish that referenced this issue Jul 31, 2020
@mstembera
Copy link
Contributor

mstembera commented Jul 31, 2020

@AlexB123 I created the PR for you here #2872 Thanks.

Edit: Actually I may have done it wrong. Is this an ARM thing or an Adnroid thing? We can use either
defined(IS_ARM)
or
defined(_ANDROID_)

@vondele vondele changed the title Android building issue [NNUE] Android building issue Aug 2, 2020
@vondele vondele added the NNUE label Aug 2, 2020
@AlexB123
Copy link
Author

AlexB123 commented Aug 6, 2020

Hello guys, just wanted to let you know that android version still crashing. :(
Flags that I use to build the engine.
set "compiler_options=-m64 -march=armv8-a -DIS_64BIT -fPIE -Wl,-pie -lm -DUSE_POPCNT -DNO_PREFETCH -DUSE_NEON -O3 -flto -static-libstdc++ -std=c++17 -fno-strict-aliasing -fno-strict-overflow -ffunction-sections -fdata-sections -Wl,--gc-sections -Wl,-s"

Btw, from old nodchip's source code https://github.com/nodchip/Stockfish , engine is generated and working with follow flags ->
set "compiler_options=-m64 -march=armv8-a -DIS_64BIT -fPIE -Wl,-pie -lm -DUSE_POPCNT -DEVAL_NNUE -DENABLE_TEST_CMD -fopenmp -O3 -flto -static-libstdc++ -std=c++17 -fno-strict-aliasing -fno-strict-overflow -ffunction-sections -fdata-sections -Wl,--gc-sections -Wl,-s"
Maybe this can help you somehow to solve the issue.

Thank you!

@vondele
Copy link
Member

vondele commented Aug 6, 2020

So, it compiles but crashes at runtime?

Edit: at which point does the crash happen, i.e. do you have any output, and which UCI commands do you send?

@AlexB123
Copy link
Author

AlexB123 commented Aug 7, 2020

So, it compiles but crashes at runtime?

Edit: at which point does the crash happen, i.e. do you have any output, and which UCI commands do you send?

Hi vondele! The engine compiles, with small correction in misc.cpp line 329, by changing to "return std_aligned_alloc(alignment, size);", or using this changes af6473a
Engine working fine in Droidfish, as normal Stockfish. But, when i mark the "Use NNUE" option in engine's settings, it's crashes instantly, with the message "engine terminated".
Screenshot_20200807-170521
Screenshot_20200807-170408

I have a feeling that some flag is missing in Makefile, that is responsible for applying NNUE in the engine. Since i don't know which is that flag, i don't know what to write in the batch file, so compiler generates normal engine, not able to use NNUE, or, NDK's Clang is unable to cooperate with NNUE.
I don't know how else to explain these crashes.

@vondele
Copy link
Member

vondele commented Aug 7, 2020

@AlexB123 that change you make (i.e. calling std_aligned_alloc) is not OK. It will compile but crash. Can you try instead of your change the change proposed #2927
i.e. https://github.com/official-stockfish/Stockfish/pull/2927/files

@AlexB123
Copy link
Author

AlexB123 commented Aug 7, 2020

Commands execution in SManager for Android.
bench
Screenshot_20200807-180309

uci
Screenshot_20200807-180404

setoption name Use NNUE value true
Screenshot_20200807-180647

@AlexB123
Copy link
Author

AlexB123 commented Aug 7, 2020

@AlexB123 that change you make (i.e. calling std_aligned_alloc) is not OK. It will compile but crash. Can you try instead of your change the change proposed #2927
i.e. https://github.com/official-stockfish/Stockfish/pull/2927/files

Ok, i'll try it later, i have to go now. :)

@AlexB123
Copy link
Author

AlexB123 commented Aug 8, 2020

@AlexB123 that change you make (i.e. calling std_aligned_alloc) is not OK. It will compile but crash. Can you try instead of your change the change proposed #2927
i.e. https://github.com/official-stockfish/Stockfish/pull/2927/files

Hello! Having tried new flags, compiler gives a new error.
SF 1
SF 2

@vondele
Copy link
Member

vondele commented Aug 8, 2020

can you try to #include <stdlib.h> in the file?

@AlexB123
Copy link
Author

AlexB123 commented Aug 8, 2020

can you try to #include <stdlib.h> in the file?

Not sure if i did it correctly -> misc.cpp, +line 52 "#include <stdlib.h>", didn't work, same error.
Also -> misc.cpp, +line 56 "#include <stdlib.h>", didn't work either, same error.

@vondele
Copy link
Member

vondele commented Aug 8, 2020

So can you instead try to use this:

void* std_aligned_alloc(size_t alignment, size_t size) {
    // alignment must be >= sizeof(void*)
    if(alignment < sizeof(void*))
    {
        alignment = sizeof(void*);
    }
    void *pointer;
    if(posix_memalign(&pointer, alignment, size) == 0)
        return pointer;
    return nullptr;
}

leave #include <stdlib.h> in the file near line 56.

@AlexB123
Copy link
Author

AlexB123 commented Aug 8, 2020

void* std_aligned_alloc(size_t alignment, size_t size) {
// alignment must be >= sizeof(void*)
if(alignment < sizeof(void*))
{
alignment = sizeof(void*);
}
void *pointer;
if(posix_memalign(&pointer, alignment, size) == 0)
return pointer;
return nullptr;

With this changes engine compiles, without errors or warnings, but again, it is crashes when i mark the "Use NNUE" box. It's working as normal engine only.
Flags

@vondele
Copy link
Member

vondele commented Aug 8, 2020

That code looks right, so, probably we're having a different reason for a crash. (unless the code returns a nullptr). I assume you have the right 'ARCH=...' option for the make command ?

To move on we need to be able to understand where it crashes. Usually that would mean to compile (after make clean) with debug=yes optimize=no flags to make, and afterwards run it under gdb like

gdb ./stockfish
run
setoption name Use NNUE value true
bench
[crash]
bt

@AlexB123
Copy link
Author

AlexB123 commented Aug 8, 2020

That code looks right, so, probably we're having a different reason for a crash. (unless the code returns a nullptr). I assume you have the right 'ARCH=...' option for the make command ?

To move on we need to be able to understand where it crashes. Usually that would mean to compile (after make clean) with debug=yes optimize=no flags to make, and afterwards run it under gdb like

gdb ./stockfish
run
setoption name Use NNUE value true
bench
[crash]
bt

I use flag -march=armv8-a in my batch file, for amr8 64 bit engines. Since the engine is working, but only as normal SF, the flag / ARCH is correct. I'll try to make engine without -flto and -DUSE_POPCNT, and let you know later if something changes.
Regards.
Alex.

@AlexB123
Copy link
Author

AlexB123 commented Aug 9, 2020

Well team, i give up. I used last source code, the first issue with compiling still remain.
SF NNUE

By using all the mentioned (above) changes in misc.cpp, engine compiles but not 100% functional.
it can execute commands like "uci" and "bench", but it fails to execute "setoption name Use NNUE value true", simply put, it working only without "Use NNUE" option. I've tried several flags "-DNDEBUG", "-DUSE_NEON", "-O3", and without all this flags, nothing works.
There must be a flag(s) in the Makefile or misc.cpp which is responsible for applying of NNUE functions on the engine, but i don't know which flag is that. Maybe Peter Österlund can help?
http://talkchess.com/forum3/viewtopic.php?p=853010#p853010

@notruck
Copy link
Contributor

notruck commented Aug 10, 2020

Thank you, vondele. Your patch in this thread (as it appears in AlexB123's screenshot) allowed the compile to finish. I think the binary is actually working too.

I can build for both aarch64 and armv7, but I can only test armv7 binaries right now.

@AlexB123 DroidFish doesn't seem to like the Use NNUE checkbox option. It crashes and/or wouldn't start. My binary appears to be working alright in Chess for Android, and also in a terminal emulator app. Maybe you might want to try your build in those apps instead, although it sounds like yours was crashing in the terminal emulator too?

You may have already noticed this: the current official branch wants the .nnue file to be in the same folder as the engine, not in a sub-folder any more. Chess for Android requires the .nnue file to be installed the same way as an engine, so I assume they're being put in the same dir.

I'll upload my aarch64 build, in case you want to test it. Let me know how it works. The only change is vondele's patch applied to misc.cpp, and I used my usual build flags (somewhat different from what you've posted above).

sf-armv8.zip

It is based on this commit iirc : ad2ad4c

@notruck
Copy link
Contributor

notruck commented Aug 10, 2020

In the terminal emulator, I first ran a bench and the speed was on par with what I usually get for regular non-NNUE Stockfish.

Then, without bringing the .nnue file into the terminal emulator yet, I did a setoption name Use NNUE value true followed by another bench ... This time, I get a warning text Use of NNUE evaluation, but the file ____.nnue was not loaded successfully. and so on. The benchmark didn't run.

After I have the correct .nnue file, I set Use NNUE again and this time the benchmark did run, and at a significantly lower speed than before. So I assumed my armv7 build was actually using NNUE.

@Joachim26
Copy link

Joachim26 commented Aug 17, 2020

v8 phone:
vondele_v8 : 60 knps
sfndk.armv7 : 26.5 knps
sfndk.armv7-neon: 52 knps
sfndk.armv8-neon: 61 knps

error +/- 1 knps => 1) and 4) same speed

v7 phone:
vondele_v7 : 35 knps
sfndk.armv7 : 15.5 knps
sfndk.armv7-neon: 35 knps

error +/- 1 knps => 1) and 3) same speed

Startposition, 1 core, measured after several seconds, hash cleared before measurement, GUI=Droidfish
If something is not clear, let me know.

@vondele
Copy link
Member

vondele commented Aug 17, 2020

cool, so I can compile for android and it works... that's foolproof. I'll just have to learn how to copy the binaries to my phone ;-).

@notruck do you like the state of this branch https://github.com/vondele/Stockfish/tree/notruck-master (see also master...vondele:notruck-master). I might just want to adjust the strip target to pick the right binary.

Now we need to verify that this still works with linux on arm. @Dantist could you test that this branch compiles on RP ?

@AlexB123
Copy link
Author

Hello, just want to let you know guys. As @Joachim26 mentioned already, i've made 6 versions of armv7, using NDK r21 and r21d. The fastest armv7 was compiled with r21, including changes in Makefile from here a251ef5 , and with follow flags
set "compiler_options=-m32 -march=armv7-a -fPIE -Wl,-pie -mfloat-abi=softfp -mfpu=vfpv3-d16 -mfpu=neon-vfpv4 -DUSE_NEON -mthumb -Wl,--fix-cortex-a8 -latomic -DNDEBUG -DUSE_POPCNT -Ofast -flto -static-libstdc++ -std=c++17 -fno-strict-aliasing -fno-strict-overflow -ffunction-sections -fdata-sections -Wl,--gc-sections -Wl,-s
All 6 armv7 here (in case if you want to try).
ARM7 speed test.zip

@MichaelB7
Copy link
Contributor

For the RPI 4 64 bit OS ( still in beta) - therese are the proper flags

	ifeq ($(ARCH),armv7)
		CXXFLAGS += -mcpu=cortex-a72 -march=armv8-a+crypto+simd -mtune=cortex-a72

for the RPI 4 or lower in a 32 bit OS
these are the proper flags

	ifeq ($(ARCH),armv7)
	        CXXFLAGS += -mcpu=cortex-a53 -mfloat-abi=hard -mfpu=neon-fp-armv8 -mneon-for-64bits -mtune=cortex-a53

64 bit OS will produce exe's that output ( in classical eval mode) 750K nps at standard @ arm_freq of 1500, up tp nearly 900k/nps at 2100 mhz. 32 bit OS is about 30 to 40 % slower . NNUE mode is about 38% of Classical mode.

@vondele
Copy link
Member

vondele commented Aug 17, 2020

@MichaelB7 which of those flags are non-essential, i.e. can be left out, and still results in a reasonable executable. I assume in the 64 bits case all of the flags can be left out, but possibly not in the 32 bit case?

Edit: Also, isn't RPI4 an armv8, why modify the flags under armv7?

The challenge with the arm target, at least for me, is the diversity, and I'd like to have a minimal working input first. For example, will the flags you post work with the RPI 1?

@MichaelB7
Copy link
Contributor

MichaelB7 commented Aug 17, 2020

The active community base of Pi users which I am involved with, are those that are using engines with PicoChess. PicoChess runs on Pi 3 and higher. The Picochess community is a community that are using DGT-PI or a modified DGT clock 3000 or something similar that may be handcrafted or modified that enable one to use a wooden chessboard to make their moves, primarily a on DGT board, to play against various chess engines. There is no active group of chess users using RPI 1 or RPI 2 devices. The second set of flags will suffice.

https://groups.google.com/g/picochess

The armv8 flag did not compile with the RPI-4 , and ARMv7 did - at least for me. This is a beta Raspi 64 bit OS which is stil in beta -and the user base is probably very very small. The flags in for the 32 bit RPI work for all 3 models and above including the 4 if is running the 32 bit OS. I'm not familiar with anyone using the RP1 or RP2 since they are not supported for PicoChess.

@vondele
Copy link
Member

vondele commented Aug 17, 2020

still trying to understand. Also RPI3 is armv8, so my question is, can you compile&run on that hardware with
https://github.com/vondele/Stockfish/tree/notruck-master
using make -j ARCH=armv8 build ?

@MichaelB7
Copy link
Contributor

no that does not work:

nnue/evaluate_nnue.cpp: In function ‘Value Eval::NNUE::ComputeScore(const Position&, bool)’:
nnue/evaluate_nnue.cpp:135:61: warning: requested alignment 64 is larger than 8 [-Wattributes]
         transformed_features[FeatureTransformer::kBufferSize];
                                                             ^
nnue/evaluate_nnue.cpp:137:61: warning: requested alignment 64 is larger than 8 [-Wattributes]
     alignas(kCacheLineSize) char buffer[Network::kBufferSize];
                                                             ^
In file included from nnue/../nnue/architectures/../features/../nnue_common.h:40,
                 from nnue/../nnue/architectures/../features/features_common.h:25,
                 from nnue/../nnue/architectures/../features/feature_set.h:24,
                 from nnue/../nnue/architectures/halfkp_256x2-32-32.h:24,
                 from nnue/../nnue/nnue_architecture.h:25,
                 from nnue/../nnue/nnue_accumulator.h:24,
                 from nnue/../position.h:31,
                 from nnue/evaluate_nnue.cpp:26:
/usr/lib/gcc/arm-linux-gnueabihf/8/include/arm_neon.h: In member function ‘void Eval::NNUE::FeatureTransformer::RefreshAccumulator(const Position&) const’:
/usr/lib/gcc/arm-linux-gnueabihf/8/include/arm_neon.h:589:1: error: inlining failed in call to always_inline ‘int16x8_t vaddq_s16(int16x8_t, int16x8_t)’: target specific option mismatch
 vaddq_s16 (int16x8_t __a, int16x8_t __b)
 ^~~~~~~~~
In file included from nnue/evaluate_nnue.h:24,
                 from nnue/evaluate_nnue.cpp:30:
nnue/nnue_feature_transformer.h:229:40: note: called from here
             accumulation[j] = vaddq_s16(accumulation[j], column[j]);
                               ~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from nnue/../nnue/architectures/../features/../nnue_common.h:40,
                 from nnue/../nnue/architectures/../features/features_common.h:25,
                 from nnue/../nnue/architectures/../features/feature_set.h:24,
                 from nnue/../nnue/architectures/halfkp_256x2-32-32.h:24,
                 from nnue/../nnue/nnue_architecture.h:25,
                 from nnue/../nnue/nnue_accumulator.h:24,
                 from nnue/../position.h:31,
                 from nnue/evaluate_nnue.cpp:26:
/usr/lib/gcc/arm-linux-gnueabihf/8/include/arm_neon.h:589:1: error: inlining failed in call to always_inline ‘int16x8_t vaddq_s16(int16x8_t, int16x8_t)’: target specific option mismatch
 vaddq_s16 (int16x8_t __a, int16x8_t __b)

looking now to see what works and keeps it simple

@vondele
Copy link
Member

vondele commented Aug 17, 2020

is that the 32 bit OS? in that case maybe try ARCH=armv7-neon ?

Anyway, I've force-pushed the branch (https://github.com/vondele/Stockfish/tree/notruck-master) once more, I think the ndk changes are final. @notruck do you want your full name added to the AUTHORS file, right now I've used your github handle.

@MichaelB7
Copy link
Contributor

so far , I need at least these two

   CXXFLAGS +=  -mcpu=cortex-a53 -mfpu=neon-fp-armv8 

using armv7-neon in lieu of above

g++: error: unrecognized -march target: armv7-neon
g++: note: valid arguments are: armv2 armv2a armv3 armv3m armv4 armv4t armv5 armv5t armv5e armv5te armv5tej armv6 armv6j armv6k armv6z armv6kz armv6zk armv6t2 armv6-m armv6s-m armv7 armv7-a armv7ve armv7-r armv7-m armv7e-m armv8-a armv8.1-a armv8.2-a armv8.3-a armv8.4-a armv8-m.base armv8-m.main armv8-r iwmmxt iwmmxt2 native; did you mean ‘armv7-a’?

@vondele
Copy link
Member

vondele commented Aug 17, 2020

I'm surprised by that gcc error, we don't pass -march=armv7-neon to gcc, as far as I can see. Do you have anywhere in the Makefile other local changes?

@MichaelB7
Copy link
Contributor

you still get this error using just -march = armv7-a

In file included from nnue/../nnue/architectures/../features/../nnue_common.h:40,
                 from nnue/../nnue/architectures/../features/features_common.h:25,
                 from nnue/../nnue/architectures/../features/feature_set.h:24,
                 from nnue/../nnue/architectures/halfkp_256x2-32-32.h:24,
                 from nnue/../nnue/nnue_architecture.h:25,
                 from nnue/../nnue/nnue_accumulator.h:24,
                 from nnue/../position.h:31,
                 from nnue/evaluate_nnue.cpp:26:
nnue/nnue_feature_transformer.h: In member function ‘Eval::NNUE::FeatureTransformer::RefreshAccumulator(Position const&) const’:
/usr/lib/gcc/arm-linux-gnueabihf/8/include/arm_neon.h:589:1: error: inlining failed in call to always_inline ‘vaddq_s16’: target specific option mismatch
 vaddq_s16 (int16x8_t __a, int16x8_t __b)
 ^~~~~~~~~
In file included from nnue/evaluate_nnue.h:24,
                 from nnue/evaluate_nnue.cpp:30:
nnue/nnue_feature_transformer.h:229:40: note: called from here
             accumulation[j] = vaddq_s16(accumulation[j], column[j]);
                               ~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from nnue/../nnue/architectures/../features/../nnue_common.h:40,
                 from nnue/../nnue/architectures/../features/features_common.h:25,
                 from nnue/../nnue/architectures/../features/feature_set.h:24,
                 from nnue/../nnue/architectures/halfkp_256x2-32-32.h:24,
                 from nnue/../nnue/nnue_architecture.h:25,
                 from nnue/../nnue/nnue_accumulator.h:24,
                 from nnue/../position.h:31,
                 from nnue/evaluate_nnue.cpp:26:
/usr/lib/gcc/arm-linux-gnueabihf/8/include/arm_neon.h:589:1: error: inlining failed in call to always_inline ‘vaddq_s16’: target specific option mismatch
 vaddq_s16 (int16x8_t __a, int16x8_t __b)
 ^~~~~~~~~
In file included from nnue/evaluate_nnue.h:24,
                 from nnue/evaluate_nnue.cpp:30:
nnue/nnue_feature_transformer.h:229:40: note: called from here
             accumulation[j] = vaddq_s16(accumulation[j], column[j]);
                               ~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~
make[1]: *** [<builtin>: evaluate_nnue.o] Error 1
make[1]: *** Waiting for unfinished jobs....
make[1]: Leaving directory '/home/Al/github/Stockfish/src'
make: *** [Makefile:812: build] Error 2

@MichaelB7
Copy link
Contributor

MichaelB7 commented Aug 17, 2020

so I found the minimum flags and it still compiles are below, this works on the 32 bit kernel and OS

g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -pedantic -Wextra -Wshadow -march=armv7-a  -mfpu=neon -DNDEBUG -O3 -DIS_64BIT -DUSE_NEON 

and it works on the RPI-4 running 32 bit kernel OS, which is the current standard. It currently does not work on the RPI-4 with 64 bit kernel/OS which is still in beta - but for those who need that, it is not hard to figure out.

@MichaelB7
Copy link
Contributor

I'm surprised by that gcc error, we don't pass -march=armv7-neon to gcc, as far as I can see. Do you have anywhere in the Makefile other local changes?

I thought that is what you were asking me to pass.

@vondele
Copy link
Member

vondele commented Aug 17, 2020

Similar, I meant make -j ARCH=armv7-neon build.

What I'm still a bit confused about is why we need to pass the -mfpu=neon flag. It is reasonable, but somehow various people seem to be able to build for raspberry PI with this option. Let me try to add it specific to linux only: https://github.com/vondele/Stockfish/tree/notruck-master can you give it a try?

@notruck
Copy link
Contributor

notruck commented Aug 18, 2020

@vondele Github username is fine. Thank you!

I tried to digest the information from the NDK NEON page a little more. If I understood them correctly, they didn't have a -mfpu=neon as the default until r21. I have used it explicitly before, and doing so was indeed beneficial.

Now that r21 uses it on everything, it has become optional (redundant but harmless). Other compilers and older NDKs will still benefit from an explicit -mfpu=neon or a slight variation of that base flag with extra suffixes.

While -mfpu=neon-x-y-z will benefit the runtime speeds in general, it still doesn't seem to help with NNUE network usage.

The neon = yes in the Makefile ensures the passing of -DUSE_NEON flag, which ultimately leads to the inclusion of a header in the nnue/nnue_common.h file:
https://github.com/official-stockfish/Stockfish/blob/master/src/nnue/nnue_common.h#L42

So when compiling with GCC for raspberry pi armv7, using both -mfpu=neon-x-y-z and -DUSE_NEONmay be producing the best results.

@vondele
Copy link
Member

vondele commented Aug 18, 2020

I have updated master with what I believe is the best patch so far. There might/will still be issues, let's try to improve as a follow up. Thanks for the feedback and testing.

joergoster pushed a commit to joergoster/Stockfish-old that referenced this issue Aug 18, 2020
The easiest way to use the NDK in conjunction with this Makefile (tested on linux-x86_64):

1. Download the latest NDK (r21d) from Google from https://developer.android.com/ndk/downloads
2. Place and unzip the NDK in $HOME/ndk folder
3. Export the path variable e.g., `export PATH=$PATH:$HOME/ndk/android-ndk-r21d/toolchains/llvm/prebuilt/linux-x86_64/bin`
4. cd to your Stockfish/src dir
5. Issue `make -j ARCH=armv8 COMP=ndk build`  (use `ARCH=armv7` or `ARCH=armv7-neon` for older CPUs)
6. Optionally `make -j ARCH=armv8 COMP=ndk strip`
7. That's all. Enjoy!

Improves support from Raspberry Pi (incomplete?) and compiling on arm in general

closes official-stockfish/Stockfish#3015

fixes official-stockfish/Stockfish#2860

fixes official-stockfish/Stockfish#2641

Support is still fragile as we're missing CI on these targets. Nevertheless tested with:

```bash
  # build crosses from ubuntu 20.04 on x86 to various arch/OS combos
  # tested with suitable packages installed
  # (build-essentials, mingw-w64, g++-arm-linux-gnueabihf, NDK (r21d) from google)

  # cross to Android
  export PATH=$HOME/ndk/android-ndk-r21d/toolchains/llvm/prebuilt/linux-x86_64/bin:$PATH
  make clean && make -j build ARCH=armv7         COMP=ndk  && make -j build ARCH=armv7 COMP=ndk strip
  make clean && make -j build ARCH=armv7-neon    COMP=ndk  && make -j build ARCH=armv7-neon COMP=ndk strip
  make clean && make -j build ARCH=armv8         COMP=ndk  && make -j build ARCH=armv8 COMP=ndk strip

  # cross to Raspberry Pi
  make clean && make -j build ARCH=armv7         COMP=gcc COMPILER=arm-linux-gnueabihf-g++
  make clean && make -j build ARCH=armv7-neon    COMP=gcc COMPILER=arm-linux-gnueabihf-g++

  # cross to Windows
  make clean && make -j build ARCH=x86-64-modern COMP=mingw
```

No functional change
lucabrivio pushed a commit to lucabrivio/Stockfish that referenced this issue Aug 18, 2020
The easiest way to use the NDK in conjunction with this Makefile (tested on linux-x86_64):

1. Download the latest NDK (r21d) from Google from https://developer.android.com/ndk/downloads
2. Place and unzip the NDK in $HOME/ndk folder
3. Export the path variable e.g., `export PATH=$PATH:$HOME/ndk/android-ndk-r21d/toolchains/llvm/prebuilt/linux-x86_64/bin`
4. cd to your Stockfish/src dir
5. Issue `make -j ARCH=armv8 COMP=ndk build`  (use `ARCH=armv7` or `ARCH=armv7-neon` for older CPUs)
6. Optionally `make -j ARCH=armv8 COMP=ndk strip`
7. That's all. Enjoy!

Improves support from Raspberry Pi (incomplete?) and compiling on arm in general

closes official-stockfish/Stockfish#3015

fixes official-stockfish/Stockfish#2860

fixes official-stockfish/Stockfish#2641

Support is still fragile as we're missing CI on these targets. Nevertheless tested with:

```bash
  # build crosses from ubuntu 20.04 on x86 to various arch/OS combos
  # tested with suitable packages installed
  # (build-essentials, mingw-w64, g++-arm-linux-gnueabihf, NDK (r21d) from google)

  # cross to Android
  export PATH=$HOME/ndk/android-ndk-r21d/toolchains/llvm/prebuilt/linux-x86_64/bin:$PATH
  make clean && make -j build ARCH=armv7         COMP=ndk  && make -j build ARCH=armv7 COMP=ndk strip
  make clean && make -j build ARCH=armv7-neon    COMP=ndk  && make -j build ARCH=armv7-neon COMP=ndk strip
  make clean && make -j build ARCH=armv8         COMP=ndk  && make -j build ARCH=armv8 COMP=ndk strip

  # cross to Raspberry Pi
  make clean && make -j build ARCH=armv7         COMP=gcc COMPILER=arm-linux-gnueabihf-g++
  make clean && make -j build ARCH=armv7-neon    COMP=gcc COMPILER=arm-linux-gnueabihf-g++

  # cross to Windows
  make clean && make -j build ARCH=x86-64-modern COMP=mingw
```

No functional change
AlexB123 referenced this issue in syzygy1/Cfish Sep 21, 2020
This seems to speed up at least ARMv8.
Perhaps more can be gained by increasing NUM_REGS from 8 to 16 in line 166 of nnue.c.
Dantist pushed a commit to Dantist/Stockfish that referenced this issue Dec 22, 2020
Move to posix_memalign for those platforms, in particular android,
that do not fully support c++17 std::aligned_alloc() (and are not windows)

see official-stockfish#2860

closes official-stockfish#2973

No functional change
Dantist pushed a commit to Dantist/Stockfish that referenced this issue Dec 22, 2020
The easiest way to use the NDK in conjunction with this Makefile (tested on linux-x86_64):

1. Download the latest NDK (r21d) from Google from https://developer.android.com/ndk/downloads
2. Place and unzip the NDK in $HOME/ndk folder
3. Export the path variable e.g., `export PATH=$PATH:$HOME/ndk/android-ndk-r21d/toolchains/llvm/prebuilt/linux-x86_64/bin`
4. cd to your Stockfish/src dir
5. Issue `make -j ARCH=armv8 COMP=ndk build`  (use `ARCH=armv7` or `ARCH=armv7-neon` for older CPUs)
6. Optionally `make -j ARCH=armv8 COMP=ndk strip`
7. That's all. Enjoy!

Improves support from Raspberry Pi (incomplete?) and compiling on arm in general

closes official-stockfish#3015

fixes official-stockfish#2860

fixes official-stockfish#2641

Support is still fragile as we're missing CI on these targets. Nevertheless tested with:

```bash
  # build crosses from ubuntu 20.04 on x86 to various arch/OS combos
  # tested with suitable packages installed
  # (build-essentials, mingw-w64, g++-arm-linux-gnueabihf, NDK (r21d) from google)

  # cross to Android
  export PATH=$HOME/ndk/android-ndk-r21d/toolchains/llvm/prebuilt/linux-x86_64/bin:$PATH
  make clean && make -j build ARCH=armv7         COMP=ndk  && make -j build ARCH=armv7 COMP=ndk strip
  make clean && make -j build ARCH=armv7-neon    COMP=ndk  && make -j build ARCH=armv7-neon COMP=ndk strip
  make clean && make -j build ARCH=armv8         COMP=ndk  && make -j build ARCH=armv8 COMP=ndk strip

  # cross to Raspberry Pi
  make clean && make -j build ARCH=armv7         COMP=gcc COMPILER=arm-linux-gnueabihf-g++
  make clean && make -j build ARCH=armv7-neon    COMP=gcc COMPILER=arm-linux-gnueabihf-g++

  # cross to Windows
  make clean && make -j build ARCH=x86-64-modern COMP=mingw
```

No functional change
mizar added a commit to mizar/YaneuraOu that referenced this issue Sep 24, 2021
- misc.cpp : Android向けに不足していた定義の追加
see yaneurao@fa10bc3
see official-stockfish/Stockfish#2860
see official-stockfish/Stockfish#2973
see official-stockfish/Stockfish@399cddf

- misc.cpp : aligned_large_pages_alloc() の引数の数をWIN32とそれ以外で合わせる
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants