WASM SIMD for NNUE #30

hi-ogawa · 2021-02-10T08:44:33Z

EDIT: I removed NOTE.md from the repository and move its content to here https://github.com/hi-ogawa/stockfish.wasm/wiki/NOTE

Hi @niklasf
I experimented with WASM SIMD and wanted to share the result, which looks promising even though I'm not sure if it's usable stage.

I've seen the closed PR (#21) where it relied on emscripten's x86 SIMD emulation feature, but here, I implemented some linear algebra routines directly using WASM SIMD intrisics.

These are some benchmark results (see NOTE.md for the detail)

# NOTE:
#  emscripten's node (emsdk/node/12.18.1_64bit/bin/node) is old and doesn't seem to compatible with latest simd (cf. https://github.com/emscripten-core/emscripten/issues/11484).
#  So, I'm using the locally installed node with this version.
$ node -e 'console.log(`node: ${process.version}\nv8: ${process.versions.v8}`)'
node: v15.8.0
v8: 8.6.395.17-node.23

# [ Build type `wasm_simd_post_mvp=yes` ]
$ node --experimental-wasm-{simd,threads} --experimental-repl-await --wasm-simd-post-mvp

> const sf = await require("./stockfish")();

> sf.postMessage("bench 16 1 22 current depth classical");
Total time (ms) : 5056
Nodes searched  : 3478378
Nodes/second    : 687970

> sf.postMessage("bench 16 1 22 current depth NNUE");
Total time (ms) : 6424
Nodes searched  : 2670151
Nodes/second    : 415652


# [ Build type `wasm_simd=yes wasm_simd_post_mvp=no` ]
> sf.postMessage("bench 16 1 22 current depth NNUE");
Total time (ms) : 9472
Nodes searched  : 2670151
Nodes/second    : 281899


# [ Build type `wasm_simd=no wasm_simd_post_mvp=no` ]
> sf.postMessage("bench 16 1 22 current depth NNUE");
Total time (ms) : 54008
Nodes searched  : 2670151
Nodes/second    : 49439

As a comparison, these are the results from current stockfish master (29ed22d) on my machine.

# Built with flag `ARCH=x86-64-modern` which enables SSSE3.
bench 16 1 22 current depth classical
Total time (ms) : 4323
Nodes searched  : 4850165
Nodes/second    : 1121944

bench 16 1 22 current depth NNUE
Total time (ms) : 2992
Nodes searched  : 2655513
Nodes/second    : 887537

# Built with `ARCH=x86-64 sse=yes sse2=yes SUPPORTED_ARCH=true` (SSE3 is disabled)
# I think this setting is the closest one we can get to with WASM SIMD.
bench 16 1 22 current depth NNUE
Total time (ms) : 5202
Nodes searched  : 2655513
Nodes/second    : 510479

A few things I wanted to mention:

I made a separate file nnue/math.cpp and WASM SIMD matrix-vector multiplication routine lives there (mostly because I wanted separate translation unit so that I can rebuild faster).
I added two build types wasm_simd and wasm_simd_post_mvp, where wasm_simd_post_mvp has substantially faster implementation due to new intrinsic __builtin_wasm_dot_s_i32x4_i16x8.
NNUE data is embedded in executable by generating c++ string literal with my little script embedded_nnue_data.py,
which means the size of stockfish.wasm becomes 21MB.
I made bench_eval command to measure directly the performance of NNUE inference with my another little script timeit.hpp.
It seems both Chrome/Chromium(88.0.4324.150) and Firefox(85.0.1) already enables "post-mvp" features when SIMD is enabled.
On Firefox, there is some security restriction regarding SharedArrayBuffer and it needs some special http headers to be sent.
So, I included misc/server.py to run http server with such headers.

At this point, I'm not really sure if this should be even PR, but I'm curious what you think about this.
For the next step, I'm planning to implement a way to run javscript as UCI engine executable so that we can compare two versions' strengths direcly by pitting (e.g. by cutechess-cli).

Thanks a lot for reading!

niklasf · 2021-02-10T12:44:40Z

Wow, amazing work!

Do you know if browsers have a good way to test for the required post MVP features? If so, it should be possible to get a test version out on Lichess pretty quickly.

hi-ogawa · 2021-02-10T14:44:07Z

Thanks for the reply!

Do you know if browsers have a good way to test for the required post MVP features?

Probably, we have to check the wasm binary itself by WebAssembly.validate.
But, of course, we don't want to download a whole wasm file before knowing if it runs, so one idea is that, maybe before downloading, we can just validate some small wasm binary with simd opcode we want to use.
I think I can make some example right away, let's see.

hi-ogawa · 2021-02-10T17:10:30Z

I think I found a very quick way to check if certain wasm opcode is supported. It's something like this:

# Feature detection for `i32x4.dot_i16x8_s` (see `misc/feature_detection.wat` for how these bytes are generated)

/usr/bin/node --experimental-wasm-simd
> data = Uint8Array.from([0, 97, 115, 109, 1, 0, 0, 0, 1, 5, 1, 96, 0, 1, 123, 3, 2, 1, 0, 7, 8, 1, 4, 116, 101, 115, 116, 0, 0, 10, 15, 1, 13, 0, 65, 0, 253, 17, 65, 0, 253, 17, 253, 186, 1, 11])
> WebAssembly.validate(data)
false

/usr/bin/node --experimental-wasm-simd --wasm-simd-post-mvp
> data = Uint8Array.from([0, 97, 115, 109, 1, 0, 0, 0, 1, 5, 1, 96, 0, 1, 123, 3, 2, 1, 0, 7, 8, 1, 4, 116, 101, 115, 116, 0, 0, 10, 15, 1, 13, 0, 65, 0, 253, 17, 65, 0, 253, 17, 253, 186, 1, 11])
> WebAssembly.validate(data)
true

hi-ogawa · 2021-02-11T08:06:54Z

I implemented uci.js to run this stockfish as an UCI engine executable.
Also, I added new github actions match.yml which runs cutechess-cli tornament between two versions (nnue and classical).
It's only triggered manually (workflow_dispatch mode) and I just started four tournaments on my repository (unfortunately, it cannot be run from web interface unless it's master branch. so I need to trigger via REST API).
You can check progress and result here https://github.com/hi-ogawa/stockfish.wasm/actions?query=workflow%3AMatch.
(Probably, Github CI is not really a good place to run engine testing, but I've been using this approach for my own engine because I don't have a machine to spare.)

Four tournaments settings are

nnue(wasm_simd_post_mvp=yes) vs classical, tc=10+0.1 (https://github.com/hi-ogawa/stockfish.wasm/runs/1877640888)
nnue(wasm_simd_post_mvp=yes) vs classical, tc=60+0.6 (https://github.com/hi-ogawa/stockfish.wasm/runs/1877641945)
nnue(wasm_simd=yes) vs classical, tc=10+0.1 (https://github.com/hi-ogawa/stockfish.wasm/runs/1877643010)
nnue(wasm_simd=yes) vs classical, tc=60+0.6 (https://github.com/hi-ogawa/stockfish.wasm/runs/1877643839)

each with 100 rounds (200 games) and openings are from noob_3moves.epd.

EDIT:

Here is the result of this very rudimental test:

nnue(wasm_simd_post_mvp=yes) vs classical, tc=10+0.1 (https://github.com/hi-ogawa/stockfish.wasm/runs/1877640888)
Score of nnue vs classical: 104 - 21 - 75 [0.708] 200
... nnue playing White: 57 - 12 - 31 [0.725] 100
... nnue playing Black: 47 - 9 - 44 [0.690] 100
... White vs Black: 66 - 59 - 75 [0.517] 200
Elo difference: 153.4 +/- 39.3, LOS: 100.0 %, DrawRatio: 37.5 %
nnue(wasm_simd_post_mvp=yes) vs classical, tc=60+0.6 (https://github.com/hi-ogawa/stockfish.wasm/runs/1877641945)
Score of nnue vs classical: 95 - 8 - 97 [0.718] 200
... nnue playing White: 54 - 3 - 43 [0.755] 100
... nnue playing Black: 41 - 5 - 54 [0.680] 100
... White vs Black: 59 - 44 - 97 [0.537] 200
Elo difference: 161.9 +/- 34.0, LOS: 100.0 %, DrawRatio: 48.5 %
nnue(wasm_simd=yes) vs classical, tc=10+0.1 (https://github.com/hi-ogawa/stockfish.wasm/runs/1877643010)
Score of nnue vs classical: 80 - 28 - 92 [0.630] 200
... nnue playing White: 40 - 13 - 47 [0.635] 100
... nnue playing Black: 40 - 15 - 45 [0.625] 100
... White vs Black: 55 - 53 - 92 [0.505] 200
Elo difference: 92.5 +/- 35.6, LOS: 100.0 %, DrawRatio: 46.0 %
nnue(wasm_simd=yes) vs classical, tc=60+0.6 (https://github.com/hi-ogawa/stockfish.wasm/runs/1877643839)
Score of nnue vs classical: 84 - 8 - 108 [0.690] 200
... nnue playing White: 43 - 3 - 54 [0.700] 100
... nnue playing Black: 41 - 5 - 54 [0.680] 100
... White vs Black: 48 - 44 - 108 [0.510] 200
Elo difference: 139.0 +/- 31.7, LOS: 100.0 %, DrawRatio: 54.0 %

hi-ogawa · 2021-02-11T10:03:24Z

I was crawling through v8's code and I found that actually __builtin_wasm_dot_s_i32x4_i16x8 is not "post mvp" anymore since v8 version 8.8.101 v8/v8@01b8b3e.
On my machine, NodeJS has v8 (8.6.395.17) and Chromium has v8 (8.8.278.15), so that's the reason why the fast version is able to run on Chromium.
Essentially what I thought "post mvp" build might not be "post mvp" build anymore, though this fact itself kind of indicates such feature is not really stable yet.

EDIT:

According to this page https://omahaproxy.appspot.com where it lists latest chrome versions and corresponding v8 version, it seems all (except iOS, which doesn't use v8) should support latest WASM SIMD features.

EDIT:

I confirmed that the version with wasm_simd_post_mvp=yes runs on my Android phone with latest Chrome.

hi-ogawa · 2021-02-12T10:37:04Z

I made a little frontend misc/uci.html where you can run uci command for quick testing. I published it as my fork's Github Page here https://hi-ogawa.github.io/stockfish.wasm/misc/uci.html.

EDIT: Due to the security header issue, the page doesn't work for Firefox...

niklasf · 2021-02-20T13:25:47Z

Still very impressed with your work. I also see you have created a much cleaner port of Stockfish at https://github.com/hi-ogawa/Stockfish/tree/emscripten. To me it would make sense to stop maintaining this repository in favor of yours. Are you planning to publish it to npm?

hi-ogawa · 2021-02-20T17:23:54Z

Thanks for noticing my port!
I was just experimenting with emscripten's ASINCIFY and PROXY_TO_PTHREAD, and (somehow) managed it to work. There's a bit sketchy part in the uci command communication but I think (hope) it doesn't have a problem.

Are you planning to publish it to npm?

I wasn't planning to make npm package, but sure, I can do that.
So, after publishing npm package from my repo, I will update lila's PR lichess-org/lila#8154 too accordingly.

hi-ogawa added 3 commits February 10, 2021 15:55

Implement WASM SIMD backend for NNUE inference

9ad54af

Implement bench_eval command

f181741

Update ci.yml (Add setup-python)

e3e40d1

hi-ogawa force-pushed the nnue-wasm-simd branch from 7678f0e to f8fc3b7 Compare February 10, 2021 09:14

Fix typo in NOTE.md

f8fc3b7

Create misc/feature_detection.wat

9e9c514

Create uci.js

155cb2a

hi-ogawa force-pushed the nnue-wasm-simd branch from a446126 to 9cd72ff Compare February 11, 2021 07:05

Create misc/matsh.sh and workflows/match.yml

9cd72ff

hi-ogawa force-pushed the nnue-wasm-simd branch from 747617c to 6de439f Compare February 11, 2021 07:32

Remove NOTE.md

6de439f

hi-ogawa mentioned this pull request Feb 11, 2021

Ceval with Stockfish NNUE lichess-org/lila#8154

Merged

hi-ogawa added 2 commits February 12, 2021 14:34

Add build option "minify_js" which is helpful for debugging

6bd1f41

Optimize FeatureTransformer::ReadParameters

99ec76d

hi-ogawa force-pushed the nnue-wasm-simd branch from d1983d6 to ad85cb1 Compare February 12, 2021 10:53

Create misc/uci.html

ad85cb1

vondele mentioned this pull request Feb 21, 2021

Classical Evaluation Improved, but Search is no longer "Tuned" for it. [Regression on Classical-only Search] official-stockfish/Stockfish#3365

Closed

niklasf closed this in 25527e6 Feb 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WASM SIMD for NNUE #30

WASM SIMD for NNUE #30

hi-ogawa commented Feb 10, 2021 •

edited

niklasf commented Feb 10, 2021

hi-ogawa commented Feb 10, 2021

hi-ogawa commented Feb 10, 2021

hi-ogawa commented Feb 11, 2021 •

edited

hi-ogawa commented Feb 11, 2021 •

edited

hi-ogawa commented Feb 12, 2021 •

edited

niklasf commented Feb 20, 2021

hi-ogawa commented Feb 20, 2021

WASM SIMD for NNUE #30

WASM SIMD for NNUE #30

Conversation

hi-ogawa commented Feb 10, 2021 • edited

niklasf commented Feb 10, 2021

hi-ogawa commented Feb 10, 2021

hi-ogawa commented Feb 10, 2021

hi-ogawa commented Feb 11, 2021 • edited

hi-ogawa commented Feb 11, 2021 • edited

hi-ogawa commented Feb 12, 2021 • edited

niklasf commented Feb 20, 2021

hi-ogawa commented Feb 20, 2021

hi-ogawa commented Feb 10, 2021 •

edited

hi-ogawa commented Feb 11, 2021 •

edited

hi-ogawa commented Feb 11, 2021 •

edited

hi-ogawa commented Feb 12, 2021 •

edited