-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add sse4 if bmi2 is enabled #2300
Conversation
I have tested this on my Mac, it compiles OK. |
I had tried that and It was observed that -msse4 doesn't work on AMD Cpu. Iirc the official static binary of SF8 was required to be changed due to the complaint by Graham on Talk Chess Forum. |
@Krgp If you look at Vondele's SPRT test, my machine (an AMD Ryzen 7) completed 500 games without any problem. (See run 81 and 257). I know it is a small sample size, but I wonder whether this is still an issue for AMD cpus. Perhaps we could verify this. |
If somebody has a link to the talkchess thread we should be able to figure out. |
SSE4 has been supported by both Intel and AMD processors released since late 2007. |
I realized the resolution of this is actually easy. Just document that bmi2 also enables sse4. I didn't find any CPU that has bmi2 but not sse4, the minority owning one of those will need to read the docs. I also reordered the order of architectures for x86-64 to have the more performant build one on top. |
Merged via db00e16, thanks! |
@vondele on a xeon 48 threads (not the best CPU to test the speedup) the speedup is very little $ bash bench-parallel.sh ./stockfish_msys2_ss3.exe ./stockfish_msys2_ss4.exe 100
base = 1802654 +/- 10464
test = 1804985 +/- 10492
diff = 2330 +/- 3493
speedup = 0.001293 |
@vondele some food for thoughts on compiler flags:
|
@ppigazzini one option could be to switch to -march=native for the default, and detect available instructions. At least on x86 with gcc this is quite easy. To generate binaries that can be redistributed, one might still need to target a specific architecture, and the list is long https://gcc.gnu.org/onlinedocs/gcc-9.2.0/gcc/x86-Options.html#x86-Options |
@vondele IMO adding a
|
looking at
so at least that's consistent. For speedups, I don't know. It might be pretty specific for the processor version. |
@vondele Makefile could be simplified enabling only the minimum number of flags (-mpopcnt flags are a subset of -msse4 flags). Perhaps better to open a Makefile issue, to make some tests and to solve also the problem of the compiler info #2327 (comment) |
the only change done to the code base to get a somewhat faster binary as discussed in #2291 is to add -msse4 to the compile options of the bmi2 build. Since all processors supporting bmi2 also support sse4 this can be done easily. It is a useful step to avoid sending around custom and poorly tested builds.
The speedup isn't enough to pass [0,4]:
LLR: -2.95 (-2.94,2.94) [0.00,4.00]
Total: 93009 W: 20519 L: 20316 D: 52174
But it is roughly 1.15Elo and a LOS of 90%
No functional change.