Remove super cluster stuff from TT and just use a 128 bit multiply. #2744

mstembera · 2020-06-15T21:17:01Z

Remove super cluster stuff from TT and just use a 128 bit multiply.

STC https://tests.stockfishchess.org/tests/view/5ee719b3aae8aec816ab7548
LLR: 2.94 (-2.94,2.94) {-1.50,0.50}
Total: 12736 W: 2502 L: 2333 D: 7901
Ptnml(0-2): 191, 1452, 2944, 1559, 222

LTC https://tests.stockfishchess.org/tests/view/5ee732d1aae8aec816ab7556
LLR: 2.93 (-2.94,2.94) {-1.50,0.50}
Total: 27584 W: 3431 L: 3350 D: 20803
Ptnml(0-2): 173, 2500, 8400, 2511, 208

Scheme back to being derived from https://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction/ pointed out by @vondele

Also the default optimized version of the index calculation now uses fewer instructions.
https://godbolt.org/z/Tktxbv

The reason MaxHashMB is currently capped to 1073741824 MB is because it's the largest power of two that can be stored in a 32bit int which is what our Options use.

Related PR #2722 #1349

bench: 4320954

protonspring · 2020-06-15T21:29:26Z

Could we use (1 << 30) instead of 1073741824 ? Then you wouldn't have to explain the value to less experienced developers.

mstembera · 2020-06-15T21:50:54Z

@protonspring I think that's reasonable as well. Whatever the maintainers prefer.

vondele · 2020-06-16T06:05:50Z

actually, as soon as one adds bmi2, the compiler will emit mulx instead of mul:
https://godbolt.org/z/MhDHtH
That's presumably what most of noob's machines do. I would assume there is a performance benefit for this new instruction (but couldn't quite figure out). On AMD Zen, we suggest that people stick to 'modern' which doesn't use bmi2, because pext is slower, so they might have a somewhat suboptimal hash access, unless they enable bmi2 but not pext. I don't know how much the performance difference would be.
Edit: I can't measure a difference between -mbmi2 and not on Zen.

concerning the maximum size, I would leave it at the current 32TB, that's still more than current processors support.

bench: 4320954

mstembera · 2020-06-16T11:04:34Z

Ok max hash reverted back to 32TB.

vondele · 2020-06-17T05:47:45Z

Thanks!

mstembera · 2020-06-18T02:11:08Z

Just for completeness if we ever want to optimize for MSVC the code is:
uint64_t highProduct;
_umul128(a, b, &highProduct);
return highProduct;

Remove super cluster stuff from TT and just use 128 bit multiply.

7b960b8

bench: 4320954

Revert max hash back to 32TB.

90a8013

bench: 4320954

vondele added the to be merged Will be merged shortly label Jun 16, 2020

vondele closed this in 1ea488d Jun 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove super cluster stuff from TT and just use a 128 bit multiply. #2744

Remove super cluster stuff from TT and just use a 128 bit multiply. #2744

mstembera commented Jun 15, 2020 •

edited

Loading

protonspring commented Jun 15, 2020 •

edited

Loading

mstembera commented Jun 15, 2020

vondele commented Jun 16, 2020 •

edited

Loading

mstembera commented Jun 16, 2020

vondele commented Jun 17, 2020

mstembera commented Jun 18, 2020 •

edited

Loading

Remove super cluster stuff from TT and just use a 128 bit multiply. #2744

Remove super cluster stuff from TT and just use a 128 bit multiply. #2744

Conversation

mstembera commented Jun 15, 2020 • edited Loading

protonspring commented Jun 15, 2020 • edited Loading

mstembera commented Jun 15, 2020

vondele commented Jun 16, 2020 • edited Loading

mstembera commented Jun 16, 2020

vondele commented Jun 17, 2020

mstembera commented Jun 18, 2020 • edited Loading

mstembera commented Jun 15, 2020 •

edited

Loading

protonspring commented Jun 15, 2020 •

edited

Loading

vondele commented Jun 16, 2020 •

edited

Loading

mstembera commented Jun 18, 2020 •

edited

Loading