Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove super cluster stuff from TT and just use a 128 bit multiply. #2744

Closed
wants to merge 2 commits into from

Conversation

mstembera
Copy link
Contributor

@mstembera mstembera commented Jun 15, 2020

Remove super cluster stuff from TT and just use a 128 bit multiply.

STC https://tests.stockfishchess.org/tests/view/5ee719b3aae8aec816ab7548
LLR: 2.94 (-2.94,2.94) {-1.50,0.50}
Total: 12736 W: 2502 L: 2333 D: 7901
Ptnml(0-2): 191, 1452, 2944, 1559, 222

LTC https://tests.stockfishchess.org/tests/view/5ee732d1aae8aec816ab7556
LLR: 2.93 (-2.94,2.94) {-1.50,0.50}
Total: 27584 W: 3431 L: 3350 D: 20803
Ptnml(0-2): 173, 2500, 8400, 2511, 208

Scheme back to being derived from https://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction/ pointed out by @vondele

Also the default optimized version of the index calculation now uses fewer instructions.
https://godbolt.org/z/Tktxbv

The reason MaxHashMB is currently capped to 1073741824 MB is because it's the largest power of two that can be stored in a 32bit int which is what our Options use.

Related PR #2722 #1349

bench: 4320954

@protonspring
Copy link

protonspring commented Jun 15, 2020

Could we use (1 << 30) instead of 1073741824 ? Then you wouldn't have to explain the value to less experienced developers.

@mstembera
Copy link
Contributor Author

@protonspring I think that's reasonable as well. Whatever the maintainers prefer.

@vondele
Copy link
Member

vondele commented Jun 16, 2020

actually, as soon as one adds bmi2, the compiler will emit mulx instead of mul:
https://godbolt.org/z/MhDHtH
That's presumably what most of noob's machines do. I would assume there is a performance benefit for this new instruction (but couldn't quite figure out). On AMD Zen, we suggest that people stick to 'modern' which doesn't use bmi2, because pext is slower, so they might have a somewhat suboptimal hash access, unless they enable bmi2 but not pext. I don't know how much the performance difference would be.
Edit: I can't measure a difference between -mbmi2 and not on Zen.

concerning the maximum size, I would leave it at the current 32TB, that's still more than current processors support.

@mstembera
Copy link
Contributor Author

Ok max hash reverted back to 32TB.

@vondele vondele added the to be merged Will be merged shortly label Jun 16, 2020
@vondele vondele closed this in 1ea488d Jun 17, 2020
@vondele
Copy link
Member

vondele commented Jun 17, 2020

Thanks!

@mstembera
Copy link
Contributor Author

mstembera commented Jun 18, 2020

Just for completeness if we ever want to optimize for MSVC the code is:
uint64_t highProduct;
_umul128(a, b, &highProduct);
return highProduct;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
to be merged Will be merged shortly
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants