-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow for general transposition table sizes. #1341
Conversation
For efficiency reasons current master only allows for transposition table sizes that are N = 2^k in size, the index computation can be done efficiently as (hash % N) can be written instead as (hash & 2^k - 1). On a typical computer (with 4, 8... etc Gb of RAM), this implies roughly half the RAM is left unused in analysis. This issue was mentioned on fishcooking by Mindbreaker: http://tests.stockfishchess.org/tests/view/5a3587de0ebc590ccbb8be04 Recently a neat trick was proposed to map a hash into the range [0,N[ more efficiently than (hash % N) for general N, nearly as efficiently as (hash % 2^k): https://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction/ namely computing (hash * N / 2^32) for 32 bit hashes. This patch implements this trick and now allows for general hash sizes. Note that for N = 2^k this just amounts to using a different subset of bits from the hash. Master will use the lower k bits, this trick will use the upper k bits (of the 32 bit hash). There is no slowdown as measured with [-3, 1] test: http://tests.stockfishchess.org/tests/view/5a3587de0ebc590ccbb8be04 LLR: 2.96 (-2.94,2.94) [-3.00,1.00] Total: 128498 W: 23332 L: 23395 D: 81771 There are two (smaller) caveats: 1) the patch is implemented for a 32 bit hash (so that a 64 bit multiply can be used), this effectively limits the number of clusters that can be used to 2^32 or to 128Gb of transpostion table. That's a change in the maximum allowed TT size, which could bother those using 256Gb or more regularly. 2) Already in master, an excluded move is hashed into the position key in rather simple way, essentially only affecting the lower 16 bits of the key. This is OK in master, since bits 0-15 end up in the index, but not in the new scheme, which picks the higher bits. This is 'fixed' by shifting the excluded move a few bits up. Eventually a better hashing scheme seems wise. Despite these two caveats, I think this is a nice improvement in usability. Bench: 5346341
We should have checked that there is something to gain from the change. I started a LTC test of master with almost optimal hash again master with half optimal hash. |
@mcostalba |
@Stefano80 it is a bit a strange thing to test, at best you get an answer X Mb is better than Y Mb under the conditions tested. It seems unlikely that under all conditions power-of-two hash sizes are the best, especially if there is no speed advantage in accessing them. The obvious use case for general hash seems to be long analysis, where as much hash as available seems naturally beneficial. |
The question is not whether power of 2 hashes are the best. The question is whether there is any play advantage in almost doubling the hash. |
Btw, we did'nt you test LTC? |
@Stefano80 since the only concern was to measure slowdown, STC was enough in my opinion. This is not a change that affects search in any systematic way. The answer is rather obvious in my opinion, if 64Mb gains 10 elo over 32 MB, a hash of 63Mb will be nearly 10 elo over 32Mb. How much elo a doubling of hash provides will depend on the conditions tested. |
Yes, you're most probably right, so let us quickly test it. |
This patch is a slowdown in my machine. |
For efficiency reasons current master only allows for transposition table sizes that are N = 2^k in size, the index computation can be done efficiently as (hash % N) can be written instead as (hash & 2^k - 1). On a typical computer (with 4, 8... etc Gb of RAM), this implies roughly half the RAM is left unused in analysis. This issue was mentioned on fishcooking by Mindbreaker: http://tests.stockfishchess.org/tests/view/5a3587de0ebc590ccbb8be04 Recently a neat trick was proposed to map a hash into the range [0,N[ more efficiently than (hash % N) for general N, nearly as efficiently as (hash % 2^k): https://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction/ namely computing (hash * N / 2^32) for 32 bit hashes. This patch implements this trick and now allows for general hash sizes. Note that for N = 2^k this just amounts to using a different subset of bits from the hash. Master will use the lower k bits, this trick will use the upper k bits (of the 32 bit hash). There is no slowdown as measured with [-3, 1] test: http://tests.stockfishchess.org/tests/view/5a3587de0ebc590ccbb8be04 LLR: 2.96 (-2.94,2.94) [-3.00,1.00] Total: 128498 W: 23332 L: 23395 D: 81771 There are two (smaller) caveats: 1) the patch is implemented for a 32 bit hash (so that a 64 bit multiply can be used), this effectively limits the number of clusters that can be used to 2^32 or to 128Gb of transpostion table. That's a change in the maximum allowed TT size, which could bother those using 256Gb or more regularly. 2) Already in master, an excluded move is hashed into the position key in rather simple way, essentially only affecting the lower 16 bits of the key. This is OK in master, since bits 0-15 end up in the index, but not in the new scheme, which picks the higher bits. This is 'fixed' by shifting the excluded move a few bits up. Eventually a better hashing scheme seems wise. Despite these two caveats, I think this is a nice improvement in usability. Bench: 5346341
For efficiency reasons current master only allows for transposition table sizes that are N = 2^k in size, the index computation can be done efficiently as (hash % N) can be written instead as (hash & 2^k - 1). On a typical computer (with 4, 8... etc Gb of RAM), this implies roughly half the RAM is left unused in analysis. This issue was mentioned on fishcooking by Mindbreaker: http://tests.stockfishchess.org/tests/view/5a3587de0ebc590ccbb8be04 Recently a neat trick was proposed to map a hash into the range [0,N[ more efficiently than (hash % N) for general N, nearly as efficiently as (hash % 2^k): https://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction/ namely computing (hash * N / 2^32) for 32 bit hashes. This patch implements this trick and now allows for general hash sizes. Note that for N = 2^k this just amounts to using a different subset of bits from the hash. Master will use the lower k bits, this trick will use the upper k bits (of the 32 bit hash). There is no slowdown as measured with [-3, 1] test: http://tests.stockfishchess.org/tests/view/5a3587de0ebc590ccbb8be04 LLR: 2.96 (-2.94,2.94) [-3.00,1.00] Total: 128498 W: 23332 L: 23395 D: 81771 There are two (smaller) caveats: 1) the patch is implemented for a 32 bit hash (so that a 64 bit multiply can be used), this effectively limits the number of clusters that can be used to 2^32 or to 128Gb of transpostion table. That's a change in the maximum allowed TT size, which could bother those using 256Gb or more regularly. 2) Already in master, an excluded move is hashed into the position key in rather simple way, essentially only affecting the lower 16 bits of the key. This is OK in master, since bits 0-15 end up in the index, but not in the new scheme, which picks the higher bits. This is 'fixed' by shifting the excluded move a few bits up. Eventually a better hashing scheme seems wise. Despite these two caveats, I think this is a nice improvement in usability. Bench: 5346341
See comment vondele@e467e84 |
For efficiency reasons current master only allows for transposition table sizes that are N = 2^k in size, the index computation can be done efficiently as (hash % N) can be written instead as (hash & 2^k - 1). On a typical computer (with 4, 8... etc Gb of RAM), this implies roughly half the RAM is left unused in analysis.
This issue was mentioned on fishcooking by Mindbreaker:
http://tests.stockfishchess.org/tests/view/5a3587de0ebc590ccbb8be04
Recently a neat trick was proposed to map a hash into the range [0,N[ more efficiently than (hash % N) for general N, nearly as efficiently as (hash % 2^k):
https://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction/
namely computing (hash * N / 2^32) for 32 bit hashes. This patch implements this trick and now allows for general hash sizes. Note that for N = 2^k this just amounts to using a different subset of bits from the hash. Master will use the lower k bits, this trick will use the upper k bits (of the 32 bit hash).
There is no slowdown as measured with [-3, 1] test:
http://tests.stockfishchess.org/tests/view/5a3587de0ebc590ccbb8be04
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 128498 W: 23332 L: 23395 D: 81771
There are two (smaller) caveats:
the patch is implemented for a 32 bit hash (so that a 64 bit multiply can be used), this effectively limits the number of clusters that can be used to 2^32 or to 128Gb of transpostion table. That's a change in the maximum allowed TT size, which could bother those using 256Gb or more regularly.
Already in master, an excluded move is hashed into the position key in rather simple way, essentially only affecting the lower 16 bits of the key. This is OK in master, since bits 0-15 end up in the index, but not in the new scheme, which picks the higher bits. This is 'fixed' by shifting the excluded move a few bits up. Eventually a better hashing scheme seems wise.
Despite these two caveats, I think this is a nice improvement in usability.
Bench: 5346341