New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up TT store by checking for matches first #170
Conversation
Could you please give it a run on fishtest? Thanks. On Sat, Feb 15, 2014 at 8:36 PM, Fr. Terry Donahue, CC <
|
Thanar maybe u will find this interesting. |
Meant to mention mine failed http://tests.stockfishchess.org/tests/view/523b7ff70ebc59749a54ae48 |
Thanks for the heads up on the previous test. Were you able to measure a speed improvement using bench with your version? I did with mine, using profile-builds, but it is always possible to have different cache alignments magnifying or negating any particular minor speed improvement. I'm also curious whether fishtest uses profile-builds or not. Certainly the optimizations make for faster code, but it may be too small to measure. |
Yes, fishtest uses profile builds. I asked to test on fishtest because On Sun, Feb 16, 2014 at 12:56 AM, Fr. Terry Donahue, CC <
|
profile-builds tangent @marco @Thanar |
I don't understand what's the reason of using profile builds. In which way On Sun, Feb 16, 2014 at 10:51 AM, mstembera notifications@github.comwrote:
|
I don't know what is the reason for profile builds. You mentioned above that fishtest uses them. I just made a general observation that it seems wrong to me to use them if they are not compiled on the local machine. By tangent I meant not related to this patch in particular. |
167a465
to
ef14ba8
Compare
Pure speed improvement to TranspositionTable::store(). The new code first checks all four TTEntry slots in a cluster for an empty or matching one, avoiding the "replace strategy" code completely in cases where an empty or matching one is found. The loop is unrolled to the constant ClusterSize of 4 for a measurable speed improvement.
The loop that implements the replace strategy only needs to execute 3 times instead of the original 4. Since in the old version, replace and tte started off pointing to the same slot which did nothing in the first iteration.
Standard bench shows approximately 1.4% speed improvement on my machine. A local test of 10,000 games at 10 second time control gave the following results: 1942-1816-6242 [.506].
No functional change.