Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up TT store by checking for matches first #170

Closed
wants to merge 1 commit into from

Conversation

Thanar2
Copy link
Contributor

@Thanar2 Thanar2 commented Feb 15, 2014

Pure speed improvement to TranspositionTable::store(). The new code first checks all four TTEntry slots in a cluster for an empty or matching one, avoiding the "replace strategy" code completely in cases where an empty or matching one is found. The loop is unrolled to the constant ClusterSize of 4 for a measurable speed improvement.

The loop that implements the replace strategy only needs to execute 3 times instead of the original 4. Since in the old version, replace and tte started off pointing to the same slot which did nothing in the first iteration.

Standard bench shows approximately 1.4% speed improvement on my machine. A local test of 10,000 games at 10 second time control gave the following results: 1942-1816-6242 [.506].

No functional change.

@mcostalba
Copy link
Owner

Could you please give it a run on fishtest? Thanks.

On Sat, Feb 15, 2014 at 8:36 PM, Fr. Terry Donahue, CC <
notifications@github.com> wrote:

Pure speed improvement to TranspositionTable::store(). The new code first
checks all four TTEntry slots in a cluster for an empty or matching one,
avoiding the "replace strategy" code completely in cases where an empty or
matching one is found. The loop is unrolled to the constant ClusterSize of
4 for a measurable speed improvement.

The loop that implements the replace strategy only needs to execute 3
times instead of the original 4. Since in the old version, replace and tte
started off pointing to the same slot which did nothing in the first
iteration.

Standard bench shows approximately 1.4% speed improvement on my machine. A
local test of 10,000 games at 10 second time control gave the following
results: 1942-1816-6242 [.506].

No functional change.

You can merge this Pull Request by running

git pull https://github.com/Thanar2/Stockfish ttstore

Or view, comment on, or merge it at:

#170
Commit Summary

  • Speed up TT store by checking for matches first

File Changes

Patch Links:


Reply to this email directly or view it on GitHubhttps://github.com//pull/170
.

@mstembera
Copy link
Contributor

Thanar maybe u will find this interesting.

mstembera/Stockfish@2740799...26ef246

@mstembera
Copy link
Contributor

Meant to mention mine failed http://tests.stockfishchess.org/tests/view/523b7ff70ebc59749a54ae48

@Thanar2
Copy link
Contributor Author

Thanar2 commented Feb 15, 2014

Thanks for the heads up on the previous test. Were you able to measure a speed improvement using bench with your version? I did with mine, using profile-builds, but it is always possible to have different cache alignments magnifying or negating any particular minor speed improvement. I'm also curious whether fishtest uses profile-builds or not. Certainly the optimizations make for faster code, but it may be too small to measure.

@mcostalba
Copy link
Owner

Yes, fishtest uses profile builds. I asked to test on fishtest because
different hardware may behave differently, so fishtest is a good mix.

On Sun, Feb 16, 2014 at 12:56 AM, Fr. Terry Donahue, CC <
notifications@github.com> wrote:

Thanks for the heads up on the previous test. Were you able to measure a
speed improvement using bench with your version? I did with mine, using
profile-builds, but it is always possible to have different cache
alignments magnifying or negating any particular minor speed improvement.
I'm also curious whether fishtest uses profile-builds or not. Certainly the
optimizations make for faster code, but it may be too small to measure.


Reply to this email directly or view it on GitHubhttps://github.com//pull/170#issuecomment-35171994
.

@mstembera
Copy link
Contributor

profile-builds tangent @marco
Since (at least on windows machines) the binary is not compiled locally there will be cases where the profiled build from the fishtest machine will be a bad match for the local hardware. Worse than a non profile build would have been. I always thought profiled builds were intended for hardware with identical timings?

@Thanar
I was able to measure a speedup using QueryPerformanceCounter under MSVC but it wasn't enough overall to be conclusive using bench.

@mcostalba
Copy link
Owner

I don't understand what's the reason of using profile builds. In which way
this is connected with the patch? What kind of optimization profile builds
enables that default doesn't ?

On Sun, Feb 16, 2014 at 10:51 AM, mstembera notifications@github.comwrote:

profile-builds tangent @marco https://github.com/Marco
Since (at least on windows machines) the binary is not compiled locally
there will be cases where the profiled build from the fishtest machine will
be a bad match for the local hardware. Worse than a non profile build would
have been. I always thought profiled builds were intended for hardware with
identical timings?

@Thanar https://github.com/Thanar
I was able to measure a speedup using QueryPerformanceCounter under MSVC
but it wasn't enough overall to be conclusive using bench.


Reply to this email directly or view it on GitHubhttps://github.com//pull/170#issuecomment-35181147
.

@mstembera
Copy link
Contributor

I don't know what is the reason for profile builds. You mentioned above that fishtest uses them. I just made a general observation that it seems wrong to me to use them if they are not compiled on the local machine. By tangent I meant not related to this patch in particular.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants