Packed like sardines TT #3

Britvich · 2014-06-19T23:55:23Z

tt_sardines 🐟 supports 50% more entries for a given sized transposition table (TT). Under tournament-style hash pressure, Elo is increased (+7.68 ± 2) as fewer time-consuming position evaluations are needed:

tt_sardines passed both STC (with 4MB hash) and LTC (with 16MB hash) easily. Also, at STC with no hash pressure (with 128MB hash), the TT performs at least as good as it did before:

The old TT used 64-byte clusters, with each cluster containing 4 14-byte entries. tt_sardines uses 32-byte clusters, with each cluster containing 3 10-byte entries.

To reduce each entry from 14 to 10 bytes, some fields in the TTEntry structure have been condensed:

The key field (32-bits) is now just 16-bits.
The generation (8-bits) and bound (8-bits) fields have been combined into a single 8-bit field with 6-bits used for the generation and 2-bits for the bound.
The depth field (16-bits) is now just 8-bits with the depth stored as an offset from the minimum depth DEPTH_NONE (which has been adjusted from -254 to -12). In this way, all depths can be stored as positive uint8 values.

Since only a partial key is stored with each entry, the same entry used to store information about one position might be used to store information about a different position. This produces occasional type-2 errors during probes, where a probe returns information for a wrong position. These rare, but possible, type-2 errant probes are automatically detected with the same logic that detects corrupted entries generated during unsafe threaded access to the TT.

To improve speed, several optimizations have been made:

The number of entries in each cluster has been reduced from 4 to 3.
first_entry only uses 2 machine instructions and no longer uses a multiply.
The probe loop has been unrolled.
store writes the key only when necessary. It also evaluates only the second and third entries as potentially better replacement candidates, since the first entry starts as the best replacement candidate.

Note: No #pragma directives or other modern C++ constructs have been used. The TT contains the same entry fields and data (such as generation) that existed before, but in condensed form. Probes and stores are processed as they were before, just a little faster.

This change supports 50% more entries for a given sized transposition table (`TT`). Elo is increased as fewer time-consuming position evaluations are needed. The old `TT` used 64-byte clusters, with each cluster containing 4 14-byte entries. This change uses 32-byte clusters, with each cluster containing 3 10-byte entries. To reduce each entry from 14 to 10 bytes, some fields in the `TTEntry` structure have been condensed: * The key field (32-bits) is now just 16-bits. * The generation (8-bits) and bound (8-bits) fields have been combined into a single 8-bit field with 6-bits used for the generation and 2-bits for the bound. * The depth field (16-bits) is now just 8-bits with the depth stored as an offset from the minimum depth `DEPTH_NONE` (which has been adjusted from -254 to -12). In this way, all depths can be stored as positive `uint8` values. Since only a partial key is stored with each entry, the same entry used to store information about one position might be used to store information about a different position. This produces occasional type-2 errors during probes, where a probe returns information for a wrong position. These rare, but possible, type-2 errant probes are automatically detected with the same logic that detects corrupted entries generated during unsafe threaded access to the `TT`. To improve speed, several optimizations have been made: * The number of entries in each cluster has been reduced from 4 to 3. * `first_entry` only uses 2 machine instructions and no longer uses a multiply. * The `probe` loop has been unrolled. * `store` writes the key only when necessary. It also evaluates only the second and third entries as potentially better replacement candidates, since the first entry starts as the best replacement candidate.

lucasart · 2014-06-20T07:07:28Z

I'm against merging this. but let's try this majority vote amongst maintainers decide. @zamar, @glinscott: what's your take on this?

Reasons I don't like it:

it introduces some ugly hacks for the sake of speed optimization, that were put there in an attempt to make the whole lot pass.
depth are assumed to never be lower than -6 plies, yet nothing in the qs imposes a depth limit (only a ply limit, and recaptures past a certain depth).
the idea of using 16bits, having loads of collisions, and saying we don't care and hope for the safety net of legal move detection to catch those is something I personally don't like.
there has been far too many tests on that subject. the patch author ignored marco's comments and pursued his TT packing obsession, spending endless framework resources on it. If we cave in, and accept this to not piss him off, we send the wrong signal, and we will have other people doing the same thing (ie. Uri-style trolling framework endlessly and whining until his stuff gets committed). it's important we don't cave and make it clear that this tactic will not work.

zamar · 2014-06-20T08:57:58Z

I'm also against merging this.

Marco was against this (for good a reason), because the patch is so ugly.
Patch author is ignoring all the feedback and spamming the test queue, and we don't want to
encourage such a behaviour.

On Fri, Jun 20, 2014 at 8:07 AM, lucasart notifications@github.com wrote:

I'm against merging this. but let's try this majority vote amongst
maintainers decide. @zamar https://github.com/zamar, @glinscott
https://github.com/glinscott: what's your take on this?

Reasons I don't like it:

it introduces some ugly hacks for the sake of speed optimization,
that were put there in an attempt to make the whole lot pass.

depth are assumed to never be lower than -6 plies, yet nothing in
the qs imposes a depth limit (only a ply limit, and recaptures past a
certain depth).

the idea of using 16bits, having loads of collisions, and saying we
don't care and hope for the safety net of legal move detection to catch
those is something I personally don't like.

there has been far too many tests on that subject. the patch author
ignored marco's comments and pursued his TT packing obsession, spending
endless framework resources on it. If we cave in, and accept this to not
piss him off, we send the wrong signal, and we will have other people doing
the same thing (ie. Uri-style trolling framework endlessly and whining
until his stuff gets committed). it's important we don't cave and make it
clear that this tactic will not work.

—
Reply to this email directly or view it on GitHub
#3 (comment)
.

lucasart · 2014-06-20T10:42:10Z

OK. Case closed. 2/3 is majority vote.

That being said some valid points have been raised

testing with 128mb hash is stupid. already we know that 16 is enough for STC (no measurable elo diff). at the very least, we should be using 16 for stc and 64 for ltc.
the current tt implementation is improvable. perhaps we should try something simple and not so ugly: just keep the tt entries and replacement logic as they are, but use 2 entries by cluster (32 byte clusters, 2 by cache line). Has this been tried?

zamar · 2014-06-20T11:56:15Z

It's difficult to decide the hash size to test. For a long time, I
believed that "bigger was better", but this doesn't seem to be the case at
all. We should select something that suits for all time controls, and I
think 128mb is quite a good choice. Of course if you're testing
improvements for TT, it's fine to use a smaller hash size...
Sure, post a patch :-)

On Fri, Jun 20, 2014 at 11:42 AM, lucasart notifications@github.com wrote:

OK. Case closed. 2/3 is majority vote.

That being said some valid points have been raised

testing with 128mb hash is stupid. already we know that 16 is enough
for STC (no measurable elo diff). at the very least, we should be using 16
for stc and 64 for ltc.

the current tt implementation is improvable. perhaps we should try
something simple and not so ugly: just keep the tt entries and replacement
logic as they are, but use 2 entries by cluster (32 byte clusters, 2 by
cache line). Has this been tried?

—
Reply to this email directly or view it on GitHub
#3 (comment)
.

glinscott · 2014-06-20T14:57:11Z

I don't think Ron was spamming the framework with these tests. He was trying different variations on the theme to try and find the best way to do it. I think that should be encouraged, rather than be called spamming.

Yes, the idea is a fairly big change from the way hash table is done today, but I really don't see how it's conflating different ideas. If you change the hash table structure in the way he did, you pretty much have to make most of the changes in this patch.

The one big concern I have with the patch is reducing the key from 32 bits to 16 bits. This will lead to many more hash collisions. That being said, when he ran tests with hash pressure, it seemed to be fine. Each bucket has 3 elements, so a 3/65536 chance of having a collision when full. If the ttable is overwritten 2-3x times per search, this is actually a fairly small chance of a collision that will have any meaningful effect.

The restrictions on depth is the other concern, I haven't had time to analyze that.

But otherwise, I think there is more good in this patch then bad. It's also really quite a small change. He also rewrote his original version to stay more within the coding guidelines.

zamar · 2014-06-20T15:56:37Z

Since one of use disagrees, it's probably best that I'll take a longer look
at this patch during the weekend...

Just 8 bits for depth is quite severe restriction.
More hash collisions can cause strange non-reproducible bugs.

I'm not convinced that 2-3 elo points would be worth this... But I'll
reconsider my opinion.

On Fri, Jun 20, 2014 at 3:57 PM, Gary Linscott notifications@github.com
wrote:

I don't think Ron was spamming the framework with these tests. He was
trying different variations on the theme to try and find the best way to do
it. I think that should be encouraged, rather than be called spamming.

Yes, the idea is a fairly big change from the way hash table is done
today, but I really don't see how it's conflating different ideas. If you
change the hash table structure in the way he did, you pretty much have to
make most of the changes in this patch.

The one big concern I have with the patch is reducing the key from 32 bits
to 16 bits. This will lead to many more hash collisions. That being said,
when he ran tests with hash pressure, it seemed to be fine. Each bucket has
3 elements, so a 3/65536 chance of having a collision when full. If the
ttable is overwritten 2-3x times per search, this is actually a fairly
small chance of a collision that will have any meaningful effect.

The restrictions on depth is the other concern, I haven't had time to
analyze that.

But otherwise, I think there is more good in this patch then bad. It's
also really quite a small change. He also rewrote his original version to
stay more within the coding guidelines.

—
Reply to this email directly or view it on GitHub
#3 (comment)
.

lucasart · 2014-06-20T17:00:11Z

@zamar: regarding hash, it was demonstrated on fishtest that 16 vs 128 is equal at STC. so the hash to use is at most 16 for stc. beyond that it's just useless, even harmful in fact (32 vs 128 was a measurable gain). so really it should be, at most, 16 for stc and 64 for ltc. And even less would be better because it puts a bit more hash pressure which simulates real life conditions a little bit closer. On that I fully agree with Ron, and disagree with Marco (and you apparently).

@glinscott: what happens to a QS depth of -7 in Ron's patch? Well, it rolls up modulo 256 to some 120 or so ply, meaning that in endgame positions where you reach the max depth some funny stuff can happen, with deep qs entries being used near the root...

I casted my vote, won't change it. Up to you guys to decide. Whatever you choose I will be happy and won't complain, promised :)

sf-x · 2014-06-20T17:15:18Z

@lucasart
https://github.com/official-stockfish/Stockfish/blob/master/src/search.cpp#L1069
so deep QS is not a problem
I was hoping that new maintainers would actually check their claims regards
X

glinscott · 2014-06-20T19:31:54Z

I've queued up a test that approximates TCEC conditions. http://tests.stockfishchess.org/tests/view/53a48b7f0ebc5975754cb287

TCEC conditions are 16Gb hash table, with 16 threads, so simplify down to 1Gb/thread. Then average searching time of 3 hours, so 1Gb / (3 * 60) ~= 5Mb. I used 4 since we use power of two anyway.

One other interesting test might be testing using an increment since more time is spent in the endgame, where the hash table is more critical. I will queue that up at low priority.

Britvich · 2014-06-20T22:57:27Z

There has been some concern about type-2 errors during probes with only a 16-bit intracluster key. Therefore, I instrumented the TT to store the full 64-bit Zobrist key and measured the false probe rate under various levels of hash pressure. Here's the number of type-2 errors per million probes vs the percentage of hash fullness (pressure = nodes searched / entries in hash table):

Under low pressure (<100% full), there are no more than about 5 type-2 errors per million probes. As the pressure increases, the error rate starts a downward curve, following the function:

Even under tournament-like pressure (~300% full), there's only about 13 type-2 errors per million probes.

Also, TT.store() is never called with a depth less than DEPTH_NONE (-12).

glinscott · 2014-06-21T00:13:29Z

@Britvich, thanks for the analysis! The tests are looking extremely promising. It appears that collisions are not really an issue any more for these TC/hash conditions, which are fairly stressful for the hash table.

One thing that could be interesting to consider is where to steal some extra bits to increase the key length.

There are a few possibilities I can see:

Shrinking the generation, we probably only need 2-3 bits for generation, as after 4-8 moves, most of the old TT entries are probably long overwritten. Currently generation has 6 bits, so if we got down to 2 bits, we get 4 bits for the key.
Using the 2 bytes of padding per cluster. That gives 5 bits extra (they are a pain to access though...). But those 5 bits would give us 32x less collisions! It's a tricky tradeoff.
Something a little more radical, like shrinking the evaluation down from 16 bits to 8-10 bits. This would be done with a lookup table along some sort of curve, so we keep evaluation fidelity close to zero, but for big evaluation values it's less important. I don't think this is worth it though.

If we are at 13 type-2 errors at 300% full, then if we add just a few more bits, we will be down to 0-1 errors per million probes, which would greatly increase my confidence.

I'd be in favor of option number 2, even though it would add a bit more code complexity.

glinscott · 2014-06-21T00:17:44Z

src/tt.cpp

-      if (!tte->key32 || tte->key32 == key32) // Empty or overwrite old
+
+      // Empty entry?
+      if (!tte->key)


I think we should remove this if block. tte->key will be 0 a lot of the time for valid keys as well, so we could replace an entry with high depth here.

But maybe this is worth a test against itself, as it will be difficult to measure the effect against normal SF.

Yes, I agree. Good point.

You are correct, store will always overwrite a 0-key, even if it's not the best candidate for replacement. Even though this 0-key check allows the TT to fill faster (as it breaks immediately upon finding an "empty" entry), that probably doesn't buy much under tournament-like conditions with a full TT.

lucasart · 2014-06-21T05:33:59Z

I was wrong about the QS depth problem, and overlooked the fact that QS stores only 0 and -1 ply entries in fact. So the depth is not an issue. It introduces some rigidity though:

we can't remove the QS depth trick
we can't increase max ply
I think these limitations are not a problem though. We'll need an assert to enforce that depth is within bounds, in case someone screws that up (by changing tt store depth in qs, or increasing max ply).

The 16 bit still seems like an heresy, but OTOH, no test so far has shown that is causes problems. Maybe we can live with it, and if one day it causes issues, we can either:

revert back to previous TT implementation.
use 5 extra bits per entry from the 2 padding bytes, as suggested by Gary. might be a bit ugly codewise, but shouldn't be a problem performance wise, since we have prefetched the cacheline containing the 32 byte cluster.

Just one thing though. The loop unrolling is ugly, and I think we should test without it, to compare apples and apples. If the gain of the patch is partly driven by that, we can just do the loop unrolling without messing up the TT structure.

lucasart · 2014-06-21T05:40:25Z

src/tt.cpp


-  return NULL;
+  if (tte->key != key16 && (++tte)->key != key16 && (++tte)->key != key16)


This code leads to undefined behaviour. Never use several ++ operators on the same variable within an expression. Simply use tte[1] and tte[2] instead.

@lucasart, that is incorrect. It doesn't lead to "undefined behavior". According to ISO/IEC C++ Standard 14882 §5.14 Logical AND operator:

The && operator groups left-to-right. The operands are both implicitly converted to type bool (clause 4). The result is true if both operands are true and false otherwise. Unlike &, && guarantees left-to-right evaluation: the second operand is not evaluated if the first operand is false.

Also, I want tte pointing to the matching entry when/if the statement gets short-circuited. Using tte[1] and tte[2] doesn't accomplish that.

Ah yes, you're right. The lazy boolean evaluation is guaranteed by the standard, so no compiler can screw up that code.

Britvich · 2014-06-21T06:14:32Z

@glinscott, Good ideas...

One thing that could be interesting to consider is where to steal some extra bits ~~to increase the key length~~.

I like stealing more bits, so that we can double the number of hash entries. If we could squish down each entry to 8 bytes, it would probably add a lot more Elo than if we used those extra bits to reduce the already rare type-2 errors that don't really cost any Elo.

Of course, the trade-off is more packing/unpacking vs speed. I'm amazed how sensitive probe is. It's an ultra hotspot. It's also part of the reason I started working on the TT first.

Shrinking the generation, we probably only need 2-3 bits for generation, as after 4-8 moves, most of the old TT entries are probably long overwritten. Currently generation has 6 bits, so if we got down to 2 bits, we get 4 bits ~~for the key~~.

I like and agree.

Using the 2 bytes of padding per cluster. That gives 5 bits extra (they are a pain to access though...). ~~But those 5 bits would give us 32x less collisions!~~ It's a tricky tradeoff.

Yes, it relatively easy to take advantage of those bytes. If each entry was 8 bytes, and there were 4 entries/cluster, then 4 bits would become available for each entry.

Something a little more radical, like shrinking the evaluation down from 16 bits to 8-10 bits. This would be done with a lookup table along some sort of curve, so we keep evaluation fidelity close to zero, but for big evaluation values it's less important. I don't think this is worth it though.

Yes, something along the lines of a 10-bit minifloat would work. That would make available 6 more bits twice.

Each of the 4 8-byte entries per cluster would look like:

key	move	eval	static eval	gen	bnd	depth
16	16	10	10	2	2	8
`''''''''''''''''`	`''''''''''''''''`	`''''''''''`	`''''''''''`	`''`	`''`	`''''''''`

This would take a little work, but my estimate is that it would improve Elo more than EGTBs do, with about 2,800 less lines of code and a lot less disk space. 😄

Britvich · 2014-06-21T07:03:53Z

@lucasart,

The loop unrolling is ugly, and I think we should test without it, to compare apples and apples. If the gain of the patch is partly driven by that, we can just do the loop unrolling without messing up the TT structure.

Are you saying:

If the loop optimization is what is causing most of the Elo gain, then we don't need to be "messing up the TT structure?"

If so, 😟 the main Elo gain is coming from the increase in the number of entries (1.5x) for a given hash size. That requires "messing up the TTEntry structure". The optimizations are there to try to keep the speed up when there's no hash pressure (as the packing/unpacking code is slightly slower). Do you want to remove the optimizations?

If that's not what you're saying, then, never mind. 😌

zamar · 2014-06-22T11:27:13Z

Today I spent around 1 hour understanding and analyzing this patch and I've made up my mind...

The potential ELO increase for this patch is big enough that it outweights the potential problems that will arise from key conflicts. So in principal, I think that this is the way to go...

However the patch in its current form is not commitable:

It's again a mix of actual changes + speed optimization attempts which badly uglify the code for no proven benefit (TranspositionTable::store() function especially).
Instead of updating existing documentation/comments in code, the patch author has simply removed them.

Until these issues are addressed, I'm against committing this patch.

One of the reasons that Marco got tired was that people were writing low quality patches (not following existing coding standards), and after those patches passed in testing framework, they expected Marco to clean up the mess. This must stop. After the patch has passed in testing framework, it's the responsibility of the patch author to clean up his patch, so that it's actually commitable.

Of course, if patch author refuses to clean up his mess, anyone can step in instead and do the work for him. Even I might do this at some point in the future, but not during this weekend...

Britvich · 2014-06-22T13:23:11Z

@zamar, before you spent time "understanding and analyzing this patch", I thought you already made up your mind...

I'm also against merging this.

Marco was against this (for good a reason), because the patch is so ugly.

Patch author is ignoring all the feedback and spamming the test queue, and we don't want to
encourage such a behaviour.

It negatively affects my reputation when you make false claims such as:

"Patch author [that's me, my name is Ron Britvich] is ignoring all the feedback"
Please see (if you haven't already) Packed like sardines TT mcostalba/Stockfish#243 where I go through Marco's comments, suggestions, and cautions, one-by-one, showing how I listened and made the changes he requested.
"Patch author [me again] is spamming the test queue"
This is a significant change, as can seen by all the tests being run on tt_sardines in the last few days (by the way, I haven't run a single fishtest in almost a week). During my testing, I experimented with multiple ways to compress the TT and several new replacement algorithms. While I may have run some ill-advised tests early on (while learning the ropes), I never deliberately attempted to abuse the fishtest framework.

However the patch in its current form is not commitable:

It's again a mix of actual changes + speed optimization attempts which badly uglify the code for no proven benefit (TranspositionTable::store() function especially).

The changes I made aren't just "speed optimization attempts", they were the result of extensive testing, instrumentation, examination of generated machine code, and performance profiling and analysis using Visual Studio Ultimate. Your opinion may be that they "uglify" things, but to say they have "no proven benefit" is just false. Take for example the code in store that implements the replacement strategy:

if ((tte->gen() == generation || tte->bound() == BOUND_EXACT)
 +          - (replace->gen() == generation)
 +          - (tte->depth8 < replace->depth8) < 0)

That statement is actually pretty expensive (and a hotspot as shown by performance profiling). It is also executed unnecessarily (the code performs no function, except to waste cycles) against the first entry of each cluster. Just on face value, most programmers see that moving that code (or "uglifying" it, to use your term) so that it only considers the second and third entries in a cluster, as better candidates for replacement, is a "proven benefit" (at least that's what my "spamming [of] the test queue" has shown).

Instead of updating existing documentation/comments in code, the patch author has simply removed them.

I only removed one comment: before the definition of the struct TTEntry, as I thought the struct definition itself was self-documenting. I'll gladly put that back in if you wish. No other comments were harmed during the making of the new TT (American Humane Association).

Of course, if patch author refuses to clean up his mess, anyone can step in instead and do the work for him

Ron (the "test queue spamming, not listening, code uglifying, in general, making a mess") Britvich

zamar · 2014-06-22T16:54:36Z

Hi Ron,

Each change to source code should be verified individually. It might be that in the end the code ends up to look like just as it is written in your patch.

If that condition is really a speed critical spot, then we should be able to verify the speed change individually. If we gain >= 1-2% speed by introducing the change, then we are going to do it. If not, then we are not going to do it...

That's why it's important that you don't post huge patches, because it's very difficult for others verify the changes. Instead you split the patches into smaller ones. Then you can say: patch A improves speed by 2%, patch B is a functional change which gains 4 ELO points proven by these tests.

Then we verify the results and commit it. So many times people have posted here 50 lines long patches and asked us to commit them. And in the end, they were reduced to 5 lines or less...

Britvich · 2014-06-23T02:32:57Z

Hi @zamar,

You know what "should be verified individually" first? Statements you make about people. Once you have those straight, perhaps looking at the code and test results too before declaring judgement on someone's 🐟 contributions:

I'm also against merging this.

I'm not convinced that 2-3 elo points would be worth this

...might gain you some respect points ✨ as a "moderator".

Fixes a race when ucinewgame is issued 'after' a search, as seen for a thread-sanitized binary issuing : ``` go depth 5 [wait for bestmove ..] ucinewgame ``` leading to ``` WARNING: ThreadSanitizer: data race (pid=5148) Write of size 4 at 0x7fbf5dcf6af8 by main thread: #0 __gnu_cxx::__enable_if<!std::__is_scalar<Move>::__value, void>::__type std::__fill_a<Move*, Move>(Move*, Move*, Move const&) <null> (stockfish+0x000000462884) #1 void std::fill<Move*, Move>(Move*, Move*, Move const&) /usr/include/c++/5/bits/stl_algobase.h:747 (stockfish+0x0000004618cc) #2 StatBoards<16, 64, Move>::fill(Move const&) src/movepick.h:36 (stockfish+0x00000046084f) #3 Search::clear() src/search.cpp:194 (stockfish+0x000000451732) #4 newgame src/uci.cpp:145 (stockfish+0x0000004738a0) #5 UCI::loop(int, char**) src/uci.cpp:200 (stockfish+0x000000473d7d) #6 main src/main.cpp:47 (stockfish+0x000000433322) Previous write of size 4 at 0x7fbf5dcf6af8 by thread T1: #0 update_stats src/search.cpp:1417 (stockfish+0x000000453ecb) #1 search<(<unnamed>::NodeType)0u> src/search.cpp:1114 (stockfish+0x00000045c4c6) #2 search<(<unnamed>::NodeType)1u> src/search.cpp:997 (stockfish+0x000000457658) #3 search<(<unnamed>::NodeType)1u> src/search.cpp:1019 (stockfish+0x000000457a50) #4 search<(<unnamed>::NodeType)1u> src/search.cpp:1019 (stockfish+0x000000457a50) #5 Thread::search() src/search.cpp:402 (stockfish+0x000000452e28) #6 MainThread::search() src/search.cpp:264 (stockfish+0x000000451c32) #7 Thread::idle_loop() src/thread.cpp:114 (stockfish+0x000000468c90) ``` The solution adopted is to split the uci commands in those that can be issued during search (quit, stop, ponderhit, isready, uci) and those that should not run during search (ucinewgame, setoption, .. ). Waiting for search to finish was previously done only for a 'go'. This fixes the above race. There are two side effects, one possibly negative, depending on how a gui deals with uci (or a user at the cli). - crashes are avoided (e.g. resizing hash during search). - as earlier commands can block on the search to be completed, 'stop' can be non-effective. This behavior is already present on master : ``` go depth 20 go depth 5 stop ``` will wait for depth 20 to be completed before stopping depth 5. This behavior now shows for other commands as well. As such some gui testing of this patch would be useful. No functional change.

limits.ponder was used as a signal between uci and search threads, but is not an atomic variable, leading to the following race as flagged by a sanitized binary. Expect input: ``` spawn ./stockfish send "uci\n" expect "uciok" send "setoption name Ponder value true\n" send "go wtime 4000 btime 4000\n" expect "bestmove" send "position startpos e2e4 d7d5\n" send "go wtime 4000 btime 4000 ponder\n" sleep 0.01 send "ponderhit\n" expect "bestmove" send "quit\n" expect eof ``` Race: ``` WARNING: ThreadSanitizer: data race (pid=7191) Read of size 4 at 0x0000005c2260 by thread T1: #0 MainThread::check_time() src/search.cpp:1488 (stockfish+0x000000454471) #1 search<(<unnamed>::NodeType)0u> src/search.cpp:566 (stockfish+0x0000004594e0) #2 search<(<unnamed>::NodeType)0u> src/search.cpp:997 (stockfish+0x00000045bfb6) #3 search<(<unnamed>::NodeType)0u> src/search.cpp:1006 (stockfish+0x00000045c1a3) #4 search<(<unnamed>::NodeType)1u> src/search.cpp:997 (stockfish+0x000000457658) #5 Thread::search() src/search.cpp:402 (stockfish+0x000000452e28) #6 MainThread::search() src/search.cpp:264 (stockfish+0x000000451c32) #7 Thread::idle_loop() src/thread.cpp:114 (stockfish+0x000000468c90) Previous write of size 4 at 0x0000005c2260 by main thread: #0 UCI::loop(int, char**) src/uci.cpp:193 (stockfish+0x000000473c9b) #1 main src/main.cpp:47 (stockfish+0x000000433322) Location is global 'Search::Limits' of size 88 at 0x0000005c2220 (stockfish+0x0000005c2260) ``` The fix is to add an atomic bool to the threads structure to signal the ponder status, letting Search::Limits to reflect just what was passed to 'go'. No functional change.

…lude Bishops/Knights Hit #0: Total 18031 Hits 335 Hit Rate (%) 1.85791 King Hit #1: Total 18031 Hits 4616 Hit Rate (%) 25.6004 Queen Hit #2: Total 18031 Hits 3916 Hit Rate (%) 21.7182 Rook Hit official-stockfish#3: Total 18031 Hits 2766 Hit Rate (%) 15.3402 Bishop Hit official-stockfish#4: Total 18031 Hits 6398 Hit Rate (%) 35.4833 Knight bench: 4868464

Fixed logic. bench: 1241996

lucasart closed this Jun 20, 2014

lucasart reopened this Jun 20, 2014

glinscott reviewed Jun 21, 2014
View reviewed changes

lucasart reviewed Jun 21, 2014
View reviewed changes

glinscott closed this in ccd823a Jun 28, 2014

LouisZulli mentioned this pull request Sep 4, 2015

Minsplitdepth2 maxslaves7 #418

Closed

ajithcj mentioned this pull request Sep 20, 2015

MinimumSplitDepth=2 Max Slaves 7(fixed for crashes) #428

Closed

vondele mentioned this pull request Oct 16, 2015

why is Signals volatile ? #431

Closed

ajithcj referenced this pull request in ajithcj/Stockfish Jul 21, 2016

Move tropisim code inside evaluate_king

bb09247

Rocky640 referenced this pull request in mojinumber1/Stockfish Jan 9, 2017

Tuning Backward Threat score for see is it an good idea really or not

5e68241

protonspring mentioned this pull request Nov 5, 2018

Replace/upgrade Passed Danger Array #1800

Closed

Rocky640 referenced this pull request in xoto10/stockfish-xoto10 Dec 15, 2018

mg penalty for their pawns near our king, 50,0. bench 3223280

f9cd4bc

FauziAkram mentioned this pull request Jun 4, 2019

Fauzi b20 #2177

Closed

protonspring referenced this pull request in protonspring/Stockfish Jul 23, 2019

no lmr for thread #3

017bd37

mooskagh mentioned this pull request Sep 16, 2019

Fix race where it may cause a hang #2307

Closed

vondele mentioned this pull request May 20, 2020

Keep low ply history from previous move #2688

Closed

noobpwnftw mentioned this pull request Aug 22, 2020

NNUE ideas and discussion (post-merge). #2915

Closed

mattginsberg mentioned this pull request Jan 20, 2021

Genbits #3313

Closed

gvreuls mentioned this pull request Jun 3, 2021

Fix StateInfo leak #3523

Closed

pb00067 referenced this pull request in pb00068/Stockfish Oct 24, 2023

Don't calculate X-ray attackers if not neccessary. Can this be faster?

02a5e3d

Fixed logic. bench: 1241996

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Packed like sardines TT #3

Packed like sardines TT #3

Britvich commented Jun 19, 2014

lucasart commented Jun 20, 2014

zamar commented Jun 20, 2014

lucasart commented Jun 20, 2014

zamar commented Jun 20, 2014

glinscott commented Jun 20, 2014

zamar commented Jun 20, 2014

lucasart commented Jun 20, 2014

sf-x commented Jun 20, 2014

glinscott commented Jun 20, 2014

Britvich commented Jun 20, 2014

glinscott commented Jun 21, 2014

glinscott Jun 21, 2014

Britvich Jun 21, 2014

lucasart commented Jun 21, 2014

lucasart Jun 21, 2014

Britvich Jun 21, 2014

lucasart Jun 21, 2014

Britvich commented Jun 21, 2014

Britvich commented Jun 21, 2014

zamar commented Jun 22, 2014

Britvich commented Jun 22, 2014

zamar commented Jun 22, 2014

Britvich commented Jun 23, 2014


		return NULL;
		if (tte->key != key16 && (++tte)->key != key16 && (++tte)->key != key16)

Packed like sardines TT #3

Packed like sardines TT #3

Conversation

Britvich commented Jun 19, 2014

lucasart commented Jun 20, 2014

zamar commented Jun 20, 2014

lucasart commented Jun 20, 2014

zamar commented Jun 20, 2014

glinscott commented Jun 20, 2014

zamar commented Jun 20, 2014

lucasart commented Jun 20, 2014

sf-x commented Jun 20, 2014

glinscott commented Jun 20, 2014

Britvich commented Jun 20, 2014

glinscott commented Jun 21, 2014

glinscott Jun 21, 2014

Choose a reason for hiding this comment

Britvich Jun 21, 2014

Choose a reason for hiding this comment

lucasart commented Jun 21, 2014

lucasart Jun 21, 2014

Choose a reason for hiding this comment

Britvich Jun 21, 2014

Choose a reason for hiding this comment

lucasart Jun 21, 2014

Choose a reason for hiding this comment

Britvich commented Jun 21, 2014

Britvich commented Jun 21, 2014

zamar commented Jun 22, 2014

Britvich commented Jun 22, 2014

zamar commented Jun 22, 2014

Britvich commented Jun 23, 2014