Skip to content

Commit

Permalink
Defer hash table resize and avoid redundant clears
Browse files Browse the repository at this point in the history
Defer hash table resize until the hash is actually needed. That is,
either when 'isready' or a search start command is received. This has
the following benefits:

- Stockfish initializes faster, especially on big hash sizes, since
  setting 'Threads' and 'Hash' will now trigger only 1 hash
  resize. Further, this removes the speed difference related to the
  order of setting "Hash" and "Threads".

- The initial default hash is never allocated in case the user sets
  the hash table size.

Add also tracking for whether the hash is dirty to avoid redundant
clears. This speeds up the initialization further.

With these improvements, Stockfish initialization is improved
drastically in the TCEC "Blue" machine: (176 threads, 128 GB hash)

Case  Init sequence style (SF ver)   Process launch to search start
===================================================================
 (1)  Default Cutechess (SF-dev)     59.0641s
 (2)  TCEC Cutechess-cli (SF-dev)     8.980s
 (3)  Either (SF-dev + this patch)    4.853s

The init sequence used in testing is as follows: (Default Cutechess style)

  uci
  setoption name Hash value 131072
  setoption name Threads value 176
  setoption name OwnBook value false
  setoption name Ponder value false
  ucinewgame
  isready
  position startpos
  isready
  go depth 1

The TCEC-patched Cutechess sets "Threads" before "Hash", since this
order is much faster on many engines, including SF-dev before this
patch.

Similarly, many regular users will also encounter speedups. The above
init sequence on my box measures the following times for the different
cases: (Intel Core i7-8700k, 12 threads, 16 GB hash)

 (1)  2.347s
 (2)  1.430s
 (3)  0.975s

This patch makes my chess GUI (chessx) also start the analysis
noticeably faster. Response time improves from approx 2s to 1s from
pressing 'Analyze' to first output. (Same i7-8700k, 11 threads, 16 GB
hash)

No functional change
  • Loading branch information
skiminki committed May 9, 2020
1 parent fcaf073 commit f078e43
Show file tree
Hide file tree
Showing 5 changed files with 48 additions and 24 deletions.
3 changes: 0 additions & 3 deletions src/thread.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -150,9 +150,6 @@ void ThreadPool::set(size_t requested) {
push_back(new Thread(size()));
clear();

// Reallocate the hash with the new threadpool size
TT.resize(Options["Hash"]);

// Init thread number dependent search params.
Search::init();
}
Expand Down
38 changes: 26 additions & 12 deletions src/tt.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -59,22 +59,31 @@ void TTEntry::save(Key k, Value v, bool pv, Bound b, Depth d, Move m, Value ev)
/// measured in megabytes. Transposition table consists of a power of 2 number
/// of clusters and each cluster consists of ClusterSize number of TTEntry.

void TranspositionTable::resize(size_t mbSize) {
void TranspositionTable::resizeIfChanged() {

Threads.main()->wait_for_search_finished();
const size_t optMemMb = Options["Hash"];
const size_t optNumThreads = Options["Threads"];

free(mem);

clusterCount = mbSize * 1024 * 1024 / sizeof(Cluster);
table = static_cast<Cluster*>(aligned_ttmem_alloc(clusterCount * sizeof(Cluster), mem));
if (!mem)
if (optMemMb != memMb || optNumThreads != numThreads)
{
std::cerr << "Failed to allocate " << mbSize
<< "MB for transposition table." << std::endl;
exit(EXIT_FAILURE);
}
Threads.main()->wait_for_search_finished();
free(mem);

memMb = optMemMb;
numThreads = optNumThreads;

clear();
clusterCount = memMb * 1024 * 1024 / sizeof(Cluster);
table = static_cast<Cluster*>(aligned_ttmem_alloc(clusterCount * sizeof(Cluster), mem));
if (!mem)
{
std::cerr << "Failed to allocate " << memMb
<< "MB for transposition table." << std::endl;
exit(EXIT_FAILURE);
}

markDirty(); // resized hash always needs clearing
clear();
}
}


Expand All @@ -83,6 +92,9 @@ void TranspositionTable::resize(size_t mbSize) {

void TranspositionTable::clear() {

if (!dirty)
return; // don't clear, hash already clean

std::vector<std::thread> threads;

for (size_t idx = 0; idx < Options["Threads"]; ++idx)
Expand All @@ -105,6 +117,8 @@ void TranspositionTable::clear() {

for (std::thread& th: threads)
th.join();

dirty = false;
}

/// TranspositionTable::probe() looks up the current position in the transposition
Expand Down
14 changes: 11 additions & 3 deletions src/tt.h
Original file line number Diff line number Diff line change
Expand Up @@ -79,8 +79,10 @@ class TranspositionTable {
void new_search() { generation8 += 8; } // Lower 3 bits are used by PV flag and Bound
TTEntry* probe(const Key key, bool& found) const;
int hashfull() const;
void resize(size_t mbSize);
void clear();

void resizeIfChanged(); // trigger resize if options changed
void clear(); // clear if hash is dirty
void markDirty() { dirty = true; } // mark the hash dirty

// The 32 lowest order bits of the key are used to get the index of the cluster
TTEntry* first_entry(const Key key) const {
Expand All @@ -92,7 +94,13 @@ class TranspositionTable {

size_t clusterCount;
Cluster* table;
void* mem;
void* mem = nullptr;
bool dirty = false;

// Current TT config values -- used to trigger resize in resizeIfChanged()
size_t memMb = 0;
size_t numThreads = 0;

uint8_t generation8; // Size must be not bigger than TTEntry::genBound8
};

Expand Down
12 changes: 9 additions & 3 deletions src/uci.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,7 @@ namespace {
bool ponderMode = false;

limits.startTime = now(); // As early as possible!
TT.markDirty(); // search makes the hash dirty

while (is >> token)
if (token == "searchmoves") // Needs to be the last command on the line
Expand Down Expand Up @@ -169,7 +170,12 @@ namespace {
}
else if (token == "setoption") setoption(is);
else if (token == "position") position(pos, is, states);
else if (token == "ucinewgame") { Search::clear(); elapsed = now(); } // Search::clear() may take some while
else if (token == "ucinewgame")
{
TT.resizeIfChanged();
Search::clear();
elapsed = now(); // TT.resizeIfChanged() may take a while
}
}

elapsed = now() - elapsed + 1; // Ensure positivity to avoid a 'divide by zero'
Expand Down Expand Up @@ -228,10 +234,10 @@ void UCI::loop(int argc, char* argv[]) {
<< "\nuciok" << sync_endl;

else if (token == "setoption") setoption(is);
else if (token == "go") go(pos, is, states);
else if (token == "go") { TT.resizeIfChanged(); go(pos, is, states); }
else if (token == "position") position(pos, is, states);
else if (token == "ucinewgame") Search::clear();
else if (token == "isready") sync_cout << "readyok" << sync_endl;
else if (token == "isready") { TT.resizeIfChanged(); sync_cout << "readyok" << sync_endl; }

// Additional custom non-UCI commands, mainly for debugging.
// Do not use these commands during a search!
Expand Down
5 changes: 2 additions & 3 deletions src/ucioption.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -37,8 +37,7 @@ UCI::OptionsMap Options; // Global object
namespace UCI {

/// 'On change' actions, triggered by an option's value change
void on_clear_hash(const Option&) { Search::clear(); }
void on_hash_size(const Option& o) { TT.resize(o); }
void on_clear_hash(const Option&) { TT.markDirty(); TT.resizeIfChanged(); Search::clear(); }
void on_logger(const Option& o) { start_logger(o); }
void on_threads(const Option& o) { Threads.set(o); }
void on_tb_path(const Option& o) { Tablebases::init(o); }
Expand All @@ -63,7 +62,7 @@ void init(OptionsMap& o) {
o["Contempt"] << Option(24, -100, 100);
o["Analysis Contempt"] << Option("Both var Off var White var Black var Both", "Both");
o["Threads"] << Option(1, 1, 512, on_threads);
o["Hash"] << Option(16, 1, MaxHashMB, on_hash_size);
o["Hash"] << Option(16, 1, MaxHashMB);
o["Clear Hash"] << Option(on_clear_hash);
o["Ponder"] << Option(false);
o["MultiPV"] << Option(1, 1, 500);
Expand Down

0 comments on commit f078e43

Please sign in to comment.