Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Windows large pages #2656

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 45 additions & 0 deletions Readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,51 @@ more compact than Nalimov tablebases, while still storing all information
needed for optimal play and in addition being able to take into account
the 50-move rule.

## Large Pages

Stockfish supports large pages on Linux and Windows. Large pages make
the hash access more efficient improving the engine speed, especially
on large hash sizes. Typical increases are 5..10% in terms of nps, but
speed increases up to 30% have been measured. The support is
automatic. Stockfish attempts to use large pages when available and
will fall back to regular memory allocation when this is not the case.

### Support on Linux

Large page support on Linux is obtained by the Linux kernel
transparent huge pages functionality. Often, transparent huge pages
are already enabled and no configuration is needed.

To verify the use of transparent huge pages, the following command may
be used:

```
grep AnonHugePages /proc/meminfo
```

After launching the engine, this number should increase roughly by the
configured size of the hash.

For troubleshooting, file `/sys/kernel/mm/transparent_hugepage/enabled`
controls whether transparent huge pages are enabled. Setting
`always` or `madvise` should suffice for Stockfish.

File `/sys/kernel/mm/transparent_hugepage/defrag` controls whether
memory is attempted to be defragmented to make room for large pages
when necessary. Setting `always`, `defer+madvise`, or `madvise` is
recommended.

### Support on Windows

The use of large pages requires "Lock Pages in Memory" privilege. See
[Enable the Lock Pages in Memory Option (Windows)](https://docs.microsoft.com/en-us/sql/database-engine/configure-windows/enable-the-lock-pages-in-memory-option-windows)
on how to enable this privilege. Logout/login may be needed
afterwards. To determine whether large pages are in use, see the
engine log.

Due to memory fragmentation, memory with large pages may not be always
possible to allocate even when enabled. When this is the case, reboot
may be needed.

## Compiling Stockfish yourself from the sources

Expand Down
1 change: 1 addition & 0 deletions src/main.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ int main(int argc, char* argv[]) {

UCI::loop(argc, argv);

TT.resize(0); // release hash first to avoid segfault on waiting for search completion
vondele marked this conversation as resolved.
Show resolved Hide resolved
Threads.set(0);
return 0;
}
87 changes: 87 additions & 0 deletions src/misc.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -308,6 +308,71 @@ void* aligned_ttmem_alloc(size_t allocSize, void*& mem) {
return mem;
}

#elif defined(_WIN64)

static void* aligned_ttmem_alloc_large_pages(size_t allocSize) {

HANDLE hProcessToken { };
LUID luid { };
void* mem = nullptr;

const size_t largePageSize = GetLargePageMinimum();
if (!largePageSize)
return nullptr;

// We need SeLockMemoryPrivilege, so try to enable it for the process
if (!OpenProcessToken(GetCurrentProcess(), TOKEN_ADJUST_PRIVILEGES | TOKEN_QUERY, &hProcessToken))
return nullptr;

if (LookupPrivilegeValue(NULL, SE_LOCK_MEMORY_NAME, &luid))
{
TOKEN_PRIVILEGES tp { };
TOKEN_PRIVILEGES prevTp { };
DWORD prevTpLen = 0;

tp.PrivilegeCount = 1;
tp.Privileges[0].Luid = luid;
tp.Privileges[0].Attributes = SE_PRIVILEGE_ENABLED;

// Try to enable SeLockMemoryPrivilege. Note that even if AdjustTokenPrivileges() succeeds,
// we still need to query GetLastError() to ensure that the privileges were actually obtained...
if (AdjustTokenPrivileges(
hProcessToken, FALSE, &tp, sizeof(TOKEN_PRIVILEGES), &prevTp, &prevTpLen) &&
GetLastError() == ERROR_SUCCESS)
{
// round up size to full pages and allocate
allocSize = (allocSize + largePageSize - 1) & ~size_t(largePageSize - 1);
mem = VirtualAlloc(
NULL, allocSize, MEM_RESERVE | MEM_COMMIT | MEM_LARGE_PAGES, PAGE_READWRITE);

// privilege no longer needed, restore previous state
AdjustTokenPrivileges(hProcessToken, FALSE, &prevTp, 0, NULL, NULL);
}
}
skiminki marked this conversation as resolved.
Show resolved Hide resolved

CloseHandle(hProcessToken);

return mem;
}

void* aligned_ttmem_alloc(size_t allocSize, void*& mem) {

// try allocate large pages
mem = aligned_ttmem_alloc_large_pages(allocSize);
if (mem)
sync_cout << "info string Hash table allocation: Windows large pages used." << sync_endl;
else
sync_cout << "info string Hash table allocation: Windows large pages not used." << sync_endl;

// fall back to regular allocation if necessary
if (!mem)
mem = VirtualAlloc(NULL, allocSize, MEM_RESERVE | MEM_COMMIT, PAGE_READWRITE);

// NOTE: VirtualAlloc returns memory at page boundary, so no need to align for
// cachelines
return mem;
}

#else

void* aligned_ttmem_alloc(size_t allocSize, void*& mem) {
Expand All @@ -321,6 +386,28 @@ void* aligned_ttmem_alloc(size_t allocSize, void*& mem) {

#endif

/// aligned_ttmem_free will free the previously allocated ttmem
#if defined(_WIN64)

void aligned_ttmem_free(void* mem) {

if (!VirtualFree(mem, 0, MEM_RELEASE))
{
DWORD err = GetLastError();
std::cerr << "Failed to free transposition table. Error code: 0x" <<
std::hex << err << std::dec << std::endl;
exit(EXIT_FAILURE);
}
}

#else

void aligned_ttmem_free(void *mem) {
free(mem);
}

#endif


namespace WinProcGroup {

Expand Down
1 change: 1 addition & 0 deletions src/misc.h
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ const std::string compiler_info();
void prefetch(void* addr);
void start_logger(const std::string& fname);
void* aligned_ttmem_alloc(size_t size, void*& mem);
void aligned_ttmem_free(void* mem);

void dbg_hit_on(bool b);
void dbg_hit_on(bool c, bool b);
Expand Down
9 changes: 8 additions & 1 deletion src/tt.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,14 @@ void TranspositionTable::resize(size_t mbSize) {

Threads.main()->wait_for_search_finished();

free(mem);
if (mem)
aligned_ttmem_free(mem);

if (!mbSize)
{
mem = nullptr;
return;
}

clusterCount = mbSize * 1024 * 1024 / sizeof(Cluster);
table = static_cast<Cluster*>(aligned_ttmem_alloc(clusterCount * sizeof(Cluster), mem));
Expand Down