Skip to content

Commit

Permalink
Fix compile under Windows XP
Browse files Browse the repository at this point in the history
The needed Windows API for processor groups could be missed from old Windows
versions, so instead of calling them directly (forcing the linker to resolve
the calls at compile time), try to load them at runtime. To do this we need
first to define the corresponding function pointers.

Also don't interfere with running fishtest on numa hardware with Windows.
Avoid all stockfish one-threaded processes will run on the same node

No functional change.
  • Loading branch information
mcostalba committed Nov 26, 2016
1 parent 9eccba7 commit 2ec626d
Showing 1 changed file with 37 additions and 12 deletions.
49 changes: 37 additions & 12 deletions src/misc.cpp
Expand Up @@ -21,9 +21,19 @@
#ifdef _WIN32
#if _WIN32_WINNT < 0x0601
#undef _WIN32_WINNT
#define _WIN32_WINNT 0x0601 // Force to include newest API (Win 7 or later)
#define _WIN32_WINNT 0x0601 // Force to include needed API prototypes
#endif
#include <windows.h> // For processor groups
#include <windows.h>
// The needed Windows API for processor groups could be missed from old Windows
// versions, so instead of calling them directly (forcing the linker to resolve
// the calls at compile time), try to load them at runtime. To do this we need
// first to define the corresponding function pointers.
extern "C" {
typedef bool(*fun1_t)(LOGICAL_PROCESSOR_RELATIONSHIP,
PSYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX, PDWORD);
typedef bool(*fun2_t)(USHORT, PGROUP_AFFINITY);
typedef bool(*fun3_t)(HANDLE, CONST GROUP_AFFINITY*, PGROUP_AFFINITY);
}
#endif

#include <fstream>
Expand Down Expand Up @@ -215,23 +225,22 @@ int get_group(size_t idx) {
DWORD returnLength = 0;
DWORD byteOffset = 0;

// Early exit if the needed API are not available at runtime
// Early exit if the needed API is not available at runtime
HMODULE k32 = GetModuleHandle("Kernel32.dll");
if ( !GetProcAddress(k32, "GetLogicalProcessorInformationEx")
|| !GetProcAddress(k32, "GetNumaNodeProcessorMaskEx")
|| !GetProcAddress(k32, "SetThreadGroupAffinity"))
auto fun1 = (fun1_t)GetProcAddress(k32, "GetLogicalProcessorInformationEx");
if (!fun1)
return -1;

// First call to get returnLength. We expect it to fail due to null buffer
if (GetLogicalProcessorInformationEx(RelationAll, nullptr, &returnLength))
if (fun1(RelationAll, nullptr, &returnLength))
return -1;

// Once we know returnLength, allocate the buffer
SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX *buffer, *ptr;
ptr = buffer = (SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX*)malloc(returnLength);

// Second call, now we expect to succeed
if (!GetLogicalProcessorInformationEx(RelationAll, buffer, &returnLength))
if (!fun1(RelationAll, buffer, &returnLength))
{
free(buffer);
return -1;
Expand Down Expand Up @@ -278,15 +287,31 @@ int get_group(size_t idx) {

void bindThisThread(size_t idx) {

// Use a local variable instead of a static: slower but thread-safe
// If OS already scheduled us on a different group than 0 then don't overwrite
// the choice, eventually we are one of many one-threaded processes running on
// some Windows NUMA hardware, for instance in fishtest. To make it simple,
// just check if running threads are below a threshold, in this case all this
// NUMA machinery is not needed.
if (Threads.size() < 8)
return;

// Use only local variables to be thread-safe
int group = get_group(idx);

if (group == -1)
return;

GROUP_AFFINITY mask;
if (GetNumaNodeProcessorMaskEx(group, &mask))
SetThreadGroupAffinity(GetCurrentThread(), &mask, nullptr);
// Early exit if the needed API are not available at runtime
HMODULE k32 = GetModuleHandle("Kernel32.dll");
auto fun2 = (fun2_t)GetProcAddress(k32, "GetNumaNodeProcessorMaskEx");
auto fun3 = (fun3_t)GetProcAddress(k32, "SetThreadGroupAffinity");

if (!fun2 || !fun3)
return;

GROUP_AFFINITY affinity;
if (fun2(group, &affinity))
fun3(GetCurrentThread(), &affinity, nullptr);
}

#endif
Expand Down

3 comments on commit 2ec626d

@mstembera
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only a curious question because I don't fully understand the code... Will this behave correctly in cases where we run multiple instances of a multi threaded test on a single NUMA Machine? For example 3 instances of a 7 threaded test get scheduled on a 24 threaded NUMA machine?

@mcostalba
Copy link
Author

@mcostalba mcostalba commented on 2ec626d Nov 27, 2016 via email

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mstembera
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK thanks for explaining. I don't care about speculative setups either. I was only concerned about real scenarios on fishtest.

Please sign in to comment.