Add HWLOC_KEEP_NVIDIA_GPU_NUMA_NODES=0 to MPS wrapper script to avoid GPU NUMA nodes
#179
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
hwloc 2.11.0 started exposing the Grace-Hopper GPU NUMA nodes by default (https://github.com/open-mpi/hwloc/blob/41030697179b16f96f7e169f4530061c5fe6803f/NEWS#L164):
This means that the MPS wrapper script would try to set GPU NUMA nodes to silly values (GPU NUMA nodes are indexed >= 4). For example:
This is not yet a problem with the system hwloc (on daint at least) which is at version 2.9.0, but if it's updated or if a newer hwloc is visible in an environment the MPS wrapper script would behave weirdly.
This PR explicitly sets
HWLOC_KEEP_NVIDIA_GPU_NUMA_NODES=0when getting the GPU index for a rank. Setting it doesn't hurt on older hwloc versions.