Skip to content

Commit

Permalink
linux: don't hide the NVIDIA GPU node on non-POWER platforms
Browse files Browse the repository at this point in the history
And allow to force hide it with HWLOC_KEEP_NVIDIA_GPU_NUMA_NODES=0.

These nodes were hidden by default on POWER because people
could use interleaved memory allocations across sockets,
but they would also interleave across GPU memory.

On NVIDIA Grace Hopper, interleaved allocation isn't much
of an issue since there's a single CPU node (and 1 single GPU
node per GPU slice).

Thanks to Antoine Morvan for the report.

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
(cherry picked from commit 39fae7e)
  • Loading branch information
bgoglin committed Feb 8, 2024
1 parent e099e60 commit c07b41b
Show file tree
Hide file tree
Showing 3 changed files with 13 additions and 6 deletions.
1 change: 1 addition & 0 deletions NEWS
Expand Up @@ -28,6 +28,7 @@ Version 2.10.1
thanks to Florent Pruvost for the help.
* Fix the enabling of CUDA in Windows CMake build,
Thanks to Moritz Kreutzer for the patch.
* Don't hide the GPU NUMA node on NVIDIA Grace Hopper.


Version 2.10.0
Expand Down
11 changes: 7 additions & 4 deletions doc/hwloc.doxy
Expand Up @@ -1320,10 +1320,13 @@ following environment variables.

<dt>HWLOC_KEEP_NVIDIA_GPU_NUMA_NODES=0</dt>
<dd>show or hide NUMA nodes that correspond to NVIDIA GPU memory.
By default they are ignored to avoid interleaved memory being allocated
on GPU by mistake.
Setting this environment variable to 1 exposes these NUMA nodes.
They may be recognized by the <em>GPUMemory</em> subtype.
By default they are ignored on POWER platforms to avoid interleaved
memory being allocated on GPU by mistake.

Setting this environment variable to 0 hides the NUMA nodes (default on POWER).
Setting to 1 exposes these NUMA nodes (default on non-POWER platforms such as NVIDIA Grace Hopper).

These NUMA nodes may be recognized by the <em>GPUMemory</em> subtype.
They also have a <em>PCIBusID</em> info attribute to identify the
corresponding GPU.
</dd>
Expand Down
7 changes: 5 additions & 2 deletions hwloc/topology-linux.c
@@ -1,6 +1,6 @@
/*
* Copyright © 2009 CNRS
* Copyright © 2009-2023 Inria. All rights reserved.
* Copyright © 2009-2024 Inria. All rights reserved.
* Copyright © 2009-2013, 2015, 2020 Université Bordeaux
* Copyright © 2009-2018 Cisco Systems, Inc. All rights reserved.
* Copyright © 2015 Intel, Inc. All rights reserved.
Expand Down Expand Up @@ -4242,7 +4242,10 @@ look_sysfsnode(struct hwloc_topology *topology,
struct dirent *dirent;
int keep;
env = getenv("HWLOC_KEEP_NVIDIA_GPU_NUMA_NODES");
keep = env && atoi(env);
/* NVIDIA GPU NUMA nodes hidden by default on POWER */
keep = (data->arch != HWLOC_LINUX_ARCH_POWER);
if (env)
keep = atoi(env);
while ((dirent = readdir(dir)) != NULL) {
char nvgpunumapath[300], line[256];
int err;
Expand Down

0 comments on commit c07b41b

Please sign in to comment.