Skip to content

Commit

Permalink
linux: don't hide the NVIDIA GPU node on non-POWER platforms
Browse files Browse the repository at this point in the history
And allow to force hide it with HWLOC_KEEP_NVIDIA_GPU_NUMA_NODES=0.

These nodes were hidden by default on POWER because people
could use interleaved memory allocations across sockets,
but they would also interleave across GPU memory.

On NVIDIA Grace Hopper, interleaved allocation isn't much
of an issue since there's a single CPU node (and 1 single GPU
node per GPU slice).

Thanks to Antoine Morvan for the report.

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
  • Loading branch information
bgoglin committed Feb 8, 2024
1 parent 79f2079 commit 39fae7e
Show file tree
Hide file tree
Showing 2 changed files with 12 additions and 6 deletions.
11 changes: 7 additions & 4 deletions doc/hwloc.doxy
Expand Up @@ -1241,10 +1241,13 @@ following environment variables.

<dt>HWLOC_KEEP_NVIDIA_GPU_NUMA_NODES=0</dt>
<dd>show or hide NUMA nodes that correspond to NVIDIA GPU memory.
By default they are ignored to avoid interleaved memory being allocated
on GPU by mistake.
Setting this environment variable to 1 exposes these NUMA nodes.
They may be recognized by the <em>GPUMemory</em> subtype.
By default they are ignored on POWER platforms to avoid interleaved
memory being allocated on GPU by mistake.

Setting this environment variable to 0 hides the NUMA nodes (default on POWER).
Setting to 1 exposes these NUMA nodes (default on non-POWER platforms such as NVIDIA Grace Hopper).

These NUMA nodes may be recognized by the <em>GPUMemory</em> subtype.
They also have a <em>PCIBusID</em> info attribute to identify the
corresponding GPU.
</dd>
Expand Down
7 changes: 5 additions & 2 deletions hwloc/topology-linux.c
@@ -1,6 +1,6 @@
/*
* Copyright © 2009 CNRS
* Copyright © 2009-2023 Inria. All rights reserved.
* Copyright © 2009-2024 Inria. All rights reserved.
* Copyright © 2009-2013, 2015, 2020 Université Bordeaux
* Copyright © 2009-2018 Cisco Systems, Inc. All rights reserved.
* Copyright © 2015 Intel, Inc. All rights reserved.
Expand Down Expand Up @@ -4248,7 +4248,10 @@ look_sysfsnode(struct hwloc_topology *topology,
struct dirent *dirent;
int keep;
env = getenv("HWLOC_KEEP_NVIDIA_GPU_NUMA_NODES");
keep = env && atoi(env);
/* NVIDIA GPU NUMA nodes hidden by default on POWER */
keep = (data->arch != HWLOC_LINUX_ARCH_POWER);
if (env)
keep = atoi(env);
while ((dirent = readdir(dir)) != NULL) {
char nvgpunumapath[300], line[256];
int err;
Expand Down

0 comments on commit 39fae7e

Please sign in to comment.