Revamp the map-by NUMA support #1151

rhc54 · 2021-11-13T00:10:00Z

For each unique topology in the system, compute the max os_index
of the CPU NUMAs by searching for the first instance of an
overlapping NUMA. We assume that any overlap stems from a GPU
NUMA, and that the os_index of such NUMAs starts at 255 and counts
downward. Cache that cutoff and use it when computing number of
NUMA objects and retrieving the Nth NUMA object.

Need to extend this to the distance computations and a few other
areas, but this covers the typical range of use-cases.

Signed-off-by: Ralph Castain rhc@pmix.org

rhc54 · 2021-11-13T00:12:09Z

@bgoglin Updated per your corrections. As noted, this covers the primary use-cases, but we'll need to do something about the "mindist" mapper and some of the other utilities that look at NUMA domains in support of that mapper. Lower priority, at least so far as I'm concerned.

rhc54 · 2021-11-13T14:42:56Z

bot:ibm:xl:retest

For each unique topology in the system, compute the max os_index of the CPU NUMAs by searching for the first instance of an overlapping NUMA. We assume that any overlap stems from a GPU NUMA, and that the os_index of such NUMAs starts at 255 and counts downward. Cache that cutoff and use it when computing number of NUMA objects and retrieving the Nth NUMA object. Need to extend this to the distance computations and a few other areas, but this covers the typical range of use-cases. Signed-off-by: Ralph Castain <rhc@pmix.org>

bgoglin · 2021-11-13T19:11:54Z

FYI, this will hopefully work for non-GPU heterogeneous memory too since I expect DRAM to be in the first NUMA nodes (so that the OS allocates there first), before HBM and/or NVDIMMs. At least it will work on KNL and should work on Xeon with DRAM+NVDIMMs (I'll try to test it next week).

bgoglin · 2021-11-15T10:18:28Z

I added some printf in the code to verify that the cutoff is set to 2 when the machine has 2 DRAM nodes and 2 NVDIMM nodes, looks fine. And --map-by numa seems to make my processes alternate between both sockets as expected.
I coudn't find a command-line that would clearly tell me that NUMA nodes P#0 L#0 and P#1 L#2 are used (DRAM) and not P#2 L#1 and P#3 L#3 (NVDIMM). How do you get debugs from this part of prrte?
By the way, there's hwloc_bitmap_dup() instead of alloc()+copy() in prte_hwloc_base_filter_cpus(). But you could actually just store pointers to existing bitmaps in your numas array instead of duplicating all of them.

rhc54 · 2021-11-15T19:18:25Z

By the way, there's hwloc_bitmap_dup() instead of alloc()+copy() in prte_hwloc_base_filter_cpus(). But you could actually just store pointers to existing bitmaps in your numas array instead of duplicating all of them.

Fair point - I actually went one better and now cache the hwloc_obj_t pointers so looking up the nth NUMA object can be done much faster.

How do you get debugs from this part of prrte?

I usually feed in a synthetic topology (so I can test a variety of scenarios), tell PRRTE not to launch the procs, and then have it output the "devel map" showing me precisely where each proc is put. If I want to watch the mapping mechanics, --prtemca rmaps_base_verbose 5 does a pretty good job. So it all looks like:

prterun --prtemca rmaps_base_verbose 5 --map-by numa --display map-devel --do-not-launch --prtemca hwloc_use_topo_file <file.xml> hostname

and you'll get output something like the following:

=================================   JOB MAP   =================================
Data for JOB prterun-Ralphs-iMac-2-68461@1 offset 0 Total slots allocated 24
Mapper requested: NULL  Last mapper: ppr  Mapping policy: BYNUMA:NOOVERSUBSCRIBE  Ranking policy: NUMA
Binding policy: NUMA:IF-SUPPORTED  Cpu set: N/A  PPR: 2:numa  Cpus-per-rank: N/A  Cpu Type: CORE
Num new daemons: 0	New daemon starting vpid INVALID
Num nodes: 1

Data for node: Ralphs-iMac-2	State: 3	Flags: flags: DAEMON_LAUNCHED:LOCATION_VERIFIED:MAPPED:SLOTS_GIVEN
                resolved from Ralphs-iMac-2.local
                resolved from 192.168.0.4
                resolved from 192.168.1.197
                resolved from Ralphs-iMac-2
        Daemon: [prterun-Ralphs-iMac-2-68461@0,0]	Daemon launched: True
            Num slots: 24	Slots in use: 8	Oversubscribed: FALSE
            Num slots allocated: 24	Max slots: 0
            Num procs: 8	Next node_rank: 8
        Data for proc: [prterun-Ralphs-iMac-2-68461@1,0]
                Pid: 0	Local rank: 0	Node rank: 0	App rank: 0
                State: INITIALIZED	App_context: 0
        	Mapped:  package[0][core:0-5]
        	Binding: package[0][core:0-5]
        Data for proc: [prterun-Ralphs-iMac-2-68461@1,1]
                Pid: 0	Local rank: 1	Node rank: 1	App rank: 1
                State: INITIALIZED	App_context: 0
        	Mapped:  package[0][core:0-5]
        	Binding: package[0][core:0-5]
        Data for proc: [prterun-Ralphs-iMac-2-68461@1,2]
                Pid: 0	Local rank: 2	Node rank: 2	App rank: 2
                State: INITIALIZED	App_context: 0
        	Mapped:  package[0][core:6-11]
        	Binding: package[0][core:6-11]
        Data for proc: [prterun-Ralphs-iMac-2-68461@1,3]
                Pid: 0	Local rank: 3	Node rank: 3	App rank: 3
                State: INITIALIZED	App_context: 0
        	Mapped:  package[0][core:6-11]
        	Binding: package[0][core:6-11]
        Data for proc: [prterun-Ralphs-iMac-2-68461@1,4]
                Pid: 0	Local rank: 4	Node rank: 4	App rank: 4
                State: INITIALIZED	App_context: 0
        	Mapped:  package[0][core:12-17]
        	Binding: package[0][core:12-17]
        Data for proc: [prterun-Ralphs-iMac-2-68461@1,5]
                Pid: 0	Local rank: 5	Node rank: 5	App rank: 5
                State: INITIALIZED	App_context: 0
        	Mapped:  package[0][core:12-17]
        	Binding: package[0][core:12-17]
        Data for proc: [prterun-Ralphs-iMac-2-68461@1,6]
                Pid: 0	Local rank: 6	Node rank: 6	App rank: 6
                State: INITIALIZED	App_context: 0
        	Mapped:  package[0][core:18-23]
        	Binding: package[0][core:18-23]
        Data for proc: [prterun-Ralphs-iMac-2-68461@1,7]
                Pid: 0	Local rank: 7	Node rank: 7	App rank: 7
                State: INITIALIZED	App_context: 0
        	Mapped:  package[0][core:18-23]
        	Binding: package[0][core:18-23]

=============================================================

rhc54 force-pushed the topic/numa2 branch from 5185052 to d23ac62 Compare November 13, 2021 15:59

rhc54 merged commit e84dc26 into openpmix:master Nov 13, 2021

rhc54 deleted the topic/numa2 branch November 13, 2021 18:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revamp the map-by NUMA support #1151

Revamp the map-by NUMA support #1151

rhc54 commented Nov 13, 2021

rhc54 commented Nov 13, 2021

rhc54 commented Nov 13, 2021

bgoglin commented Nov 13, 2021

bgoglin commented Nov 15, 2021

rhc54 commented Nov 15, 2021

Revamp the map-by NUMA support #1151

Revamp the map-by NUMA support #1151

Conversation

rhc54 commented Nov 13, 2021

rhc54 commented Nov 13, 2021

rhc54 commented Nov 13, 2021

bgoglin commented Nov 13, 2021

bgoglin commented Nov 15, 2021

rhc54 commented Nov 15, 2021