-
Notifications
You must be signed in to change notification settings - Fork 168
/
hwloc-calc.1in
509 lines (434 loc) · 18.1 KB
/
hwloc-calc.1in
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
.\" -*- nroff -*-
.\" Copyright © 2010-2023 Inria. All rights reserved.
.\" Copyright © 2009-2020 Cisco Systems, Inc. All rights reserved.
.\" See COPYING in top-level directory.
.TH HWLOC-CALC "1" "%HWLOC_DATE%" "%PACKAGE_VERSION%" "%PACKAGE_NAME%"
.SH NAME
hwloc-calc \- Operate on cpu mask strings and objects
.
.\" **************************
.\" Synopsis Section
.\" **************************
.SH SYNOPSIS
.
.B hwloc-calc
[\fItopology options\fR] [\fIoptions\fR] \fI<location1> [<location2> [...] ]
.
.PP
Note that hwloc(7) provides a detailed explanation of the hwloc system
and of valid <location> formats;
it should be read before reading this man page.
.
.\" **************************
.\" Options Section
.\" **************************
.SH TOPOLOGY OPTIONS
.
All topology options must be given before all other options.
.
.TP 10
\fB\-\-no\-smt\fR, \fB\-\-no\-smt=<N>\fR
Only keep the first PU per core in the input locations.
If \fI<N>\fR is specified, keep the <N>-th instead, if any.
PUs are ordered by physical index during this filtering.
Note that this option is applied after searching locations.
Hence \fB\-\-no\-smt pu:2-5\fR will first select the PUs #2
to #5 in the machine before keeping one of them per core.
To rather get PUs #2 to #5 after filtering one per core,
you should combine invocations:
hwloc-calc --restrict $(hwloc-calc --no-smt all) pu:2-5
.TP
\fB\-\-cpukind\fR <n>, \fB\-\-cpukind\fR <infoname>=<infovalue>
Only keep PUs whose CPU kind match.
Either a single CPU kind is specified as an index,
or the info attribute name-value will select matching kinds.
When specified by index, it corresponds to hwloc ranking of CPU kinds
which returns energy-efficient cores first, and high-performance
power-hungry cores last.
The full list of CPU kinds may be seen with \fIlstopo --cpukinds\fR.
Note that this option is applied after searching locations.
Hence \fB\-\-cpukind 0 core:1\fR will return the second core of
the machine if it is of kind 0, and nothing otherwise.
To rather get the second core among those of kind 0, you should
combine invocations:
hwloc-calc --restrict $(hwloc-calc --cpukind 0 all) core:1
.TP
\fB\-\-restrict\fR <cpuset>
Restrict the topology to the given cpuset.
This removes some PUs and their now-child-less parents.
This is useful when combining invocations to filter some objects
before selecting among them.
Beware that restricting the PUs in a topology may change the
logical indexes of many objects, including NUMA nodes.
.TP
\fB\-\-restrict\fR nodeset=<nodeset>
Restrict the topology to the given nodeset
(unless \fB\-\-restrict\-flags\fR specifies something different).
This removes some NUMA nodes and their now-child-less parents.
Beware that restricting the NUMA nodes in a topology may change the
logical indexes of many objects, including PUs.
.TP
\fB\-\-restrict\-flags\fR <flags>
Enforce flags when restricting the topology.
Flags may be given as numeric values or as a comma-separated list of flag names
that are passed to \fIhwloc_topology_restrict()\fR.
Those names may be substrings of actual flag names as long as a single one matches,
for instance \fBbynodeset,memless\fR.
The default is \fB0\fR (or \fBnone\fR).
.TP
\fB\-\-disallowed\fR
Include objects disallowed by administrative limitations.
.TP
\fB\-i\fR <path>, \fB\-\-input\fR <path>
Read the topology from <path> instead of discovering the topology of the local machine.
If <path> is a file,
it may be a XML file exported by a previous hwloc program.
If <path> is "\-", the standard input may be used as a XML file.
On Linux, <path> may be a directory containing the topology files
gathered from another machine topology with hwloc-gather-topology.
On x86, <path> may be a directory containing a cpuid dump gathered
with hwloc-gather-cpuid.
When the archivemount program is available, <path> may also be a tarball
containing such Linux or x86 topology files.
.TP
\fB\-i\fR <specification>, \fB\-\-input\fR <specification>
Simulate a fake hierarchy (instead of discovering the topology on the
local machine). If <specification> is "node:2 pu:3", the topology will
contain two NUMA nodes with 3 processing units in each of them.
The <specification> string must end with a number of PUs.
.TP
\fB\-\-if\fR <format>, \fB\-\-input\-format\fR <format>
Enforce the input in the given format, among \fBxml\fR, \fBfsroot\fR,
\fBcpuid\fR and \fBsynthetic\fR.
.
.SH OPTIONS
.
All these options must be given after all topology options above.
.
.TP 10
\fB\-p\fR \fB\-\-physical\fR
Use OS/physical indexes instead of logical indexes for both input and output.
.TP
\fB\-l\fR \fB\-\-logical\fR
Use logical indexes instead of physical/OS indexes for both input and output (default).
.TP
\fB\-\-pi\fR \fB\-\-physical\-input\fR
Use OS/physical indexes instead of logical indexes for input.
.TP
\fB\-\-li\fR \fB\-\-logical\-input\fR
Use logical indexes instead of physical/OS indexes for input (default).
.TP
\fB\-\-po\fR \fB\-\-physical\-output\fR
Use OS/physical indexes instead of logical indexes for output.
.TP
\fB\-\-lo\fR \fB\-\-logical\-output\fR
Use logical indexes instead of physical/OS indexes for output (default, except for cpusets which are always physical).
.TP
\fB\-n\fR \fB\-\-nodeset\fR
Interpret both input and output sets as nodesets instead of CPU sets.
See \fB\-\-nodeset\-output\fR and \fB\-\-nodeset\-input\fR below for details.
.TP
\fB\-\-no\fR \fB\-\-nodeset\-output\fR
Report nodesets instead of CPU sets.
This output is more precise than the default CPU set output when memory
locality matters because it properly describes CPU-less NUMA nodes,
as well as NUMA-nodes that are local to multiple CPUs.
.TP
\fB\-\-ni\fR \fB\-\-nodeset\-input\fR
Interpret input sets as nodesets instead of CPU sets.
.TP
\fB\-\-oo\fR \fB\-\-object\-output\fR
When reporting object indexes (e.g. with \fB\-I\fR or \fB\-\-local\-memory\fR),
this option prefixes these indexes with types (e.g. \fICore:0\fR instead of \fI0\fR).
.TP
\fB\-N \-\-number\-of <type|depth>\fR
Report the number of objects of the given type or depth that intersect the CPU set.
This is convenient for finding how many cores, NUMA nodes or PUs are available
in a machine.
When combined with \fB\-\-nodeset\fR or \fB\-\-nodeset-output\fR,
the nodeset is considered instead of the CPU set for finding matching objects.
This is useful when reporting the output as a number or set of NUMA nodes.
If an OS device subtype such as \fIgpu\fR is given instead of \fIosdev\fR,
only the os devices of that subtype will be counted.
.TP
\fB\-I \-\-intersect <type|depth>\fR
Find the list of objects of the given type or depth that intersect the CPU set and
report the comma-separated list of their indexes instead of the cpu mask string.
This may be used for determining the list of objects above or below the input
objects.
When combined with \fB\-\-physical\fR, the list is convenient to pass to external
tools such as taskset or numactl \fB\-\-physcpubind\fR or \fB\-\-membind\fR.
This is different from \-\-largest since the latter requires that all reported
objects are strictly included inside the input objects.
When combined with \fB\-\-nodeset\fR or \fB\-\-nodeset-output\fR,
the nodeset is considered instead of the CPU set for finding matching objects.
This is useful when reporting the output as a number or set of NUMA nodes.
If an OS device subtype such as \fIgpu\fR is given instead of \fIosdev\fR,
only the os devices of that subtype will be returned.
If combined with \fB\-\-object\-output\fR, object indexes are prefixed
with types (e.g. \fICore:0\fR instead of \fI0\fR).
.TP
\fB\-H \-\-hierarchical <type1>.<type2>...\fR
Find the list of objects of type <type2> that intersect the CPU set and
report the space-separated list of their hierarchical indexes with respect
to <type1>, <type2>, etc.
For instance, if \fIpackage.core\fR is given, the output would be
\fIPackage:1.Core:2 Package:2.Core:3\fR if the input contains the third
core of the second package and the fourth core of the third package.
Only normal CPU-side object types should be used.
NUMA nodes may be used but they may cause redundancy in the output
on heterogeneous memory platform. For instance, on a platform with both
DRAM and HBM memory on a package, the first core will be considered both
as first core of first NUMA node (DRAM) and
as first core of second NUMA node (HBM).
.TP
\fB\-\-largest\fR
Report (in a human readable format) the list of largest objects which exactly
include all input objects (by looking at their CPU sets).
None of these output objects intersect each other, and the sum of them is
exactly equivalent to the input. No largest object is included in the input
This is different from \-\-intersect where reported objects may not be
strictly included in the input.
.TP
\fB\-\-local\-memory\fR
Report the list of NUMA nodes that are local to the input objects.
This option is similar to \fB\-I numa\fR but the way nodes are selected
is different:
The selection performed by \fB\-\-local\-memory\fR may be precisely
configured with \fB\-\-local\-memory\-flags\fR,
while \fB\-I numa\fR just selects all nodes that are somehow local to
any of the input objects.
If combined with \fB\-\-object\-output\fR, object indexes are prefixed
with types (e.g. \fINUMANode:0\fR instead of \fI0\fR).
.TP
\fB\-\-local\-memory\-flags\fR
Change the flags used to select local NUMA nodes.
Flags may be given as numeric values or as a comma-separated list of flag names
that are passed to \fIhwloc_get_local_numanode_objs()\fR.
Those names may be substrings of actual flag names as long as a single one matches.
The default is \fB3\fR (or \fBsmaller,larger\fR)
which means NUMA nodes are displayed
if their locality either contains or is contained
in the locality of the given object.
This option enables \fB\-\-local\-memory\fR.
.TP
\fB\-\-best\-memattr\fR <name>
Enable the listing of local memory nodes with \fB\-\-local\-memory\fR,
but only display the local node that has the best value for the memory
attribute given by \fI<name>\fR (or as an index).
If the memory attribute values depend on the initiator, the hwloc-calc
input objects are used as the initiator.
Standard attribute names are \fICapacity\fR, \fILocality\fR,
\fIBandwidth\fR, and \fILatency\fR.
All existing attributes in the current topology may be listed with
$ lstopo --memattrs
If combined with \fB\-\-object\-output\fR, the object index is prefixed
with its type (e.g. \fINUMANode:0\fR instead of \fI0\fR).
.TP
\fB\-\-sep <sep>\fR
Change the field separator in the output.
By default, a space is used to separate output objects
(for instance when \fB\-\-hierarchical\fR or \fB\-\-largest\fR is given)
while a comma is used to separate indexes
(for instance when \fB\-\-intersect\fR is given).
.TP
\fB\-\-single\fR
Singlify the output to a single CPU.
.TP
\fB\-\-taskset\fR
Display CPU set strings in the format recognized by the taskset command-line
program instead of hwloc-specific CPU set string format.
This option has no impact on the format of input CPU set strings,
both formats are always accepted.
.TP
\fB\-q\fR \fB\-\-quiet\fR
Hide non-fatal error messages.
It mostly includes locations pointing to non-existing objects.
.TP
\fB\-v\fR \fB\-\-verbose\fR
Verbose output.
.TP
\fB\-\-version\fR
Report version and exit.
.TP
\fB\-h\fR \fB\-\-help\fR
Display help message and exit.
.
.\" **************************
.\" Description Section
.\" **************************
.SH DESCRIPTION
.
hwloc-calc generates and manipulates CPU mask strings or objects.
Both input and output may be either objects (with physical or logical
indexes), CPU lists (with physical or logical indexes), or CPU mask strings
(always physically indexed).
Input location specification is described in hwloc(7).
.
.PP
If objects or CPU mask strings are given on the command-line,
they are combined and a single output is printed.
If no object or CPU mask strings are given on the command-line,
the program will read the standard input.
It will combine multiple objects or CPU mask strings that are
given on the same line of the standard input line with spaces
as separators.
Different input lines will be processed separately.
.
.PP
Command-line arguments and options are processed in order.
First topology configuration options should be given.
Then, for instance, changing the type of input indexes
with \fB\-\-li\fR or changing the input topology with \fB\-i\fR
only affects the processing the following arguments.
.
.PP
.B NOTE:
It is highly recommended that you read the hwloc(7) overview page
before reading this man page. Most of the concepts described in
hwloc(7) directly apply to the hwloc-calc utility.
.
.
.\" **************************
.\" Examples Section
.\" **************************
.SH EXAMPLES
.PP
hwloc-calc's operation is best described through several examples.
.
.PP
To display the (physical) CPU mask corresponding to the second package:
$ hwloc-calc package:1
0x000000f0
To display the (physical) CPU mask corresponding to the third pacakge, excluding
its even numbered logical processors:
$ hwloc-calc package:2 ~PU:even
0x00000c00
To convert a cpu mask to human-readable output, the -H option can be
used to emit a space-delimited list of locations:
$ echo 0x000000f0 | hwloc-calc -H package.core
Package:1.Core1 Package:1.Core:1 Package:1.Core:2 Package:1.Core:3
To use some other character (e.g., a comma) instead of spaces in
output, use the --sep option:
$ echo 0x000000f0 | hwloc-calc -H package.core --sep ,
Package:1.Core1,Package:1.Core:1,Package:1.Core:2,Package:1.Core:3
To combine two (physical) CPU masks:
$ hwloc-calc 0x0000ffff 0xff000000
0xff00ffff
To display the list of logical numbers of processors included in the second
package:
$ hwloc-calc --intersect PU package:1
4,5,6,7
To bind GNU OpenMP threads logically over the whole machine, we need to use
physical number output instead:
$ export GOMP_CPU_AFFINITY=`hwloc-calc --physical-output --intersect PU all`
$ echo $GOMP_CPU_AFFINITY
0,4,1,5,2,6,3,7
To display the list of NUMA nodes, by physical indexes, that intersect a given (physical) CPU mask:
$ hwloc-calc --physical --intersect NUMAnode 0xf0f0f0f0
0,2
To find how many cores are in the second CPU kind
(those cores are likely higher-performance and more power-hungry than cores of the first kind):
$ hwloc-calc --cpukind 1 -N core all
4
To display the list of NUMA nodes, by physical indexes,
whose locality is exactly equal to a Package:
$ hwloc-calc --local-memory-flags 0 pack:1
4,7
To display the best-capacity NUMA node, by physical indexe,
whose locality is exactly equal to a Package:
$ hwloc-calc --local-memory-flags 0 --best-memattr capacity pack:1
4
Converting object logical indexes (default) from/to physical/OS indexes
may be performed with \fB--intersect\fR combined with either \fB--physical-output\fR
(logical to physical conversion) or \fB--physical-input\fR (physical to logical):
$ hwloc-calc --physical-output PU:2 --intersect PU
3
$ hwloc-calc --physical-input PU:3 --intersect PU
2
One should add \fB--nodeset\fR when converting indexes of memory objects
to make sure a single NUMA node index is returned on platforms
with heterogeneous memory:
$ hwloc-calc --nodeset --physical-output node:2 --intersect node
3
$ hwloc-calc --nodeset --physical-input node:3 --intersect node
2
To display the set of CPUs near network interface eth0:
$ hwloc-calc os=eth0
0x00005555
To display the indexes of packages near PCI device whose bus ID is 0000:01:02.0:
$ hwloc-calc pci=0000:01:02.0 --intersect Package
1
To display the list of per-package cores that intersect the input:
$ hwloc-calc 0x00003c00 --hierarchical package.core
Package:2.Core:1 Package:3.Core:0
To display the (physical) CPU mask of the entire topology except the third package:
$ hwloc-calc all ~package:3
0x0000f0ff
To combine both physical and logical indexes as input:
$ hwloc-calc PU:2 --physical-input PU:3
0x0000000c
To synthetize a set of cores into largest objects on a 2-node 2-package 2-core machine:
$ hwloc-calc core:0 --largest
Core:0
$ hwloc-calc core:0-1 --largest
Package:0
$ hwloc-calc core:4-7 --largest
NUMANode:1
$ hwloc-calc core:2-6 --largest
Package:1 Package:2 Core:6
$ hwloc-calc pack:2 --largest
Package:2
$ hwloc-calc package:2-3 --largest
NUMANode:1
To get the set of first threads of all cores:
$ hwloc-calc core:all.pu:0
$ hwloc-calc --no-smt all
This can also be very useful in order to make GNU OpenMP use exactly one thread
per core, and in logical core order:
$ export OMP_NUM_THREADS=`hwloc-calc --number-of core all`
$ echo $OMP_NUM_THREADS
4
$ export GOMP_CPU_AFFINITY=`hwloc-calc --physical-output --intersect PU --no-smt all`
$ echo $GOMP_CPU_AFFINITY
0,2,1,3
To export bitmask in a format that is acceptable by the resctrl Linux subsystem
(for configuring cache partitioning, etc), apply a sed regexp to the output of hwloc-calc:
$ hwloc-calc pack:all.core:7-9.pu:0
0x00000380,,0x00000380 <this format cannot be given to resctrl>
$ hwloc-calc pack:all.core:7-9.pu:0 | sed -e 's/0x//g' -e 's/,,/,0,/g' -e 's/,,/,0,/g'
00000380,0,00000380
# echo 00000380,0,00000380 > /sys/fs/resctrl/test/cpus
# cat /sys/fs/resctrl/test/cpus
00000000,00000380,00000000,00000380 <the modified bitmask was corrected parsed by resctrl>
OS devices may also be filtered by subtype. In this example, there are
8 OS devices in the system, 4 of them are near NUMA node #1, and only
2 of these are CoProcessors:
$ utils/hwloc/hwloc-calc -I osdev all
0,1,2,3,4,5,6,7,8
$ utils/hwloc/hwloc-calc -I osdev node:1
5,6,7,8
$ utils/hwloc/hwloc-calc -I coproc node:1
7,8
.
.\" **************************
.\" Return value section
.\" **************************
.SH RETURN VALUE
Upon successful execution, hwloc-calc displays the (physical) CPU mask string,
(physical or logical) object list, or (physical or logical) object number list.
The return value is 0.
.
.
.PP
hwloc-calc will return nonzero if any kind of error occurs, such as
(but not limited to): failure to parse the command line.
.
.\" **************************
.\" See also section
.\" **************************
.SH SEE ALSO
.
.ft R
hwloc(7), lstopo(1), hwloc-info(1)
.sp