New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
shell: support -o gpu-affinity=map:LIST
#5356
Conversation
Problem: We'd like to support a map:LIST option for the gpu affinity shell plugin that matches the cpu affinity support, but the functions to create, parse, and destroy lists of hwloc_cpuset_t objects are internal to the shell CPU affinity plugin. Export the cpuset array functions from affinity.c for use by other builtin shell plugins.
Problem: The gpubind plugin "per-task" works by allocating gpu ids from a shared gpus idset within the task.init callback, but this is different from how the cpu affinity plugins and thus code can't be easily shared. Rewrite the gpubind plugin to use an array of hwloc_cpuset_t objects to maintain which gpus should be assinged to each task. To ease memory management, add a context object that contains relevant data for the plugin, and create this object at initialization. The gpubind plugin now assigns GPUs to tasks during initialization, as does the affinity plugin. This allows the cpuset array code to be reused, and for the gpubind plugin to support a "map:LIST" option in the future.
Problem: The usefulness of the `-o cpu-affinity=map:LIST` option is reduced because the assigned cores must be contained within the cpu set assigned to the job. However, if a user is using the `map:LIST` option then they are presumably agreeing to take complete control of core assignments, and the option should not limit them to only the assigned cores. Drop the check in the affinity plugin that cores are contained within the job cpuset. Don't even check if the cores are valid (this would presumably fail later when the affinity is actually applied.) Fixes flux-framework#5352
Problem: Conditional registration of the task.init callback may cause code duplication if the callback is required in other situations. Always register the task.init callback. Do nothing if ctx->gpusets has not been assigned. This means there are no per-task GPUs.
Problem: The gpubind plugin returns early if no GPUs were allocated to the job, but some future gpu-affinity options may want to override the default binding in this case. Let the plugin fall through to the if/else block even if no GPUs are assigned to the job. Change the final `else` to `else if (ngpus > 0)` though, so that the plugin still does nothing (besides setting CUDA_VISIBLE_DEVICES=-1) if ngpus == 0.
Problem: The gpubind shell plugin doesn't support explicit mapping of GPUs to tasks. Support a `-o gpu-affinity=map:LIST` which works similar to the cpu-affinity of the same name. This option allows explicity specification of GPUs to tasks without regard for the actual GPU ids assigned to a job. It is mainly meant for testing, benchmarks, or when default GPU assignment is not working for a particular situation. Fixes flux-framework#5350
Problem: No tests in the testsuite ensure proper operation of the job shell gpu-affinity=map: option. Add a simple test that ensures the gpu-affinity `map:` option is working for a basic scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a good idea. I didn't find anything to comment on except the man page entry.
doc/man1/flux-shell.rst
Outdated
@@ -276,7 +276,10 @@ options are supported by the builtin plugins of ``flux-shell``: | |||
sets ``CUDA_VISIBLE_DEVICES`` to the GPU IDs allocated to the job. | |||
*OPT* may be set to ``off`` to disable the plugin, or ``per-task`` | |||
to divide allocated GPUs among tasks launched by the shell (sets a | |||
different GPU ID or IDs for each launched task) | |||
different GPU ID or IDs for each launched task) If *OPT* starts with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing period?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also this refers to the cpu-affinity entry in the man page, but over there, the hwloc_cpuset_t(3) reference is a 404 in the rendered page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, nice catch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, that manpage exists on some systems but not on linux,die.net. I guess I'll have to reference the hwloc documentation directly (which won't be too nice in a manpage but oh well)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I pushed an extra commit to fix the missing hwloc_cpuset_t(3)
link. Used hwlocality_bitmask(3)
instead, which has the actual *sscanf(3)
functions used in the shell plugin. Since that manpage also does not exist on linux.die.net, an explicit link directly to the latest hwloc documentation is used.
Problem: The `map:` argument of the `gpu-affinity` shell option is not documented. Add a short description of this option to flux-shell(1).
Codecov Report
@@ Coverage Diff @@
## master #5356 +/- ##
==========================================
- Coverage 83.78% 83.77% -0.02%
==========================================
Files 463 463
Lines 77559 77601 +42
==========================================
+ Hits 64986 65008 +22
- Misses 12573 12593 +20
|
Problem: The documentation of the cpu-affinity shell option in the flux-shell(1) manpage references the `hwloc_cpuset_t(3)` manpage, but the link does not exist. Reference the `hwlocality_bitmap(3)` manpage instead, and use a direct link to the hwloc docs v2.9.0 since hwlocality_bitmap(3) doesn't exist on the default manpage site (linux.die.net). Update the spelling dictionary.
Setting MWP on this one. |
This PR adds support for
gpu-affinity=map:LIST
. This can come in handy for benchmarking, testing, etc. Also, since-o cpu-affinity
supportsmap:
, it seemed like GPUs should be able to assigned in this way.As a side effect of the approach, #5352 was also fixed.