Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shell: support -o gpu-affinity=map:LIST #5356

Merged
merged 9 commits into from Jul 28, 2023

Conversation

grondo
Copy link
Contributor

@grondo grondo commented Jul 28, 2023

This PR adds support for gpu-affinity=map:LIST. This can come in handy for benchmarking, testing, etc. Also, since -o cpu-affinity supports map:, it seemed like GPUs should be able to assigned in this way.

As a side effect of the approach, #5352 was also fixed.

Problem: We'd like to support a map:LIST option for the gpu affinity
shell plugin that matches the cpu affinity support, but the functions
to create, parse, and destroy lists of hwloc_cpuset_t objects are
internal to the shell CPU affinity plugin.

Export the cpuset array functions from affinity.c for use by other
builtin shell plugins.
Problem: The gpubind plugin "per-task" works by allocating gpu ids
from a shared gpus idset within the task.init callback, but this is
different from how the cpu affinity plugins and thus code can't be
easily shared.

Rewrite the gpubind plugin to use an array of hwloc_cpuset_t objects
to maintain which gpus should be assinged to each task. To ease memory
management, add a context object that contains relevant data for the
plugin, and create this object at initialization.

The gpubind plugin now assigns GPUs to tasks during initialization,
as does the affinity plugin. This allows the cpuset array code to be
reused, and for the gpubind plugin to support a "map:LIST" option in
the future.
Problem: The usefulness of the `-o cpu-affinity=map:LIST` option is
reduced because the assigned cores must be contained within the cpu
set assigned to the job. However, if a user is using the `map:LIST`
option then they are presumably agreeing to take complete control of
core assignments, and the option should not limit them to only the
assigned cores.

Drop the check in the affinity plugin that cores are contained within
the job cpuset. Don't even check if the cores are valid (this would
presumably fail later when the affinity is actually applied.)

Fixes flux-framework#5352
Problem: Conditional registration of the task.init callback may cause
code duplication if the callback is required in other situations.

Always register the task.init callback. Do nothing if ctx->gpusets
has not been assigned. This means there are no per-task GPUs.
Problem: The gpubind plugin returns early if no GPUs were allocated
to the job, but some future gpu-affinity options may want to override
the default binding in this case.

Let the plugin fall through to the if/else block even if no GPUs are
assigned to the job. Change the final `else` to `else if (ngpus >
0)` though, so that the plugin still does nothing (besides setting
CUDA_VISIBLE_DEVICES=-1) if ngpus == 0.
Problem: The gpubind shell plugin doesn't support explicit mapping
of GPUs to tasks.

Support a `-o gpu-affinity=map:LIST` which works similar to
the cpu-affinity of the same name. This option allows explicity
specification of GPUs to tasks without regard for the actual GPU ids
assigned to a job. It is mainly meant for testing, benchmarks, or when
default GPU assignment is not working for a particular situation.

Fixes flux-framework#5350
Problem: No tests in the testsuite ensure proper operation of the
job shell gpu-affinity=map: option.

Add a simple test that ensures the gpu-affinity `map:` option is working
for a basic scenario.
Copy link
Member

@garlick garlick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a good idea. I didn't find anything to comment on except the man page entry.

@@ -276,7 +276,10 @@ options are supported by the builtin plugins of ``flux-shell``:
sets ``CUDA_VISIBLE_DEVICES`` to the GPU IDs allocated to the job.
*OPT* may be set to ``off`` to disable the plugin, or ``per-task``
to divide allocated GPUs among tasks launched by the shell (sets a
different GPU ID or IDs for each launched task)
different GPU ID or IDs for each launched task) If *OPT* starts with
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing period?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also this refers to the cpu-affinity entry in the man page, but over there, the hwloc_cpuset_t(3) reference is a 404 in the rendered page.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, nice catch!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, that manpage exists on some systems but not on linux,die.net. I guess I'll have to reference the hwloc documentation directly (which won't be too nice in a manpage but oh well)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I pushed an extra commit to fix the missing hwloc_cpuset_t(3) link. Used hwlocality_bitmask(3) instead, which has the actual *sscanf(3) functions used in the shell plugin. Since that manpage also does not exist on linux.die.net, an explicit link directly to the latest hwloc documentation is used.

Problem: The `map:` argument of the `gpu-affinity` shell option is not
documented.

Add a short description of this option to flux-shell(1).
@codecov
Copy link

codecov bot commented Jul 28, 2023

Codecov Report

Merging #5356 (8039af3) into master (87bb7ee) will decrease coverage by 0.02%.
The diff coverage is 70.51%.

❗ Current head 8039af3 differs from pull request most recent head 3069266. Consider uploading reports for the commit 3069266 to get more accurate results

@@            Coverage Diff             @@
##           master    #5356      +/-   ##
==========================================
- Coverage   83.78%   83.77%   -0.02%     
==========================================
  Files         463      463              
  Lines       77559    77601      +42     
==========================================
+ Hits        64986    65008      +22     
- Misses      12573    12593      +20     
Files Changed Coverage Δ
src/shell/gpubind.c 74.76% <68.91%> (-11.91%) ⬇️
src/shell/affinity.c 79.27% <100.00%> (-0.53%) ⬇️

... and 12 files with indirect coverage changes

Problem: The documentation of the cpu-affinity shell option in the
flux-shell(1) manpage references the `hwloc_cpuset_t(3)` manpage,
but the link does not exist.

Reference the `hwlocality_bitmap(3)` manpage instead, and use a
direct link to the hwloc docs v2.9.0 since hwlocality_bitmap(3)
doesn't exist on the default manpage site (linux.die.net).

Update the spelling dictionary.
@grondo
Copy link
Contributor Author

grondo commented Jul 28, 2023

Setting MWP on this one.

@mergify mergify bot merged commit 55443a6 into flux-framework:master Jul 28, 2023
31 of 32 checks passed
@grondo grondo deleted the issue#5350 branch July 28, 2023 23:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants