shell: add gpu affinity support #2406

grondo · 2019-09-28T02:32:52Z

This PR adds support for parsing gpu resources from R if assigned into the shell internal rcalc class. The gpu resources are then exposed to plugins via flux_shell_get_rank_info() and a gpubind.c builtin plugin is introduced to read the gpu list and set CUDA_VISIBLE_DEVICES appropriately.

The plugin can be controlled via the gpu-affinity shell option. gpu-affinity="off" disables plugin operation, gpu-affinity="on" is the default and sets CUDA_VISIBLE_DEVICES once per job, and gpu-affinity="per-task" divides available GPUs evenly among local tasks. (If GPUs do not evenly divide among tasks the current behavior is undefined 😉. Probably should fix this, but I don't think it could actually happen in the current system)

I couldn't directly port the cuda_visible_devices.lua plugin from 0.11 because that depended on the lua cpu_set_t bindings which were removed in the wreck purge of '19.

grondo · 2019-09-28T19:31:19Z

Rebased and switched from jq to flux mini run --dry-run in tests.

garlick · 2019-09-28T21:50:02Z

I guess we also need a --gpus-per-task=N option for flux mini run huh?

grondo · 2019-09-28T22:06:11Z

Yeah, there's an issue open on that #2403
I was wondering if we could do something more.. flexible, than --gpus-per-task. But I guess that is what maxi run is for... 😉

grondo · 2019-09-29T01:48:58Z

I guess we also need a --gpus-per-task=N option for flux mini run huh?

Ah, I see. Were you hinting here that this PR should add that option now that flux-mini run is merged? That actually makes sense and I could do that if you're not already tackling it.

Support "gpu" resource type in R v1 children list. Save and return this list in struct rcalc_rankinfo. Add a total count of gpu for informational purposes.

Support GPUs in the shell rcalc test program.

Add an R input for rcalc tests with GPUs to sanity check rcalc gpu support.

Add gpus list to flux_shell_get_rank_info() JSON output.

Add a builtin gpu affinity plugin for the shell which sets CUDA_VISIBLE_DEVICES (optionally per-task), as well as CUDA_DEVICE_ORDER=PCI_BUS_ID. This builtin plugin can be overridden with the name 'gpu-affinity'. The plugin reads the shell option "gpu-affinity: string", and supports values "on", "off", and "per-task". Invalid options are ignored and the default is "on".

t2604-job-shell-affinity no longer uses jq. Remove the jq test and requirement.

Ensure basic operation of the shell builtin gpu affinity plugin.

garlick · 2019-09-29T02:47:22Z

I wasn't hinting - just thinking out loud, and I'm happy to work on it if I can grok the necessary jobspec generation.

…

On Sat, Sep 28, 2019, 6:48 PM Mark Grondona ***@***.***> wrote: I guess we also need a --gpus-per-task=N option for flux mini run huh? Ah, I see. Were you hinting here that this PR should add that option now that flux-mini run is merged? That actually makes sense and I could do that if you're not already tackling it. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#2406?email_source=notifications&email_token=AABJPW2ST4FGATVIUWHNNQLQMACQVA5CNFSM4I3MNEB2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD73F2VY#issuecomment-536239447>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AABJPW2HXOWQF4KEM2ZLKODQMACQVANCNFSM4I3MNEBQ> .

grondo · 2019-09-29T02:59:43Z

I was playing with something like this? 🤷‍♂️

diff --git a/src/cmd/flux-mini.py b/src/cmd/flux-mini.py
index e53539dc8..35ac78de7 100755
--- a/src/cmd/flux-mini.py
+++ b/src/cmd/flux-mini.py
@@ -34,7 +34,7 @@ from datetime import timedelta
 
 
 class JobSpec:
-    def __init__(self, command, num_tasks=1, cores_per_task=1, num_nodes=None):
+    def __init__(self, command, num_tasks=1, cores_per_task=1, gpus_per_task=0, num_nodes=None):
         """
         Constructor builds the minimum legal v1 jobspec.
         Use setters to assign additional properties.
@@ -45,12 +45,16 @@ class JobSpec:
             raise ValueError("task count must be a integer >= 1")
         if not isinstance(cores_per_task, int) or cores_per_task < 1:
             raise ValueError("cores per task must be an integer >= 1")
+        if not isinstance(gpus_per_task, int) or gpus_per_task < 1:
+            raise ValueError("gpus per task must be an integer >= 0")
         if num_nodes is not None:
             if not isinstance(num_nodes, int) or num_nodes < 1:
                 raise ValueError("node count must be an integer >= 1 (if set)")
             if num_nodes > num_tasks:
                 raise ValueError("node count must not be greater than task count")
-        core = self.__create_resource("core", cores_per_task)
+        children = [self.__create_resource("core", cores_per_task)]
+        if gpus_per_task > 0:
+            children.append (self.__create_resource("gpu", gpus_per_task))
         if num_nodes is not None:
             num_slots = int(math.ceil(num_tasks / float(num_nodes)))
             if num_tasks % num_nodes != 0:
@@ -58,11 +62,11 @@ class JobSpec:
                 task_count_dict = {"total": num_tasks}
             else:
                 task_count_dict = {"per_slot": 1}
-            slot = self.__create_slot("task", num_slots, [core])
+            slot = self.__create_slot("task", num_slots, children)
             resource_section = self.__create_resource("node", num_nodes, [slot])
         else:
             task_count_dict = {"per_slot": 1}
-            slot = self.__create_slot("task", num_tasks, [core])
+            slot = self.__create_slot("task", num_tasks, children)
             resource_section = slot
 
         self.jobspec = {
@@ -173,7 +177,6 @@ class JobSpec:
     def dumps(self):
         return json.dumps(self.jobspec)
 
-
 class SubmitCmd:
     """
     SubmitCmd submits a job, displays the jobid on stdout, and returns.
@@ -208,6 +211,14 @@ class SubmitCmd:
             default=1,
             help="Number of cores to allocate per task",
         )
+        parser.add_argument(
+            "-g",
+            "--gpus-per-task",
+            type=int,
+            metavar="N",
+            default=0,
+            help="Number of GPUs to allocate per task",
+        )
         parser.add_argument(
             "-t",
             "--time-limit",
@@ -275,6 +286,7 @@ class SubmitCmd:
             args.command,
             num_tasks=args.ntasks,
             cores_per_task=args.cores_per_task,
+            gpus_per_task=args.gpus_per_task,
             num_nodes=args.nodes,
         )
         jobspec.set_cwd(os.getcwd())

flux mini run -g 1 -c 2 -n 2 produces:

  "resources": [
    {
      "count": 2,
      "with": [
        {
          "count": 2,
          "type": "core"
        },
        {
          "count": 1,
          "type": "gpu"
        }
      ],
      "type": "slot",
      "label": "task"
    }

grondo · 2019-09-29T03:02:45Z

However, since the scheduler in core ignores any resource besides node and core, there isn't a way to do an end to end test of GPU support (in flux-core anyway).

grondo · 2019-09-29T14:33:19Z

I went ahead and pushed the change to add flux-mini -g, --gpus-per-task=N (no tests yet.) That would allow this PR to close #2402. If it is wrong it can easily be backed out.

garlick · 2019-09-29T14:37:22Z

src/cmd/flux-mini.py

+            "--gpus-per-task",
+            type=int,
+            metavar="N",
+            default=0,


Cool! Just reading through this, won't this fail if -g is not specified, since value will be zero and <1 raises an exception?

You might want to follow the pattern of --nodes / num_nodes where it is None unless the user provided the option.

Yeah, that sounds better. The assertion was a typo and should allow 0, but use of None feels more right to me. I think I've got it wrong anyway since the builds are failing in travis. I won't have more time to work on this until this afternoon if you'd rather get it done.

Sure, I'll have a go.

Otherwise, are you feeling like this PR is ready?

garlick · 2019-09-29T14:44:06Z

I'll add the new option to the flux-mini(1) man page in #2409 if that works for you.

Edit: 0d1e625 if you prefer to cherry pick, or redo.

grondo · 2019-09-29T15:57:14Z

@garlick, feel free to force push this branch without that last commit. Then, I think this is merge ready. (Maybe after another spot check)

codecov-io · 2019-09-29T19:14:26Z

Codecov Report

Merging #2406 into master will increase coverage by <.01%.
The diff coverage is 82.35%.

@@            Coverage Diff             @@
##           master    #2406      +/-   ##
==========================================
+ Coverage   81.12%   81.12%   +<.01%     
==========================================
  Files         224      225       +1     
  Lines       36022    36099      +77     
==========================================
+ Hits        29223    29287      +64     
- Misses       6799     6812      +13

Impacted Files	Coverage Δ
src/shell/builtins.c	`95.23% <ø> (ø)`	⬆️
src/shell/shell.c	`85.75% <100%> (-0.24%)`	⬇️
src/shell/gpubind.c	`79.68% <79.68%> (ø)`
src/shell/rcalc.c	`87.24% <88.88%> (+0.83%)`	⬆️
src/common/libsubprocess/local.c	`79.86% <0%> (-0.35%)`	⬇️
src/common/libflux/message.c	`80.22% <0%> (-0.14%)`	⬇️
src/common/libutil/veb.c	`98.95% <0%> (+0.52%)`	⬆️
src/modules/job-info/guest_watch.c	`74.71% <0%> (+0.56%)`	⬆️

garlick

LGTM!

grondo · 2019-09-29T20:01:00Z

Thanks for fixing it up for me! (Off doing kid stuff all day)

grondo force-pushed the shell-gpu-support branch from 6632df2 to 9a514e8 Compare September 28, 2019 19:28

grondo added 7 commits September 29, 2019 01:50

shell: rcalc: pick up gpu list from R v1

20454b7

Support "gpu" resource type in R v1 children list. Save and return this list in struct rcalc_rankinfo. Add a total count of gpu for informational purposes.

t/shell/rcalc: support gpus

3b23724

Support GPUs in the shell rcalc test program.

t/shell: add rcalc input to sanity check gpus support

18d5f62

Add an R input for rcalc tests with GPUs to sanity check rcalc gpu support.

shell: add gpus to flux_shell_get_rank_info()

8afcf83

Add gpus list to flux_shell_get_rank_info() JSON output.

t2604-job-shell-affinity: remove jq prereq

12870ca

t2604-job-shell-affinity no longer uses jq. Remove the jq test and requirement.

testsuite: add basic shell gpu-affinity tests

a258487

Ensure basic operation of the shell builtin gpu affinity plugin.

grondo force-pushed the shell-gpu-support branch from 9a514e8 to 8fbae10 Compare September 29, 2019 14:31

grondo force-pushed the shell-gpu-support branch from 8fbae10 to e691204 Compare September 29, 2019 14:38

garlick reviewed Sep 29, 2019

View reviewed changes

grondo force-pushed the shell-gpu-support branch from e691204 to 2171c5d Compare September 29, 2019 15:26

garlick force-pushed the shell-gpu-support branch from 2171c5d to a258487 Compare September 29, 2019 18:48

garlick approved these changes Sep 29, 2019

View reviewed changes

garlick added the merge-when-passing label Sep 29, 2019

mergify bot merged commit 62a92b3 into flux-framework:master Sep 29, 2019

garlick mentioned this pull request Sep 29, 2019

flux-mini: add --gpus-per-task option, update flux-mini(1) man page #2409

Merged

grondo deleted the shell-gpu-support branch September 29, 2019 20:00

This was referenced Sep 30, 2019

wreck parity: GPU support #2402

Closed

sched-simple: reject GPU / unknown resource types #2417

Closed

0.13.0 release notes #2418

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

shell: add gpu affinity support #2406

shell: add gpu affinity support #2406

grondo commented Sep 28, 2019

grondo commented Sep 28, 2019

garlick commented Sep 28, 2019

grondo commented Sep 28, 2019

grondo commented Sep 29, 2019

garlick commented Sep 29, 2019 via email

grondo commented Sep 29, 2019 •

edited

grondo commented Sep 29, 2019

grondo commented Sep 29, 2019

garlick Sep 29, 2019

grondo Sep 29, 2019

garlick Sep 29, 2019

garlick commented Sep 29, 2019 •

edited

grondo commented Sep 29, 2019

codecov-io commented Sep 29, 2019

garlick left a comment

grondo commented Sep 29, 2019

shell: add gpu affinity support #2406

shell: add gpu affinity support #2406

Conversation

grondo commented Sep 28, 2019

grondo commented Sep 28, 2019

garlick commented Sep 28, 2019

grondo commented Sep 28, 2019

grondo commented Sep 29, 2019

garlick commented Sep 29, 2019 via email

grondo commented Sep 29, 2019 • edited

grondo commented Sep 29, 2019

grondo commented Sep 29, 2019

garlick Sep 29, 2019

Choose a reason for hiding this comment

grondo Sep 29, 2019

Choose a reason for hiding this comment

garlick Sep 29, 2019

Choose a reason for hiding this comment

garlick commented Sep 29, 2019 • edited

grondo commented Sep 29, 2019

codecov-io commented Sep 29, 2019

Codecov Report

garlick left a comment

Choose a reason for hiding this comment

grondo commented Sep 29, 2019

grondo commented Sep 29, 2019 •

edited

garlick commented Sep 29, 2019 •

edited