Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

shell: add gpu affinity support #2406

Merged
merged 7 commits into from Sep 29, 2019

Conversation

@grondo
Copy link
Contributor

commented Sep 28, 2019

This PR adds support for parsing gpu resources from R if assigned into the shell internal rcalc class. The gpu resources are then exposed to plugins via flux_shell_get_rank_info() and a gpubind.c builtin plugin is introduced to read the gpu list and set CUDA_VISIBLE_DEVICES appropriately.

The plugin can be controlled via the gpu-affinity shell option. gpu-affinity="off" disables plugin operation, gpu-affinity="on" is the default and sets CUDA_VISIBLE_DEVICES once per job, and gpu-affinity="per-task" divides available GPUs evenly among local tasks. (If GPUs do not evenly divide among tasks the current behavior is undefined 馃槈. Probably should fix this, but I don't think it could actually happen in the current system)

I couldn't directly port the cuda_visible_devices.lua plugin from 0.11 because that depended on the lua cpu_set_t bindings which were removed in the wreck purge of '19.

@grondo grondo force-pushed the grondo:shell-gpu-support branch from 6632df2 to 9a514e8 Sep 28, 2019
@grondo

This comment has been minimized.

Copy link
Contributor Author

commented Sep 28, 2019

Rebased and switched from jq to flux mini run --dry-run in tests.

@garlick

This comment has been minimized.

Copy link
Member

commented Sep 28, 2019

I guess we also need a --gpus-per-task=N option for flux mini run huh?

@grondo

This comment has been minimized.

Copy link
Contributor Author

commented Sep 28, 2019

Yeah, there's an issue open on that #2403
I was wondering if we could do something more.. flexible, than --gpus-per-task. But I guess that is what maxi run is for... 馃槈

@grondo

This comment has been minimized.

Copy link
Contributor Author

commented Sep 29, 2019

I guess we also need a --gpus-per-task=N option for flux mini run huh?

Ah, I see. Were you hinting here that this PR should add that option now that flux-mini run is merged? That actually makes sense and I could do that if you're not already tackling it.

grondo added 7 commits Sep 27, 2019
Support "gpu" resource type in R v1 children list. Save
and return this list in struct rcalc_rankinfo.

Add a total count of gpu for informational purposes.
Support GPUs in the shell rcalc test program.
Add an R input for rcalc tests with GPUs to sanity check rcalc
gpu support.
Add gpus list to flux_shell_get_rank_info() JSON output.
Add a builtin gpu affinity plugin for the shell which sets
CUDA_VISIBLE_DEVICES (optionally per-task), as well as
CUDA_DEVICE_ORDER=PCI_BUS_ID.

This builtin plugin can be overridden with the name 'gpu-affinity'.

The plugin reads the shell option "gpu-affinity: string", and supports
values "on", "off", and "per-task". Invalid options are ignored
and the default is "on".
Ensure basic operation of the shell builtin gpu affinity plugin.
t2604-job-shell-affinity no longer uses jq. Remove the jq
test and requirement.
@garlick

This comment has been minimized.

Copy link
Member

commented Sep 29, 2019

@grondo

This comment has been minimized.

Copy link
Contributor Author

commented Sep 29, 2019

I was playing with something like this? 馃し鈥嶁檪

diff --git a/src/cmd/flux-mini.py b/src/cmd/flux-mini.py
index e53539dc8..35ac78de7 100755
--- a/src/cmd/flux-mini.py
+++ b/src/cmd/flux-mini.py
@@ -34,7 +34,7 @@ from datetime import timedelta
 
 
 class JobSpec:
-    def __init__(self, command, num_tasks=1, cores_per_task=1, num_nodes=None):
+    def __init__(self, command, num_tasks=1, cores_per_task=1, gpus_per_task=0, num_nodes=None):
         """
         Constructor builds the minimum legal v1 jobspec.
         Use setters to assign additional properties.
@@ -45,12 +45,16 @@ class JobSpec:
             raise ValueError("task count must be a integer >= 1")
         if not isinstance(cores_per_task, int) or cores_per_task < 1:
             raise ValueError("cores per task must be an integer >= 1")
+        if not isinstance(gpus_per_task, int) or gpus_per_task < 1:
+            raise ValueError("gpus per task must be an integer >= 0")
         if num_nodes is not None:
             if not isinstance(num_nodes, int) or num_nodes < 1:
                 raise ValueError("node count must be an integer >= 1 (if set)")
             if num_nodes > num_tasks:
                 raise ValueError("node count must not be greater than task count")
-        core = self.__create_resource("core", cores_per_task)
+        children = [self.__create_resource("core", cores_per_task)]
+        if gpus_per_task > 0:
+            children.append (self.__create_resource("gpu", gpus_per_task))
         if num_nodes is not None:
             num_slots = int(math.ceil(num_tasks / float(num_nodes)))
             if num_tasks % num_nodes != 0:
@@ -58,11 +62,11 @@ class JobSpec:
                 task_count_dict = {"total": num_tasks}
             else:
                 task_count_dict = {"per_slot": 1}
-            slot = self.__create_slot("task", num_slots, [core])
+            slot = self.__create_slot("task", num_slots, children)
             resource_section = self.__create_resource("node", num_nodes, [slot])
         else:
             task_count_dict = {"per_slot": 1}
-            slot = self.__create_slot("task", num_tasks, [core])
+            slot = self.__create_slot("task", num_tasks, children)
             resource_section = slot
 
         self.jobspec = {
@@ -173,7 +177,6 @@ class JobSpec:
     def dumps(self):
         return json.dumps(self.jobspec)
 
-
 class SubmitCmd:
     """
     SubmitCmd submits a job, displays the jobid on stdout, and returns.
@@ -208,6 +211,14 @@ class SubmitCmd:
             default=1,
             help="Number of cores to allocate per task",
         )
+        parser.add_argument(
+            "-g",
+            "--gpus-per-task",
+            type=int,
+            metavar="N",
+            default=0,
+            help="Number of GPUs to allocate per task",
+        )
         parser.add_argument(
             "-t",
             "--time-limit",
@@ -275,6 +286,7 @@ class SubmitCmd:
             args.command,
             num_tasks=args.ntasks,
             cores_per_task=args.cores_per_task,
+            gpus_per_task=args.gpus_per_task,
             num_nodes=args.nodes,
         )
         jobspec.set_cwd(os.getcwd())

flux mini run -g 1 -c 2 -n 2 produces:

  "resources": [
    {
      "count": 2,
      "with": [
        {
          "count": 2,
          "type": "core"
        },
        {
          "count": 1,
          "type": "gpu"
        }
      ],
      "type": "slot",
      "label": "task"
    }
@grondo

This comment has been minimized.

Copy link
Contributor Author

commented Sep 29, 2019

However, since the scheduler in core ignores any resource besides node and core, there isn't a way to do an end to end test of GPU support (in flux-core anyway).

@grondo grondo force-pushed the grondo:shell-gpu-support branch from 9a514e8 to 8fbae10 Sep 29, 2019
@grondo

This comment has been minimized.

Copy link
Contributor Author

commented Sep 29, 2019

I went ahead and pushed the change to add flux-mini -g, --gpus-per-task=N (no tests yet.) That would allow this PR to close #2402. If it is wrong it can easily be backed out.

@grondo grondo force-pushed the grondo:shell-gpu-support branch from 8fbae10 to e691204 Sep 29, 2019
"--gpus-per-task",
type=int,
metavar="N",
default=0,

This comment has been minimized.

Copy link
@garlick

garlick Sep 29, 2019

Member

Cool! Just reading through this, won't this fail if -g is not specified, since value will be zero and <1 raises an exception?

You might want to follow the pattern of --nodes / num_nodes where it is None unless the user provided the option.

This comment has been minimized.

Copy link
@grondo

grondo Sep 29, 2019

Author Contributor

Yeah, that sounds better. The assertion was a typo and should allow 0, but use of None feels more right to me. I think I've got it wrong anyway since the builds are failing in travis. I won't have more time to work on this until this afternoon if you'd rather get it done.

This comment has been minimized.

Copy link
@garlick

garlick Sep 29, 2019

Member

Sure, I'll have a go.

Otherwise, are you feeling like this PR is ready?

@garlick

This comment has been minimized.

Copy link
Member

commented Sep 29, 2019

I'll add the new option to the flux-mini(1) man page in #2409 if that works for you.

Edit: 0d1e625 if you prefer to cherry pick, or redo.

@grondo grondo force-pushed the grondo:shell-gpu-support branch from e691204 to 2171c5d Sep 29, 2019
@grondo

This comment has been minimized.

Copy link
Contributor Author

commented Sep 29, 2019

@garlick, feel free to force push this branch without that last commit. Then, I think this is merge ready. (Maybe after another spot check)

@garlick garlick force-pushed the grondo:shell-gpu-support branch from 2171c5d to a258487 Sep 29, 2019
@codecov-io

This comment has been minimized.

Copy link

commented Sep 29, 2019

Codecov Report

Merging #2406 into master will increase coverage by <.01%.
The diff coverage is 82.35%.

@@            Coverage Diff             @@
##           master    #2406      +/-   ##
==========================================
+ Coverage   81.12%   81.12%   +<.01%     
==========================================
  Files         224      225       +1     
  Lines       36022    36099      +77     
==========================================
+ Hits        29223    29287      +64     
- Misses       6799     6812      +13
Impacted Files Coverage 螖
src/shell/builtins.c 95.23% <酶> (酶) 猬嗭笍
src/shell/shell.c 85.75% <100%> (-0.24%) 猬囷笍
src/shell/gpubind.c 79.68% <79.68%> (酶)
src/shell/rcalc.c 87.24% <88.88%> (+0.83%) 猬嗭笍
src/common/libsubprocess/local.c 79.86% <0%> (-0.35%) 猬囷笍
src/common/libflux/message.c 80.22% <0%> (-0.14%) 猬囷笍
src/common/libutil/veb.c 98.95% <0%> (+0.52%) 猬嗭笍
src/modules/job-info/guest_watch.c 74.71% <0%> (+0.56%) 猬嗭笍
Copy link
Member

left a comment

LGTM!

@mergify mergify bot merged commit 62a92b3 into flux-framework:master Sep 29, 2019
2 checks passed
2 checks passed
Summary 1 rule matches
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@grondo grondo deleted the grondo:shell-gpu-support branch Sep 29, 2019
@grondo

This comment has been minimized.

Copy link
Contributor Author

commented Sep 29, 2019

Thanks for fixing it up for me! (Off doing kid stuff all day)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can鈥檛 perform that action at this time.