New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
shell: add gpu affinity support #2406
Conversation
6632df2
to
9a514e8
Compare
Rebased and switched from |
I guess we also need a |
Yeah, there's an issue open on that #2403 |
Ah, I see. Were you hinting here that this PR should add that option now that |
Support "gpu" resource type in R v1 children list. Save and return this list in struct rcalc_rankinfo. Add a total count of gpu for informational purposes.
Support GPUs in the shell rcalc test program.
Add an R input for rcalc tests with GPUs to sanity check rcalc gpu support.
Add gpus list to flux_shell_get_rank_info() JSON output.
Add a builtin gpu affinity plugin for the shell which sets CUDA_VISIBLE_DEVICES (optionally per-task), as well as CUDA_DEVICE_ORDER=PCI_BUS_ID. This builtin plugin can be overridden with the name 'gpu-affinity'. The plugin reads the shell option "gpu-affinity: string", and supports values "on", "off", and "per-task". Invalid options are ignored and the default is "on".
t2604-job-shell-affinity no longer uses jq. Remove the jq test and requirement.
Ensure basic operation of the shell builtin gpu affinity plugin.
I wasn't hinting - just thinking out loud, and I'm happy to work on it if I
can grok the necessary jobspec generation.
…On Sat, Sep 28, 2019, 6:48 PM Mark Grondona ***@***.***> wrote:
I guess we also need a --gpus-per-task=N option for flux mini run huh?
Ah, I see. Were you hinting here that this PR should add that option now
that flux-mini run is merged? That actually makes sense and I could do
that if you're not already tackling it.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#2406?email_source=notifications&email_token=AABJPW2ST4FGATVIUWHNNQLQMACQVA5CNFSM4I3MNEB2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD73F2VY#issuecomment-536239447>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AABJPW2HXOWQF4KEM2ZLKODQMACQVANCNFSM4I3MNEBQ>
.
|
I was playing with something like this? 🤷♂️ diff --git a/src/cmd/flux-mini.py b/src/cmd/flux-mini.py
index e53539dc8..35ac78de7 100755
--- a/src/cmd/flux-mini.py
+++ b/src/cmd/flux-mini.py
@@ -34,7 +34,7 @@ from datetime import timedelta
class JobSpec:
- def __init__(self, command, num_tasks=1, cores_per_task=1, num_nodes=None):
+ def __init__(self, command, num_tasks=1, cores_per_task=1, gpus_per_task=0, num_nodes=None):
"""
Constructor builds the minimum legal v1 jobspec.
Use setters to assign additional properties.
@@ -45,12 +45,16 @@ class JobSpec:
raise ValueError("task count must be a integer >= 1")
if not isinstance(cores_per_task, int) or cores_per_task < 1:
raise ValueError("cores per task must be an integer >= 1")
+ if not isinstance(gpus_per_task, int) or gpus_per_task < 1:
+ raise ValueError("gpus per task must be an integer >= 0")
if num_nodes is not None:
if not isinstance(num_nodes, int) or num_nodes < 1:
raise ValueError("node count must be an integer >= 1 (if set)")
if num_nodes > num_tasks:
raise ValueError("node count must not be greater than task count")
- core = self.__create_resource("core", cores_per_task)
+ children = [self.__create_resource("core", cores_per_task)]
+ if gpus_per_task > 0:
+ children.append (self.__create_resource("gpu", gpus_per_task))
if num_nodes is not None:
num_slots = int(math.ceil(num_tasks / float(num_nodes)))
if num_tasks % num_nodes != 0:
@@ -58,11 +62,11 @@ class JobSpec:
task_count_dict = {"total": num_tasks}
else:
task_count_dict = {"per_slot": 1}
- slot = self.__create_slot("task", num_slots, [core])
+ slot = self.__create_slot("task", num_slots, children)
resource_section = self.__create_resource("node", num_nodes, [slot])
else:
task_count_dict = {"per_slot": 1}
- slot = self.__create_slot("task", num_tasks, [core])
+ slot = self.__create_slot("task", num_tasks, children)
resource_section = slot
self.jobspec = {
@@ -173,7 +177,6 @@ class JobSpec:
def dumps(self):
return json.dumps(self.jobspec)
-
class SubmitCmd:
"""
SubmitCmd submits a job, displays the jobid on stdout, and returns.
@@ -208,6 +211,14 @@ class SubmitCmd:
default=1,
help="Number of cores to allocate per task",
)
+ parser.add_argument(
+ "-g",
+ "--gpus-per-task",
+ type=int,
+ metavar="N",
+ default=0,
+ help="Number of GPUs to allocate per task",
+ )
parser.add_argument(
"-t",
"--time-limit",
@@ -275,6 +286,7 @@ class SubmitCmd:
args.command,
num_tasks=args.ntasks,
cores_per_task=args.cores_per_task,
+ gpus_per_task=args.gpus_per_task,
num_nodes=args.nodes,
)
jobspec.set_cwd(os.getcwd())
"resources": [
{
"count": 2,
"with": [
{
"count": 2,
"type": "core"
},
{
"count": 1,
"type": "gpu"
}
],
"type": "slot",
"label": "task"
}
|
However, since the scheduler in core ignores any resource besides node and core, there isn't a way to do an end to end test of GPU support (in flux-core anyway). |
9a514e8
to
8fbae10
Compare
I went ahead and pushed the change to add |
8fbae10
to
e691204
Compare
src/cmd/flux-mini.py
Outdated
"--gpus-per-task", | ||
type=int, | ||
metavar="N", | ||
default=0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! Just reading through this, won't this fail if -g
is not specified, since value will be zero and <1 raises an exception?
You might want to follow the pattern of --nodes
/ num_nodes
where it is None
unless the user provided the option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that sounds better. The assertion was a typo and should allow 0, but use of None
feels more right to me. I think I've got it wrong anyway since the builds are failing in travis. I won't have more time to work on this until this afternoon if you'd rather get it done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll have a go.
Otherwise, are you feeling like this PR is ready?
e691204
to
2171c5d
Compare
@garlick, feel free to force push this branch without that last commit. Then, I think this is merge ready. (Maybe after another spot check) |
2171c5d
to
a258487
Compare
Codecov Report
@@ Coverage Diff @@
## master #2406 +/- ##
==========================================
+ Coverage 81.12% 81.12% +<.01%
==========================================
Files 224 225 +1
Lines 36022 36099 +77
==========================================
+ Hits 29223 29287 +64
- Misses 6799 6812 +13
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Thanks for fixing it up for me! (Off doing kid stuff all day) |
This PR adds support for parsing
gpu
resources from R if assigned into the shell internalrcalc
class. The gpu resources are then exposed to plugins viaflux_shell_get_rank_info()
and agpubind.c
builtin plugin is introduced to read the gpu list and setCUDA_VISIBLE_DEVICES
appropriately.The plugin can be controlled via the
gpu-affinity
shell option.gpu-affinity="off"
disables plugin operation,gpu-affinity="on"
is the default and setsCUDA_VISIBLE_DEVICES
once per job, andgpu-affinity="per-task"
divides available GPUs evenly among local tasks. (If GPUs do not evenly divide among tasks the current behavior is undefined 😉. Probably should fix this, but I don't think it could actually happen in the current system)I couldn't directly port the
cuda_visible_devices.lua
plugin from 0.11 because that depended on the luacpu_set_t
bindings which were removed in the wreck purge of '19.