idea: add shell plugin to work around non-locality-aware scheduling of cores and GPUs #5342

grondo · 2023-07-18T15:20:29Z

@trws had the idea to build a simple shell plugin which could work around the lack of guaranteed locality aware scheduling of cores and GPUs at the moment.

The idea would be to implement a -o follow-gpus or similar option which would ignore the cores assigned to the job and would instead pick one or more cores that are near allocated GPUs.

From @trws:

so it would be this case, but slightly broadened so that if you have something you want to run, where the main constraint is "different GPUs and some nearby cores" we could handle it better as an interim solution without having to write up a one-off that's specific to the numbers for a given job on a given box

The case in question was a user that wanted to run 2 jobs per node, each with 4 tasks using 1 core and 1 gpu each.

The text was updated successfully, but these errors were encountered:

Problem: An epilog-start event emitted for a job that never had an alloc event will proceed immediately to inactive because there is no pending "free" event to hold the job until the matching epilog-finish is posted. A pending epilog-start event should prevent the "clean" event for a job as well as the "free" event. This means a job cannot become inactive until all epilogs are complete. Fixes flux-framework#5342

trws · 2023-07-27T22:21:50Z

@grondo, was this the intended target of that commit?

grondo · 2023-07-27T22:39:52Z

Dang no. Typo 🤦

mergify bot closed this as completed in f276677 Jul 27, 2023

grondo reopened this Jul 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

idea: add shell plugin to work around non-locality-aware scheduling of cores and GPUs #5342

idea: add shell plugin to work around non-locality-aware scheduling of cores and GPUs #5342

grondo commented Jul 18, 2023

trws commented Jul 27, 2023

grondo commented Jul 27, 2023

idea: add shell plugin to work around non-locality-aware scheduling of cores and GPUs #5342

idea: add shell plugin to work around non-locality-aware scheduling of cores and GPUs #5342

Comments

grondo commented Jul 18, 2023

trws commented Jul 27, 2023

grondo commented Jul 27, 2023