-
Notifications
You must be signed in to change notification settings - Fork 117
[feat] Full support for flexible task allocation #522
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
* Add support for flexible node allocation triggered when `self.num_tasks` is set to 0. * Add multiple unittests covering the main functionality of the feature. * The flexible node allocation respects the cmd options and the regression test options.
Codecov Report
@@ Coverage Diff @@
## master #522 +/- ##
==========================================
- Coverage 91.59% 91.59% -0.01%
==========================================
Files 72 72
Lines 8801 8954 +153
==========================================
+ Hits 8061 8201 +140
- Misses 740 753 +13
Continue to review full report at Codecov.
|
|
I've just had a nice talk with @victorusu on how this feature should behave depending on the flags specified (i.e., reservation, partition etc.) and we have had an idea that I think it simplifies the implementation and also the reasoning for the behaviour of this feature. The bottomline is that we should add a command-line option, if is_number(flexible_alloc_tasks):
self.num_tasks = flexible_alloc_tasks
else:
all_nodes = get_all_nodes_of_virtual_partition()
all_nodes &= reservation_nodes
all_nodes &= partition_nodes
all_nodes.filter_by_constraint()
all_nodes -= excluded_nodes
if flexible_alloc_tasks == 'idle':
self.num_tasks = len(get_idle(all_nodes))
else:
self.num_tasks = len(all_nodes) |
vkarak
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice PR! Apart from comments for enhancements, I'd like to ask whether it is possible to hoist the implementation of Slurm's guess_num_tasks() inside the base class Job and have subclasses implement only very specific tasks, such as "get the reservation nodes" etc. It seems that the algorithm you are using in guess_num_tasks is generic enough.
- Rename `get_available_nodes()` to `get_partition_nodes()`. - Move try/except clause for backends not implementing the feature inside the `guess_num_tasks()` method. - Other minor style changes.
Add support for flexible node allocation triggered when
self.num_tasksis set to 0.Add multiple unittests covering the main functionality of the
feature.
The flexible node allocation respects the cmd options and the
regression test options.
Closes #105
Closes #104
Closes #75