Skip to content

Conversation

@teojgo
Copy link
Contributor

@teojgo teojgo commented Oct 17, 2018

  • Add support for flexible node allocation triggered when
    self.num_tasks is set to 0.

  • Add multiple unittests covering the main functionality of the
    feature.

  • The flexible node allocation respects the cmd options and the
    regression test options.

Closes #105
Closes #104
Closes #75

* Add support for flexible node allocation triggered when
  `self.num_tasks` is set to 0.

* Add multiple unittests covering the main functionality of the
  feature.

* The flexible node allocation respects the cmd options and the
  regression test options.
@teojgo teojgo added this to the ReFrame sprint 2018w41 milestone Oct 17, 2018
@teojgo teojgo self-assigned this Oct 17, 2018
@teojgo teojgo requested a review from vkarak October 17, 2018 12:48
@codecov-io
Copy link

codecov-io commented Oct 17, 2018

Codecov Report

Merging #522 into master will decrease coverage by <.01%.
The diff coverage is 91.76%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #522      +/-   ##
==========================================
- Coverage   91.59%   91.59%   -0.01%     
==========================================
  Files          72       72              
  Lines        8801     8954     +153     
==========================================
+ Hits         8061     8201     +140     
- Misses        740      753      +13
Impacted Files Coverage Δ
reframe/frontend/executors/policies.py 96.6% <ø> (ø) ⬆️
reframe/core/pipeline.py 91.7% <100%> (ø) ⬆️
reframe/frontend/executors/__init__.py 98.07% <100%> (ø) ⬆️
unittests/test_schedulers.py 98.11% <100%> (+0.36%) ⬆️
unittests/test_launchers.py 91.86% <66.66%> (-1%) ⬇️
reframe/frontend/cli.py 80.43% <75%> (-0.25%) ⬇️
reframe/core/schedulers/local.py 98.91% <75%> (-1.09%) ⬇️
reframe/core/schedulers/slurm.py 60.61% <81.57%> (+2.65%) ⬆️
reframe/core/schedulers/pbs.py 66.66% <85.71%> (-0.41%) ⬇️
reframe/core/schedulers/__init__.py 94.03% <94.28%> (-0.03%) ⬇️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d1e5566...ceaea12. Read the comment docs.

@vkarak vkarak requested a review from victorusu October 17, 2018 14:41
@vkarak
Copy link
Contributor

vkarak commented Oct 17, 2018

I've just had a nice talk with @victorusu on how this feature should behave depending on the flags specified (i.e., reservation, partition etc.) and we have had an idea that I think it simplifies the implementation and also the reasoning for the behaviour of this feature. The bottomline is that we should add a command-line option, --flexible-alloc-tasks={all,idle,NUM} that will essentially specify how we should get the node allocation in a flexible test. If the option is set to all, num_tasks should be set to all the nodes of the virtual partition that meet any additional constraints (i.e., reservation, partition, node list, etc.). If set to idle, num_tasks should be set to all the idle nodes of the virtual partition that meet any additional constraints. If set to a specific number, then num_tasks will be set to that number regardless. In pseudocode, this should look like this:

if is_number(flexible_alloc_tasks):
    self.num_tasks = flexible_alloc_tasks
else:
    all_nodes = get_all_nodes_of_virtual_partition()
    all_nodes &= reservation_nodes
    all_nodes &= partition_nodes
    all_nodes.filter_by_constraint()
    all_nodes -= excluded_nodes
    if flexible_alloc_tasks == 'idle':
        self.num_tasks = len(get_idle(all_nodes))
    else:
        self.num_tasks = len(all_nodes)

Copy link
Contributor

@vkarak vkarak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice PR! Apart from comments for enhancements, I'd like to ask whether it is possible to hoist the implementation of Slurm's guess_num_tasks() inside the base class Job and have subclasses implement only very specific tasks, such as "get the reservation nodes" etc. It seems that the algorithm you are using in guess_num_tasks is generic enough.

@teojgo teojgo changed the title [WIP] [feat] Support for flexible node allocation [feat] Support for flexible node allocation Oct 19, 2018
Theofilos Manitaras and others added 3 commits October 30, 2018 10:23
- Rename `get_available_nodes()` to `get_partition_nodes()`.
- Move try/except clause for backends not implementing the feature
  inside the `guess_num_tasks()` method.
- Other minor style changes.
@vkarak vkarak changed the title [feat] Support for flexible node allocation [feat] Full support for flexible task allocation Oct 30, 2018
@vkarak vkarak merged commit 7b9f158 into reframe-hpc:master Oct 30, 2018
@teojgo teojgo deleted the feature/flexible_nodelist branch November 2, 2018 13:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants