[feat] Add ability to specify number of nodes for slurm scheduler #2693

ethanjjjjjjj · 2022-11-26T00:07:18Z

I believe that being able to explicitly ask for a number of nodes would be a useful feature, especially where the same set of tests are reused between clusters and partitions. This should give the ability to request full nodes from the Slurm scheduler without having to be explicit about how many tasks to run with, or how many cores the node has.

Helps solve some of the issues mentioned in #2093

jenkins-cscs · 2022-11-26T00:10:03Z

Can I test this patch?

ethanjjjjjjj · 2022-11-26T16:33:06Z

I'd appreciate some guidance on how to cleanly introduce this feature into ReFrame, as it stands, my needs are met by this small change, but I'm happy to keep working on it until many other people are satisfied.

Fix my poorformatting

vkarak · 2022-11-27T20:30:28Z

Wouldn't setting num_tasks = N and num_tasks_per_node = 1 have the same effect? Also, you can instruct reframe to emit the -N option always by setting the use_nodes_option configuration option.

If you use that option, note that in 4.0 all the scheduler-specific options will be moved inside the partition definition (see also issue #2669).

ethanjjjjjjj · 2022-11-27T21:44:19Z

Wouldn't setting num_tasks = N and num_tasks_per_node = 1 have the same effect? Also, you can instruct reframe to emit the -N option always by setting the use_nodes_option configuration option.

This is very close to what I want, it adds the --nodes line to the job script, but then only allows me to allocate a fixed number of mpi tasks per node.

I'd like that to be automatically filled in by the scheduler, which slurm will do as long as you leave --ntasks out of your script. Allowing Slurm to come up with the number of tasks, means that I can run the test on a new cluster without having to specify how many tasks will be required to fill the node (assuming 1 cpu per task).

For example with this PR:

class hpcg(rfm.RegressionTest):
    sourcepath="hpcg-3.1"
    num_cpus_per_task=1
    num_tasks=None
    exclusive_access=True
    num_nodes=4

Generates:

#!/bin/bash
#SBATCH --job-name="rfm_job"
#SBATCH --cpus-per-task=1
#SBATCH --nodes=4
#SBATCH --output=rfm_job.out
#SBATCH --error=rfm_job.err
#SBATCH --time=1:0:0
#SBATCH --exclusive
srun --cpus-per-task=1 hpcg-3.1/bin/xhpcg

Which in turn allocates 28 tasks per node for a total of 96 tasks on my system.

vkarak · 2022-11-28T22:10:14Z

You are right, I wasn't aware of this capability of the --cpus-per-task:

If -c is specified without -n, as many tasks will be allocated per node as possible while satisfying the -c restriction.

My concern regarding the implementation is that Introducing simply a num_nodes in parallel with the num_tasks will require changes in many places and might have strange side-effects if both are specified. It's not only the Slurm backend that should be updated; all the other backends should treat that option and should do something if it is specified along with the num_tasks and num_tasks_per_node. Slurm gives much more context in all of these options compared to other schedulers, and I would rather avoid implementing Slurm's behaviour in all other backends. That was also the rationale of keeping minimal and well defined what is specified in a test by default. Adding a num_nodes would require us to introduce logic and possibly reframe-specific interpretations on what to do if a job spec is over-specified. There is also the flexible node allocation and the --distribute option that should take care of a test that specifies num_nodes, num_tasks and num_tasks_per_node. On the other hand, what you propose here is a valid request that reframe should have a way to support this behaviour for the Slurm backend. I'm thinking what could be an alternative implementation that would be rather Slurm-specific and would not have to be exposed to the test API, if possible.

ethanjjjjjjj · 2022-11-28T23:56:46Z

It seems to me that Slurm is already given special treatment with many options as shown by the table here. While adding another is definitely not the cleanest way to deal with this looking to the future it certainly wouldn't be out of place with the current state of things. I agree and I'm on board with finding a more long term solution to all those other examples too though.

In this case, my requirements depends on the omission of --ntasks from the job script, therefore I believe there should still be a way to set num_tasks=None and this requires at least some small modification to the test API.

I think having the option of being explicit within the frontend about exactly how the job is run by the scheduler behind the scenes would be desirable to many people. They might prefer to set num_nodes and num_tasks explicitly as you would in a job script, which is currently how this PR deals with assigning both of these options. Perhaps a dictionary of options within something like RegressionTest.scheduler_opts.slurm could satisfy this?

I think --distribute still launches one test per node that meets the filter which can still be multi node jobs, so in this case, I think specifying num_nodes explicitly doesn't change the behaviour here too much. Possibly if --distribute filters down to a list of 20 nodes, and you request num_nodes=4 then ReFrame should spawn 5 jobs, with 4 nodes each from the 20, what do you think?

vkarak · 2022-11-29T19:42:20Z

What about the following? If num_tasks could be set to None, which I agree is unavoidable for achieving this scenario, then you could write your test as follows:

class my_test(...):
    num_tasks = None

    @run_before('run')    # or anywhere after setup
    def setup_job(self):
        self.job.options = [f'--nodes={N}']

Setting num_tasks to None, which is the only scheduler-specific variable that is not allowed to be None now, would be somehow the equivalent of telling the framework "I know what scheduler I'm using and I know what I'm doing with my job options." That's fine, I think, and then you can pass any additional options as I've shown above. Regarding the implementation, we should treat the case for other backends, but I believe this should also be straightforward: we don't emit anything derived from num_tasks and we simply process the user specified job options (that's already done).

ethanjjjjjjj · 2022-12-01T10:55:23Z

Yep, that makes sense to me. Would definitely give me the control that I need over the job script.

vkarak · 2023-01-26T19:49:55Z

@ethanjjjjjjj I will close and reopen this against the develop branch.

add ability to specify number of nodes, for slurm scheduler

35bf427

ethanjjjjjjj added 6 commits November 26, 2022 00:17

fix adding --ntasks to jobscript twice

72208ef

correct default value of num_nodes

ed2b221

permit num_nodes to be loggable

a702f3e

Include specifying nodes for srunalloc launcher

8372b8d

removed debug print

4c10dc5

allow nodes and tasks to both be set

d556499

vkarak requested review from ekouts, victorusu and vkarak November 26, 2022 14:58

vkarak assigned ethanjjjjjjj Nov 26, 2022

vkarak added prio: normal enhancement schedulers labels Nov 26, 2022

ethanjjjjjjj added 2 commits November 26, 2022 15:56

clean up slurm.py

fd0f18d

document changes

6b80d09

ethanjjjjjjj marked this pull request as ready for review November 26, 2022 16:32

Update pipeline.py

636fbb1

Fix my poorformatting

allow compatibility with use_nodes_option system setting

1cdc415

vkarak changed the title ~~Add ability to specify number of nodes for slurm scheduler~~ [feat] Add ability to specify number of nodes for slurm scheduler Nov 28, 2022

vkarak modified the milestone: ReFrame sprint 22.12.1 Nov 30, 2022

vkarak added this to the ReFrame Sprint 23.01 milestone Jan 12, 2023

vkarak self-assigned this Jan 12, 2023

vkarak closed this Jan 26, 2023

vkarak mentioned this pull request Jan 26, 2023

[feat] Add ability to specify number of nodes for slurm scheduler #2768

Closed

vkarak mentioned this pull request Feb 3, 2023

[feat] Allow num_tasks to be None #2778

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[feat] Add ability to specify number of nodes for slurm scheduler #2693

[feat] Add ability to specify number of nodes for slurm scheduler #2693

Uh oh!

ethanjjjjjjj commented Nov 26, 2022 •

edited

Loading

Uh oh!

jenkins-cscs commented Nov 26, 2022

Uh oh!

ethanjjjjjjj commented Nov 26, 2022

Uh oh!

vkarak commented Nov 27, 2022

Uh oh!

ethanjjjjjjj commented Nov 27, 2022 •

edited

Loading

Uh oh!

vkarak commented Nov 28, 2022

Uh oh!

ethanjjjjjjj commented Nov 28, 2022 •

edited

Loading

Uh oh!

vkarak commented Nov 29, 2022

Uh oh!

ethanjjjjjjj commented Dec 1, 2022

Uh oh!

vkarak commented Jan 26, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[feat] Add ability to specify number of nodes for slurm scheduler #2693

[feat] Add ability to specify number of nodes for slurm scheduler #2693

Uh oh!

Conversation

ethanjjjjjjj commented Nov 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jenkins-cscs commented Nov 26, 2022

Uh oh!

ethanjjjjjjj commented Nov 26, 2022

Uh oh!

vkarak commented Nov 27, 2022

Uh oh!

ethanjjjjjjj commented Nov 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vkarak commented Nov 28, 2022

Uh oh!

ethanjjjjjjj commented Nov 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vkarak commented Nov 29, 2022

Uh oh!

ethanjjjjjjj commented Dec 1, 2022

Uh oh!

vkarak commented Jan 26, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ethanjjjjjjj commented Nov 26, 2022 •

edited

Loading

ethanjjjjjjj commented Nov 27, 2022 •

edited

Loading

ethanjjjjjjj commented Nov 28, 2022 •

edited

Loading