Getting 2 nodes on what was supposed to be an intranode test

I have an OSU test that is supposed to test point-to-point GPU communication. Essentially, it sets `num_tasks=2`, and `num_tasks_per_node=2`. The job script produced is:

```
#!/bin/bash
#SBATCH --job-name="rfm_EESSI_OSU_pt2pt_GPU_87fbf5ce"
#SBATCH --ntasks=2
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=32
#SBATCH --output=rfm_job.out
#SBATCH --error=rfm_job.err
#SBATCH --time=0:30:0
#SBATCH -p gpu_h100
#SBATCH --export=None
#SBATCH --mem=737280M
#SBATCH --gpus-per-node=4
source /cvmfs/software.eessi.io/versions/2023.06/init/bash
module load OSU-Micro-Benchmarks/7.5-gompi-2023b-CUDA-12.4.0
...
mpirun -np 2 osu_bw -m 4194304 -x 5 -i 10 -c -d cuda D D
...
```

I saw strongly varying performance: either 25 GB/s, or 120 GB/s. Based on our interconnect and the connectivity between GPUs, 25 GB/s matches our internode GPU to GPU performance, whereas 120 GB/s matches the intranode GPU to GPU performance. Checking the run-report, I saw:

```
          "outputdir": "/home/jenkins/EESSI/reframe_CI_runs/output/snellius/gpu_H100/default/EESSI_OSU_pt2pt_GPU_87fbf5ce",
...
          "job_nodelist": [
            "gcn114",
            "gcn149"
          ],
```

I.e. this particular test was being scheduled to _two_ nodes. I was a bit surprised by this behavior, but reading the SLURM documentation carefully, it becomes clear why:

> --ntasks-per-node=<ntasks>
Request that ntasks be invoked on each node. If used with the --ntasks option, the --ntasks option will take precedence and the --ntasks-per-node will be treated as a maximum count of tasks per node. Meant to be used with the --nodes option. This is related to --cpus-per-task=ncpus, but does not require knowledge of the actual number of cpus on each node. In some cases, it is more convenient to be able to request that no more than a specific number of tasks be invoked on each node. Examples of this include submitting a hybrid MPI/OpenMP app where only one MPI "task/rank" should be assigned to each node while allowing the OpenMP portion to utilize all of the parallelism present in the node, or submitting a single setup/cleanup/monitoring job to each node of a pre-existing allocation as one step in a larger job script.

Note in particular

>  If used with the --ntasks option, the --ntasks option will take precedence and the --ntasks-per-node will be treated as a maximum count of tasks per node. Meant to be used with the --nodes option.

I.e. they basically say: you _should_ use it with `--nodes`, and if you use it with `--ntasks` instead, it's considered a _maximum_ count of tasks per node. That gives SLURM the liberty of actually scheduling 2 nodes with 1 tasks per node - which is what is happening in my case. From a regression testing perspective, this is clearly undesirable, as it leads to unexpected changes in performance from one run to the next. Actually, I'd consider it a bug, because the ReFrame docs specify:

> num_tasks_per_node= None
Number of tasks per node required by this test.

Which suggest thats _exactly_ the number of tasks per node you'll get (and not a maximum, like it is for SLURM), but then by specifying `--ntasks` and `--ntasks-per-node` (and _not_ `--nnodes`), ReFrame doesn't give the right instructions to the SLURM backend in order to trigger the promised behavior.

Now, I know the `use_nodes_option` exists, and it does resolve the issue, but it's default value is `False`. I'd consider it preferable to _change the default_ to `True`, so that the behavior of `num_tasks_per_node` as it is documented in the ReFrame docs matches with the behavior it triggers on the SLURM side.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Getting 2 nodes on what was supposed to be an intranode test #3580

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Getting 2 nodes on what was supposed to be an intranode test #3580

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions