Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/running/slurm.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,21 +9,21 @@

<div class="grid cards" markdown>

- :fontawesome-solid-mountain-sun: __Configuring jobs__

Check failure on line 12 in docs/running/slurm.md

View workflow job for this annotation

GitHub Actions / Check Spelling

`fontawesome` is not a recognized word. (unrecognized-spelling)

Specific guidance for configuring Slurm jobs on different node types.

[:octicons-arrow-right-24: GH200 nodes (Daint, Clariden, Santis)][ref-slurm-gh200]

Check failure on line 16 in docs/running/slurm.md

View workflow job for this annotation

GitHub Actions / Check Spelling

`octicons` is not a recognized word. (unrecognized-spelling)

[:octicons-arrow-right-24: AMD CPU-only nodes (Eiger)][ref-slurm-amdcpu]

Check warning on line 18 in docs/running/slurm.md

View workflow job for this annotation

GitHub Actions / Check Spelling

`octicons` is not a recognized word. (unrecognized-spelling)

- :fontawesome-solid-mountain-sun: __Node sharing__

Check warning on line 20 in docs/running/slurm.md

View workflow job for this annotation

GitHub Actions / Check Spelling

`fontawesome` is not a recognized word. (unrecognized-spelling)

Guides on how to effectively use all resouces on nodes by running more than one job per node.

Check failure on line 22 in docs/running/slurm.md

View workflow job for this annotation

GitHub Actions / Check Spelling

`resouces` is not a recognized word. (unrecognized-spelling)

[:octicons-arrow-right-24: Node sharing][ref-slurm-sharing]

Check warning on line 24 in docs/running/slurm.md

View workflow job for this annotation

GitHub Actions / Check Spelling

`octicons` is not a recognized word. (unrecognized-spelling)

[:octicons-arrow-right-24: Multiple MPI jobs per node][ref-slurm-exclusive]

Check warning on line 26 in docs/running/slurm.md

View workflow job for this annotation

GitHub Actions / Check Spelling

`octicons` is not a recognized word. (unrecognized-spelling)

</div>

Expand Down Expand Up @@ -68,7 +68,7 @@
!!! note
The flags `--account` and `-Cmc` that were required on the old [Eiger][ref-cluster-eiger] cluster are no longer required.

## Prioritization and scheduling

Check failure on line 71 in docs/running/slurm.md

View workflow job for this annotation

GitHub Actions / Check Spelling

`Prioritization` is not a recognized word. (unrecognized-spelling)

Job priorities are determined based on each project's resource usage relative to its quarterly allocation, as well as in comparison to other projects.
An aging factor is also applied to each job in the queue to ensure fairness over time.
Expand Down Expand Up @@ -138,7 +138,7 @@

3. Enable CUDA support on systems that provide NVIDIA GPUs.

4. Enable ROCM support on systems that provide AMD GPUs.

Check failure on line 141 in docs/running/slurm.md

View workflow job for this annotation

GitHub Actions / Check Spelling

`ROCM` is not a recognized word. (unrecognized-spelling)

The build generates the following executables:

Expand Down Expand Up @@ -219,7 +219,7 @@

1. Test GPU affinity: note how all 4 ranks see the same 4 GPUs.

2. Test GPU affinity: note how the `--gpus-per-task=1` parameter assings a unique GPU to each rank.

Check failure on line 222 in docs/running/slurm.md

View workflow job for this annotation

GitHub Actions / Check Spelling

`assings` is not a recognized word. (unrecognized-spelling)

!!! info "Quick affinity checks"

Expand All @@ -242,7 +242,7 @@

The [GH200 nodes on Alps][ref-alps-gh200-node] have four GPUs per node, and Slurm job submissions must be configured appropriately to best make use of the resources.
Applications that can saturate the GPUs with a single process per GPU should generally prefer this mode.
[Configuring Slurm jobs to use a single GPU per rank][ref-slurm-gh200-single-rank-per-gpu] is also the most straightforward setup.

Check failure on line 245 in docs/running/slurm.md

View workflow job for this annotation

GitHub Actions / Check Spelling

`gpu` is not a recognized word. (unrecognized-spelling)
Some applications perform badly with a single rank per GPU, and require use of [NVIDIA's Multi-Process Service (MPS)] to oversubscribe GPUs with multiple ranks per GPU.

The best Slurm configuration is application- and workload-specific, so it is worth testing which works best in your particular case.
Expand All @@ -254,12 +254,12 @@
Unlike "exclusive process" mode, "default" mode allows multiple processes to submit work to a single GPU simultaneously.
This also means that different ranks on the same node can inadvertently use the same GPU leading to suboptimal performance or unused GPUs, rather than job failures.

Some applications benefit from using multiple ranks per GPU. However, [MPS should be used][ref-slurm-gh200-multi-rank-per-gpu] in these cases.

Check warning on line 257 in docs/running/slurm.md

View workflow job for this annotation

GitHub Actions / Check Spelling

`gpu` is not a recognized word. (unrecognized-spelling)

If you are unsure about which GPU is being used for a particular rank, print the `CUDA_VISIBLE_DEVICES` variable, along with e.g. `SLURM_LOCALID`, `SLURM_PROCID`, and `SLURM_NODEID` variables, in your job script.
If the variable is unset or empty all GPUs are visible to the rank and the rank will in most cases only use the first GPU.

[](){#ref-slurm-gh200-single-rank-per-gpu}

Check warning on line 262 in docs/running/slurm.md

View workflow job for this annotation

GitHub Actions / Check Spelling

`gpu` is not a recognized word. (unrecognized-spelling)
### One rank per GPU

Configuring Slurm to use one GH200 GPU per rank is easiest done using the `--ntasks-per-node=4` and `--gpus-per-task=1` Slurm flags.
Expand All @@ -278,7 +278,7 @@

Omitting the `--gpus-per-task` results in `CUDA_VISIBLE_DEVICES` being unset, which will lead to most applications using the first GPU on all ranks.

[](){#ref-slurm-gh200-multi-rank-per-gpu}

Check warning on line 281 in docs/running/slurm.md

View workflow job for this annotation

GitHub Actions / Check Spelling

`gpu` is not a recognized word. (unrecognized-spelling)
### Multiple ranks per GPU

Using multiple ranks per GPU can improve performance e.g. of applications that don't generate enough work for a GPU using a single rank, or ones that scale badly to all 72 cores of the Grace CPU.
Expand Down Expand Up @@ -347,13 +347,13 @@
[](){#ref-slurm-amdcpu}
## AMD CPU nodes

Alps has nodes with two AMD Epyc Rome CPU sockets per node for CPU-only workloads, most notably in the [Eiger][ref-cluster-eiger] cluster provided by the [HPC Platform][ref-platform-hpcp].

Check failure on line 350 in docs/running/slurm.md

View workflow job for this annotation

GitHub Actions / Check Spelling

`Epyc` is not a recognized word. (unrecognized-spelling)
For a detailed description of the node hardware, see the [AMD Rome node][ref-alps-zen2-node] hardware documentation.

??? info "Node description"
- The node has 2 x 64 core sockets
- Each socket is divided into 4 NUMA regions

Check failure on line 355 in docs/running/slurm.md

View workflow job for this annotation

GitHub Actions / Check Spelling

`NUMA` is not a recognized word. (unrecognized-spelling)
- the 16 cores in each NUMA region have faster memory access to their of 32 GB

Check warning on line 356 in docs/running/slurm.md

View workflow job for this annotation

GitHub Actions / Check Spelling

`NUMA` is not a recognized word. (unrecognized-spelling)
- Each core has two processing units (PUs)

![Screenshot](../images/slurm/eiger-topo.png)
Expand Down Expand Up @@ -401,7 +401,7 @@
srun --nodes=4 --ntasks-per-node=2
```

It is often more efficient to only run one task per core instead of the default two PU, which can be achieved using the `--hint=nomultithreading` option.
It is often more efficient to only run one task per core instead of the default two PU, which can be achieved using the `--hint=nomultithread` option.
```console title="One MPI rank per socket with 1 PU per core"
$ srun -n2 -N1 -c64 --hint=nomultithread ./affinity.mpi
affinity test for 2 MPI ranks
Expand All @@ -413,8 +413,8 @@
The best configuration for performance is highly application specific, with no one-size-fits-all configuration.
Take the time to experiment with `--hint=nomultithread`.

Memory on the node is divided into NUMA (non-uniform memory access) regions.

Check warning on line 416 in docs/running/slurm.md

View workflow job for this annotation

GitHub Actions / Check Spelling

`NUMA` is not a recognized word. (unrecognized-spelling)
The 256 GB of a standard-memory node are divided into 8 NUMA nodes of 32 GB, with 16 cores associated with each node:

Check warning on line 417 in docs/running/slurm.md

View workflow job for this annotation

GitHub Actions / Check Spelling

`NUMA` is not a recognized word. (unrecognized-spelling)

* memory access is optimal when all the cores of a rank are on the same NUMA node;
* memory access to NUMA regions on the other socket are significantly slower.
Expand Down Expand Up @@ -491,7 +491,7 @@
In the above examples all threads on each -- we are effectively allowing the OS to schedule the threads on the available set of cores as it sees fit.
This often gives the best performance, however sometimes it is beneficial to bind threads to explicit cores.

The OpenMP threading runtime provides additional options for controlling the pinning of threads to the cores assinged to each MPI rank.

Check failure on line 494 in docs/running/slurm.md

View workflow job for this annotation

GitHub Actions / Check Spelling

`assinged` is not a recognized word. (unrecognized-spelling)

Use the `--omp` flag with `affinity.mpi` to get more detailed information about OpenMP thread affinity.
For example, four MPI ranks on one node with four cores and four OpenMP threads:
Expand Down