Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 17 additions & 26 deletions docs/guides/gb2025.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
[](){#ref-gb2025}
# Gordon Bell and HPL runs 2025

For Gordon Bell and HPL runs in March-April 2025, CSCS has created a reservation on Santis with 1333 nodes (12 cabinets).
For Gordon Bell and HPL runs in March-April 2025, CSCS has expanded Santis to 1333 nodes (12 cabinets).

For the runs, CSCS has applied some updates and changes that aim to improve performance and scaling scale, particularly for NCCL.
If you are already familiar with running on Daint, you might have to make some small changes to your current job scripts and parameters, which will be documented here.
Expand All @@ -27,6 +27,18 @@ Host santis

The `normal` partition is used with no reservation, which means that that jobs can be submittied without `--partition` and `--reservation` flags.

Timeline:

1. Friday 4th April:
* HPE finish HPL runs at 10:30am
* CSCS performs testing on the reconfigured system for ~1 hour on the `GB_TESTING_2` reservation
* The reservation is removed and all GB teams have access to test and tune applications.
2. Monday 7th April:
* at 4pm the runs will start for the first team

!!! note
There will be no special reservation during the open testing and tuning between Friday and Monday.

### Storage

Your data sets from Daint are available on Santis
Expand All @@ -37,51 +49,30 @@ Your data sets from Daint are available on Santis

## Low Noise Mode

Low noise mode (LNM) is now enabled.
This confines system processes and operations to the first core of each of the four NUMA regions in a node (i.e., cores 0, 72, 144, 216).

The consequence of this setting is that only 71 cores per socket can be requested by an application (for a total of 284 cores instead of 288 cores per node).
!!! note
Low noise mode has been disabled, so the previous requirement that you set `OMP_PLACES` and `OMP_PROC_BIND` no longer applies.

!!! warning "Unable to allocate resources: Requested node configuration is not available"
If you try to use all 72 cores on each socket, SLURM will give a hard error, because only 71 are available:

```console
# try to run 4 ranks per node, with 72 cores each
$ srun -n4 -N1 -c72 --reservation=reshuffling ./build/affinity.mpi
$ srun -n4 -N1 -c72 ./build/affinity.mpi
srun: error: Unable to allocate resources: Requested node configuration is not available
```

One consequence of this change is that thread affinity and OpenMP settings that worked on Daint might cause large slowdown in the new configuration.

### SLURM

Explicitly set the number of cores per task using the `--cpus-per-task/-c` flag, e.g.:
For example:
```
#SBATCH --cpus-per-task=64
#SBATCH --cpus-per-task=71
```
or
```
srun -N1 -n4 -c71 ...
```

**Do not** use the `--cpu-bind` flag to control affinity

* this can cause large slowdown, particularly with `--cpu-bind=socket`. We are investigating how to fix this.

If you see significant slowdown and you want to report it, please provide the output of using the `--cpu-bind=verbose` flag.

### OpenMP

If your application uses OpenMP, try setting the following in your job script:

```bash
export OMP_PLACES=cores
export OMP_PROC_BIND=close
```

Without these settings, we have observed application slowdown due to poor thread placement.

## NCCL

!!! todo
Expand Down