eth-cscs · bcumming · Apr 4, 2025 · Apr 4, 2025
@@ -1,7 +1,7 @@
 [](){#ref-gb2025}
 # Gordon Bell and HPL runs 2025
 
-For Gordon Bell and HPL runs in March-April 2025, CSCS has created a reservation on Santis with 1333 nodes (12 cabinets).
+For Gordon Bell and HPL runs in March-April 2025, CSCS has expanded Santis to 1333 nodes (12 cabinets).
 
 For the runs, CSCS has applied some updates and changes that aim to improve performance and scaling scale, particularly for NCCL.
 If you are already familiar with running on Daint, you might have to make some small changes to your current job scripts and parameters, which will be documented here.
@@ -27,6 +27,18 @@ Host santis
 
 The `normal` partition is used with no reservation, which means that that jobs can be submittied without `--partition` and `--reservation` flags.
 
+Timeline:
+
+1. Friday 4th April:
+    * HPE finish HPL runs at 10:30am
+    * CSCS performs testing on the reconfigured system for ~1 hour on the `GB_TESTING_2` reservation
+    * The reservation is removed and all GB teams have access to test and tune applications.
+2. Monday 7th April:
+    * at 4pm the runs will start for the first team
+
+!!! note
+    There will be no special reservation during the open testing and tuning between Friday and Monday.
+
 ### Storage
 
 Your data sets from Daint are available on Santis
@@ -37,51 +49,30 @@ Your data sets from Daint are available on Santis
 
 ## Low Noise Mode
 
-Low noise mode (LNM) is now enabled.
-This confines system processes and operations to the first core of each of the four NUMA regions in a node (i.e., cores 0, 72, 144, 216).
-
-The consequence of this setting is that only 71 cores per socket can be requested by an application (for a total of 284 cores instead of 288 cores per node).
+!!! note
+    Low noise mode has been disabled, so the previous requirement that you set `OMP_PLACES` and `OMP_PROC_BIND` no longer applies.
 
 !!! warning "Unable to allocate resources: Requested node configuration is not available"
     If you try to use all 72 cores on each socket, SLURM will give a hard error, because only 71 are available:
 
     ```console
     # try to run 4 ranks per node, with 72 cores each
-    $ srun -n4 -N1 -c72 --reservation=reshuffling ./build/affinity.mpi
+    $ srun -n4 -N1 -c72 ./build/affinity.mpi
     srun: error: Unable to allocate resources: Requested node configuration is not available
     ```
 
-One consequence of this change is that thread affinity and OpenMP settings that worked on Daint might cause large slowdown in the new configuration.
-
 ### SLURM
 
 Explicitly set the number of cores per task using the `--cpus-per-task/-c` flag, e.g.:
+For example:
 ```
 #SBATCH --cpus-per-task=64
-#SBATCH --cpus-per-task=71
 ```
 or
 ```
 srun -N1 -n4 -c71 ...
 ```
 
-**Do not** use the `--cpu-bind` flag to control affinity
-
-* this can cause large slowdown, particularly with `--cpu-bind=socket`. We are investigating how to fix this.
-
-If you see significant slowdown and you want to report it, please provide the output of using the `--cpu-bind=verbose` flag.
-
-### OpenMP
-
-If your application uses OpenMP, try setting the following in your job script:
-
-```bash
-export OMP_PLACES=cores
-export OMP_PROC_BIND=close
-```
-
-Without these settings, we have observed application slowdown due to poor thread placement.
-
 ## NCCL
 
 !!! todo