eth-cscs · msimberg · Jul 8, 2025 · Jul 8, 2025 · Jul 8, 2025 · Jul 8, 2025
@@ -10,12 +10,9 @@ CHARMM
 CHF
 COSMA
 CPE
-cpe
 CPMD
-CSCS
 CWP
 CXI
-capstor
 Ceph
 Containerfile
 DNS
@@ -46,7 +43,6 @@ HPCP
 HPE
 HSN
 Hartree
-iopsstor
 Jax
 Jira
 Keycloak
@@ -101,23 +97,32 @@ acl
 biomolecular
 bristen
 bytecode
+capstor
 clariden
 concretise
 concretizer
 containerised
+cpe
+cscs
 customised
 diagonalisation
 eiger
 filesystems
 groundstate
+ijulia
 inodes
+iopsstor
 lexer
 libfabric
+miniconda
+mpi
 multitenancy
+nsight
 podman
-prioritised
 prgenv
+prioritised
 proactively
+pytorch
 quickstart
 santis
 sbatch

@@ -33,6 +33,7 @@ jobs:
         suppress_push_for_open_pull_request: ${{ github.actor != 'dependabot[bot]' && 1 }}
         checkout: true
         check_file_names: 0
+        only_check_changed_files: 1
         post_comment: 1
         use_magic_file: 1
         warnings: bad-regex,binary-file,deprecated-feature,large-file,limited-references,no-newline-at-eof,noisy-file,non-alpha-in-dictionary,token-is-substring,unexpected-line-ending,whitespace-in-dictionary,minified-file,unsupported-configuration,no-files-to-check

@@ -5,7 +5,7 @@

 The JupyterHub service enables the interactive execution of JupyterLab on the compute nodes of [Daint][ref-cluster-daint], [Clariden][ref-cluster-clariden], [Santis][ref-cluster-santis] and [Eiger][ref-cluster-eiger].

 The service is accessed at [jupyter-daint.cscs.ch](https://jupyter-daint.cscs.ch/), [jupyter-clariden.cscs.ch](https://jupyter-clariden.cscs.ch/), [jupyter-santis.cscs.ch](https://jupyter-clariden.cscs.ch/) and [jupyter-eiger.cscs.ch](https://jupyter-eiger.cscs.ch), respectively. As the notebook servers are executed on compute nodes, you must have a project with compute resources available on the respective cluster.

 Once logged in, you will be redirected to the JupyterHub Spawner Options form, where typical job configuration options can be selected. These options might include the type and number of compute nodes, the wall time limit, and your project account.

@@ -51,7 +51,7 @@
        git+https://github.com/eth-cscs/firecrestspawner.git
    ```

    The package [nvdashboard](https://github.com/rapidsai/jupyterlab-nvdashboard) is also installed here, which allows to monitor system metrics at runtime.

    A corresponding TOML file can look like

@@ -148,7 +148,7 @@
 !!! important "pass a [`julia`][ref-uenv-julia] uenv and the view `jupyter`."
 
 When Julia is first used within Jupyter, IJulia and one or more Julia kernel need to be installed. 
-Type the following command in a shell within JupyterHub to install IJulia, the default Julia kernel and, on systems whith Nvidia GPUs, a Julia kernel running under Nvidia Nsight Systems:
+Type the following command in a shell within JupyterHub to install IJulia, the default Julia kernel and, on systems with Nvidia GPUs, a Julia kernel running under Nvidia Nsight Systems:
 ```bash
 install_ijulia
 ```
@@ -167,7 +167,7 @@

 ## Parallel computing

 ### MPI in the notebook via IPyParallel and MPI4Py

 MPI for Python provides bindings of the Message Passing Interface (MPI) standard for Python, allowing any Python program to exploit multiple processors.

@@ -199,13 +199,13 @@

 While it is generally recommended to submit long-running machine learning training and inference jobs via `sbatch`, certain use cases can benefit from an interactive Jupyter environment.

 A popular approach to run multi-GPU ML workloads is with [`accelerate`](https://github.com/huggingface/accelerate) and [`torchrun`](https://docs.pytorch.org/docs/stable/elastic/run.html) as demonstrated in the [tutorials][ref-guides-mlp-tutorials]. In particular, the `accelerate launch` script in the [LLM fine-tuning tutorial][ref-mlp-llm-finetuning-tutorial] can be directly carried over to a Jupyter cell with a `%%bash` header (to run its contents interpreted by bash). For `torchrun`, one can adapt the command from the multi-node [nanotron tutorial][ref-mlp-llm-nanotron-tutorial] to run on a single GH200 node using the following line in a Jupyter cell

 ```bash
 !python -m torch.distributed.run --standalone --nproc_per_node=4 run_train.py ...
 ```

 !!! warning "torchrun with virtual environments"
    When using a virtual environment on top of a base image with Pytorch, always replace `torchrun` with `python -m torch.distributed.run` to pick up the correct Python environment. Otherwise, the system Python environment will be used and virtual environment packages not available. If not using virtual environments such as with a self-contained Pytorch container, `torchrun` is equivalent to `python -m torch.distributed.run`.

 !!! note "Notebook structure"