Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 10 additions & 5 deletions .github/actions/spelling/allow.txt
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,9 @@ CHARMM
CHF
COSMA
CPE
cpe
CPMD
CSCS
CWP
CXI
capstor
Ceph
Containerfile
DNS
Expand Down Expand Up @@ -46,7 +43,6 @@ HPCP
HPE
HSN
Hartree
iopsstor
Jax
Jira
Keycloak
Expand Down Expand Up @@ -101,23 +97,32 @@ acl
biomolecular
bristen
bytecode
capstor
clariden
concretise
concretizer
containerised
cpe
cscs
customised
diagonalisation
eiger
filesystems
groundstate
ijulia
inodes
iopsstor
lexer
libfabric
miniconda
mpi
multitenancy
nsight
podman
prioritised
prgenv
prioritised
proactively
pytorch
quickstart
santis
sbatch
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/spelling.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ jobs:
suppress_push_for_open_pull_request: ${{ github.actor != 'dependabot[bot]' && 1 }}
checkout: true
check_file_names: 0
only_check_changed_files: 1
post_comment: 1
use_magic_file: 1
warnings: bad-regex,binary-file,deprecated-feature,large-file,limited-references,no-newline-at-eof,noisy-file,non-alpha-in-dictionary,token-is-substring,unexpected-line-ending,whitespace-in-dictionary,minified-file,unsupported-configuration,no-files-to-check
Expand Down
2 changes: 1 addition & 1 deletion docs/access/jupyterlab.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

The JupyterHub service enables the interactive execution of JupyterLab on the compute nodes of [Daint][ref-cluster-daint], [Clariden][ref-cluster-clariden], [Santis][ref-cluster-santis] and [Eiger][ref-cluster-eiger].

The service is accessed at [jupyter-daint.cscs.ch](https://jupyter-daint.cscs.ch/), [jupyter-clariden.cscs.ch](https://jupyter-clariden.cscs.ch/), [jupyter-santis.cscs.ch](https://jupyter-clariden.cscs.ch/) and [jupyter-eiger.cscs.ch](https://jupyter-eiger.cscs.ch), respectively. As the notebook servers are executed on compute nodes, you must have a project with compute resources available on the respective cluster.

Check warning on line 8 in docs/access/jupyterlab.md

View workflow job for this annotation

GitHub Actions / Check Spelling

`jupyter` is not a recognized word. (unrecognized-spelling)

Check warning on line 8 in docs/access/jupyterlab.md

View workflow job for this annotation

GitHub Actions / Check Spelling

`jupyter` is not a recognized word. (unrecognized-spelling)

Check warning on line 8 in docs/access/jupyterlab.md

View workflow job for this annotation

GitHub Actions / Check Spelling

`jupyter` is not a recognized word. (unrecognized-spelling)

Once logged in, you will be redirected to the JupyterHub Spawner Options form, where typical job configuration options can be selected. These options might include the type and number of compute nodes, the wall time limit, and your project account.

Expand Down Expand Up @@ -51,7 +51,7 @@
git+https://github.com/eth-cscs/firecrestspawner.git
```

The package [nvdashboard](https://github.com/rapidsai/jupyterlab-nvdashboard) is also installed here, which allows to monitor system metrics at runtime.

Check failure on line 54 in docs/access/jupyterlab.md

View workflow job for this annotation

GitHub Actions / Check Spelling

`nvdashboard` is not a recognized word. (unrecognized-spelling)

A corresponding TOML file can look like

Expand Down Expand Up @@ -148,7 +148,7 @@
!!! important "pass a [`julia`][ref-uenv-julia] uenv and the view `jupyter`."

When Julia is first used within Jupyter, IJulia and one or more Julia kernel need to be installed.
Type the following command in a shell within JupyterHub to install IJulia, the default Julia kernel and, on systems whith Nvidia GPUs, a Julia kernel running under Nvidia Nsight Systems:
Type the following command in a shell within JupyterHub to install IJulia, the default Julia kernel and, on systems with Nvidia GPUs, a Julia kernel running under Nvidia Nsight Systems:
```bash
install_ijulia
```
Expand All @@ -167,7 +167,7 @@

## Parallel computing

### MPI in the notebook via IPyParallel and MPI4Py

Check failure on line 170 in docs/access/jupyterlab.md

View workflow job for this annotation

GitHub Actions / Check Spelling

`IPy` is not a recognized word. (unrecognized-spelling)

MPI for Python provides bindings of the Message Passing Interface (MPI) standard for Python, allowing any Python program to exploit multiple processors.

Expand Down Expand Up @@ -199,13 +199,13 @@

While it is generally recommended to submit long-running machine learning training and inference jobs via `sbatch`, certain use cases can benefit from an interactive Jupyter environment.

A popular approach to run multi-GPU ML workloads is with [`accelerate`](https://github.com/huggingface/accelerate) and [`torchrun`](https://docs.pytorch.org/docs/stable/elastic/run.html) as demonstrated in the [tutorials][ref-guides-mlp-tutorials]. In particular, the `accelerate launch` script in the [LLM fine-tuning tutorial][ref-mlp-llm-finetuning-tutorial] can be directly carried over to a Jupyter cell with a `%%bash` header (to run its contents interpreted by bash). For `torchrun`, one can adapt the command from the multi-node [nanotron tutorial][ref-mlp-llm-nanotron-tutorial] to run on a single GH200 node using the following line in a Jupyter cell

Check failure on line 202 in docs/access/jupyterlab.md

View workflow job for this annotation

GitHub Actions / Check Spelling

`nanotron` is not a recognized word. (unrecognized-spelling)

```bash
!python -m torch.distributed.run --standalone --nproc_per_node=4 run_train.py ...
```

!!! warning "torchrun with virtual environments"

Check failure on line 208 in docs/access/jupyterlab.md

View workflow job for this annotation

GitHub Actions / Check Spelling

`torchrun` is not a recognized word. (unrecognized-spelling)
When using a virtual environment on top of a base image with Pytorch, always replace `torchrun` with `python -m torch.distributed.run` to pick up the correct Python environment. Otherwise, the system Python environment will be used and virtual environment packages not available. If not using virtual environments such as with a self-contained Pytorch container, `torchrun` is equivalent to `python -m torch.distributed.run`.

!!! note "Notebook structure"
Expand Down
Loading