Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 34 additions & 10 deletions docs/alps/hardware.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,22 +38,44 @@ This approach to cooling provides greater efficiency for the rack-level cooling,

Alps was installed in phases, starting with the installation of 1024 AMD Rome dual socket CPU nodes in 2020, through to the main installation of 2,688 Grace-Hopper nodes in 2024.

There are currently four node types in Alps, with another becoming available in 2025:
There are currently five node types in Alps:

| type | blades | nodes | CPU sockets | GPU devices |
| ---- | ------:| -----:| -----------:| -----------:|
| NVIDIA GH200 | 1344 | 2688 | 10,752 | 10,752 |
| AMD Rome | 256 | 1024 | 2,048 | -- |
| NVIDIA A100 | 72 | 144 | 144 | 576 |
| AMD MI250x | 12 | 24 | 24 | 96 |
| AMD MI300A | 64 | 128 | 512 | 512 |
| type | abbreviation | blades | nodes | CPU sockets | GPU devices |
| ---- | ------- | ------:| -----:| -----------:| -----------:|
| NVIDIA GH200 | gh200 | 1344 | 2688 | 10,752 | 10,752 |
| AMD Rome | zen2 | 256 | 1024 | 2,048 | -- |
| NVIDIA A100 | a100 | 72 | 144 | 144 | 576 |
| AMD MI250x | mi200 | 12 | 24 | 24 | 96 |
| AMD MI300A | mi300 | 64 | 128 | 512 | 512 |

[](){#ref-alps-gh200-node}
### NVIDIA GH200 GPU Nodes

!!! todo
!!! under-construction
The description of the GH200 nodes is a work in progress.
We will add more detailed information soon.
Please [get in touch](https://github.com/eth-cscs/cscs-docs/issues) if there is information that you want to see here.

There are 24 cabinets, in 4 rows with 6 cabinets per row, and each cabinet contains 112 nodes (for a total of 448 GH200):
* 8 chassis per cabinet
* 7 blades per chassis
* 2 nodes per blade

!!! info "Why 7 blades per chassis?"
A chassis can contain up to 8 blades, however Alps' gh200 chassis are underpopulated so that we can increase the amount of power delivered to each GPU.

Blanca Peak
Each node contains four Grace-Hopper modules and four corresponding network interface cards (NICS) per blade, as illustrated below:

![](../images/alps/gh200-schematic.svg)

??? info "Node xname"
There are two boards per blade with one node per board.
This is different to the `zen2` CPU-only nodes (used for example in Eiger) that have two nodes per board for a total of four nodes per blade.
As such, there are no `n1` nodes in the xname list, e.g.:
```
x1100c0s6b0n0
x1100c0s6b1n0
```

[](){#ref-alps-zen2-node}
### AMD Rome CPU Nodes
Expand All @@ -79,6 +101,8 @@ Bard Peak
[](){#ref-alps-mi300-node}
### AMD MI300A GPU Nodes

![](../images/alps/mi300-schematic.svg)

!!! todo

Parry Peak
60 changes: 60 additions & 0 deletions docs/contributing/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -229,6 +229,66 @@ They stand out better from the main text, and can be collapsed by default if nee

If an admonition is collapsed by default, it should have a title.

We provide some custom admonitions.

#### Change

For adding information about a change, originally designed for recording updates to clusters.

=== "Rendered"
!!! change "2025-04-17"
* SLURM was upgraded to version 25.1.
* uenv was upgraded to v0.8

Old changes can be folded:

??? change "2025-02-04"
* The new Scratch cleanup policy was implemented
* NVIDIA driver was updated

=== "Markdown"
```
!!! change "2025-04-17"
* SLURM was upgraded to version 25.1.
* uenv was upgraded to v0.8
```

Old changes can be folded:

```
??? change "2025-02-04"
* The new Scratch cleanup policy was implemented
* NVIDIA driver was updated
```

#### Under construction

For marking incomplete sections.

=== "Rendered"
!!! under-construction
This is not finished yet!

=== "Markdown"
```
!!! under-construction
This is not finished yet!
```

#### Todo

As a placeholder for documentation that needs to be written.

=== "Rendered"
!!! todo
Add some common error messages and how to fix them.

=== "Markdown"
```
!!! todo
Add some common error messages and how to fix them.
```

### Code blocks

Use [code blocks](https://squidfunk.github.io/mkdocs-material/reference/code-blocks/) when you want to display monospace text in a programming language, terminal output, configuration files etc.
Expand Down
Loading