# Using HIP on Setonix

## Access to Setonix

Firstly you need a username and password to access Setonix. Your **username** and **password** will be given to you prior to the beginning of this workshop. If you are using your regular Pawsey account then you can reset your password [here](https://support.pawsey.org.au/password-reset/).

Access to Setonix is via Secure SHell (SSH). On Linux, Mac OS, and Windows 10 and higher, an SSH client is available from the command line or terminal application. Otherwise you need to use a client program like [Putty](https://www.putty.org/) or [MobaXterm](https://mobaxterm.mobatek.net/download-home-edition.html).

### Access with SSH on the command line

On the command line use the command **ssh** to access Setonix.

```bash
ssh -Y <username>@setonix.pawsey.org.au
```

#### Passwordless login with SSH

In order to avoid specifying a username and password on each login you can generate a key and password combination on your computer using the following on the command line.

```bash
ssh-keygen -t rsa
```

Then copy the public key (the file that ends in \*.pub) to your account on setonix and append it to the authorized keys in .ssh. On your machine run this command

```bash
scp -r <filename>.pub <username>@setonix.pawsey.org.au
```

Then login to Setonix and run this command

```bash
mkdir -p ${HOME}/.ssh
cat <filename>.pub >> ${HOME}/.ssh/authorized_keys
chmod -R 0400 ${HOME}/.ssh
```

Finally, if you are using MacOS or Linux you can add this line to ${HOME}/.ssh/config on your computer

```text
Host setonix
    Hostname setonix.pawsey.org.au
    IdentityFile <private_key_file>
    User <username>
    ForwardX11 yes
    ForwardAgent yes
    ServerAliveInterval 300
    ServerAliveCountMax 2
    TCPKeepAlive no
```

Then you can run 

```bash
ssh setonix
```

without a password.

### Access from Windows with the MobaXterm client

If you have a OS that is older than Windows 10 and need a client in a hurry, just download **MobaXterm Home (Portable Edition)** from [this location](https://mobaxterm.mobatek.net/download-home-edition.html). Extract the Zip file and run the application. You might need to accept a firewall notification. 

Now go to **Settings -> SSH** and uncheck **"Enable graphical SSH-browser"** in the SSH-browser settings pane. Also enable **"SSH keepalive"** to keep SSH connections active.

<figure style="margin-bottom 3em; margin-top: 2em; margin-left:auto; margin-right:auto; width:100%">
    <img style="vertical-align:middle" src="../images/MobaXTerm_Settings.svg"> <figcaption style= "text-align:lower; margin:1em; float:bottom; vertical-align:bottom;">Figure: MobaXTerm settings.</figcaption>
</figure>

Then close the settings and start a local terminal.



## Hardware environment

<figure style="margin: 1em; margin-left:auto; margin-right:auto; width:100%;">
    <img src="../images/MI250x.png">
    <figcaption style= "text-align:lower; margin:1em; float:bottom; vertical-align:bottom;">AMD Instinct<span>&trade;</span> MI250X compute architecture. Image credit: <a href="https://hc34.hotchips.org/")>AMD Instinct<span>&trade;</span> MI200 Series Accelerator and Node Architectures | Hot Chips 34</a></figcaption>
</figure>


### Setonix CPU specifications

| Computer | CPU | Nominal clock frequency (GHz) | Cores | Hardware threads | L1 Cache (KB) | L2 Cache (KB) | L3 cache (MB) | FP SIMD width (bits) | Tflops (FP32 calculated) |
|:----:|:----:|-----:| -----: | -----: | :----: | :----: | :----: | :----: | :----: |
| Setonix |AMD EPYC 7A53 | 2.0 | 64 | 128 | 64x32 | 64x512 | 8x32 | 256 | ~2 |

### Setonix GPU specifications

| Card | Boost clock (GHz)| Compute Units | FP32 Processing Elements | FP64 Processing Elements (equivalent compute capacity) | L1 Cache (KB) | L2 Cache (KB) | device memory (GB) | Peak Tflops (FP32)| Peak Tflops (FP64)|
|:----:|:-----| :----- | :----- | :---- | :---- | :---- | :---- | :---- | :---- |
| AMD Radeon Instinct MI250x |1.7 | 220 | 14080 | 14080 | 220x16 | 16000 | 128 | 47.9 | 47.9 |


## Job queues

On Setonix the following queues are available for general use:

|Queue| Max time limit| Processing elements (CPU) | Socket| Cores| processing elements per CPU core | Host memory (GB) | Number of GPU's | Memory per GPU (GB) |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
| work | 24 hours | 256 | 2 | 64 | 2 | 256 | 0 | 0 |
| long | 96 hours | 256 | 2 | 64 | 2 | 256 | 0 | 0 |
| debug | 1 hour | 256 | 2 | 64 | 2 | 256 | 0 | 0 |
| himem | 24 hours | 256 | 2 | 64 | 2 | 1000 | 0 | 0 |
| gpu | 24 hours | 128 | 1 | 64 | 2 | 128 | 4 | 128 |

## Interactive jobs on GPU nodes

```bash
salloc --account ${PAWSEY_PROJECT} --ntasks 1 --mem 4GB --cpus-per-task 1 --time 1:00:00 --gpus-per-task 2 --partition gpu
```

## Building software for Setonix

### Software modules

There are three main programming environments available on Setonix. Each provides C/C++ and Fortran compilers that build software with knowledge of of the MPI libraries available on Setonix. The **PrgEnv-GNU** programming environment uses the GNU compilers, **PrgEnv-aocc** uses the AMD aocc optimising compiler to try and get the best performance from the AMD CPU's on Setonix, and the **PrgEnv-cray** compilers use the compilers from Cray. Use these commands to find which module to load.

| Programming environment | command to use |
| :--- | :--- |
| AMD | ```module avail PrgEnv-aocc``` |
| Cray | ```module avail PrgEnv-cray``` |
| GNU | ```module avail PrgEnv-gnu``` |

When compiling HIP sources you have the choice of either the the ROCM **hipcc** compiler wrapper or the Cray compiler wrapper **CC** from **PrgEnv-cray**. If you use the Cray compiler wrapper you need to swap to the module **PrgEnv-cray** as the GNU programming environment (**PrgEnv-gnu**) is loaded by default. 

```bash
module swap PrgEnv-gnu PrgEnv-cray
```

Then the following compiler wrappers are available for use to compile source files:

| Command | Explanation |
| :--- | :--- |
| cc | C compiler |
| CC | C++ compiler |
| ftn | FORTRAN compiler |

In order to use the GPU-aware MPI library you also need to load the **craype-accel-amd-gfx90a** module, which works in all three programming environments. To see which version to load run this command.

```bash
module avail craype-accel-amd-gfx90a
```

Load the module **craype-accel-amd-gfx90a** then set the environment variable

```bash
export MPICH_GPU_SUPPORT_ENABLED=1
```

Finally, in order to have ROCM software (such as hipcc and rocgdb) and libraries available you need to have the **rocm** module loaded. To see which one to load run this command:

```bash
module avail rocm
```

The **rocm** module is independent of the programming environment module loaded. 

### Compiling software with HIP and MPI support

According to this [documentation](https://docs.amd.com/bundle/HIP-Programming-Guide-v5.0/page/Transitioning_from_CUDA_to_HIP.html) the AMD compiler wrapper **hipcc** can be use for compiling HIP source files and is the suggested linker for program objects. 

#### Compiling and linking with the **hipcc** compiler wrapper

You can use these compiler flags to bring in the MPI headers and make sure **hipcc** compiles kernels for the MI250X GPU's on Setonix.

| Function | flags |
| :--- | :--- |
| Compile | ```-I${MPICH_DIR}/include --offload-arch=gfx90a ``` |
| Link | ```-L${MPICH_DIR}/lib -lmpi ${PE_MPICH_GTL_DIR_amd_gfx90a} ${PE_MPICH_GTL_LIBS_amd_gfx90a}``` |
| Debug (compile and link) | ```-ggdb``` |
| OpenMP (compile and link)| ```-fopenmp``` |

If you want **hipcc** to behave like Cray **CC**, make sure the **PrgEnv-cray** and **craype-accel-amd-gfx90a** modules are also loaded. Then you can add the output of this command,

```bash
$(CC --cray-print-opts=cflags)
```

to the hipcc compile flags, and the output of this command,

```bash
$(CC --cray-print-opts=libs)
```

to the hipcc linker flags.

#### Compiling and linking with the Cray **CC** compiler wrapper 

If you are using the Cray compiler wrapper **CC** you can add these flags to compile and link HIP code for the MI250X GPU's on Setonix. You need to have the **rocm** module loaded.

| Function | flags |
| :--- | :--- |
| Compile | ```-D__HIP_ROCclr__ -D__HIP_ARCH_GFX90A__=1 --offload-arch=gfx90a -x hip``` |
| Link |  |
| Debug (compile and link) | ```-g``` |
| OpenMP (compile and link)| ```-fopenmp``` |

#### Mixing hipcc and Cray compilation

From this [documentation](https://docs.amd.com/bundle/HIP-Programming-Guide-v5.0/page/Transitioning_from_CUDA_to_HIP.html) it is important to note that all code links back to the same C++ standard libaries. The command ```hipconfig --cxx``` generates extra compile flags that might be useful for including in the build process with the Cray wrapper. 

## Batch jobs on GPU nodes

Options 

**--distribution=cyclic:cyclic:cyclic**

**--gpu-bind=closest**

## Exercise: compile your first HIP application with MPI 

In the files [hello_devices_mpi.cpp](hello_devices_mpi.cpp) and [hello_devices_mpi_onefile.cpp](hello_devices_mpi_onefile.cpp) are files to implement HIP application that fills a vector. The difference between the two is that for [hello_devices_mpi.cpp](hello_devices_mpi.cpp) has it's kernels located in the file [kernels.hip.cpp](kernels.hip.cpp) for separate compilation. Your task is to compile these files.

In [1]:
!make clean; make

rm -r *.exe
g++ -std=c++11 -g -O2 -fopenmp -I/usr/include -I../include -L/usr/lib64 hello_devices.cpp\
	-o hello_devices.exe -lOpenCL -lomp
In file included from [01m[Khello_devices.cpp:2:0[m[K:
[01m[K../include/cl_helper.hpp:[m[K In function ‘[01m[K_cl_command_queue** h_create_command_queues(_cl_device_id**, _cl_context**, cl_uint, cl_uint, cl_bool, cl_bool)[m[K’:
         [01;35m[K)[m[K;
         [01;35m[K^[m[K
In file included from [01m[K/usr/include/CL/opencl.h:24:0[m[K,
                 from [01m[K../include/cl_helper.hpp:15[m[K,
                 from [01m[Khello_devices.cpp:2[m[K:
[01m[K/usr/include/CL/cl.h:1906:1:[m[K [01;36m[Knote: [m[Kdeclared here
 [01;36m[KclCreateCommandQueue[m[K(cl_context                     context,
 [01;36m[K^~~~~~~~~~~~~~~~~~~~[m[K


In [2]:
!rocminfo -l

[37mROCk module is loaded[0m
HSA System Attributes    
Runtime Version:         1.1
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             

HSA Agents               
*******                  
Agent 1                  
*******                  
  Name:                    AMD Ryzen 7 6800H with Radeon Graphics
  Uuid:                    CPU-XX                             
  Marketing Name:          AMD Ryzen 7 6800H with Radeon Graphics
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                   