# Running jobs using multiple GPUs
## Running Nvidia Modulus on Sunbird
The Apptainer image is located at

```sh
[s.1915438@sl1 ~]$ module display modulus/22.07
-------------------------------------------------------------------
/apps/local/modules/tools/modulus/22.07:

module		 load apptainer/1.0.3 
module-whatis	 add NVIDIA Modulus to PATH environment variables 
setenv		 MODULUS_IMG /apps/local/tools/modulus/22.07/modulus_apptainer/modulus.img 
-------------------------------------------------------------------
```

The `setenv` tells us that the environment variable `MODULUS_IMG` points to `/apps/local/tools/modulus/22.07/modulus_apptainer/modulus.img`. So, we can run the Apptainer image with bypassed default volume binds and environments varible exports.

```sh
apptainer shell --nv --contain --cleanenv --bind "$(pwd)":/data,/tmp:/tmp $MODULUS_IMG
```

This works perfectly fine with 1 GPU. But for multiple GPU we need to use the `mpirun` [(link)](https://docs.nvidia.com/deeplearning/modulus/user_guide/features/performance.html?highlight=mpirun#running-jobs-using-multiple-gpus).

The command to use the `mpirun` is `mpirun -np #GPUs`. For example, with 2 GPUs

```sh
Apptainer> mpirun -np 2 python ldc/ldc_2d.py
Initialized process 0 of 2 using method "openmpi". Device set to cuda:0
Initialized process 1 of 2 using method "openmpi". Device set to cuda:1
```

# Problem with `mpirun` (SKIP IF NEEDED)
We need to export `$CUDA_VISIBLE_DEVICES` inside the Apptainer image otherwise if $n$ GPUs are allocated than `mpirun` will use first $n$ GPUs. For example, if I was allocated GPU number 2 and 3.

```sh
[s.1915438@scs2043 examples]$ echo $CUDA_VISIBLE_DEVICES
2,3
[s.1915438@scs2043 ~]$ nvidia-smi
Mon Sep 12 01:20:59 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A100-PCI...  On   | 00000000:27:00.0 Off |                    0 |
| N/A   53C    P0   231W / 250W |   5487MiB / 40960MiB |     57%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA A100-PCI...  On   | 00000000:28:00.0 Off |                    0 |
| N/A   56C    P0    94W / 250W |   5487MiB / 40960MiB |     71%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   2  NVIDIA A100-PCI...  On   | 00000000:43:00.0 Off |                    0 |
| N/A   45C    P0    47W / 250W |      2MiB / 40960MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   3  NVIDIA A100-PCI...  On   | 00000000:44:00.0 Off |                    0 |
| N/A   45C    P0    45W / 250W |      2MiB / 40960MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   4  NVIDIA A100-PCI...  On   | 00000000:A3:00.0 Off |                    0 |
| N/A   40C    P0    47W / 250W |      2MiB / 40960MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   5  NVIDIA A100-PCI...  On   | 00000000:A4:00.0 Off |                    0 |
| N/A   40C    P0    47W / 250W |      2MiB / 40960MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   6  NVIDIA A100-PCI...  On   | 00000000:C3:00.0 Off |                    0 |
| N/A   39C    P0    48W / 250W |      2MiB / 40960MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   7  NVIDIA A100-PCI...  On   | 00000000:C4:00.0 Off |                    0 |
| N/A   39C    P0    46W / 250W |      2MiB / 40960MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      2282      C   python                           5485MiB |
|    1   N/A  N/A      2283      C   python                           5485MiB |
+-----------------------------------------------------------------------------+
```

# Use `mpirun`

We need to export `$CUDA_VISIBLE_DEVICES` to properly use the right GPUs.

```sh
[s.1915438@scs2043 ~]$ nvidia-smi -i 2,3
Mon Sep 12 01:29:42 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   2  NVIDIA A100-PCI...  On   | 00000000:43:00.0 Off |                    0 |
| N/A   63C    P0   112W / 250W |   5487MiB / 40960MiB |     66%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   3  NVIDIA A100-PCI...  On   | 00000000:44:00.0 Off |                    0 |
| N/A   62C    P0   206W / 250W |   5487MiB / 40960MiB |     82%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    2   N/A  N/A      3255      C   python                           5485MiB |
|    3   N/A  N/A      3256      C   python                           5485MiB |
+-----------------------------------------------------------------------------+
```

# Best Practice
* Allocate the resources


```sh
salloc --nodes=1 --account=scw1901 --partition=accel_ai --gres=gpu:2 --nodelist=scs2043
```

* Switch to the compute node
```sh
srun --pty bash
```

* Load NVIDIA Modulus 22.07

```sh
module load modulus/22.07
```

* Start the container with [--env](https://apptainer.org/user-docs/master/environment_and_metadata.html#env-option) for CUDA devices
```sh
apptainer shell --nv --contain --cleanenv --bind "$(pwd)":/data,/tmp:/tmp --env CUDA_VISIBLE_DEVICES=$CUDA_VISIBLE_DEVICES $MODULUS_IMG
```

One can check if the `$CUDA_VISIBLE_DEVICES` was successfully imported.
```sh
Apptainer> env | grep CUDA_VISIBLE_DEVICES
CUDA_VISIBLE_DEVICES=2,3
```

* Running an example.
```sh
Apptainer> cd /data/ldc/
Apptainer> ls
conf  conf_zeroEq  ldc_2d.py  ldc_2d_importance_sampling.py  ldc_2d_zeroEq.py  openfoam
Apptainer> mpirun -np 2 python ldc_2d.py 
Initialized process 0 of 2 using method "openmpi". Device set to cuda:0
Initialized process 1 of 2 using method "openmpi". Device set to cuda:1
[01:28:14] - attempting to restore from: outputs/ldc_2d
[01:28:14] - optimizer checkpoint not found
[01:28:14] - model flow_network.pth not found
[01:28:14] - attempting to restore from: outputs/ldc_2d
[01:28:14] - optimizer checkpoint not found
[01:28:14] - model flow_network.pth not found
[01:28:24] - [step:          0] record constraint batch time:  3.484e-01s
[01:28:40] - [step:          0] record validators time:  1.606e+01s
[01:28:52] - [step:          0] record inferencers time:  1.134e+01s
[01:28:52] - [step:          0] saved checkpoint to outputs/ldc_2d
[01:28:52] - [step:          0] loss:  5.037e-02
[01:28:52] - Reducer buckets have been rebuilt in this iteration.
[01:28:52] - Reducer buckets have been rebuilt in this iteration.
[01:28:52] - Reducer buckets have been rebuilt in this iteration.
[01:28:52] - Reducer buckets have been rebuilt in this iteration.
[01:28:52] - Reducer buckets have been rebuilt in this iteration.
[01:28:52] - Reducer buckets have been rebuilt in this iteration.
[01:28:54] - Attempting cuda graph building, this may take a bit...
[01:28:54] - Attempting cuda graph building, this may take a bit...
[01:29:00] - [step:        100] loss:  7.916e-03, time/iteration:  8.412e+01 ms
[01:29:07] - [step:        200] loss:  5.416e-03, time/iteration:  6.566e+01 ms
[01:29:14] - [step:        300] loss:  4.992e-03, time/iteration:  6.685e+01 ms
[01:29:20] - [step:        400] loss:  3.430e-03, time/iteration:  6.476e+01 ms
[01:29:28] - [step:        500] loss:  2.418e-03, time/iteration:  8.192e+01 ms
[01:29:35] - [step:        600] loss:  2.150e-03, time/iteration:  6.622e+01 ms
[01:29:41] - [step:        700] loss:  1.699e-03, time/iteration:  6.515e+01 ms
```

* The `nvidia-smi`'s output is as follows:

```sh
[s.1915438@sl1 ~]$ ssh scs2043
Last login: Mon Sep 12 01:13:49 2022 from sl1
[s.1915438@scs2043 ~]$ nvidia-smi -i 2,3
Mon Sep 12 01:29:42 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   2  NVIDIA A100-PCI...  On   | 00000000:43:00.0 Off |                    0 |
| N/A   63C    P0   112W / 250W |   5487MiB / 40960MiB |     66%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   3  NVIDIA A100-PCI...  On   | 00000000:44:00.0 Off |                    0 |
| N/A   62C    P0   206W / 250W |   5487MiB / 40960MiB |     82%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    2   N/A  N/A      3255      C   python                           5485MiB |
|    3   N/A  N/A      3256      C   python                           5485MiB |
+-----------------------------------------------------------------------------+
[s.1915438@scs2043 ~]$ 
```