## OSU G2G Bandwidth Benchmark with MPI4Py
In this example we use [IPCMagic](https://github.com/eth-cscs/ipcluster_magic/tree/master) to run a test from the [OSU Bandwidth benchmark](http://mvapich.cse.ohio-state.edu/benchmarks/) with MPI4Py from a Jupyter notebook.
Using [this example](https://mpi4py.readthedocs.io/en/stable/tutorial.html#cuda-aware-mpi-python-gpu-arrays), we adapted the [osu_bw.py](https://github.com/mpi4py/mpi4py/blob/d0228f0397403ff73d8f41d90d97b411efda6128/demo/osu_bw.py) script from the MPI4Py repository so it uses an array allocated on the GPU.

* From a shell in Piz Daint this can be run using this Slurm job script:
 
```
#!/bin/bash -l

#SBATCH --job-name=osubw
#SBATCH --time=00:05:00
#SBATCH --nodes=2
#SBATCH --ntasks-per-core=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=12
#SBATCH --partition=normal
#SBATCH --constraint=gpu
#SBATCH --account=<project>

# source python environment with cupy and mpi4py

export MPICH_RDMA_ENABLED_CUDA=1

srun python osu_bw_cupy.py
```

In [1]:
import os
import ipcmagic

In [2]:
os.environ['MPICH_RDMA_ENABLED_CUDA'] = '1'  # Enable direct communication between GPUs

In [3]:
%ipcluster --version

1.1.0


In [4]:
%ipcluster start -n 2

100%|██████████| 2/2 [00:06<00:00,  3.18s/engine]


In [5]:
# Disable IPyParallel's progress bar
%pxconfig --progress-after -1

In [6]:
%%px
import socket

socket.gethostname()

[0;31mOut[1:1]: [0m'nid07225'

[0;31mOut[0:1]: [0m'nid07225'

In [7]:
%%px
from osu_bw_cupy import osu_bw

In [None]:
%%px
osu_bw()

[0:execute]
[0;31m---------------------------------------------------------------------------[0m
[0;31mCUDARuntimeError[0m                          Traceback (most recent call last)
Cell [0;32mIn[3], line 1[0m
[0;32m----> 1[0m [43mosu_bw[49m[43m([49m[43m)[49m

File [0;32m/scratch/snx3000/class272/SummerUniversity2024/pyhpc/mpi/osu_bw_cupy.py:32[0m, in [0;36mosu_bw[0;34m(BENCHMARH, skip, loop, window_size, skip_large, loop_large, window_size_large, large_message_size, MAX_MSG_SIZE)[0m
[1;32m     29[0m         errmsg [38;5;241m=[39m [38;5;28;01mNone[39;00m
[1;32m     30[0m     [38;5;28;01mraise[39;00m [38;5;167;01mSystemExit[39;00m(errmsg)
[0;32m---> 32[0m s_buf [38;5;241m=[39m [43mcp[49m[38;5;241;43m.[39;49m[43marange[49m[43m([49m[43mMAX_MSG_SIZE[49m[43m,[49m[43m [49m[43mdtype[49m[38;5;241;43m=[39;49m[38;5;124;43m'[39;49m[38;5;124;43mi[39;49m[38;5;124;43m'[39;49m[43m)[49m
[1;32m     33[0m r_buf [38;5;241m=[39m cp[38;5;241

[stdout:1] # MPI G2G Bandwidth Test
# Size [B]    Bandwidth [MB/s]


Received Keyboard Interrupt. Sending signal SIGINT to engines...


In [None]:
%ipcluster stop