## OSU G2G Bandwidth Benchmark with MPI4Py
In this example we use [IPCMagic](https://github.com/eth-cscs/ipcluster_magic/tree/master) to run a test from the [OSU Bandwidth benchmark](http://mvapich.cse.ohio-state.edu/benchmarks/) with MPI4Py from a Jupyter notebook.
Using [this example](https://mpi4py.readthedocs.io/en/stable/tutorial.html#cuda-aware-mpi-python-gpu-arrays), we adapted the [osu_bw.py](https://github.com/mpi4py/mpi4py/blob/d0228f0397403ff73d8f41d90d97b411efda6128/demo/osu_bw.py) script from the MPI4Py repository so it uses an array allocated on the GPU.

* From a shell in Piz Daint this can be run using this Slurm job script:
 
```
#!/bin/bash -l

#SBATCH --job-name=osubw
#SBATCH --time=00:05:00
#SBATCH --nodes=2
#SBATCH --ntasks-per-core=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=12
#SBATCH --partition=normal
#SBATCH --constraint=gpu
#SBATCH --account=<project>

# source python environment with cupy and mpi4py

export MPICH_RDMA_ENABLED_CUDA=1

srun python osu_bw_cupy.py
```

In [None]:
import os
import ipcmagic

In [None]:
os.environ['MPICH_RDMA_ENABLED_CUDA'] = '1'  # Enable direct communication between GPUs

In [None]:
%ipcluster --version

In [None]:
%ipcluster start -n 2

In [None]:
# Disable IPyParallel's progress bar
%pxconfig --progress-after -1

In [None]:
%%px
import socket

socket.gethostname()

In [None]:
%%px
from osu_bw_cupy import osu_bw

In [None]:
%%px
osu_bw()

In [None]:
%ipcluster stop