  1. This tutorial demonstrates how to use CUDA-Aware MPI

  2. How to use node-local MPI rank IDs to manually map MPI ranks to GPUs

  3. A simple CUDA vector addition program

  4. OpenMP programming tips for GPU offloading

  5. This tutorial gives a quick overview of the jsrun job launcher



