Multi GPU Programming Models

This project implements the well known multi GPU Jacobi solver with different multi GPU Programming Models:

Single Threaded using cudaMemcpy of inter GPU communication (single_threaded_copy)
Multi Threaded with OpenMP using cudaMemcpy for inter GPU communication (multi_threaded_copy)
Multi Threaded with OpenMP using cudaMemcpy for itner GPU communication with overlapping communication (multi_threaded_copy_overlapp)
Multi Threaded with OpenMP using GPUDirect P2P mappings for inter GPU communication (multi_threaded_p2p)
Multi Threaded with OpenMP using GPUDirect P2P mappings for inter GPU communication with delayed norm execution (multi_threaded_p2p_opt)
Multi Threaded with OpenMP relying on transparent peer mappings with Unified Memory for inter GPU communication (multi_threaded_um)
Multi Process with MPI using CUDA-aware MPI for inter GPU communication (mpi)
Multi Process with MPI using CUDA-aware MPI for inter GPU communication with overlapping communication (mpi_overlapp)

Each variant is a stand alone Makefile project and all variants have been described in the GTC EU 2017 Talk Multi GPU Programming Models

Requirements

CUDA: verison 9.2 or later is required by all variants.
OpenMP capable compiler: Required by the Multi Threaded variants. The examples have been developed and tested with gcc.
CUDA-aware MPI: Required by the MPI variants. The examples have been developed and tested with OpenMPI.
CUB: Optional for optimized residual reductions. Set CUB_HOME to your cub installation directory. The examples have been developed and tested with cub 1.8.0.

Building

Each variant come with a Makefile and can be build by simply issuing make, e.g.

multi-gpu-programming-models$ cd multi_threaded_copy
multi_threaded_copy$ make CUB_HOME=../cub
nvcc -DHAVE_CUB -I../cub -Xcompiler -fopenmp -lineinfo -DUSE_NVTX -lnvToolsExt -gencode arch=compute_60,code=sm_60 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_70,code=compute_70  -std=c++11 jacobi.cu -o jacobi
multi_threaded_copy$ ls jacobi
jacobi

Run instructions

All variant have the following command line options

-niter: How many iterations to carry out (default 1000)
-nccheck: How often to check for convergence (default 1)
-nx: Size of the domain in x direction (default 7168)
-ny: Size of the domain in y direction (default 7168)
-csv: Print performance results as -csv

The provided script bench.sh contains some examples executing all the benchmarks presented in the GTC EU 2017 Talk Multi GPU Programming Models.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Multi GPU Programming Models

Requirements

Building

Run instructions

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
mpi		mpi
mpi_overlapp		mpi_overlapp
multi_threaded_copy		multi_threaded_copy
multi_threaded_copy_overlapp		multi_threaded_copy_overlapp
multi_threaded_p2p		multi_threaded_p2p
multi_threaded_p2p_opt		multi_threaded_p2p_opt
multi_threaded_um		multi_threaded_um
single_gpu		single_gpu
single_threaded_copy		single_threaded_copy
LICENSE.md		LICENSE.md
README.md		README.md
bench.sh		bench.sh
test.sh		test.sh

License

msserpa/multi-gpu-programming-models

Folders and files

Latest commit

History

Repository files navigation

Multi GPU Programming Models

Requirements

Building

Run instructions

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages