This project demonstrates how FTI can be used with CUDA. It performs a simple vector addition of C = A + B by dividing the vector size evenly among the number of MPI processes. Each process then launches a CUDA kernel to compute their partition of the vector.
To compile the following environment variables need to be set:
- MPI_HOME
- CUDA_HOME
- FTI_HOME
These variables should point to the home directory of MPI, CUDA and FTI
respectively. To compile run make
.
Execute the binary with the following two arguments
- vector-size
- iterations
- vector-size Specifies the length of the vector
- iterations Specifies how many times each MPI process should launch its kernel
You will need to have FTI built and configured for a successful run. For more information on FTI see their github repository.
The following will spawn 8 MPI processes and each process will execute their kernel 10 times.
mpirun -np 8 ./fti_cuda.out 10000 10