gpu_cluster

Boiler-plate framework for job scheduling on HPC GPU cluster. Not every project is finished, but that does not mean the effort is wasted. This project began when Tensorflow and Pytorch were not as popular as they are now. Therefore, there was a plan to use a custom boiler-plate framework to move data within the cluster and do calculations on GPUs. This little project demonstrates, that such a task is doable and be coded in C++ and CudaC.

Main challenges

Moving data between CPU and GPU is solved by using Unified memory, which is a physical memory residing on GPU but appears as virtual memory to CPU. This enables to create objects (which inherit from Managed.cuh) that are directly constructed and exist only within this memory. This means, that a serialized object sent from node A can be deserialized and directly stored in GPU memory of node B. This saves a lot of allocation steps and movind data piece-by-piece.
Synchronizing multiple processes residing on unknown nodes. There is a very close co-existence of PBS Pro scheduler and OpenMPI communication interface. While the CPU is not concious of where in the cluster it is, thus neither the MPI process, the PBS Pro scheduler can allocate the resources in predictable manner. Then, it is a matter of structuring the MPI processes according to some hierarchical structure using commRank MPI variable.
Compilation of code for different architectures. C++ and CudaC are different languages that are executed on different machines, therefore they need different compilers. Although this is straight-forward, making the process seamless, thus easy to work with (e.g. compiling on remote machine), was a nice way to practice writting Makefile.

About the project

The project is self-contained, all source-files are provided (including parts of Cereal library) to compile the project. The operating system on cluster is Linux, but the source-code requires very little support other than Makefile.

This was a fun to program, as I have gained some experience with:

template programming in C++
CUDA programming on NVIDIA GPUs
object serialization with Cereal library to prepare data for transmission
OpenMPI transmission and synchronization for sending/receiving data
PBS Pro scripting for job distribution on cluster
general Object Oriented Programming concepts to not get lost on the way :D

Code execution

The below image illustrates how a cluster is structured into nodes, which are divided into main and worker processes. The program continues by spreading the data from main to workers. Workers process the data on GPU and send the result to main. This is repreated TIME_TESTING_TERATIONS to generate statistics.

File description

PBS scripts

pbs_script.scr: PBS Pro script to allocate resources within cluster and run the executable (line 109)

CPU (C++)

main.cpp: Main file representing main and worker nodes

main node is differentiated by [const int] commRank == ROOT_PROCESS (defined in const.h)
worker nodes are all other nodes with commRank != ROOT_PROCESS

Process.cpp, Process.hpp: Object representing both main and worker nodes (NOTICE: the same kind of Process object represents different kinds of nodes)

NOTICE: The object has a regular OOP structure:
- Process.cpp contains implementations of classes and functions
- Process.hpp contains declarations + implementations of template functions (template functions cannot be directly compiled from implementation in .hpp file)

Unified memory (CUDA<->C++)

Managed.cuh: Inheriting from this class allows the object to be unified memory (memory on GPU that is visible from CPU)

GPU_grid_in.cuh): Strucutre sent from main to worker nodes containing data.
GPU_grid_out.cuh: Strucutre sent from workers to main node containing results.
GPU_shared_in.cuh: Structure shared among main and workers to exchange debug info.

GPU (CUDA)

Cuda_GPU.cu: Class representing GPU from the CPU point of view. Therefore, this object, while using CUDA functions and compiled by CUDA compiler, does not run directly on GPU.

Cuda_kernel.cu: This file defines kernel_execute() function, which is a wrapper for a function executed on GPU. The actuall kernel function that is exeuted on GPU is implemented in kernel_by_ref.cu. See Cuda_kernel.cuh for detailed explanation on why C and C++ does not mix. In short, C++ mangles the names during compilation, C does not mangle the names.

Utility functions

Console_print.cpp: Pretty console output.
Stop_watch.cpp: Time measurements.

Other files

Makefile: Makefile to locally (on cluster) compile the source files and generate executable test_framework.

compile.sh: Shell script to compile the _source/code.

jobrun.sh: Shell script to schedule resource allocation using pbs_script.scr.

recompile_jobrun.sh: Recompile and schedule resource allocation (combines compile.sh and jobrun.sh).

return_values.h: Define unified return values.

const.h: Constants for CPU.

config_GPU.h: Constants for GPU.

/cereal: Parts of Cereal library required to compile the project.

300000_messages_1009.frontnode.OU: Output file produced by main node after the test is finished. Notice line 110, which shows the total amount of transfered data during the test was 15 [TB].

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
output		output
source_code		source_code
README.md		README.md
compile.sh		compile.sh
gpu_cluster.png		gpu_cluster.png
gpu_cluster.vsdx		gpu_cluster.vsdx
jobrun.sh		jobrun.sh
pbs_script.scr		pbs_script.scr
recompile_jobrun.sh		recompile_jobrun.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

output

output

source_code

source_code

README.md

README.md

compile.sh

compile.sh

gpu_cluster.png

gpu_cluster.png

gpu_cluster.vsdx

gpu_cluster.vsdx

jobrun.sh

jobrun.sh

pbs_script.scr

pbs_script.scr

recompile_jobrun.sh

recompile_jobrun.sh

Repository files navigation

gpu_cluster

Main challenges

About the project

Code execution

File description

PBS scripts

CPU (C++)

Unified memory (CUDA<->C++)

GPU (CUDA)

Utility functions

Other files

About

Releases

Packages

Languages

martin-garaj/gpu_cluster

Folders and files

Latest commit

History

Repository files navigation

gpu_cluster

Main challenges

About the project

Code execution

File description

PBS scripts

CPU (C++)

Unified memory (CUDA<->C++)

GPU (CUDA)

Other files

About

Topics

Resources

Stars

Watchers

Forks

Languages