Skip to content

jigangkim/nvidia-gpu-scheduler

Repository files navigation

Manage multiple NVIDIA GPU compute tasks

Supports per GPU compute limits (number of processes, utilization rate, memory usage) on a per-(UNIX)user/worker basis, load-balancing, multiple nodes(machines) and more.

Tested on tensorflow-gpu tasks.


Installation (virtual python environment such as venv/conda is recommended)

cd /path/to/install
git clone https://github.com/jigangkim/nvidia-gpu-scheduler.git
cd /path/to/install/nvidia-gpu-scheduler

pip install . # standard installation
pip install -e . # editable (develop mode) installation

Usage (dummy example: json)

cd /path/to/install/nvidia-gpu-scheduler
# Run job server
python example.py --identity scheduler --config_ext .json
# Run worker
python example.py --identity worker --config_ext .json

Usage (dummy example: gin)

cd /path/to/install/nvidia-gpu-scheduler
# Run job server
python example.py --identity scheduler --config_ext .gin
# Run worker
python example.py --identity worker --config_ext .gin

Usage (OpenAI baselines example)

cd /path/to/install/nvidia-gpu-scheduler
# Run job server
python example_openaibaselines.py --identity scheduler
# Run worker
python example_openaibaselines.py --identity worker