GPU-CR is a system designed to support efficient Checkpoint and Restore (C/R) for GPU-accelerated applications. Its key advantage is completely yielding the GPU memory of the checkpointed app (reducing VRAM usage to 0), seamlessly freeing up space for other workloads to swap in and execute.
A quick demonstration of executing the GPU-CR tool via the command-line interface.
- Cross-Vendor Support: Experimental support for both NVIDIA and AMD GPUs.
- Transparent C/R: Uses
LD_PRELOADto inject avGPUlibrary that intercepts memory allocations and resource management. - Client CLI: Simple command-line interface (
cr_client) to trigger checkpoint and restore operations. - Performance Optimization: Support for Huge Pages to accelerate memory saving.
We are actively working on expanding GPU-CR's capabilities:
- 🚀 Broader Hardware Support: Extending compatibility to more architectures, such as Huawei Ascend.
We compare GPU-CR with existing GPU checkpoint solutions on four LLM workloads:
- Llama-8B
- Phi-4-mini-instruct
- pythia-1b
- Qwen3-1.7B
For GPU-CR, the latency is split into:
- Data — GPU data buffers
- Control — GPU control states
Total latency = Data + Control
- GPU: NVIDIA A100-PCIE-40GB
- Driver Version: 580.95.05
- CUDA Version: 13.0
- vLLM Version: 0.14.1
- Operating System: Linux (Tested on Ubuntu 22.04).
- Build Tools: CMake, GCC/G++, Make.
- Checkpoint Backend & Drivers:
- NVIDIA:
- Requires CUDA Toolkit 12.x or later.
- Uses
cuda-checkpoint(Included in this repository). - Note: If updates are needed, please update the parameters within the source code manually.[cuda-checkpoint]
- AMD:
- Requires ROCm 6.x or later.
- Requires a custom-built
criuwith the AMD plugin enabled. (Manual Compilation Required). - Note: This custom CRIU is not included in this repository. Users must manually compile and install CRIU with AMD plugin before using GPU-CR.[CRIU AMDGPU Plugin Documentation]
- NVIDIA:
This project utilizes CMake for building. Please choose ONE of the following build options based on your target GPU vendor. Do not build both simultaneously in the same environment.
mkdir build && cd build
export GPU_VENDOR=NVIDIA
cmake ..
make -j$(nproc)This generates vGPU-NVIDIA.so and cr_client.
mkdir build && cd build
export GPU_VENDOR=AMD
cmake ..
make -j$(nproc)This generates vGPU-AMD.so and cr_client.
Before running, configure the necessary environment variables.
- VRAM Storage Strategy By default, GPU memory is saved to Huge Pages. You can optionally save it to a file system path using EXPORT_FILE_PATH.
# Optional: Path to save video memory content as a file.
# If NOT set, the system defaults to saving VRAM to Huge Pages.
export EXPORT_FILE_PATH=/path/to/save/vram_dump_path- Huge Pages (Recommended for Acceleration) Huge pages can significantly accelerate the save process for both vendors.
# Example: reserve 80GB huge pages
sudo bash -c "echo 40960 > /proc/sys/vm/nr_hugepages"
sudo mkdir /mnt/huge-ckpt
sudo mount -t hugetlbfs nodev /mnt/huge-ckpt
sudo chmod 777 -R /mnt/huge-ckptIf you are using AMD GPUs, you must specify the directory where CRIU will store its checkpoint files.
export AMD_CKPT_DIR=/path/to/save/criu_filesLaunch the target application (e.g., a Python script using PyTorch/vLLM or a C++ binary) using LD_PRELOAD.
(1) Example (NVIDIA):
LD_PRELOAD=/path/to/build/vGPU-NVIDIA.so python3 ./apps/vllm/serving_vllm_nvidia.py(2) Example (AMD):
LD_PRELOAD=/path/to/build/vGPU-AMD.so ./apps/vllm/serving_vllm_amd.shUse the cr_client tool to trigger a checkpoint.
# -i: initialization mode
# -c: Checkpoint mode
# -p: Target PID
# -m: (Optional) The PID of the original parent process (Master) that CRIU needs to control.(for CRIU in AMD mode)
./cr_client -c -p <TARGET_PID>
# or
./cr_client -c -p <GPU_CHILD_PID> -m <PARENT_PID>Restore the process from the checkpoints.
# -r: Restore mode
# -p: Target PID (the original PID)
./cr_client -r -p <TARGET_PID>src/: Source code for the vGPU library and cr_client.GPUs/NVIDIA/: NVIDIA-specific implementation (CUDA hooks).GPUs/AMD/: AMD-specific implementation (HIP hooks).cr_client.cpp: Control client implementation.
apps/: Example scripts and applications (e.g., vLLM examples).
This project is based on our paper:
@inproceedings{GCR,
author = {Shaoxun Zeng and Tingxu Ren and Jiwu Shu and Youyou Lu},
title = {GPU Checkpoint/Restore Made Fast and Lightweight},
booktitle = {24rd USENIX Conference on File and Storage Technologies (FAST'26)},
year = {2026},
address = {Santa Clara, CA},
month = feb,
publisher = {USENIX Association},
url = {https://www.usenix.org/conference/fast26/presentation/zeng}
}And the implementation of the paper is in: https://github.com/thustorage/GCR



