Gdev: Open-Source GPGPU Runtime and Driver Software
Gdev is a rich set of open-source software for NVIDIA GPGPU technology, containing device drivers, CUDA runtimes, CUDA/PTX compilers, and some utility tools. Currently it only supports NVIDIA GPUs and Linux but is, by design, portable to other GPUs and platforms as well. The supported API implementaions include:
- Gdev API: A low-level API to manage details of GPUs.
- CUDA Driver API: A low-level API adovocated by NVIDIA.
- CUDA Runtime API: A high-level API adovocated by NVIDIA.
The implementation of CUDA Driver API and CUDA Runtime API is built on top of Gdev API. For CUDA Runtime API we make use of GPU Ocelot as a front-end implementation. You can add your favorite high-level API to Gdev other than CUDA Driver/Runtime APIs on top of Gdev API.
Gdev provides runtime support in both the device driver and the user- space library. Device-driver runtime support is a unique feature of Gdev while most existing GPGPU programming frameworks take user-space approaches. With device-driver runtime support, Gdev allows the OS to manage GPUs as first-class citizens and execute CUDA programs itself. Gdev's user-space runtime support is also unique in a sense that it is available for multiple open-source and proprietary device drivers. The supported device drivers include:
- Nouveau: An open-source driver developed by the Linux community.
- PSCNV: An open-source driver developed by PathScale.
- NVRM: A proprietary binary driver provided by NVIDIA.
To summarize, Gdev offers the following advantages:
- You have open-source access to GPGPU runtime and driver software.
- You can execute CUDA in the OS using loabable kernel modules.
- You can investigate GPU resource management in research.
- You can enhance OS and user-space runtime support capabilities.
- You can compare device drivers performance under the same runtime.
How to download, install, and use Gdev
The recommended way to build/install Gdev is building/installing it with CMake.
Otherwise, you can choose one of the following for what driver to be used (obsolete):
- Do you want to use runtime support in the OS?
- Do you want to use user-space runtime with Nouveau?
- Do you want to use user-space runtime with PSCNV?
- Do you want to use user-space runtime with NVRM (NVIDIA Driver)?
Once the driver is successfully installed, you can install high-level API:
- Do you want to use CUDA?
The publication of the Gdev project
- S. Kato, M. McThrow, C. Maltzahn, and S. Brandt. "Gdev: First-Class GPU Resource Management in the Operating System", In Proceedings of the 2012 USENIX Annual Technical Conference (USENIX ATC'12), 2012.
Related research papers
- Y. Abe, H. Sasaki, M. Peres, K. Inoue, K. Murakami, and S. Kato. "Power and Performance Analysis of GPU-Accelerated Systems", In Proceedings of the 5th UESNIX Workshop on Power-Aware Computing and Systems (HotPower'12) , 2012.
- S. Kato. "Implementing Open-Source CUDA Runtime", In Proceedings of the 54th Programming Symposium, Jan, 2013.
- S. Kato, J. Aumiller, and S. Brandt. "Zero-Copy I/O Processing for Low-Latency GPU Computing", In Proceedings of the 4th ACM/IEEE International Conference on Cyber-Physical Systems (ICCPS'13), 2013.
- Y. Fujii, T. Azumi, N. Nishio, and S. Kato. "Exploring Microcontrollers in GPUs", In Proceedings of the 4th Asia-Pacific Workshop on Systems, (APSys'13), 2013.
Reclocking the GPU
NVIDIA's graphics cards are set very low clocks by default. To get
performance, you need to reclock your card at the maximum level.
How? Be root first, and then echo
3 to the following file:
echo 3 > /sys/class/drm/card0/device/performance_level
You can downclock your card by echoing
0 to the same file, i.e.,
echo 0 > /sys/class/drm/card0/device/performance_level
There are middleground levels
2, too. Note that Reclocking
is not completely supported by the open-source solution yet.
There are still some performance levels missing, and hence you may
not get as high performance as the blob. If you really need the same
level of performance as the blob, you can run some long-running
CUDA program with the blob, and do
kexec -f your kernel before the
program is finished. Then the clock remains at the maximum level.
Benchmarks and Applications
Today many CUDA programs are written using CUDA Runtime API. If you want to test CUDA Driver API, try the following benchmarks and apps.
- Yuki ABE, Kyushu University
- Jason AUMILLER, University of California at Santa Cruz
- Takuya AZUMI, Ritsumeikan University
- Masato EDAHIRO, Nagoya University
- Yusuke FUJII, Ritsumeikan University
- Tsuyoshi HAMADA, Nagasaki University
- Masaki IWATA, AXE Inc.
- Shinpei KATO, Nagoya University (Maintainer)
- Marcin KOSCIELNICKI, University of Warsaw
- Michael MCTHROW, University of California at Santa Cruz
- Martin PERES, University of Bordeaux
- Hiroshi SASAKI, Kyushu University
- Yusuke SUZUKI, Keio University
- Hisashi USUDA, AXE Inc.
- Kaibo WANG, Ohio State University
- Hiroshi YAMADA, Tokyo University of Agriculture and Technology
Copyright (C) Shinpei Kato Nagoya University Parallel and Distributed Systems Lab (PDSL) http://pdsl.jp University of California, Santa Cruz Systems Research Lab (SRL) http://systems.soe.ucsc.edu All Rights Reserved.