Skip to content
Find file
Fetching contributors…
Cannot retrieve contributors at this time
124 lines (116 sloc) 6.22 KB
\vspace{-1em}
\section{Introduction}
\label{sec:introduction}
\vspace{-0.25em}
Recent innovations in heterogeneous compute devices with many-core
technology have achieved an order-of-magnitude gain in computing
performance.
In particular, the graphics processing unit (GPU) receives considerable
attention as a mature heterogeneous compute device that embraces a
concept of many-core computing.
A recent announcement from the TOP500 supercomputing sites in November
2011~\cite{TOP500} disclosed that three of the top five supercomputers
use GPUs.
For example, scientific climate applications can gain 80x speedups using
such GPU-based supercomputers~\cite{Shimokawabe10}.
The benefits of GPUs appear not only in high-performance computing but
also in general-purpose and embedded computing domains.
According to previous research, GPUs can provide up to an order of 10x
speedups for software routers~\cite{Han_SIGCOMM10}, 20x speedups for
encrypted networks~\cite{Jang_NSDI11}, and 15x speedups for motion
planning in autonomous vehicles~\cite{McNaughton_ICRA11}.
Such a rapid growth of general-purpose computing on GPUs,
\textit{a.k.a.}, GPGPU, is brought by recent advances in GPU
programming languages, such as CUDA~\cite{CUDA40}.
Although the use of GPUs in general-purpose domains provides significant
performance improvements, system software support for GPUs in the market
is especially tailored to accelerate particular applications dedicated
to the system, but is not well-designed to integrate GPUs into more
general time-sharing systems.
The research community thereby has articulated the need to manage GPU
resources in the operating system (OS)~\cite{Bautin_MCNC08, Kato_ATC11,
Rossbach_SOSP11}.
However, these pieces of OS support still limit a class of applications
that can use GPUs due to a lack of first-class resource management
primitives, such as virtual memory management, inter-process
communication (IPC), and time-and-space resource partitioning.
On time-sharing systems where multiple independent users are logged in,
for instance, if one executes a high-workload GPGPU program, the
GPU resources available for the rest of users could be limited.
GPU schedulers in the state of the arts~\cite{Kato_ATC11,
Rossbach_SOSP11} provide priority schemes for the GPU, yet users
are required to specify priorities and other parameters by themselves.
The system thus cannot partition GPU resources in time-sharing systems.
More essentially, the current GPGPU framework does not permit
users to share memory resources among GPU contexts, and could limit the
total amount of effective data by the size of physical memory.
Such constraints may not be acceptable in general-purpose programming.
The current OS and system support for GPUs also leaves the
application programming interface (API) to the user space, which
restricts the availability of GPUs to user-space applications.
In addition, employing the API in the user-space library implies that
the device driver exposes its resource management primitives to the user
space through a system call, which allows malicious programs to abuse
GPUs using this system call.
As a matter of fact, non-privileged user-space programs can directly
allocate memory and launch computation on NVIDIA GPUs, via the
\texttt{ioctl} system call in Linux.
This explains that the API should also be protected by the OS.
This paper presents \textbf{Gdev}, a new GPGPU ecosystem that addresses
the current limitations of GPU resource management.
Specifically, Gdev integrates GPU runtime support, including the API,
into the OS to allow a wide class of user-space applications and the OS
itself to use GPUs in a reliable manner.
While OS applications can directly call the API, user-space
programs can also use it through the system call, such as
\texttt{ioctl}.
We also provide an implementation of CUDA API~\cite{CUDA40} wrapping
Gdev API so that legacy CUDA applications can work with Gdev in both the
user space and the OS.
This runtime-unified OS approach forces GPU applications running in
the same system to be managed by the identical resource management
entity, which eliminates the concern about malicious programs attempting
to access GPUs directly.
The contributions of Gdev also include first-class support for GPU
resource management in time-sharing systems.
Specifically, Gdev allows programmers to share device memory resources
among GPU contexts explicitly using the API.
We also leverage this shared memory scheme to allow the system to allocate
data beyond the size of physical memory space, exploiting implicit data
eviction and reload between the host and device memory.
Moreover, Gdev devises virtual GPU support to isolate GPU users in
time-sharing systems, using a new GPU scheduler to deal with the
non-preemptive and burst nature of GPU workloads.
As a proof-of-concept, we finally provide an open-source implementation of Gdev,
including a device driver and runtime/API library.
In summary, this paper makes the following contributions:
\begin{itemize}
\vspace{-0.25em}
\item Identifies the advantage/disadvantage of integrating GPU runtime
support into the OS.
\vspace{-0.5em}
\item Enables the OS to use GPUs for computation.
\vspace{-0.5em}
\item Makes GPUs as first-class computing resources in time-sharing
systems -- support for shared memory, memory swapping,
and virtual GPUs, in addition to device memory management and GPU
scheduling.
\vspace{-0.5em}
\item Provides open-source implementations of the GPU device driver,
runtime/API libraries, utility tools, and Gdev resource
management components.
\vspace{-0.5em}
\item Demonstrates the capabilities of Gdev using real-world benchmarks
and applications.
\vspace{-0.25em}
\end{itemize}
The rest of this paper is organized as follows.
Section~\ref{sec:model} provides the model and assumptions behind
this paper.
Section~\ref{sec:ecosystem} outlines the Gdev ecosystem.
Section~\ref{sec:memory_management} and \ref{sec:scheduling} propose new
memory management and scheduling schemes for the GPU.
Section~\ref{sec:implementation} describes a prototype implementation,
and its capabilities are demonstrated in Section~\ref{sec:evaluation}.
Related work are discussed in Section~\ref{sec:related_work}.
We provide our concluding remarks in Section~\ref{sec:conclusion}.
Something went wrong with that request. Please try again.