Browse files

updated system_implementation.tex

  • Loading branch information...
1 parent 4c06116 commit 66e8613eda16529ba56d6ece4ac726aa22f7e00c Shinpei Kato committed Oct 26, 2012
Showing with 31 additions and 10 deletions.
  1. BIN draft/draft.pdf
  2. +3 −1 draft/draft.tex
  3. +17 −9 draft/introduction.tex
  4. +11 −0 draft/related_work.tex
Binary file not shown.
@@ -116,13 +116,15 @@
-%\keywords{GPGPU, Zero-Copy I/O, Fusion and Plasma, CPS}
+\keywords{GPGPU, Zero-Copy I/O, Fusion and Plasma, CPS}
@@ -15,13 +15,6 @@ \section{Introduction}
In this paper, we tackle this problem with a specific example of plasma
- \centering
- \includegraphics[width=0.8\hsize]{eps/tokamak.eps}
- \caption{The HBT-EP ``Tokamak'' at Columbia University.}
- \label{fig:tokamak}
Plasma control for fusion is an applications of energy CPS, where
complex algorithms must be computed at a very high rate.
Figure~\ref{fig:tokamak} shows the HBT-EP Tokamak at Columbia
@@ -44,6 +37,13 @@ \section{Introduction}
This is a signficant problem not only for plasma control but also for
any applications of CPS that are augmented with compute devices.
+ \centering
+ \includegraphics[width=0.78\hsize]{eps/tokamak.eps}
+ \caption{The HBT-EP ``Tokamak'' at Columbia University.}
+ \label{fig:tokamak}
In order to utilize the GPU for applications of CPS, the system is
required to support a method of bypassing data transfer between the CPU
and the GPU, instead connecting the GPU and I/O devices directly.
@@ -85,5 +85,13 @@ \section{Introduction}
The rest of this paper is organized as follows.
Section~\ref{sec:system_model} describes the system model and
assumptions behind this paper.
-Section~\ref{sec:io_processing} presents our zero-copy I/O processing
-scheme, and differentiates it from the exisiting schemes.
+Section~\ref{sec:io_processing} proposes our zero-copy I/O processing
+scheme, and differentiates it from the exisiting schemes.
+Section~\ref{sec:implementation} presents details of system
+In Section~\ref{sec:case_study}, a case study of plasma control is
+provided to demonstrate the real-world impact of our contribution.
+Microbenchmarks are also used to evaluate more generic properties of the
+I/O processing schemes in Section~\ref{sec:benchmarking}.
+Section~\ref{sec:related_work} introduces related work, and this paper
+concludes in Section~\ref{sec:conclusion}.
@@ -0,0 +1,11 @@
+\section{Related Work}
+NVIDIA's \emph{Fermi} architecture\cite{Fermi} supports unified virtual addressing (UVA) which creates a single address space for host memory and multiple GPUs. This allows their {\tt cuMemcpyPeer()} function to copy data directly between GPUs over the PCIe bus without the involvement of the host CPU or memory. It is, of course, restricted for use between NVIDIA GPUs -- not arbitrary I/O devices.
+An automatic system for managing and optimizing CPU-GPU communication has been developed called CGCM\cite{Jablin:2011:ACC:1993316.1993516}. It uses compiler modifications in conjunction with a runtime library to manipulate the timing of events in a way that effectively reduces transfer overhead. By analyzing source code, they are able to identify operations that can be temporally promoted while still preserving dependencies. This is made possible in part by taking advantage of the ability to concurrently copy data and execute kernel functions.
+PTask\cite{Rossbach_SOSP11} is a project that provides GPU programming abstractions that are supported by the OS. One aspect of the project is the use
+of a data-flow programming model to minimize data communication between the host and GPU.
+High priority compute tasks may have high blocking times imposed on them due to data transfers. RGEM\cite{Kato_RTSS11} is a project that aims to bound these blocking times by dividing data into chunks. This effectively creates preemption points to allow finer grained scheduling of GPU tasks to fully exploit the ability to concurrently copy data and execute code.

0 comments on commit 66e8613

Please sign in to comment.