Browse files

refactored Section 4-6

  • Loading branch information...
1 parent 4c199e3 commit a19450010299c0c8e62532b99eb7c69707a3c24c Shinpei Kato committed Sep 20, 2012
Showing with 155 additions and 207 deletions.
  1. +2 −2 Sec1_Introduction.tex
  2. +1 −1 Sec2_Technology.tex
  3. +79 −50 Sec4_Evaluation.tex
  4. +8 −4 Sec5_Related_Works.tex
  5. +28 −16 Sec6_Conclusion.tex
  6. BIN farm.pdf
  7. +1 −1 farm.tex
  8. +36 −133 refer.bib
4 Sec1_Introduction.tex
@@ -12,7 +12,7 @@ \section{Introduction}\label{sec:intro}
cores on a chip, which is nearly equivalent of 19 times that of
traditional microprocessors, such as Intel Core i7 series.
Such a rapid growth of GPUs is due to recent advances in programming
-support, such as CUDA\cite{cuda} and OpenCL\cite{opencl}, for
+support, such as CUDA\cite{CUDA} and OpenCL\cite{OPENCL}, for
general-purpose computing on GPUs, also known as GPGPU.
@@ -89,6 +89,6 @@ \section{Introduction}\label{sec:intro}
Section~\ref{sec:tech} describes the design and implementation of our
compiler and debugging environment for NVIDIA's GPU microcontrollers,
and Section~\ref{sec:evaluation} evaluates its basic performance.
-Related work are discussed in Section~\ref{sec:related}.
+%Related work are discussed in Section~\ref{sec:related}.
This paper is concluded in Section~\ref{sec:con}.
2 Sec2_Technology.tex
@@ -44,7 +44,7 @@ \subsection{LLVM Infrastructure}
The LLVM (Low Level Virtual Machine) project is a collection of
open-source modular and reusable compiler tool sets.
-Since the microcontroller has its own instrunction set architecture, we
+Since the microcontroller has its own instruction set architecture, we
develop an architecture-dependent backend of LLVM so that we can make
use of all the front-end modules of LLVM.
129 Sec4_Evaluation.tex
@@ -1,72 +1,101 @@
%!TEX root = farm.tex
-The evaluation compare the performance of the NVIDIA's standard firmware and we developed firmware used FARM.
-Table \ref{tab:environment} shows the evaluation environment.
-This evaluation measure the overhead of measuring the execution time in the NVIDIA's firmware and the we developed firmware on Gdev\cite{kato:gdev}\cite{kato:gdev2} of GPGPU runtime and resource management engine set.
-This execution time is the time to copy of the data into device, the process execution,
-the copy of the data into host.
-Measurement results were concentrated in the following over 7msec and less 2msec.
-This phenomenon occurred both firmware.
-That because GPU's program are many things influence compared to the CPU Program.
-Specifically, these are device memory, device driver, GPU cache and GPU memory.
-Thus we divide over 7 msec are ``case A'', and less 2 msec are ``case B''.
-We compare the average each case A and case B.
-Figure \ref{fig:goodcase} shows result of case A.
-Also figure \ref{fig:badcase} shows result of case B.
-The abscissa axis is Sample program name.
-The vertical axis is Execution time (msec).
-The blue is the NVIDIA's standard firmware, also the red is the we developed firmware.
-Figure \ref{fig:goodcase}, \ref{fig:badcase} can be as seen almost no overheads.
-It is the largest overhead, the case A is 0.003msec of madd, it was 2.31\%.
-also it was the lowest overhead, the case B is -0,002msec, it was -1.74\%.
-In this way, result has been increased execution time and decreased execution time.
-Thus, it is within the range of error at execution time from a range of numbers that was measured is wide.
-In addition, if you want to use the GPU application, there will be less affected because the processor core for processing time increases.
-For example, The total time of madd in NVIDIA'S standard firmware is 21.842msec, this total time is between finished program from the start program by host.
-The execution time overhead occupy relatively small 0.01\% of the total time
-Thus, the overhead of firmware developed by our development environment is within the allowable range.
-It follows from what has been said that developing firmware by this our development environment is a valid one.
-The total time includes the time required to generate GPU context, Memory allocation and Memory release time.
-madd by the running NVIDIA's standard firmware
-The total time is 214 msec at the madd first execution times by the NVIDIA standard firmware.
-The second total time is 20 msec, this result is a big gap to the first execution time.
-Further, the our developing firmware get the same results to NVIDIA standard firmware.
-Because the firmware generate the GPU context when the first run, and then, the firmware secondly running use GPU context generated by the first run.
-Thus, it takes a long time to generate the GPU context.
-This problem findings obtained by firmware development and evaluate.
+We now evaluate the basic performance of the firmware developed using
+our compiler and debugging environment.
+We also discuss what we have found in our experiments.
+\subsection{Performance Evaluation}
- \caption{Evaluate Environment}
+ \caption{Experimental setup.}
\hbox to\hsize{\hfil
CPU & Intel core i7 2600 \\\hline
GPU & NVIDIA GeForce GTX480 \\\hline
Memory & 8GB \\\hline
Kernel & Linux\_64 \\\hline
- Device driver & PSCNV \\\hline
+ Device driver & Gdev~\cite{Kato_ATC12} \\\hline
+We evaluate the performance of our firmware as compared with NVIDIA's
+proprietary firmware blob.
+Table~\ref{tab:environment} shows our experimental setup.
+For fair comparison, we use Gdev~\cite{Kato_ATC12} as the underlying GPU
+device driver and runtime, given that our firmware is available for only
+the open-source environment.
+The performance metric of our experiments is a total execution time
+including GPU executions and data transfers between the host and the
-\caption{Execution Time of Gdev Sample Program : Case A}
+ \begin{center}
+ \hfil
+ \includegraphics[width=8cm]{./img/good_case.pdf}
+ \end{center}
+ \caption{Execution time of microbenchmark programs: Case (i).}
+ \label{fig:goodcase}
-\caption{Execution Time of Gdev Sample Program : Case B}
+ \begin{center}
+ \includegraphics[width=8cm]{./img/bad_case.pdf}
+ \end{center}
+ \caption{Execution time of microbenchmark programs: Case (ii).}
+ \label{fig:badcase}
+To average variations in the execution time, we run each test program 100
+Our observation is that our firmware and NVIDIA's firmware exhibit
+similar behavior - the execution time is mostly less than 2 $ms$ or more
+than 7 $ms$.
+We hence categorize them into two cases: (i) less than 2 $ms$ and (ii)
+more than 7 $ms$.
+Figures~\ref{fig:goodcase} and \ref{fig:badcase} depict the experimental
+results of these two cases, respectively, where the X axis lists our
+microbenchmark programs and the Y axis shows their execution time.
+As can be seen from the experimental results, the performance difference
+between our firmware and NVIDIA's firmware is very trivial in Case (i).
+For example, it is at most 0.003 $ms$ in the madd program, which is
+equivalent to 2.31\% of the time.
+In the case (ii), on the other hand, our firmware rather outperforms
+NVIDIA's firmware by 0.002 $ms$, i.e., 1.74\% of the time.
+Such a performance difference, however, is negligible due to the
+following reasons.
+ \item The observed performance difference is much smaller than their
+ error margin.
+ \item The total execution time is occupied by GPU executions rather
+ than firmware execution.
+ For instance, the total execution time of the madd program under
+ NVIDIA's firmware is 21.842 $ms$, which is mostly dominated by the
+ host-side execution, whereas the firmware execution time is no
+ greater than 0.01\% of the total execution time.
+The above experimental results imply that our compiler and debugging
+environment for NVIDIA's GPU microcontrollers is reliable in
+Given that firmware developers can use C language rather than hand
+assembling, we believe that the contribution of this paper is
+significant in this line of work.
+Through the experiment, we observed that the execution time of the madd
+program was 214 $ms$ at the first trial, while that after the second
+time is around 20 $ms$.
+We found out that this big gap comes from the fact that the firmware has
+to generate the GPU context at the first run, while it can reuse the
+same context information from the second run.
+This explains the cost of generating the GPU context.
+Without self-firmware development, we would have never have these
12 Sec5_Related_Works.tex
@@ -1,10 +1,14 @@
%!TEX root = farm.tex
\section{Related Work}\label{sec:related}
-This Section introduce related work and draw a comparison between our works and related work.
-\subsection{Helios: Heterogeneous Multiprocessing with Satellite Kernels}
-A study on Helios\cite{NightingaleEB:SOSP09:2009} was made by Edmoud at the Microsoft Research, and it revealed that is an operating system designed to simplify the task of writing, deploying, and tuning applications for heterogeneous platforms.
-There says required the using programmable devices such as GPU and NIC for the high-performance vector processing and high-speed communications.
+We now briefly discuss related work, and draw a comparison with our
+Helios~\cite{NightingaleEB:SOSP09:2009} is an operating system component
+designed to simplify the task of writing, deploying, and tuning
+applications for heterogeneous platforms.
+They insist that programmable devices such as GPUs and NICs for the high-performance vector processing and high-speed communications.
GPU and NIC leveraging its capabilities via the device driver, in this form, the amount of data that can be transferred is limited by the communication between the CPU and the device.
Further more, the problem has complexity of the device driver and not provided interface of runs a task.
Approach is providing OS called Helios.
44 Sec6_Conclusion.tex
@@ -1,19 +1,31 @@
%!TEX root = farm.tex
-It has been presented FARM, a new GPU firmware development environment.
-We advance the approaches by firmware for this problem,
-however, we pointed the problem of the productivity on firmware development.
-Then, we proposed the implementation of GPU microcontroller firmware development environment to solve.
-Then, we evaluated the overhead to run the Gdev sample program on developing firmware by the our development environment.
-The results of evaluated were less than 2.31\%, and result looked cases that were completed earlier than NVIDIA standard firmware. Further we confirmed the within an acceptable range of the application was less affected by microcotroller overheads.
-Finally, for reasons mentioned above, our development environment is a valid one.
-Further more, we found the overhead of generate GPU context.
-Our development environment is all open-source, and can be download from our web site \cite{yukke:farm},\cite{yukke:nvfc},\cite{kato:gdev}.
-Our future, we pursue a new direction for GPU resource management.
-In particular, we think the CPU load reduce by we shifts firmware on the microcontroller from the device driver works.
-Further, the scheduling of GPU processing can not be preempted because scheduling performed in the device driver and the runtime engine.
-However, the microcontroller has scheduling, it can be preempt, Available resource effectively of the GPU.
-On the other hand, we expect getting improvement result in Section \ref{sec:evaluation} of the overhead of generate the GPU context.
+In this paper, we have presented a new compiler and debugging
+environment for NVIDIA's GPU microcontrollers.
+As a basis of future work toward fined-grained GPU resource management,
+we developed new firmware for those microcontrollers using our
+development environment.
+We executed several microbenchmark programs to demonstrate that the
+overhead introduced by our firmware was no greater than 2.31\% of the
+total execution, as compared to NVIDIA's proprietary firmware blob,
+while our firmware even outperformed NVIDIA's firmware depending on test
+One of the interesting findings obtained through the experiments was the
+overhead of generating the GPU context, which must be minimized and
+bounded in real-time systems.
+Our development environment is all open-source, and may be download from
+our web site~\cite{GIT_GDEV, GIT_FARM, GIT_NVFC}.
+In future work, we pursue a new direction of GPU resource management
+using microcontrollers.
+First of all, the CPU load could be reduced by offloading GPU resource
+management functions on to the GPU microcontroller.
+This idea in fact is inspired by the Helios
+project~\cite{Nightingale_SOSP09}, where networking resource management
+functions are offloaded onto the NIC microcontroller.
+Preemption support and power management for the GPU could also be
+achieved by extending the firmware, as discussed in \cite{Kato_OSPERT11}.
+We believe that such a fine-grained GPU resource management approach is
+significant for real-time systems augmented with the GPU.
BIN farm.pdf
Binary file not shown.
2 farm.tex
@@ -380,7 +380,7 @@
% Chapter4
% Chapter5
% Chapter6
169 refer.bib
@@ -1,29 +1,3 @@
-@TECHREPORT { weko_81351_1,
- author = "Min Si and Yutaka Ishikawa",
- institution = "CASS2012 in conjunction with IPDPS2012",
- mark = "ON",
- modified = "2012-09-09 20:19:16 +0000",
- number = 16,
- title = "Design of Direct Communication Facility for Manycore-based Accelerators",
- year = 2012
-@MISC { xorg,
- author = " Foundation",
- howpublished = "\url{}",
- mark = "ON",
- modified = "2012-06-05 17:19:05 +0000",
- title = ""
-@MISC { freedesktop,
- author = "",
- howpublished = "\url{}",
- mark = "ON",
- modified = "2012-06-05 17:19:05 +0000",
- title = ""
@MISC { nouveau,
author = " and X.Org Foundation",
howpublished = "\url{}",
@@ -32,38 +6,12 @@ @MISC { nouveau
title = "Nouveau"
-@ARTICLE { shimosawa2010inter,
- author = "Taku Shimosawa and Yutaka Ishikawa",
- journal = "Information and Media Technologies",
- mark = "ON",
- number = 1,
- pages = "13--31",
- publisher = "J-STAGE",
- title = "Inter-kernel Communication between Multiple Kernels on Multicore Machines",
- volume = 5,
- year = 2010
-@INPROCEEDINGS { bautin2008graphic,
- author = "Bautin, M. and Dwarakinath, A. and Chiueh, T.",
- booktitle = "Proceedings of SPIE",
- mark = "ON",
- pages = "68180O",
- title = "Graphic engine resource management",
- volume = 6818,
- year = 2008
-@ARTICLE { kato2011operating,
- author = "Shinpei Kato and Brandt, S. and Yutaka Ishikawa and Rajkumar, R. R.",
- citeulike-article-id = 9989710,
- journal = "OSPERT 2011",
- mark = "ON",
- pages = 21,
- posted-at = "2011-11-04 21:18:27",
- priority = 2,
- title = "{Operating Systems Challenges for GPU Resource Management}",
- year = 2011
+author = {S. Kato and S. Brandt and Y. Ishikawa and R. Rajkumar},
+booktitle = {Proc. of the International Workshop on Operating Systems Platforms for Embedded Real-Time Applications},
+pages = {pp. 21--30},
+title = {{Operating Systems Challenges for GPU Resource Management}},
+year = 2011
@@ -156,14 +104,6 @@ @article{Ferreira_JRTIP11
-@MISC { pathscale:enzo,
- author = "PathScale",
- howpublished = "\url{}",
- mark = "ON",
- modified = "2012-06-05 17:19:05 +0000",
- title = "ENZO"
@MISC { nvidia:linux:driver,
author = "NVIDIA",
howpublished = "\url{}",
@@ -186,14 +126,6 @@ @MISC { envytools
title = {{Envytools}}
-@MISC { pathscale:pscnv,
- author = "PathScale",
- howpublished = "\url{}",
- mark = "ON",
- modified = "2012-06-05 17:19:31 +0000",
- title = "PSCNV GPU Device Driver"
@MASTERSTHESIS { llvm:start,
address = "Urbana, IL",
author = "Chris Lattner",
@@ -205,73 +137,44 @@ @MASTERSTHESIS { llvm:start
year = 2002
- author = "Ashok Dwarakinath",
- mark = "ON",
- modified = "2012-06-12 14:26:47 +0000",
- school = "{Computer Science , Stony Brook University}",
- title = "A Fair-Share Scheduler for the Graphics Processing Unit",
- year = 2008
+@inproceedings {Nightingale_SOSP09,
+author = {E.B. Nightingale and O. Hodson and R. McIlroy and C. Hawblitzel and Galen C. Hunt},
+booktitle = {ACM Symposium on Operating Systems Principles},
+pages = {221--234},
+title = {{Helios: heterogeneous multiprocessing with satellite kernels}},
+year = {2009}
-@MISC { kato:gdev2,
- author = "Shinpei Kato",
- howpublished = "\url{}",
- mark = "ON",
- modified = "2012-06-20 23:38:56 +0000",
- title = "Gdev Project.",
- year = 2012
-@INPROCEEDINGS { kato:gdev,
- author = "Shinpei Kato and McThrow, M. and Maltzahn, C. and Brandt, S.",
- booktitle = "USENIX ATC",
- mark = "ON",
- modified = "2012-06-20 23:36:02 +0000",
- title = "Gdev: First-class GPU resource management in the operating system",
- volume = 12,
- year = 2012
-@INPROCEEDINGS { NightingaleEB:SOSP09:2009,
- author = "Edmund B. Nightingale and Orion Hodson and Ross McIlroy and Chris Hawblitzel and Galen C. Hunt",
- booktitle = "SOSP'09",
- mark = "ON",
- modified = "2012-07-05 03:51:55 +0000",
- pages = "221 - 234",
- title = "Helios: heterogeneous multiprocessing with satellite kernels",
- year = 2009
+@misc {GIT_GDEV,
+author = {S. Kato},
+howpublished = {\url{}},
+title = {{Gdev Project}},
+year = {2012}
-@MISC { yukke:farm,
- author = "Yusuke Fujii and Takuya Azumi and Shinpei Kato",
- howpublished = "\url{}",
- mark = "ON",
- modified = "2012-07-01 04:57:05 +0000",
- title = "Farm Project.",
- year = 2012
+@misc {GIT_NVFC,
+author = {Y. Fujii and T. Azumi and S. Kato},
+howpublished = "\url{}",
+title = {{NVIDIA Firmware Compiler Project}},
+year = {2012}
-@MISC { cuda,
- author = {NVIDIA},
- howpublished = {\url{}},
- mark = {ON},
- title = {{CUDA}}
+@misc {GIT_FARM,
+author = {Y. Fujii and T. Azumi and S. Kato},
+howpublished = "\url{}",
+title = {{FARM Project}},
+year = {2012}
-@MISC { opencl,
- author = {KHRONOS},
- howpublished = {\url{}},
- mark = {ON},
- title = {{OpenCL - The open standard for parallel programming of heterogeneous systems}}
+@misc {CUDA,
+author = {NVIDIA},
+howpublished = {\url{}},
+mark = {ON},
+title = {{CUDA}},
-@MISC { yukke:nvfc,
- author = "Yusuke Fujii and Takuya Azumi and Shinpei Kato",
- howpublished = "\url{}",
- mark = "ON",
- modified = "2012-09-11 07:04:25 +0000",
- title = "Nvidia Firmware Compiler",
- year = 2012
+@opencl {OPENCL,
+author = {KHRONOS},
+howpublished = {\url{}},
+title = {{OpenCL - The open standard for parallel programming of heterogeneous systems}}

0 comments on commit a194500

Please sign in to comment.