Permalink
Browse files

init with Abstract, Instruction, and Platform Technology

  • Loading branch information...
Shinpei Kato
Shinpei Kato committed Sep 17, 2012
1 parent 206301d commit d1e4c8c0eee4d4746e30bd8b469907e109fe3afc
View
2,425 IEEEtran.bst

Large diffs are not rendered by default.

Oops, something went wrong.
View
4,722 IEEEtran.cls

Large diffs are not rendered by default.

Oops, something went wrong.
View
@@ -0,0 +1,12 @@
+TARGET = farm
+TEX = latex
+
+all:
+ $(TEX) $(TARGET)
+ bibtex $(TARGET)
+ $(TEX) $(TARGET)
+ $(TEX) $(TARGET)
+ dvipdfm $(TARGET).dvi
+
+clean:
+ rm -fr *~ *.aux *.ps *.pdf *.dvi *.log *.bbl *.blg *.ent2012/
View
@@ -0,0 +1,94 @@
+%!TEX root = farm.tex
+
+\section{Introduction}\label{sec:intro}
+% no \IEEEPARstart
+
+Graphics processing units (GPUs) are becoming more and more commonplace
+to support compute-intensive and data-parallel computing.
+In many application domains, GPU-accelerated systems provide significant
+performance gains over traditional multi-core CPU-based systems.
+As shown in Table~\ref{tab:cpu-gpu}, the peak performance of the
+state-of-the-art GPUs exceeds 3,000 GFLOPS, integrating more than 1,500
+cores on a chip, which is nearly equivalent of 19 times that of
+traditional microprocessors, such as Intel Core i7 series.
+Such a rapid growth of GPUs is due to recent advances in programming
+support, such as CUDA\cite{cuda} and OpenCL\cite{opencl}, for
+general-purpose computing on GPUs, also known as GPGPU.
+
+\begin{table*}[tb]
+ \caption{Comparison of the Intel CPU Architectures and the NVIDIA GPU
+ Architectures}
+ \label{tab:cpu-gpu}
+ \begin{center}
+ \hbox to\hsize{\hfil
+ \begin{tabular}{|c|c|c|c|c|c|}\hline
+ & Core i7 980XE & Core i7 3960X & GeForce GTX285 & GeForce GTX480 &
+ GeForce GTX680 \\ \hline
+ \# of processing cores & 6 & 6 & 240 & 480 & 1536 \\ \hline
+ Single-precision performance (GFLOPS) & 108.0 & 158.4 & 933.0 & 1350.0
+ & 3090.0 \\ \hline
+ Memory bandwidth (GB/sec) & 37.55 & 51.2 & 159.0 & 177.0 & 192.2 \\ \hline
+ Power consumption (watt) & 130 & 278 & 183 & 250 & 195 \\ \hline
+ Release date & 2010/03 & 2011/11 & 2009/01 & 2010/04 & 2012/03 \\ \hline
+ \end{tabular}\hfil}
+ \end{center}
+\end{table*}
+
+\par
+In recent years, real-time systems have been augmented with
+the GPU~\cite{Kato_ATC11, Kato_RTSS11, Kato_RTAS11, Basaran_ECRTS12,
+Elliott_ECRTS12, Elliott_RTS12}.
+The motivation of using the GPU in real-time systems is mainly found in
+emerging applications of cyber-physical systems~\cite{Aumiller_CPSNA12,
+McNaughton_ICRA11, Ferreira_JRTIP11}, where a large
+amount of data acquired from the physical world needs to be processed in
+real-time.
+Given that the workload of such applications is highly compute-intensive and
+data-parallel, many-core computing on the GPU is best suited to meet the
+real-fast requirements of computation.
+What is challenging in this line of work is to control the GPU under
+real-time constraints.
+The GPU is a coprocessor independent of the CPU, and hence two different
+pieces of code are running concurrently on the GPU and the CPU, respectively.
+This heterogeneity poses a core challenge in resource management.
+Since the GPU is designed to accelerate particular workload, resource
+management functions are often performed on the CPU.
+In other words, the GPU and the CPU must be synchronized in some way to
+ensure timeliness.
+Unfortunately, this could be a major source of latency that makes
+real-time systems unpredictable~\cite{Kato_ATC11}, though the previous
+work are forced to take this approach due to a lack of functionality
+that enables resource management functions to offload on to the GPU.
+While compute cores or shaders on the GPU are not available to perform
+resource management, recent GPUs integrate microcontrollers on a chip
+where firmware code is launched to control the functional units of the
+GPU.
+These microcontrollers are highly available to extend the functionality
+of GPU resource management, launching special pieces of firmware code to
+control GPU executions and data transfers.
+
+\par
+This paper presents a compiler and debugging environment for NVIDIA's
+GPU microcontrollers based on the well-known portable LLVM compiler
+infrastructure.
+The main purpose of this environment is to enhance the productivity of
+GPU firmware development so that the community can facilitate future
+research on fine-grained GPU resource management using microcontrollers.
+Firmware is self-contained within the GPU, and there will be
+interference from background jobs running on the CPU, once it is
+uploaded by the device driver.
+Therefore, we believe that GPU computing would be more timely and
+reliable for real-time systems, if the firmware can support GPU resource
+management by itself.
+In this paper, we develop an initial stage of the firmware, and evaluate
+its basic performance.
+
+\par
+The rest of this paper is organized as follows.
+Section~\ref{sec:intro} introduces the underlying platform technology.
+Section~\ref{sec:tech} describes the design and implementation of our
+compiler and debugging environment for NVIDIA's GPU microcontrollers,
+and Section~\ref{sec:evaluation} evaluates its basic performance.
+Related work are discussed in Section~\ref{sec:related}.
+This paper is concluded in Section~\ref{sec:con}.
+\stepcounter{footnote}
View
@@ -0,0 +1,101 @@
+%!TEX root = farm.tex
+
+\section{Platform Technology}\label{sec:tech}
+
+First of all, we describe the platform technology underlying our
+development.
+We intensively focus on NVIDIA's GPU architectures, while the idea of
+integrating GPU resource management into on-chip microcontrollers is not
+limited to these specific architectures.
+All pieces of technology presented herein are open-source, and may be
+downloaded from the corresponding websites, respectively.
+
+\subsection{Assembler for GPU microcontrollers}\label{sec:envy}
+
+The assembler is comprised in package of the envytools suite~\cite{envytools}.
+The envytools suite is a rich set of open-source tools to compile or
+decompile GPU shader code, firmware code, macro code, and so on.
+It is also used to generate header files of GPU command definitions used
+by the device driver and the runtime library.
+There are many other useful tools and documentations for NVIDIA's GPU
+architectures enclosed in the envytools suite.
+
+\subsection{GPU Device Driver}\label{sec:driver}
+
+In general, the application programming interface (API) for the GPU is
+provided by the runtime library.
+GPU resource management, on the other hand, is often supported by the
+device driver and the operating system (OS) module~\cite{Kato_ATC11,
+Kato_ATC12, Bautin_MCNC08}.
+As part of resource management, the device driver communicates with
+microcontrollers integrated on the GPU.
+The communication is typically managed by specific commands, which can
+be handled by firmware running on each microcontroller.
+
+\par
+The firmware is built into the device driver by a shape of byte code,
+and is uploaded on to the GPU at boot time.
+To do so, we require open-source software, because we have to build the
+firmware into the device driver.
+In this paper, we use Gdev~\cite{Kato_ATC12}, an open-source module of
+the GPGPU device driver and runtime library.
+
+\subsection{LLVM Infrastructure}
+
+The LLVM (Low Level Virtual Machine) project is a collection of
+open-source modular and reusable compiler tool sets.
+Since the microcontroller has its own instrunction set architecture, we
+develop an architecture-dependent backend of LLVM so that we can make
+use of all the front-end modules of LLVM.
+
+Figure \ref{fig:llvm} illustrates the structure of LLVM.
+It first generates the LLVM IR (Intermediate Representation) from the
+source code.
+This IR code is assembled by the LLVM backend.
+The assembly code is finally translated to the object code for the
+target machine.
+
+\begin{figure}
+\begin{center}
+\includegraphics[scale = 0.5]{./img/llvmflow.pdf}
+\end{center}
+\caption{Compiling stages in LLVM.}
+\label{fig:llvm}
+\end{figure}
+
+\subsubsection{LLVM IR}
+
+The LLVM IR is an intermediate language used in LLVM, also called bitcode or
+LLVM assembly languages.
+This intermediate language is very powerful, scalable, light-weight, and
+low-level enough to underlie many languages on top of many
+architectures.
+LLVM uses an expression of SSA (Static Single Assignment), which is
+suitable for a lot of compiler optimization algorithms.
+
+\subsubsection{LLVM frontend}\label{set:clang}
+
+The LLVM frontend generates an intermediate language from a high-level
+language in LLVM.
+It is mainly used for code generation and its optimization.
+In particular, we use Clang for our development, which is an open-source
+compiler for the C family of programming languages provided by LLVM.
+
+\subsubsection{LLVM backend}\label{set:backend}
+
+The LLVM backend generates target code from an intermediate language in
+LLVM.
+The backend of LLVM features a target-independent code generator that
+may create output for several types of target processors including X86,
+PowerPC, ARM, and SPARC.
+This backend framework may also be used to generate code targeted at
+accelerators such as Cell B/E and GPUs.
+In fact, NVIDIA has announced recently that they use LLVM for the basis
+of their CUDA compiler.
+The backend of LLVM is composed of the LLC (LLVM static Compiler) and
+the LLI (LLVM Interpreter).
+LLI is an interpreter of the LLVM IR, also available as a JIT compiler,
+while LLC is a static compiler to generate code.
+We use this backend part of LLVM to generate code targeted at NVIDIA's
+GPU microcontrollers.
+
View
@@ -0,0 +1,161 @@
+%!TEX root = farm.tex
+
+\section{Compiler and Debugging Environment}\label{sec:design}
+
+This section describes the design and implementation of our compiler and
+debugging environment for NVIDIA's GPU microcontrollers.
+
+\subsection{GPU microcontroller}
+
+\begin{table}[tb]
+\caption{Microcontroller Specifications in GF100 }
+\label{tab:fermi}
+% $B:81&$N7S@~$O$D$1$:!$0lHV>e$N7S@~$OFs=E@~(B
+\hbox to\hsize{\hfil
+\begin{tabular}{|l|r|r|}\hline
+Name & HUB & GPC\\\hline
+Architecture & Fermi & Fermi \\\hline
+Number & 1 & 4\\\hline
+Bit & 32bit & 32bit\\\hline
+Code size & 16,384 byte & 8,192 byte\\\hline
+Data size & 4,096 byte & 2,048 byte\\\hline
+% \multicolumn{4}{l}{type-1\,: enumerate$BEy(B\quad type-2\,: enumerate*$BEy(B}\\
+% \multicolumn{4}{l}{type-3\,: Enumerate$BEy(B\quad type-4\,: ENUMERATE$BEy(B}\\
+\end{tabular}\hfil}
+\end{table}
+
+This research use the microcontroller of Nvidia's Fermi architecture such as GF100 (GeForce GTX480).
+In GF100, a Streaming Multiprocessor (SM) consists of 32 CUDA cores, and a Graphical Processor Cluster (GPC) consists of 4 SM's, and GF100 consist of 4 GPC's. Thus GF100 mounted 512 CUDA cores. Since one full SM is disabled. GF100 has 480 valid CUDA cores.
+Since maximum code size of the microcontroller is limited to 16KB as indicated in Table \ref{tab:fermi}, developers should carefully design the firmware.
+
+\subsection{The compiler for GPU microcontroller}
+The compiler for GPU microcontroller generates object code of GPU microcontroller manufactured by NVIDIA.
+\subsubsection{The overall flow}\label{section:flow}
+\begin{figure*}
+\begin{center}
+\includegraphics[width=12cm]{./img/step_compiler.pdf}
+\end{center}
+\caption{Detail of Compiler for GPU Microcontroller}
+\label{fig:compiler}
+\end{figure*}
+
+
+The compiler for GPU microcontroller implemented using the LLVM.
+Figure \ref{fig:compiler} shows an overall view of the compiler for GPU microcontroller.
+The main flow of the compilation is Clang done generate LLVM IR from the source of the C language in the first,
+Next then, LLC generates assembly code from LLVM IR.
+After, assembly code divided to code part and data part, code part unions bootstrap code.
+
+Finally, envyas generates an executable file.
+The executable file can be run using the debugging support tool or device driver.
+If developer develops the firmware in this development environment that only need to write the code for the C language.
+
+\begin{description}
+\item[ (1) Clang]\mbox{}\\
+Clang is a frontend that generates IR for LLVM from C language source codes.
+
+\begin{figure}
+\begin{center}
+\includegraphics[width=6cm]{./img/llc.pdf}
+\end{center}
+\caption{Step of code generation in LLC}
+\label{fig:llc}
+\end{figure}
+
+\begin{figure*}
+\begin{center}
+\includegraphics[width=12cm]{./img/llvm_code.pdf}
+\end{center}
+\caption{C source code and generated code. Left : C, Center : LLVM IR, Right : Assembly}
+\label{fig:llvm_code}
+\end{figure*}
+\item[ (2) LLC with nvfuc]\mbox{}\\
+As mentioned Section \ref{set:backend}, LLC is the backend of LLVM that compiles LLVM IR code into assembly language for a specified architecture.
+Figure \ref{fig:llc} shows LLC flow.
+There are five steps to convert an LLVM IR to specific assembly code; flow analysis, optimization, instruction selection, register allocation, code generation.
+The flow is not depend on the target machine agnostic has been standardized.
+LLC reads the configuration of the target machine at the time of the instruction selection and selects the instruction and register to meet the specifications of each machine.
+A new configuration called nvfc (NVidia Firmware Compiler) for GPU microcontroller manufactured by NVIDIA is added to target machine configurations.
+
+\item[ (3) LLVM to envyas]\mbox{}\\
+``LLVM to envyas'' divides the generated assembly code into code section and data section.
+``LLVM to envyas'' combines code section and BootstrapCode including code to set up an interrupt handler and a call main function.
+Further ``LLVM to envyas'' replaces labels of code section to data address.
+\item[ (4) envyas]\mbox{}\\
+As mentioned in Section \ref{sec:design},
+``envyas'' is assembler for the GPU micro controller and is included in the envytools.
+The envyas generates the execution files from generated the code section by Step (3).
+
+\item[ (5) hex to bin]\mbox{}\\
+``hex to bin'' converts to the data portion split Step (3) to an executable file.
+
+\item[ (6) Running the firmware]\mbox{}\\
+There are two ways to run the firmware: incorporating the firmware into the device driver, or using the debugging support tool.
+The device driver and the debugging support tool load the binary file of the firmware at boot time.
+\end{description}
+
+\subsubsection{The generated code}
+Figure \ref{fig:llvm_code} shows example of C language source codes, LLVM IR code, and assembly code.
+Left is C language source codes, center is LLVM IR code that is generated by \ref{section:flow}(1), right is assembly code that is generated by \ref{section:flow}(3).
+
+\begin{figure}
+\begin{center}
+\includegraphics[width=3cm]{./img/loader.pdf}
+\end{center}
+\caption{Flowchart of Debugging Support Tool}
+\label{fig:loader}
+\end{figure}
+
+\begin{figure*}
+\begin{center}
+\includegraphics[width=12cm]{./img/firmware.pdf}
+\end{center}
+\caption{Flowchart of Our Firmware }
+\label{fig:firmware}
+\end{figure*}
+
+
+\subsection{Debugging support tool}
+Debugging support tool is to load the firmware, to send commands and data, and to display a register value of GPU.
+Figure \ref{fig:loader} shows the flow of this tool flow, we describe use it.
+Microcontrollers memory space map to CPU memory space in MMIO (Memory Mapped IO).
+
+\begin{description}
+\item[ (1) Load the firmware]\mbox{}\\
+Debugging support tool load the HUB firmware and GPC firmware executable code to mapping address by MMIO.
+After the load completion, this firmware runs by set a flag in the specified register.
+\item[ (2) Sends commands and data]\mbox{}\\
+A processing on microcontroller is suspended until receives a command.
+The debugging support tool sends the command.
+After an interrupt is executed by the command, the processing is resumed,
+
+\item[ (3) Display a register] \mbox{}\\
+The microcontroller has the register may be used freely on the host side.
+The register is used for execute completion flag by the traditional firmware.
+Thus we assumed that the register is used for the same purpose during debugging time,
+the register value is displayed.
+\end{description}
+
+
+\subsection{Firmware development}
+In this section, we describe the our developing firmware on HUB.
+Figure \ref{fig:firmware} shows that firmware flow chart.
+The Firmware is started by setting the value in the register.
+
+\begin{description}
+\item[ (1) initialize]\mbox{}\\
+The firmware sets the interrupt handler and get the data when started.
+Next then Step(2).
+\item[ (2) sleep]\mbox{}\\
+The firmware makes the shift to the standby state, which wait for the receive command by the device driver or the debug support tools.
+The firmware interrupt occurs when the firmware received command, which started ``ihbody''.
+
+\item[ (3) ihbody] \mbox{}\\
+``ihbody'' enqueued command, and then it releases wait state of firmware.
+\item[ (4) work] \mbox{}\\
+``work'' function is called when the wait state of firmware is released.
+``work'' calls the function after the dequeue.
+It will check the end flag of firmware after the function execution.
+\end{description}
+We can recognize from what has been said that the firmware controlled by execute the function is better suited command.
+
Oops, something went wrong.

0 comments on commit d1e4c8c

Please sign in to comment.