Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Newer
Older
100644 201 lines (174 sloc) 7.314 kb
d1e4c8c init with Abstract, Instruction, and Platform Technology
Shinpei Kato authored
1 %!TEX root = farm.tex
2
3 \section{Compiler and Debugging Environment}\label{sec:design}
4
5 This section describes the design and implementation of our compiler and
6 debugging environment for NVIDIA's GPU microcontrollers.
7
fa5e6f0 half refactered Section 3
Shinpei Kato authored
8 \subsection{Microcontroller}
d1e4c8c init with Abstract, Instruction, and Platform Technology
Shinpei Kato authored
9
5662cd1 refactored Section 3
Shinpei Kato authored
10 \begin{table}[!t]
fa5e6f0 half refactered Section 3
Shinpei Kato authored
11 \caption{Specification of GF100 microcontroller.}
d1e4c8c init with Abstract, Instruction, and Platform Technology
Shinpei Kato authored
12 \label{tab:fermi}
13 \hbox to\hsize{\hfil
14 \begin{tabular}{|l|r|r|}\hline
5662cd1 refactored Section 3
Shinpei Kato authored
15 Name & HUB & GPC\\\hline
fa5e6f0 half refactered Section 3
Shinpei Kato authored
16 Number of units & 1 & 4\\\hline
5662cd1 refactored Section 3
Shinpei Kato authored
17 Bit & 32bit & 32bit\\\hline
18 Code size & 16,384 byte & 8,192 byte\\\hline
19 Data size & 4,096 byte & 2,048 byte\\\hline
d1e4c8c init with Abstract, Instruction, and Platform Technology
Shinpei Kato authored
20 \end{tabular}\hfil}
21 \end{table}
22
fa5e6f0 half refactered Section 3
Shinpei Kato authored
23 This paper presumes the microcontroller of NVIDIA's Fermi architecture.
24 In particular, we target the GeForce GTX 480 graphics card designed
25 based on the GF100 architecture.
26 In this architecture, a streaming multiprocessor (SM) consists of
27 32 CUDA cores, while a graphics processing cluster (GPC) consists of 4
28 SM's.
29 There are four GPC's in total equipped in the GF100 architecture, and
30 hence the maximum number of CUDA cores is 512.
31
32 Table~\ref{tab:fermi} illustrates the specification of the GF100
33 microcontroller.
34 There are two types of microcontrollers, HUB and GPC, relevant to CUDA
35 engines.
36 HUB is broadcasting the access to all GPC's, while the GPC represents a
37 specific microcontroller for each GPC engine.
38 Since the maximum code size is limited to 16KB as indicated in Table
39 \ref{tab:fermi}, developers should carefully design the firmware.
40
41 \subsection{Compiler Implementation}
d1e4c8c init with Abstract, Instruction, and Platform Technology
Shinpei Kato authored
42
5662cd1 refactored Section 3
Shinpei Kato authored
43 \begin{figure*}[!t]
d1e4c8c init with Abstract, Instruction, and Platform Technology
Shinpei Kato authored
44 \begin{center}
45 \includegraphics[width=12cm]{./img/step_compiler.pdf}
46 \end{center}
fa5e6f0 half refactered Section 3
Shinpei Kato authored
47 \caption{Overview of Compiler Implementation.}
d1e4c8c init with Abstract, Instruction, and Platform Technology
Shinpei Kato authored
48 \label{fig:compiler}
49 \end{figure*}
50
fa5e6f0 half refactered Section 3
Shinpei Kato authored
51 Figure \ref{fig:compiler} shows an overview of our compiler
52 implementation.
53 The main flow of compilation is done by Clang.
54 It generates the LLVM IR from the C source file.
55 The LLC next generates assembly code, which contains code and data in
56 separate files.
57 Finally, the Envytools outputs an executable file.
58 This executable file can be launched by the device driver, and can also
59 be tested by our debugging tool described in the later section.
60 To summarize, our compiler takes the following stages:
d1e4c8c init with Abstract, Instruction, and Platform Technology
Shinpei Kato authored
61
5662cd1 refactored Section 3
Shinpei Kato authored
62 \begin{figure}[!t]
fa5e6f0 half refactered Section 3
Shinpei Kato authored
63 \begin{center}
64 \includegraphics[width=6cm]{./img/llc.pdf}
65 \end{center}
66 \caption{Code generation stages of LLC.}
67 \label{fig:llc}
d1e4c8c init with Abstract, Instruction, and Platform Technology
Shinpei Kato authored
68 \end{figure}
69
fa5e6f0 half refactered Section 3
Shinpei Kato authored
70 \begin{description}
71 \item[ (1) Clang]\mbox{}\\
72 This is a frontend of C language that generates LLVM IR code
73 from the source file.
74
75 \item[ (2) LLC with nvuc]\mbox{}\\
76 This is a backend of LLVM that compiles LLVM IR code into
77 assembly code.
78 As shown in Figure~\ref{fig:llc}, there are five steps to
79 exploit compilation: (i) flow analysis, (ii) optimization,
80 (iii) instruction selection, (iv) register allocation, and
81 (v) code generation.
82 This flow is not dependent on the target machine.
83 The LLC reads a configuration of the target machine at the
84 time of instruction selection, and selects a set of the
85 instruction and register to meet the specifications of each
86 machine.
87 Our implementation adds a new configuration called nvuc
88 (NVIDIA Micro-Controller) to support NVIDIA's GPU
5662cd1 refactored Section 3
Shinpei Kato authored
89 microcontrollers under LLVM.
d1e4c8c init with Abstract, Instruction, and Platform Technology
Shinpei Kato authored
90
91 \item[ (3) LLVM to envyas]\mbox{}\\
fa5e6f0 half refactered Section 3
Shinpei Kato authored
92 This stage divides the generated assembly code into code and
93 data sections so that we can create binary images using
94 ``envyas'', which is a microcontroller assembler provided by
95 the Envytools suite.
96 The bootstrap code is also unified into the binary images
97 in this stage.
5662cd1 refactored Section 3
Shinpei Kato authored
98
d1e4c8c init with Abstract, Instruction, and Platform Technology
Shinpei Kato authored
99 \item[ (4) envyas]\mbox{}\\
fa5e6f0 half refactered Section 3
Shinpei Kato authored
100 This is a final assembly stage for the microcontroller, which
101 generates the byte code of the firmware.
d1e4c8c init with Abstract, Instruction, and Platform Technology
Shinpei Kato authored
102
5662cd1 refactored Section 3
Shinpei Kato authored
103 \item[ (5) hex to bin]\mbox{}\\
104 This stage translates the hexadecimal byte code to the binary
105 format so that the firmware can execute on the
106 microcontroller.
107
108 \item[ (6) Running the microcontroller]\mbox{}\\
109 The compiled firmware is loaded on the microcontroller by the
110 device driver.
111 We also support a debugging tool that launches the firmware
112 in the same way as the device driver for development
113 purposes.
d1e4c8c init with Abstract, Instruction, and Platform Technology
Shinpei Kato authored
114 \end{description}
115
5662cd1 refactored Section 3
Shinpei Kato authored
116 \subsection{Debugging Support}
d1e4c8c init with Abstract, Instruction, and Platform Technology
Shinpei Kato authored
117
5662cd1 refactored Section 3
Shinpei Kato authored
118 \begin{figure}[!t]
119 \begin{center}
120 \includegraphics[width=3cm]{./img/loader.pdf}
121 \end{center}
122 \caption{Flowchart of our debugging tool.}
123 \label{fig:loader}
d1e4c8c init with Abstract, Instruction, and Platform Technology
Shinpei Kato authored
124 \end{figure}
125
5662cd1 refactored Section 3
Shinpei Kato authored
126 We support a debugging tool to load the firmware, send commands and
127 data, monitor the status, and display register and memory values of the
128 microcontroller.
129 Figure~\ref{fig:loader} shows control flow of our debugging tool.
130 The following are the details of each block in the flow:
d1e4c8c init with Abstract, Instruction, and Platform Technology
Shinpei Kato authored
131
132 \begin{description}
5662cd1 refactored Section 3
Shinpei Kato authored
133 \item[(1) Load Firmware]\mbox{}\\
134 Our debugging tool uploads a set of firmware programs on to
135 the HUB and the GPC microcontrollers.
136 The uploaded firmware programs start execution when a flag is
137 set in the specified register.
138 \item[(2) Send Command and Data]\mbox{}\\
139 The microcontroller is event-driven.
140 It is totally suspended in an idle state.
141 When it receives a command from the debugging tool through
142 the PCI bus, the interrupt handler is invoked and its
143 execution is resumed.
144 \item[(3) Display Register Value] \mbox{}\\
145 The microcontroller has a set of registers that may be used
146 by the debugging tool for any purpose.
147 There are also several important registers relevant to
148 firmware execution.
149 The debugging tool hence displays the values of these
150 registers to notify what is happening.
d1e4c8c init with Abstract, Instruction, and Platform Technology
Shinpei Kato authored
151 \end{description}
152
5662cd1 refactored Section 3
Shinpei Kato authored
153 \subsection{Firmware Development}
154
155 In this paper, we present the most basic firmware program for NVIDIA's
156 GPU microcontrollers.
157 We develop this firmware entirely using our compiler and debugging
158 environment.
159 This is indeed the initial step toward fine-grained GPU resource
160 management using microcontrollers, and enhanced functions could build
161 upon this work.
162
163 \begin{figure*}[!t]
164 \begin{center}
165 \includegraphics[width=12cm]{./img/firmware.pdf}
166 \end{center}
167 \caption{Flowchart of our basic firmware.}
168 \label{fig:firmware}
169 \end{figure*}
d1e4c8c init with Abstract, Instruction, and Platform Technology
Shinpei Kato authored
170
5662cd1 refactored Section 3
Shinpei Kato authored
171 Figure \ref{fig:firmware} shows control flow of the basic firmware
172 developed in this paper.
173 The following are the details of each block in the flow.
d1e4c8c init with Abstract, Instruction, and Platform Technology
Shinpei Kato authored
174
175 \begin{description}
5662cd1 refactored Section 3
Shinpei Kato authored
176 \item[(1) initialize]\mbox{}\\
177 The firmware configures the interrupt handler, and receives
178 the default set of data when started.
179 \item[(2) sleep]\mbox{}\\
180 The firmware enters the standby mode in the main event loop,
181 waiting for the next command issued by the device driver or
182 the debugging tool.
183 Upon every arrival of the command, an interrupt is generated
184 on the microcontroller, awakening the firmware in the
185 ``ihbody'' function.
186 \item[(3) ihbody] \mbox{}\\
187 This is an interrupt handler invoked by the command.
188 All we have to do here is to enqueue the corresponding
189 command, and releases the standby mode to resume firmware
190 execution.
191 \item[(4) work] \mbox{}\\
192 This is a main body of the firmware.
193 It is called when the firmware is released from the standby
194 mode.
195 The basic procedure of this function is to dequeue a pending
196 command one by one, and call the function corresponding to
197 the command.
198 If the specified flag is cleared, we destroy the firmware.
d1e4c8c init with Abstract, Instruction, and Platform Technology
Shinpei Kato authored
199 \end{description}
200
Something went wrong with that request. Please try again.