-
Notifications
You must be signed in to change notification settings - Fork 9
/
linkage.html
290 lines (287 loc) · 13.1 KB
/
linkage.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
<html>
<head>
<title>Stack Frames and Linkage Conventions</title>
</head>
<body>
<center>
<h1>Stack Frames and Linkage Conventions</h1>
</center>
<h2>Introduction</h2>
<P>
Two fundamental questions that quickly arise in the implementation of an operating system
are:
<ol type="1">
<li> what constitutes the state of a computation, and how can that state be
saved and restored?</li>
<li> what are the mechanisms by which one software component can request
services, and receive results from another?</li>
</ol>
We can begin the exploration of these questions by examining subroutine linkage
conventions.
Most students will have already been exposed to this in previous courses (e.g.
computer architecture, programming languages, compiler construction);
This brief review is intended to refresh your understanding of the basic
concepts so that we can build upon them.
</P>
<h2>The Stack Model of Programming Languages</h2>
<P>
Most modern programming languages support procedure-local variables:
<ul>
<li> They are automatically allocated whenever the procedure (or block) is entered.</li>
<li> They are only visible to code within that procedure (or block).
They cannot be seen or accessed by code in calling or called procedures.</li>
<li> Each distinct (e.g. recursive or parallel) invocation of the procedure (or block) has
its own distinct set of local variables.</li>
<li> They are automatically deallocated when the procedure (or block) exits/returns.</li>
</ul>
These local variables, as well as parameters and intermediate computational results are most
commonly stored in a Last-In-First-Out stack: with new call frames being pushed onto the
stack whenever a procedure is called (or a block entered) and old frames being popped off
the stack whenever a procedure returns (or a block is exited).
Consider the recursive factorial implementation:
<ul>
<pre><tt>
int factorial( int value ) {
if (value <= 1)
return(value);
else
return(value * factorial(value - 1));
}
</tt></pre>
</ul>
</p>
If I call <tt>factorial(4)</tt>, just before the final invocation
of <tt>factorial</tt> returns, there would be four instances of
<tt>factorial</tt> on the stack, in addition to the call frames
for the code that called <tt>factorial</tt> in the first place.
<p>
<IMG src="factorial4.gif" alt="stack frames for factorial(4)"/>
</p>
<p>
The stack model for
procedure calls is so universal that most processor architectures provide hardware
instructions for stack management, and depend on the presence of a stack to make
subroutine calls or handle exceptions.
It should be remembered, however, that the stack model does not work for all
memory allocation needs. Many data items must last longer than the procedure that
created them. For example, when read a document into memory, I do not expect
that document to disappear as soon as the read function returns.
Long lived resources should not be allocated on the stack, but rather from the
<em>heap</em> (which we will discuss in a few weeks).
</P>
<h2>Subroutine Linkage Conventions</h2>
<P>
The details of subroutine linkage conventions are highly Instruction Set Architecture
specific, and in some cases language-specific.
The following examples will be based on the Intel x86 architecture.
The instructions and registers will be very different for other ISAs,
but the basic concepts will remain valid.
</p>
<p>
The basic elements of subroutine linkage are:
<ul>
<li> parameter passing ... marshaling the information that will be passed
as parameters to the called routine.</li>
<li> subroutine call ... save the return address (in the calling routine)
on the stack, and transfer control to the entry point (of the called routine).</li>
<li> register saving ... saving the contents of registers that the linkage
conventions declare to be <em>non-volatile</em>, so that they can
be restored when the called routine returns.</li>
<li> allocating space for the local variables (and perhaps computational
temporaries) in the called routine.</li>
</ul>
When the called routine completes, the process of returning is fairly symmetric:
<ul>
<li> return value ... placing the return value in the place where the calling routine
expects to find it.</li>
<li> popping the local storage (for the called routine) off the stack.</li>
<li> register restoring ... restore the non-volatile registers to the values they had
when the called routine was entered.</li>
<li> subroutine return ... transfer control to the return address that the
calling routine saved at the beginning of the call.</li>
</ul>
The corresponding x86 code (for the above <tt>factorial</tt> example illustrates
the register conventions and respective responsibilities of the caller and callee.
The code in <font color="green">green</font> is executed by <em>callees</em>.
The code in <font color="red">red</font> is executed by <em>callers</em>.
</p>
<ul><pre>
factorial:
</pre><font color="green"><pre>
subq $8, %rsp // 4-byte local, 16-byte frame alignment
</pre></font><pre>
movl %edi, 4(%rsp) // keep local copy of value
cmpl $1, %edi // compare value with one
jg .L2 // if greater, recurse
movl 4(%rsp), %eax // recursion has ended, result is parameter
jmp .L3 // and return
.L2: movl 4(%rsp), %eax // recover value
subl $1, %eax // value - 1
</pre><font color="red"><pre>
movl %eax, %edi // value-1 as new parameter
call factorial
</pre></font><pre>
imul 4(%rsp), %eax // multiply result by value
</pre><font color="green"><pre>
.L3:
addq $8, %rsp // restore stack to value at entry
ret
</pre></font>
</ul>
X86 Register Conventions:
<ul>
<li><tt>%rsp</tt> is the hardware-defined stack pointer.</li>
<li>the X86 stack grows downwards:<br>
A push operation causes the top of stack to be the next lower address.<br>
A Pop operation causes the top of stack to be the next higher address.
<li><tt>%eax/%rax</tt> is a volatile register that is expected to contain the
return value when the called routine returns.</li>
<li>the first six parameters are passed in registers,
the first being passed in <tt>%rdi</tt> (or in this case <tt>%edi</tt>)</li>
<li>the <tt>call</tt> instruction pushes the address of the
next instruction onto the stack, and then transfers
control to the next location.</li>
<li>the <tt>ret</tt> instruction pops the return address off of the
top of the stack and transfers back to that location.</li>
</ul>
While these details can be ISA specific, we also observe that:
<ul>
<li>If the called routine needs to use any non-volatile registers
<u>it must save them</u>.
This is done because only the called routine knows
which registers it will use (and therefore which it needs
to save and restore).</li>
<li>Cleaning callee-allocated locals (and saved registers) off the stack is the
callee's responsibility. It knows what it allocated, so it must deallocate
them, returning the stack to its state at entry.</li>
<li>Cleaning (stack-passed) parameters off of the stack is the responsibility
of the calling routines. This is often done, because only
the calling routine knows how many parameters is actually
passed.</li>
<li>The clear delineation of responsibilities between the
caller and callee make it possible to have procedures
written in one language (e.g. C) called by programs
written in another language (e.g. FORTRAN).</li>
</ul>
<p>
Even though the details may be different in other Instruction Set Architectures
there is value in understanding each step of the above process,
because the same basic steps (some, perhaps automated by hardware)
can be seen in almost all linkage conventions. After you have
understood one set of linkage conventions, you should have little
trouble understanding others.
</p>
<p>
It is also interesting to note how relatively simple it is to
save and restore the state of a procedure; It is merely a stack
frame and a few registers ... most of whose values are stored in
the next stack frame.
</p>
<h2>Traps and Interrupts</h2>
<P>
Most Instruction Set Architectures include support for interrupts
(to inform the software than an external event has happened) and
traps (to inform the software of an execution fault). These are
similar to procedure calls in that:
<UL>
<LI> we want to transfer control to an interrupt or trap handler.</li>
<LI> we need to save the state of the running computation before doing so.</li>
<LI> after the event has been handled, we want to restore the saved state
and resume the interrupted computation.</li>
</UL>
The key differences between a procedure call and an interrupt/trap are:
<UL>
<LI> a procedure call is requested by the running software, and the
calling software expects that, upon return, some function will
have been performed, and an appropriate value returned.</li>
<LI> because a procedure call is initiated by software, all
of the linkage conventions are under software control.
Because interrupts and traps are initiated by the hardware,
the linkage conventions are strictly defined by the hardware.
<LI> the running software was not expecting a trap or interrupt,
and after the event is handled, the computer state should
be restored as if the trap/interrupt had never happened.</li>
</UL>
</P>
<P>
A typical interrupt or trap mechanism works as follows:
<UL>
<LI> there is a number (0, 1, 2 ...) associated with every possible
external interrupt or execution exception.</li>
<LI> there is a table, initialized by the OS software, that associates
a Program Counter and Processor Status word (PC/PS pair) with
each possible external interrupt or execution exception.</li>
<LI> when an event happens that would trigger an interrupt or trap:
<OL type="1">
<li> the CPU uses the associated interrupt/trap number to index
into the appropriate interrupt/trap vector table.</li>
<li> the CPU loads a new program counter and processor status word
from the appropriate interrupt/trap vector.</li>
<li> the CPU pushes the program counter and processor status word
associated with the interrupted computation onto
the CPU stack (associated with the new processor status word).</li>
<li> execution continues at the address specified by the new
program counter.</li>
<li> the selected code (usually written in assembler, and called a <em>first level handler</em>:
<ul>
<li>saves all of the general registers on the stack</li>
<li>gathers information from the hardware on the cause of the interrupt/trap</li>
<li>chooses the appropriate second level handler</li>
<li>makes a normal procedure call to the second level handler,
which actually deals with the interrupt or exception.</li>
</ul>
</ol>
<LI> after the second level handler has dealt with the event, it returns
to the first level handler, after which ...
<OL type="1">
<li> the 1st level handler restores all of the saved registers (from the interrupted computation).</li>
<li> the 1st level handler executes a privileged <tt>return from interrupt</tt> or <tt>return from trap</tt>
instruction.</li>
<li> the CPU re-loads the program counter and processor status word from the
values saved at the time of interrupt/trap.</li>
<li> execution resumes at the point of interruption.</li>
</ol>
</li>
</UL>
</P>
<P>
<IMG src="trap_ctl.gif" alt="TRAP vectoring and return"/><br>
<strong>Flow-of-control associated with a (system call) trap</strong>
</P>
<P>
<IMG src="trap_stack.gif" alt="STACK during trap handling"/><br>
<strong>Stacking and un-stacking of a (system call) trap</strong>
</P>
<P>
This interrupt/trap mechanism:
<ul>
<li> does a more complete job of saving and restoring the state of the interrupted computation.</li>
<li> translates the (much simpler) hardware-driven call to the 1st level handler into a normal
higher-level-language procedure call to the chosen 2nd level hander.</li>
</ul>
The similarities might make it seem that an interrupt/trap is only a little bit more expensive than
a procedure call (more registers to save, plus the added computation to decide what 2nd level handler
to call. But this ignores the fact that an interrupt/trap causes a new processor status word to
be loaded into the CPU ... which will probably move us to a new (likely more privileged) processor mode,
running in a new (unrelated to the interrupted program) address space, and almost surely
involve a complete loss of the CPU caches. And the same things will happen again upon return.
Consequently, taking an interrupt or trap is likely to be somewhere between 100x and 1000x as
expensive as making a procedure call.
</P>
<H2>Summary</H2>
<P>
Procedure and trap/interrupt linkage instructions provide us with:
<UL>
<li> a preliminary low level definition of <em>the state of a computation</em> and
what it means to <em>save</em> and <em>restore</em> that state.</li>
<li> an introduction to the basic CPU-supported mechanisms for synchronous and
and asynchronous transfers of control.</li>
<li> an initial example of how it might be possible to interrupt an on-going
computation, do other things, and then return to the interrupted computation
as if it had never been interrupted.</li>
</ul>
In the following chapters we will build on these foundations to implement
processes and system calls.
</P>
</body>
</html>