Subroutines provide the basic abstraction mechanism in programs. They can be used to abstract over the specific values bound to variables when evaluating an expression. One often distinguishes between functions and procedures as two different kinds of subroutines:
-
Functions correspond to the mathematical notion of computation, i.e. they are viewed as mapping from input to output values. They can be viewed as abstractions of side-effect free expressions.
-
Procedures can be viewed as abstractions over statements. That is, they affect the environment (mutable variables, hard disk, network, ...), and are called for their side-effects.
-
A pure functional model is possible but rare (e.g. Haskell)
-
Hybrid model is most common: functions can have (limited) side effects
Recall that subroutine calls rely heavily on use of the stack.
Each time a subroutine is called, space on the stack is allocated for the objects needed by the subroutine. This space is called a stack frame or activation record.
The stack pointer contains the address of either the last used location or the next unused location on the stack.
The frame pointer points into the activation record of a subroutine so that any objects allocated on the stack can be referenced with a static offset from the frame pointer.
Question: Why not use an offset from the stack pointer to reference subroutine objects?
Answer: There may be objects that are allocated on the stack whose size is unknown at compile time.
These objects get allocated last so that objects whose size is known at compile time can still be accessed quickly via a known offset from the frame pointer. Example:
void foo (int size) {
char arr[size]; // allocates char array of length size on the stack
...
}
When a subroutine is called, a new activation record is created and populated with data.
The management of this task involves both the caller and the callee and is referred to as the calling sequence.
-
The prologue refers to activation record management code executed at the beginning of a subroutine call.
-
The epilogue refers to activation record management code executed at the end of a subroutine call.
Upon calling a subroutine, the prologue has to perform the following tasks:
- Pass parameters
- Save return address
- Update static chain (only needed for certain forms of nested subroutines)
- Change program counter
- Move stack pointer
- Save register values, including frame pointer
- Move frame pointer
- Initialize objects
Upon returning from the subroutine, the epilogue has to perform the following tasks:
- Finalize (destroy) objects
- Pass return value(s) back to caller
- Restore register values, including frame pointer
- Restore stack pointer
- Restore program counter
Question: Are there advantages to having the caller or callee perform various tasks?
Answer: If possible, have the callee perform tasks: task code needs to occur only once, rather than at every call site.
Some tasks (e.g. parameter passing) must be performed by the caller.
Typical calling sequence:
Stack (before call to subroutine):
| |
├──────────────────────────┤<- stack pointer
| Caller activation record |
├──────────────────────────┤<- frame pointer
| ... |
Step 1: Save caller-save registers:
| |
├──────────────────────────┤<- stack pointer
| Caller-saved registers |
├──────────────────────────┤
| Caller activation record |
├==========================|<- frame pointer
| ... |
Step 2: Push arguments on stack:
| |
├==========================|<- stack pointer
| Arguments |
├──────────────────────────┤
| Caller-saved registers |
├──────────────────────────┤
| Caller activation record |
├──────────────────────────┤<- frame pointer
| ... |
Step 3: Jump to subroutine, saving return address on stack
| |
├==========================|<- stack pointer
| Return address |
├──────────────────────────┤
| Arguments |
├──────────────────────────┤
| Caller-saved registers |
├──────────────────────────┤
| Caller activation record |
├──────────────────────────┤<- frame pointer
| ... |
Step 4: Save caller's fp, set new fp
| |
├──────────────────────────┤<- stack pointer
| Saved fp (dynamic link) |
├==========================|<- frame pointer
| Return address |
├──────────────────────────┤
| Arguments |
├──────────────────────────┤
| Caller-saved registers |
├──────────────────────────┤
| Caller activation record |
├==========================|
| ... |
Step 5: Save callee-saved registers
| |
├──────────────────────────┤<- stack pointer
| Callee-saved registers |
├──────────────────────────┤
| Saved fp (dynamic link) |
|==========================|<- frame pointer
| Return address |
├──────────────────────────┤
| Arguments |
├──────────────────────────┤
| Caller-saved registers |
├──────────────────────────┤
| Caller activation record |
|==========================|
| ... |
Step 7: Allocate and initialize locals
| |
├──────────────────────────┤<- stack pointer
| Local variables |
├──────────────────────────┤
| Callee-saved registers |
├──────────────────────────┤
| Saved fp (dynamic link) |
|==========================|<- frame pointer
| Return address |
├──────────────────────────┤
| Arguments |
├──────────────────────────┤
| Caller-saved registers |
├──────────────────────────┤
| Caller activation record |
|==========================|
| ... |
Step 7: Push and pop temporaries
| Temporaries |
├──────────────────────────┤<- stack pointer
| Local variables |
├──────────────────────────┤
| Callee-saved registers |
├──────────────────────────┤
| Saved fp (dynamic link) |
|==========================|<- frame pointer
| Return address |
├──────────────────────────┤
| Arguments |
├──────────────────────────┤
| Caller-saved registers |
├──────────────────────────┤
| Caller activation record |
|==========================|
| ... |
Step 8: Pop locals
| |
├──────────────────────────┤<- stack pointer
| Callee-saved registers |
├──────────────────────────┤
| Saved fp (dynamic link) |
|==========================|<- frame pointer
| Return address |
├──────────────────────────┤
| Arguments |
├──────────────────────────┤
| Caller-saved registers |
├──────────────────────────┤
| Caller activation record |
|==========================|
| ... |
Step 9: Restore callee-saved registers
| |
├──────────────────────────┤<- stack pointer
| Saved fp (dynamic link) |
|==========================|<- frame pointer
| Return address |
├──────────────────────────┤
| Arguments |
├──────────────────────────┤
| Caller-saved registers |
├──────────────────────────┤
| Caller activation record |
|==========================|
| ... |
Step 10: Restore caller's fp
| |
|==========================|<- stack pointer
| Return address |
├──────────────────────────┤
| Arguments |
├──────────────────────────┤
| Caller-saved registers |
├──────────────────────────┤
| Caller activation record |
|==========================|<- frame pointer
| ... |
Step 11: Jump to return address
| |
├──────────────────────────┤<- stack pointer
| Arguments |
├──────────────────────────┤
| Caller-saved registers |
├──────────────────────────┤
| Caller activation record |
|==========================|<- frame pointer
| ... |
Step 12: Pop arguments
| |
├──────────────────────────┤<- stack pointer
| Caller-saved registers |
├──────────────────────────┤
| Caller activation record |
|==========================|<- frame pointer
| ... |
Step 13: Restore caller-saved registers
| |
├──────────────────────────┤<- stack pointer
| Caller activation record |
|==========================|<- frame pointer
| ... |
Summary of calling sequence:
Prologue (caller)
- Save caller-save registers
- Push arguments on stack
- Jump to subroutine, saving return address on stack
Prologue (callee)
- Save old fp, set new fp
- Save callee-save registers
- Allocate and initialize locals Execute callee
- Push and pop temporaries
Epilogue (callee)
- Pop locals
- Restore callee-save registers
- Restore frame pointer
- Jump to return address
Epilogue (caller)
- Pop arguments
- Restore caller-save registers
One difficult question is whether the caller or callee should be in charge of saving registers.
Question: What would the caller have to do to ensure proper saving of registers?
Answer: Save all registers currently being used by caller.
Question: What would the callee have to do to ensure proper saving of registers?
Answer: Save all registers that will be used by callee.
Question: Which is better?
Answer: Could be either one -- no clear answer in general. In practice, many processors (including MIPS and x86) compromise: half the registers are caller-save and half are callee-save.
To see the rational for this design, let us first recall some background from basic computer systems architecture: a CPU has a limited number of registers for storing the operands and results of the executed machine instructions (arithmetic and logical operations, comparisons, etc.) as well as for storing intermediate results of computations (some of which may correspond to local variables in high-level code).
An important step during compilation of high-level language code to machine language code is register allocation. This is the phase where the compiler decides for each value computed in a function in which register it should be held, respectively, if there are not enough registers to hold all values, whether the value should be stored on the stack (i.e. in RAM) and only be loaded into registers when it is needed. The compiler will optimize for avoiding the use of the stack as much as possible because RAM accesses are much slower than register accesses. On a typical modern architecture, a register access can be in the order of 100 times faster than a RAM access, depending on the state of caches.
Now, when the compiler does register allocation for a particular
function f
, one of the factors that enters the equation is the callee
portion of the calling sequence for when f
itself is called, as well
as the caller portion of the calling sequences for calls that f
makes
to other functions. All of this calling sequence code is going to be
part of the binary code that make up the compiled function f
.
What does the compiler know when it has decided on a particular
register allocation for f
:
-
it knows all the registers whose values the execution of
f
may modify -
at each point where another function is called in
f
, it knows which values that are stored in registers before the call will still be needed byf
after the call
Thus, suppose we consider a scenario where the calling sequence leaves
all the work of saving registers to the callees. In this case, the
callee portion of the calling sequence code in f
would be responsible
for saving and restoring all registers that are modified by the
execution of f
. This is because if we look at f
in isolation, we don't
know which of the original values stored in the registers modified by
f
, the caller of f
will still need after f
returns. In the worst case,
it will need all of them. So f
would have to save and restore them
all. If the caller only needs a subset of the modified registers, f
would potentially be doing a lot of unnecessary work (involving RAM
accesses which are expensive)
The other extreme scenario is to leave all the work of saving the
registers to the callers. In this case, for every call to another
function that f
makes, the caller portion of the calling sequence for
that call would be responsible for saving and restoring all the
registers whose values f
still needs after the call. Again, this is
because if we look at f
in isolation, we don't know which of those
registers will be modified by the callee. In the worst case, all of
them will be modified. So f
would have to save and restore them
all. If the callee only modifies a subset of the registers that f
still needs after the call, f
would again be doing a lot of
unnecessary work.
For this reason, many architectures make a compromise where some of
the registers are caller-saved and others are callee-saved. This way,
the compiler can optimize register allocation so that it minimizes the
number of registers that need to be saved onto the stack while f
is
executed (there are other optimization goals factoring into the
equation, but this is one of them). In particular, if we view f
from
the perspective of being a callee, we'd like to minimize the number of
callee-saved registers that f
modifies because those are the ones that
f
would have to save and restore. If f
only modifies caller-saved
registers, then f
does not need to save/restore any registers in the
callee portion of its calling sequence (because all the work now falls
onto f
's caller).
On the other hand, if we now view f
as a caller
of another function g
, then we want to
minimize the number of caller-saved registers whose values f
will
still need after the call to g
returns. If f
no longer needs any of the
values in these registers after the call, then it will not have to do
any work for saving/restoring them in f
's caller portion of the
calling sequence for that call to g
. So it is now possible for f
to both
modify registers and use registers across calls that it makes itself
without ever having to save/restore any of these registers.
You might ask: why not just solve register allocation for the whole program at once instead of looking at one function at a time (making the distinction between caller/callee-saved registers mood)? The compiler will do a little bit of this kind of global optimization. However, compilation is inherently modular. For instance, a program that you write will often be linked against precompiled libraries. So register allocation for the function in the library code happens at a different time than register allocation for the program's code. The two can't be solved at the same time.
Even if one compiles all code (program + libraries) at the same time, a global optimization is usually not possible. The register allocation problem can be reduced to the graph coloring problem, which is NP-complete. So solving register allocation precisely and globally for a large program is computationally infeasible. This is why a more modular approach needs to be used to keep the time needed for compilation reasonable. One approach is to solve register allocation per function in isolation. By distinguishing between caller-saved and callee-saved registers, the compiler can make informed decisions when doing the optimization modularly, i.e., where it does not have perfect information about the entire program.
Register windows offer an alternative to architectures that split between caller/callee-saved registers: each routine has access only to a small window of a large number of registers; when a subroutine is called, the window moves, overlapping a bit to allow parameter passing.
A leaf routine is one which does not call any subroutines.
Leaf routines can avoid pushing the return address on the stack: it can just be left in a register.
If a leaf routine is sufficiently simple (no local variables), it may not even need a stack frame at all.
Another optimization is to inline a function: inserting the code for the function at every call site.
Question: What are advantages and disadvantages of inlining?
-
Advantages: avoid overhead, enable more compiler optimizations
-
Disadvantages: increases code size, can't always do inlining (e.g. recursive procedures)
We distinguish between two types of parameters:
-
Formal parameters: these are the names that appear in the declaration of the subroutine.
-
Actual parameters or arguments: these refer to the expressions passed to a subroutine at a particular call site.
// formal parameters: a, b, c
def f (a: Int, b: Int, c: Int): Unit = ...
// actual parameters: i, 2/i, g(i,j)
f(i, 2/i, g(i,j))
An important consideration for the execution of a subroutine call is: what does a reference to a formal parameter in the subroutine mean in terms of the actual parameters?
The answer to this question depends on the evaluation strategy for the actual parameter. We distinguish between two basic strategies:
-
Strict evaluation: the actual parameter is evaluated before the call to the function
-
Lazy evaluation: the actual parameter is evaluated only if and when its value is needed during execution of the call.
Each evaluation strategy additionally supports different parameter passing modes. Many languages support multiple parameter passing modes. These modes are specified per parameter of a subroutine, either by adding qualifiers to the parameter that indicate the mode or by dedicated parameter types.
Parameter passing modes for strict evaluation:
-
by value:
- formal is bound to fresh memory location that holds a copy of the value of the actual parameter
- assignment to formal, if allowed, changes value at location of local copy, not at location of the actual that stores the original value.
Most languages (including Scala/Java/C) use call-by-value semantics for parameters by default. Here is an example in
C
:void incr (int x) { x = x + 1; } int counter = 0; incr(counter); /* passes copy of value stored in counter */ printf("%d", counter); // prints 0 since incr only increments local copy
-
by reference:
- only applicable if actual parameter is an l-value
- formal is bound to the memory location that holds the original actual parameter value, forming an alias
- assignment to formal, if allowed, also affects memory location of actual
An example of a language that supports call-by-reference parameters is C++. Here, the fact that a parameter is passed by reference is indicated using a so-called reference type:
void incr (int& x) { x = x + 1; } int counter = 0; // compiler knows declaration of incr, // builds reference incr(counter); std::cout << counter; // prints 1
The type
int&
of parameterx
inincr
indicates thatx
is a reference to a memory location that holds a value of typeint
. -
by copy-return:
- formal is bound to copy of value of actual
- upon return from routine, actual gets copy of formal
Pass by cop-return is fairly uncommon in modern programming languages. We include it here only for completeness.
Parameter passing modes for lazy evaluation:
-
by name:
- formal is bound to expression for actual
- expression is (re)evaluated each time formal is read when executing callee
- can be viewed as textually substituting every occurrence of the formal parameter in the body of the subroutine by the expression of the actual parameter
- formal parameter cannot be assigned in body of subroutine
Scala supports call-by-name parameters which similar to reference parameters in C++ are indicated by dedicated parameter types:
var debugEnabled = false; def debug(msg: => String): Unit = if (debugEnabled) println(msg) // some expensive operation that generates a debug message def complexAnalysis(): String = ... // call to complexAnalysis is only executed // if debugEnabled is set to true debug(complexAnalysis())
The
=>
in front of the argument type ofmsg
indebug
indicates that the parameter is passed by name. -
by need:
- formal is bound to expression for actual
- expression is evaluated the first time its value is needed
- subsequent reads from the formal will use the value computed earlier
- formal parameter cannot be assigned in body of subroutine
Question: What are the advantages and disadvantages of passing by need?
-
Advantage: The argument is only evaluated if it is actually used while executing the call.
-
Disadvantage: implementation of parameter passing is more complex; behavior can be confusing if evaluation of actual has side effects. Hence, it is usually only used in purely functional languages.
Programming languages differ in their evaluation strategies and the specific parameter passing modes that they support. As we have already seen, some languages support multiple strategies and modes. Here is a brief overview for some languages:
-
Evaluation strategy is strict. The order in which actuals are evaluated is unspecified.
-
Parameter passing is always by value: assignment to formal is assignment to local copy.
-
Passing by reference can be simulated by using pointers
void incr (int *x) { *x = *x + 1; } int counter = 0; incr(&counter); /* pass pointer to counter */ printf("%d", counter); // prints 1
-
No need to distinguish between functions and procedures: return type
void
indicates side-effects only.
-
Evaluation strategy is strict. The order in which actuals are evaluated is unspecified.
-
Default mode is by-value (same semantics as C)
-
Explicit by-reference parameters are also supported.
-
Semantic intent such as read-only references can be indicated by adding appropriate qualifiers:
// 'x' is passed by reference, but 'f' cannot modify it void f(const double& x);
-
Evaluation strategy is strict. The order in which actuals are evaluated is left to right.
-
Parameters are passed by value.
-
However, object types have reference semantics.
-
That is, if the type of an argument is a class or interface, it will evaluate to a reference pointing to a heap-allocated object.
-
This reference is passed by value (similar to a C pointer).
-
Consequence:
- Methods can modify objects through the passed reference.
- Assignment to the formal only affects the local copy of the reference, not the actual (like a C pointer).
- Confusingly, this is sometimes referred to as call-by-reference even though it is not (because the reference stored in the actual cannot be modified by assignment to the formal).
- Sometimes, this semantics is referred to as call-by-sharing.
-
For parameters with object types:
final
means that formal is read-only.
-
Default is same as in Java (strict evaluation with call-by-value/call-by-sharing, evaluation order left to right).
-
However, formals cannot be assigned (their declarations are
val
rather thanvar
). -
By-name parameters are also supported (see above).
-
By-need parameters can be simulated using by-name parameters and lazy values.
-
Evaluation strategy is strict. The order in which actuals are evaluated is unspecified.
-
Parameters are passed by value/sharing (as in Java/Scala).
-
Lazy evaluation can be simulated using higher-order functions.
-
Goal: separate semantic intent from implementation
-
Parameter modes:
in
: read-only in subroutineout
: write-only in subroutinein out
: read-write in subprogram
-
Mode is independent of whether binding by value, by reference, or by copy-return
in
: bind by value or referenceout
: bind by reference or copy-returnin out
: bind by reference or by value/copy-return
-
Functions can only have
in
parameters. Otherwise, they must be declared as procedures.
-
Evaluation strategy is lazy.
-
Default parameter passing mode is by-need.
Some languages allow functions with a variable number of parameters.
Example (in C):
printf("this is %d a format %d string", x, y);
-
Within body of
printf
, need to locate as many actuals as placeholders in the format string -
Solution: place parameters on stack in reverse order
| ... | ├──────────────────────────┤ | return address | ├──────────────────────────┤ | actual 1 (format string) | ├──────────────────────────┤ | ... | ├──────────────────────────┤ | actual n-1 | ├──────────────────────────┤ | actual n | ├──────────────────────────┤ | ... |
C and C++ allow parameters which are pointers to subroutines:
void (*pf) (int);
// pf is a pointer to a function that takes
// an int argument and returns void
typedef void (*PROC)(int);
// type abbreviation clarifies syntax
void do_it (int d) { ... }
void use_it (PROC p) { ... }
PROC ptr = &do_it;
use_it(ptr);
use_it(&do_it);
Question: Are there any implementation challenges for this kind of subroutine call?
Not really. This feature can be implemented in the same way as a usual subroutine call: in particular the reference environment remains the same for the point where the subroutine is called and the point where it was defined. Here, the reference environment refers to the binding (i.e. mapping) of variables that are in the scope of the function definition to the actual values that these variables hold at run-time (respectively the memory locations that store these values).
The situation is more complex for languages where subroutines can escape the environment where they are created. A subroutine escapes the environment where it is created if it can somehow be called in a different environment where the variables that where in scope at the point of its definition are bound to different memory locations/values. This can happen if the language allows nested subroutines to be returned by the surrounding subroutine or passed to other subroutines.
Consider the following Scala program:
def outer(i: Int, f: () => Unit): Unit = {
def inner(): Unit = println(i)
if (i > 1) f()
else outer(2, inner)
}
def foo(): Unit = ()
outer(1, foo)
Note that the type () => Unit
of the parameter f
in function outer
indicates that f
itself is a function that takes no arguments and
returns a value of type Unit
. An example of such a function is foo
which on the last line is passed to outer
as argument for the
function parameter f
. Another example is the nested function inner
inside of outer
which is passed to the parameter f
of outer
in
the recursive call in the else branch of the conditional.
The reference environment of function inner
includes the binding of the
variable i
, which is used in the print statement inside of the body
of inner
.
Note that when this program executes, we first execute the call outer(1, foo)
which will take the else branch of the conditional inside outer
because the condition i > 1
is false for i=1
. In the recursive
call, however, we take the then branch because now i=2
. The then
branch then calls the passed function f
, which is inner
from the
previous call to outer
.
What does this call to inner
print? That is, does i
in the body of inner
now use the binding in the original reference environment where inner
was defined, i.e. i=1
, or does it use the binding in the environment
where inner
is called, i.e. i=2
?
There are two possible semantics:
-
Deep Binding:
-
When a subroutine is passed as argument to another subroutine, a closure must be created and passed in place of the subroutine.
-
A closure is a pair of a subroutine together with its reference environment.
-
When a subroutine is called through a closure, the reference environment from when the closure was created is restored as part of the calling sequence.
-
With deep binding, the above program prints
1
because in the reference environment where the closure forinner
was created when it was passed to the recursive call ofouter
, the variablei
was bound to1
.
-
-
Shallow Binding:
-
When a subroutine is called, it uses the current reference environment at the call site.
-
With shallow binding, the program prints
2
because at the point whereinner
is called viaf
, the variablei
is now bound to2
.
-
Passing a nested subroutine to another subroutine is just one example how a subroutine can escape the reference environment of its definition site. Another way this can happen is if subroutines are allowed to return nested subroutines.
Consider the following variant of the example above:
def outer(i: Int): Int => Unit = {
def inner(j: Int): Unit = println(i + j)
inner
}
var i = 2
outer(1)(2)
Note the return type Int => Unit
of the function outer
indicates
that outer
returns a function that takes an Int
as argument and
produces a return value of type Unit
. The last line in the body of
outer
determines the return value of outer
which is the function
inner
(note that we are not calling inner
here but merely
returning the function inner
itself to the caller of outer
).
The call outer(1)
in the last line will thus evaluate to the
function inner
defined inside of outer
which is then immediately
called with value 2
.
With static scoping and deep binding, the occurrence of i
inside of
inner
will always refer to the parameter i
of outer
which is
bound to 1
when inner
is returned. Hence, this program will print
3
in this case.
On the other hand, if the language uses dynamic scoping and shallow
binding, the i
inside of inner
will always refer to the current
i
in the reference environment where inner
is called. When
we call inner
returned by outer(1)
the i
in the reference
environment is the global variable i
whose value is 2
. Thus, in
this case the program prints 4
.
In general, static scoping demands the use of deep binding for nested subroutines. Hence, Scala uses deep binding semantics. Shallow binding is typically the default in languages with dynamic scoping that support nested subroutines.
Functional programming languages treat functions as first-class values (i.e. they can be passed to and returned by other functions). When using deep binding, this entails that the current activation record must be allocated on the heap when a closure is created.
-
Environment of function definition must be preserved until the point of call: activation record cannot be reclaimed if it creates functions.
-
Functional languages therefore require more complex run-time management.
-
Higher-order functions: functions that take (other) functions as arguments and/or return functions
-
very powerful
-
but complex to implement efficiently
-
imperative languages (traditionally) restrict their use.
-