

CS 410/510

Languages & Low-Level Programming

Mark P Jones Portland State University

Spring 2016

Week 1: Introduction, Assembly Language

#### Introduction and Goals

2

## **Origins**

- For a long time, a group of us at PSU have been looking at the role that high-level programming languages can play in the construction of (very) low-level software.
- By using high-level languages, we can hope to increase programmer productivity, and improve software quality
- By focussing on very **low-level software**, we hope to provide strong foundations for the complete software stack



House (2005)

Kernel, GUI, drivers, network stack, and apps

Boots and runs in a bare metal environment

... all written in Haskell, a "purely functional" programming language



4

# Why "House"?

"The Haskell User's Operating System Environment"

You are more secure in a house ...



than if you only have Windows ...

#### Performance concerns

- By design, higher-level languages abstract away from the details of how the underlying machine works
- Can we obtain the levels of performance and predictability that are typically required/expected in the systems programming domain?
- Can we write good systems software in a language that intentionally distances users from details of memory layout, representation, instruction selection, alignment, caching, etc.?
- Traditional approaches to building system software resort to using old, low-level languages like assembly and C
- Do "modern" languages have anything to offer in this area?

#### The Habit programming language

- "a dialect of Haskell that is designed to meet the needs of high assurance systems programming"
- How do you design a programming language for a specific domain?
- Experiment with existing languages
- Understand the domain ...



## The seL4 experience

 In 2009, a group from NICTA, UNSW, and OK Labs in Australia announced seL4, as "the world's first operating system kernel with an end-to-end proof of implementation correctness and security enforcement."



 A landmark achievement for formal verification, and a strong foundation for building trustworthy systems

.

## seL4 and capabilities

- Even without the verification result, the design of seL4 is interesting in its own right:
  - seL4 is a "capability enhanced" version of an earlier microkernel design called L4
  - The "capability" abstraction in seL4 provides facilities for implementing "least privilege" security policies and novel mechanisms for controlling resource usage

## Proof vs performance?

- Security properties established in the seL4 verification include:
  - Absence of code injection attacks
  - Absence of buffer overflows
  - Absence of null pointer dereferences
  - ...
- Many of these properties could be established "for free" if the implementation had been written in a "safer" language
- How might things be different if we built something like seL4 in Habit?

10

# The CEMLaBS project

- "Using a Capability-Enhanced Microkernel as a Testbed for Language-Based Security"
- Started October 2014, funded by The National Science Foundation
- Three main questions:
  - Feasibility: Is it possible to build an inherently "unsafe" system like seL4 in a "safe" language like Habit?
  - Benefit: What benefits might this have, for example, in reducing verification costs?
  - Performance: Is it possible to meet reasonable performance goals for this kind of system?

# CEMLaBS workplan

- The original plan was to use Habit to design and build HAL4, a "capability-enhanced" microkernel in the style of seL4 ...
- At the time, neither the specification or the implementation of seL4 was publicly available ... but some key aspects of its design had been revealed in published papers, dissertations, and talks
- In July 2014, seL4 was released as open source software!
- Now we are free to build HAL4 as an implementation of seL4, which will allow use to make more direct comparisons between the two implementations ...

П

#### **CEMLaBS** status

- The original HAL4 implementation is incomplete (and will need to be reworked to match the seL4 specification)
- The Habit compiler is not complete (a full compiler pipeline has been implemented, but it is not feature complete, and it is not robust)
- We'll be working on addressing both of these shortcomings this coming summer
- In the meantime, there is still much to study ...

## Course description

- An overview of conventional low-level programming techniques:
  - · Bare metal programming
  - Fundamental programmable hardware components
- Case studies of practical microkernel implementations:
  - OS abstractions (address spaces, threads, capabilities, ...)
  - The L4 and seL4 microkernels
- Reflections on the design of programming languages for use in this application domain:
  - Assembly, C, Rust, Habit, domain specific languages, ...

13

14

#### Course learning objectives

Upon the successful completion of this course, students will be able to:

- I. Write simple programs that can run in a bare-metal environment using low-level programming languages.
- Discuss common challenges in low-level systems software development, including debugging in a bare-metal environment.
- Explain how conventional operating system features (multiple address spaces, context switching, protection, etc.) motivate the desire for (and benefit from) hardware support.

#### Course learning objectives, continued

- 4. Develop code to configure and use programmable hardware components such as a memory management unit (MMU), interrupt controller (PIC), and interval timer (PIT).
- Describe the key steps in a typical boot process, including the role of a bootloader.
- 6. Describe the motivation, implementation, and application of microkernel abstractions for managing address spaces, threads, and interprocess communication (IPC).
- 7. Explain the use and implementation of capabilities in access control and resource management.
- 8. Develop programs using the capability abstraction provided by the seL4 microkernel.

16

15

# Course learning objectives, continued

- 9. Illustrate the use of a range of domain specific languages in the development of systems software.
- Use practical case studies to evaluate and compare language design proposals.
- II. Describe features of modern, high-level programming languages—including abstract datatypes and higher-order functions—and show how they can be leveraged in the construction of low-level software.
- 12. Explain how the requirements of low-level systems programming motivate the desire for (and benefit from) language-based support.

# The "programming languages" perspective

- We will survey and evaluate a range of programming languages during this course:
  - · Low-level machine and assembly languages
  - Systems programming languages (e.g., C, Rust, ...)
  - Object-oriented languages (e.g., the seL4 API)
  - Domain specific languages
  - Functional languages (e.g., Habit, Haskell, ...)
- What are the driving needs of the systems domain?
- How can a programming language design best meet those needs?

17

#### Context

- Basic Platform: Generic "IBM PC" compatible
  - 32 bits ... not 64
  - IA32 ... not x86 64 or ARM
  - BIOS ... not EFI or UEFI
  - int and iret ... not sysenter/sysexit
  - PIC ... not APIC
  - No PAE, PCI, ACPI, MMX, SSE, SMM, SMP, VTx, ...
  - etc., ...
- · Already complicated enough for our purposes!
- Well supported by current hardware, emulators, and tools
- · Underlying concepts still very broadly applicable

#### Development environment

- Ubuntu Linux
  - Week I: using the LinuxLab machines (Mac OS also an option)
  - Weeks 2+: using a VirtualBox virtual machine, preconfigured with appropriate development tools (can be used on Linux, Mac OS, or Windows)
- Bare metal emulation using the QEMU emulator

20

## Rough schedule

| Week | Торіс                                      |  |
|------|--------------------------------------------|--|
| I    | Assembly language programming              |  |
| 2    | Bare metal programming                     |  |
| 3    | Hardware support for OS abstractions       |  |
| 4    | Memory management & protection             |  |
| 5    | Case Study: L4 use & implementation        |  |
| 6    |                                            |  |
| 7    | Casa Study Dead 4 was 8 implementation     |  |
| 8    | Case Study 2: seL4 use & implementation    |  |
| 9    | Language design for lavy level anogramming |  |
| 10   | Language design for low-level programming  |  |

#### Lectures and labs

- Monday Lectures (UTS 210), Wednesday Labs (FAB 88-09)
  - Students are expected to attend all lectures and labs
  - Sets of guided practical exercises will be provided for each lab
  - Students will work on labs during lab sessions with instructor supervision
  - Students are expected to complete each set of exercises before the following week's lab session
  - We will use D2L forums for discussion of exercises outside lab sessions
  - Lab sessions may also be used for direct instruction, as necessary

22

#### Exams

- No midterm or final exams
  - Students are still required to attend the "final" on 6/9
  - This will be used for student presentations or demos, or else for other instruction

#### Portfolio assessment

- Final grades will be assigned on the basis of a "Portfolio" comprising the results of lab work and independent projects
- Based on previously listed collection of 12 course objectives
- General process:
  - Student uploads work item (typically a tar ball or zip file), accompanied by a write up explaining how the submitted work addresses the expectations of one or more objectives
  - Grading assigns a score of N/A, Below, Meets or Exceeds for each objective (with +/- versions)
  - Different final letter grades awarded depending on the combination of individual scores

23

21

#### Portfolio assessment, continued

- Students may request consideration of at most two objectives in any given week (Saturday-Friday)
  - · discourages procrastination, balances grading load
- I will attempt to provide feedback on submitted items within a week
- Students may resubmit in an attempt to improve score for any given objective
  - (Each resubmission counts as a new submission)
- Completion of lab exercises for a given week can typically be used to satisfy the "Meets" condition for specific objectives
- Process starts in Week 2, runs until end of finals week

#### General approach and caveats

- This course will be experimental, open-ended, hands-on, and interactive
- Your flexibility, patience, and tolerance of informal or underspecified aspects of the course will be very much appreciated
- Questions, suggestions, and contributions are welcome at any point

26

An introduction to IA32 assembly language programming

#### What is IA32?

- We'll be using the IA32 (x86) architecture as our main target:
  - A "32-bit" instruction set
  - · Broadly adopted by:
    - processors from Intel, AMD, Via, ...
    - laptops, desktops, servers, gaming consoles, ...
    - Linux, Mac OS X, Windows, ...
  - Arguably, a bit dated ... but still very relevant, and a good platform for learning and exploration
  - (... and one of the two architectures supported by seL4)

28

#### Other architectures:

- Not to be confused with:
  - x86-64/AMD64: a 64 bit architecture supported (in addition to IA32) by more recent AMD/Intel designs
  - IA64: a completely different 64-bit Intel architecture (Itanium)
  - · ARM: widely used in phones, tablets, and more
  - IBM Power: used in Xbox 360, PS3, Wii, servers, and more
  - SPARC: used by some of the college's Unix servers
- Except for x86-64, you can't run IA32 code directly on a machine that uses one of these alternatives instruction sets!

#### **Notes**

- No prior or in-depth knowledge of IA32 programming will be assumed
- We will only use a small subset of the full instruction set
- If you're looking to become an expert on IA32 programming, you'll want to look for another class!
- We'll be using the AT&T syntax for IA32 assembly language rather than the Intel syntax. This is the default syntax used by the free GNU tools in Linux, MacOS, and DJGPP or Cygwin on Windows, and others
- For simplicity, I recommend: ssh yourid@linuxlab.cs.pdx.edu (or, on Windows, the equivalent using PuTTY)

29

## A (greatly) simplified view of the IA32



#### Programming for IA32

- In concrete terms, an IA32 program is just a collection of byte values (machine code)
- Once it has been loaded in to memory, the processor can execute a program by interpreting the byte values as instructions for the processor to act on
- For practical purposes, we will usually write IA32 programs in a textual format called assembly language that is easier to read than the raw byte values
- The program that translates assembly language programs in to machine code is called an assembler

#### The GNU assembler, as

- · Assembly code goes in files with a .s suffix
- We will typically use gcc to invoke the assembler gcc -m32 -o output assemblyCode.s extras.c
- You can also invoke the assembler directly: detailed documentation is available from:

http://sourceware.org/binutils/docs/as/ For IA32 programming, look in particular at the section on "80386 Dependent Features"

33

## An assembly code listing

```
.globl
        pushl
                 %ebp
                  %esp,%ebp
        movl
        pushl
                 %ebx
                 8(%ebp), %ebx
        movl
                                # initialize length count in eax
                 test
loop:
                  %eax
                                # increment count
                 $4, %ebx
                                # and move to next array element
test:
                 (%ebx), %ecx # load array element
$0, %ecx # test for end of array
                                # repeat if we're not done ...
        jne
        popl
        movl
                 %ebp.%esp
                                                    Assembly code
```

# An assembly code listing

```
.globl
                            pushl
                                      %ebp
0001 89E5
0003 53
                                      %esp,%ebp
                            movl
                            pushl
0004 8B5D08
                            movl
                                      8(%ebp), %ebx
0007 B8000000
                            movl
                                      $0, %eax
                                                      # initialize length count in eax
00
000c EB04
                                      test
                            ami
000f 83C304
                            addl
                                      $4, %ebx
                                                      # and move to next array element
                                      (%ebx), %ecx # load array element
$0, %ecx # test for end of array
loop # repeat if we're not done ...
0012 8B0B
0014 83F900
0017 75F5
                            ine
                            popl
001a 89EC
                                       %ebp,%esp
001c 5D
                            popl
                                      %ebp
Machine code
                                                                            Assembly code
```

addresses labels /offsets .globl f directive %ebp %esp,%ebp %ebx 0001 89E5 movl comments pushl 0004 8B5D08 movl 8(%ebp), %eb 0007 B8000000 movl \$0, %eax initialize length count in eax 000c EB04 test 000e loop: 000f 83C304 addl \$4, %ebx and move to next array element load array element
test for end of array
repeat if we're not done ... 0012 8B0B test: (%ebx), %ecx 83F900 0017 75F5 ine 1000 0019 5B popl 001a 89EC 001c 5D 001d C3 %ebp,%esp popl %ebp machine instructions code

# IA32 registers

# 8-bit registers (holding a single byte, 0-255)



Introduced in 1978 as part of the 8086 architecture

or the oood arenitecture

## 16-bit registers ("word")

accumulator ax ah al base bx bh bl count cx ch cl data dx dh dl source index di base pointer bp stack pointer sp

Introduced in 1978 as part of the 8086 architecture

32-bit registers ("double word")

 accumulator
 eax
 ax
 ah
 al

 base
 ebx
 bx
 bh
 bl

 count
 ecx
 cx
 ch
 cl

 edx
 dx
 dh
 dl

 source index
 esi
 si

 destination index
 edi
 di

 base pointer
 ebp
 bp

 esp
 sp

"e" for extended

sometimes referred to as "long word"s Introduced in 1985 as part of the 80386 architecture

40

# Special vs. general purpose registers

- eip: the instruction pointer register
- esp: the stack pointer register
- eflags: the flags register, stores information about the results of the most recent arithmetic or logic instruction
- Other registers can typically be used for any purpose (although some instructions—division, for example—work only with specific registers)

IA32 instructions

I |

#### Instruction format

• A typical IA32 instruction has the form:



- A suffix on the opcode indicates the size of the data that is being operated on:
  - 32-bit values use the suffix I(ong)
  - 16-bit values use the suffix w(ord)
  - 8-bit values use the suffix **b**(yte)

#### Addressing modes

- Register access, reg:
  - %eax: the value in register eax
  - Can typically use any registers except eip and eflags
- Memory access, mem:
  - var: the value in the memory location at address var
  - (%eax): the value in memory at the address in eax
  - 8 ( %eax): the value in memory at the address given by adding 8 to the value in eax
- Immediate, immed:
  - \$42: the constant value 42 (decimal; use \$0x2A for hex)
  - \$var: the address of memory location var

43

## Directives for "declaring" variables

```
.data
                         # put variables in the "data" section
                         # (code usually goes in .text)
         .align 4
                         \# make sure address is multiple of 4
       .long 42
                        # Simple variable, initialized to 42
myvar:
        .global days # A globally accessible array of ints
        .long 31, 28, 31, 30, 30, 30
.long 31, 31, 30, 31, 30, 31
days:
scratch:.space 4*100 # reserve uninitialized space
                        # a 32-bit integer (takes 4 bytes)
medium: .long 123
regular:.short 123
small: .byte 123
                        # a 16-bit integer (takes 2 bytes)
                        # an 8-bit integer (takes 1 byte)
```

#### How values are stored in memory

• A double word holds 32 binary digits ("bits") (i.e., 4 bytes)



- 0xBE1A3910 can be interpreted as -1,105,577,712 (signed) or 3,189,389,584 (unsigned)
- Stored in memory with the least significant byte at the lowest address ("little endian"):

| stored byte | 0x10 | 0x39 | 0x1A | 0xBE |
|-------------|------|------|------|------|
| address     | 400  | 401  | 402  | 403  |

46

IA32 instructions:

data movement

#### Move instructions

• Copy data from a source to a destination (where X is one of the size suffixes: b,w,1):

movX src, dst

• Any of the following combinations of arguments is allowed:

• Note that you can't move mem to mem in one instruction

47

#### **Examples**

Suppose that the memory (starting at address 0) contains the following (four byte) values:



Then

| instruction        | contents of eax |
|--------------------|-----------------|
| movl \$12, %eax    | 12              |
| movl (%eax), %eax  | 8               |
| movl 8(%eax), %eax | 0               |

## Zero and sign-extension

 Suppose we want to copy a value from a 16-bit register in to a 32-bit register

- Two common strategies:
  - Zero extension: for unsigned values



• Sign extension: for signed values



50

#### Move with sign, move with zero extension

• Copy from source to larger destination with sign extension:

• Copy from source to larger destination with zero extension:

- F and T are the "from" and "to" sizes (either b, w, or 1)
- Valid combinations: bw, b1, or w1
- Examples:

```
movsbw %al, %dx  # byte to word
movzwl %ax, %edx  # word to long
```

51

## Scaled indexed addressing

- •[base]([reg<sub>1</sub>], reg<sub>2</sub>[, index])
  - a memory operand whose address is the value in reg<sub>1</sub>, plus the specified base constant, plus the value of reg<sub>2</sub> times the index (which must be 1, 2, 4, or 8)
- •Any of the parts in [...] can be omitted
- Examples:

(eax,ebx,4) the ebx<sup>th</sup> element in the array of 32-bit words starting at the address in eax days(,ebx,4) the ebx<sup>th</sup> element in the array of 32-bit words starting at the address days

52

# More examples

Suppose that the memory (starting at address 0) contains the following (four byte) values:



Then

| instruction                | eax | ebx |
|----------------------------|-----|-----|
| movl \$12, %eax            | 12  |     |
| movl 8(%eax), %ebx         | 12  | 2   |
| movl 12(%eax,%ebx,4), %eax | 7   | 2   |

# The lea (load effective address) instruction

• Load the address of the source operand (must be memory) to a destination (where X is one of the size suffixes: b,w,1):

• Can also be used to co-opt the addressing mode circuitry into performing arithmetic operations:

```
leal 4(%eax),%eax # eax += 4
leal 1(%eax,%eax,2),%eax # eax = 3*eax + 1
leal 1(%eax,%eax), %eax # eax = 2*eax + 1
leal 4(,%eax,8), %eax # eax = 8*eax + 4
```

• These instructions just do an address calculation and do not attempt to read the data at that address.

53

#### The exchange instruction

• Exchange data between two locations

• Consider the following instructions in a high-level language:

 If x and y are held in registers, then a "clever enough" compiler can translate this code into a single xchgl instruction

#### The instruction pointer, eip

- The eip register holds the address of the next instruction to be executed
- As the processor reads each instruction, it increments the value in eip by the appropriate number of bytes to point to the following instruction
- This mechanism allows the processor to execute a sequence of instructions stored in contiguous locations in memory
- What would happen if we "move" a different value in to eip?

56

## Jumping and labels

 We can transfer control and start executing instructions at address addr by using a jump instruction

• Labels can be attached to instructions in an assembly language program: jmp b

a: jmp c
b: jmp a
c: ...

 Modern, pipelined machines work well with sequences of instructions that appear in consecutive locations. Jumps can be expensive: one of the goals of an optimizing compiler is to avoid unnecessary jumps. IA32 instructions: arithmetic and logic operations

58

#### Arithmetic instructions

• Combine a given src with a given dst value and leave the result in dst:

• Similar to dst += src, dst -= src, etc.. in C/C++

# **Examples**

• To compute  $x^2 + y^2$  and store the result in z:

movl x, %eax
imull %eax, %eax
movl y, %ebx
imull %ebx, %ebx
addl %ebx, %eax
movl %eax, z

| register | contents |
|----------|----------|
| eax      | x2+y2    |
| ebx      | $v^2$    |

.data

x: .long 4 y: .long 3

z: .long 0

59

# IA32 instructions: conditional execution

# Flags

• In addition to performing the required operation, arithmetic instructions also change bits in the eflags register



- The flags record details about the last operation, such as:
  - Was the result zero?
  - Was the result positive?
  - Did a carry occur?
  - etc...

62

## Conditional jumps, jCC

We can test these flags in conditional jump instructions

```
jz addr
             (jump to addr if the zero flag is set)
             (jump to addr if the zero flag is not set)
jnz addr
je addr
             (jump to addr if equal; same as jz)
             (jump to addr if not equal; same as jnz)
ine addr
jl addr
             (jump to addr if less than)
inl addr
             (jump to addr if not less than)
                                               (signed)
             (jump to addr if greater than)
jg addr
             (jump to addr if not greater than)
jng addr
```

Examples

```
jump to addr
subl
         %eax,%ebx
                           if ebx = eax
         addr
jΖ
                          jump to addr
         %eax,%ebx
subl
                          if ebx \neq eax
         addr
jnz
                          jump to addr
         %eax,%ebx <
subl
                           if ebx < eax
         addr
                          jump to addr
subl
         %eax, %ebx <
                          if ebx >= eax
jnl
         addr
```

If the specified condition does not apply, then execution just continues with the next instruction ...

64

63

#### The compare instruction

- The cmpX instruction behaves like subX except that the result is not saved; only the flags are changed
- For example: cmpl %eax, %ebx jl addr

will jump to addr if the value in ebx is less than the value in eax, but it will not change the values in either register

#### Other conditional instructions

- There are some other instructions that perform an action based on the conditional flags without the cost of a jump
- setCC reg8 sets the value in a specified 8-bit register to 0 or 1, based on the condition specified by CC:

```
cmpl %ecx,%ebx # set eax to 1 if
setl %al # ebx < ecx, or
movzbl %al,%eax # else to 0</pre>
```

• cmovCC src, dst copies data from the specified src to dst, but only if the condition specified by CC holds:

```
cmpl %ebx,%eax # set eax to the max of
cmovl %ebx,%eax # eax and ebx

condition code; no size suffix here!
```

65

# IA32 instructions: more arithmetic

# Unary operations

 The following arithmetic operations have only one argument (which serves as both source and destination)

• Like the binary operators, these instructions also set the flags for subsequent testing

68

## Bitwise shift operations

• Shift operations are handled using instructions of the form:

op count,(reg | mem)



- count is either a constant or else the %cl register.
- In all cases, the count value will be masked to 31 bits.

69

## Example

• Given two 32 bit input values:

• base: Each box is one nibble (4 bits), least significant bits on the right

• Calculate a 64 bit descriptor:



• (Needed for the calculation of "GDT entries")

70



## Bitwise rotate operations

• Rotate operations use the same instruction format:



• [Aside: Curiously, "higher level" languages often include shift operators, but not rotates, even though the latter have more interesting/uniform behavior ...]

#### Division

• Divide implicit destination (edx:eax) (a 64-bit quantity) by a specified argument with result in eax and remainder in edx

 Often used in conjunction with the cltd instruction ("convert long to double", a.k.a. cdq), which converts a signed 32-bit value in eax into the corresponding signed 64bit value in edx:eax.



Example I

Divide 4,660 (i.e., 0x1234) by 25:

movl \$0x1234, %eax
cltd
movl \$25, %ecx
idivl %ecx

Results: eax = 0xBA (186)edx = 0xA (10)

Sure enough: 186\*25 + 10 = 4,660

74

## Example 2

Divide -1,105,577,712 (i.e., 0xBE1A3910) by 256

movl \$0xBE1A3910, %eax cltd movl \$256, %ecx idivl %ecx

Results: eax = 0xFFBE1A3A (-4,318,662)edx = 0xffffff10 (-240)

Sure enough: -4,318,662 \* 256 - 240 = -1,105,577,712

Complications of division

- Division produces multiple results: a quotient and a remainder
- Division uses special registers: we'd better not store any other values in eax or edx if there's a chance that a division instruction might be executed
- Doesn't set flags: requires separate tests, for example, to determine whether quotient or remainder was zero
- Division can raise an exception if the src is zero (or -1)

76

IA32 instructions: using the stack

#### Stack

- The IA32 includes features that allow the programmer to use a region of memory as a simple stack:
  - the esp (stack pointer) register
  - $^{\bullet}$  special instructions like push, pop, call, ret, ...
- There is no obligation for the programmer to use these features, but it is often convenient to do so:
  - for temporary/scratch storage when a calculation needs more storage than the CPU registers can provide
  - to support calling and returning from functions

7

75

## A typical memory layout

• A typical operating system reserves an area of scratch memory for each program, and sets the esp register to point to the end of this region when the program begins



- The stack pointer moves
  - · down (decreases) as values are pushed on to the stack
  - up (increases) as values are popped off of the stack
- So long as they never overlap, the data and stack areas can grow or shrink as necessary as the program runs

#### Stack operations

• Push a value onto the stack

• Pop a value of the stack

· Roughly speaking:

```
pushl src = subl $4, %esp; movl src, (%esp)
popl dst = movl (%esp), dst; addl $4, %esp
```

30

# Spilling temporaries on the stack

- The stack is often used for saving the contents of a register on the stack ("spilling") so that the register can be used, temporarily, for some other reason
- For example: push1 %eax

pushl %edx

... code that changes eax and/or edx ...

popl %edx =
popl %eax

pop values in reverse order that was used to push them!

• Note that values on the stack can still be accessed, from memory, using (%esp), 4(%esp), 8(%esp), 12(%esp), ...

Call and return

• There is a special instruction for calling a function

call addr ~ pushl \$lab jmp addr lab:...

• And a special instruction for returning from a function

ret \( \approx \) popl \( \approx \) eax \( \approx \) used for something elso \( \text{...} \)

• In practice, additional instructions are often needed to deal with parameter passing, etc. ...

special syntax: jump to the address given by the contents of eax

\_\_\_\_

Functions and the System V ABI

# Implementing functions

- How do we pass arguments to a function?
- · How does a function return a result?
- How do we handle local variables?
- In principle, especially in a bare metal setting, we can implement these features any way we like, using the basic tools that the IA32 instruction set provides
- But there are some existing standards we can follow, notably the "System V IA32 Application Binary Interface (ABI)":

http://www.sco.com/developers/devspecs/abi386-4.pdf particularly Section 3-9

83

#### Stack frames

The code for any given function/procedure call runs in the context of a stack frame of the form:



- Frame (base) pointer: ebp points to the stack frame; the caller's frame pointer is stored in old (i.e., (%ebp))
- Return address: retn is the return address
- <u>Actual parameters</u>: a<sub>1</sub>,...,a<sub>n</sub> are the function's arguments. We can access a<sub>1</sub> as 8 (%ebp), etc...
- Local variables:  $l_1, ..., l_m$  are the function's local variables. We can access  $l_1$  as -4 ( %ebp ), etc...

## Building the stack frame ... in the caller



• The <u>caller</u> starts by pushing the arguments:



• Then it executes a call instruction, which pushes the return address:



• ... and jumps to the code for the callee ...

36

## Building the stack frame ... in the callee



• The <u>callee</u> saves the old frame pointer, and sets a new value:



• Then it decrements the stack pointer to reserve space for any local variables:



• ... and now the callee can start work ...

## Function prologue

- The code that builds the stack frame at the start of a function body is called the <u>prologue</u>:
  - At the beginning of a function body, the parameters and return address have already been pushed on to the stack.
     We need to:

pushl %ebp # save old frame pointer
movl %esp, %ebp # and set new value

• If local variables taking M bytes of storage are required, then we need to reserve space for them:

88

# Function epilogue

- When a function completes, we must dismantle the stack frame and return the machine to the state it was in before the call. The code to do this is called the <a href="mailto:epilogue">epilogue</a>:
  - Running the previous process in reverse:

```
movl %ebp, %esp # discard locals/temps
popl %ebp # restore frame pointer
ret # return to caller
```

 The first two instructions here can be replaced with the more efficient, but otherwise equivalent leave instruction

# Removing the parameters

• Once we return to the caller, the result of the function is in eax, but the parameters are still on the stack:



• We restore the stack pointer to its original value by adding on the number of bytes that are used by the parameters:

• If no parameters were passed, then this step can be omitted

#### Example: a leaf function



## Example: multiple parameters + call

```
int f(int x,
                            f: pushl %ebp
      int y,
                               movl
                                     %esp, %ebp
      int z) {
                               movl
                                     8(%ebp), %eax
  return g(x+y);
                               addl
                                     12(%ebp), %eax
                               pushl %eax
                               call
                               addl
                                     $4, %esp
                               movl
                                     %ebp, %esp
                               popl
                                     %ebp
                               ret
             old retn
                         X
                              У
        esp=ebp
                 ebp+4 ebp+8 ebp+12 ebp+16
```

91

#### Example: spilling

```
int h(int x,
                           h: pushl %ebp
      int y,
                              movl
                                    %esp, %ebp
     int z) {
                              pushl 8(%ebp)
    return g(x)+g(y);
                              call q
                              addl $4,%esp
                              pushl %eax -- spill
                              pushl 12(%ebp)
                              call
                              addl $4, %esp
                              popl %ecx -- unspill
                              addl %ecx, %eax
                              movl
                                    %ebp, %esp
                              popl
                                    %ebp
                              ret
        spill
              old
                    retn
                               ebp+12 ebp+16
                         ebp+8
```

#### **Observations**

- There is a four instruction overhead for each function that uses the frame pointer
  - · Increases execution time
  - Prevents use of ebp as a general purpose register
- For larger functions, the four instruction overhead is less of an issue
- For small functions, we would prefer to inline rather than copy
- Nevertheless, it is common to produce code that doesn't use ebp as a frame pointer (e.g., -fomit-frame-pointer in gcc)

94

#### Caller and callee saves

We (System V) can designate some registers as:

- caller saves (eax, ecx, and edx)
  - · can be freely used by the callee
  - the caller is responsible for saving (and later restoring) the value of a caller save register before a call
- callee saves (ebp, ebx, esi, and edi)
  - can be freely used by the caller
  - the callee is responsible for saving (and later restoring) the value of a callee saves register before using it to store temporary values

# Revisiting the previous example: h

```
int h(int x,
                            h: push1 %ebp
                                movl %esp, %ebp
      int y,
      int z) {
    return g(x)+g(y);
                                pushl 8(%ebp) -- x
                                call
                                addl
                                      $4,%esp
                                pushl %eax -- spill
      instead of having the
                                pushl 12(%ebp) -- y
     compiler save this value
                                call q
        on the stack ...
                                addl $4, %esp
                                popl
                                      %ecx -- unspill
                                addl
                                      %ecx, %eax
                                movl
                                      %ebp, %esp
                                popl
                                      %ebp
                                ret.
```

95

#### Revisiting the previous example: h

```
int h(int x,
      int y,
                                  movl %esp, %ebp
      int z) {
    return g(x)+g(y);
                                   pushl 8(%ebp) -- x
                                   call
                                   addl $4,%esp
                                  movl %eax, %esi
      ... we can move it to a
                                   pushl 12(%ebp) -- y
     callee saves register, esi,
                                   call
                                         q
     g will preserve the value
                                   addl
                                         $4, %esp
       in esi, if necessary
                                   addl
                                         %esi, %eax
     so it will still contain the
                                   movl
                                         %ebp, %esp
      correct value here...
                                   popl
                                         %ebp
                                   ret.
```

#### Revisiting the previous example: h

```
int h(int x,
                               h: pushl %ebp
      int y,
                                  movl %esp, %ebp
      int z) {
                                  pushl %esi
    return g(x)+g(y);
                                  pushl 8(%ebp)
                                  call
                                  addl
                                         $4,%esp
     ... now h has to save the
                                  movl %eax, %esi
                                  pushl 12(%ebp) -- y
      value in register, esi
                                  call
                                  addl
                                         $4, %esp
      one save in h is better
    than one saves in each of
                                  addl %esi, %eax
         two calls to g
                                         %esi
                                  popl
                                  movl
                                         %ebp, %esp
      empirically, more than
                                  popl
                                         %ebp
     50% of calls are to leaf
                                  ret.
           functions
```

# Closing thoughts

## Assembly "Language"?

- Highly imperative, primitive instructions, no expressions
- No high-level abstractions, but all the building blocks:
  - No arrays, records, variants, objects, closures, ...
  - No loops, switch statements, functions, local variables, ...
- Type System?
  - Values classified by size (e.g., 8 vs 32 bits) and storage class (e.g., memory, flag, integer register, floating point register, ...)
  - Limited protection against common programming mistakes
  - Programmer has full control over data representation

100

```
Macros
                    macro name
                                     parameters
                                                           defaults
       .macro idtcalc handler, slot, dpl=0, type=IDT_INTR, seg=KERN_CS
               $\seg, %ax
                                     # eax = seg # 0
# edx = hhi # hlo
               S\handler. %edx
        movl
               movl
               %edx, idt + (4 + 8*\slot)
                                                   loop constructs
       .endm
initIDT:.irp
              num, 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,16,17,18,19
       idtcalc handler=syscall, slot=0x80, dpl=3 <
       lidt
```

Macros support compile-time code generation:

- · Macros are like "used-defined instructions"
- Also supported: conditional code generation, loops, ...
- · One way to compensate for language weaknesses?

#### Summary

- IA32 provides a very basic programming language:
  - A fixed set of registers
  - · Instructions for moving and operating on data
  - Instructions for testing and control transfer
- In programming language terms:
  - Low-level, primitive instructions, loosely typed
  - No high-level abstractions, but all the building blocks
  - Very close to the metal, low-level control, "predictable" performance
- Let's write some programs!