

• A high-level understanding of System Call interface
• Mostly from the user's perspective
• From textbook (section 1.6)

• Understanding of how the application-kernel boundary is crossed with system calls in general
• Including an appreciation of the relationship between a case study (05/161 system call handling) and the general case.

• Exposure architectural details of the MIPS R3000
• Detailed understanding of the of exception handling mechanism
• From "Hardware Guide" on class web site

2 JUNSW

1



System Calls

Can be viewed as special function calls
Provides for a controlled entry into the kernel
While in kernel, they perform a privileged operation
Returns to original caller with the result
The system call interface represents the abstract machine provided by the operating system.

5

The System Call Interface:
A Brief Overview

• From the user's perspective
• Process Management
• File I/O
• Directories management
• Some other selected Calls
• There are many more
• On Linux, see man syscalls for a list

6

2





System Calls • A stripped down shell: while (TRUE) { /\* repeat forever \*/ type\_prompt(); /\* display prompt \*/ /\* input from terminal \*/ read\_command (command, parameters) if (fork() != 0) { /\* fork off child process \*/ /\* Parent code \*/ waitpid( -1, &status, 0); /\* wait for child to exit \*/ } else { /\* Child code \*/ execve (command, parameters, 0); /\* execute command \*/



9 10





11

12







Privileged-mode Operation Memory Address Space • The accessibility of addresses 0xFFFFFFF within an address space changes Accessible only depending on operating mode To protect kernel code and data Kernel-mode Note: The exact memory ranges 0x80000000 are usually configurable, and vary between CPU architectures and/or operating systems. Accessible to User- and Kernel-mode 0x00000000

15 16



Questions we'll answer

• There is only one register set
• How is register use managed?
• What does an application expect a system call to look like?

• How is the transition to kernel mode triggered?
• Where is the OS entry point (system call handler)?
• How does the OS know what to do?

17 18

# System Call Mechanism Overview

- System call transitions triggered by special processor instructions
  - User to Kernel
    - · System call instruction
  - Kernel to User
    - Return from privileged mode instruction

19 🌉 UNSW

# System Call Mechanism Overview

- Processor mode
  - Switched from user-mode to kernel-mode
     Switched back when returning to user mode
- Stack Pointer (SP)
  - User-level SP is saved and a kernel SP is initialised
  - User-level SP restored when returning to user-mode
- Program Counter (PC)
  - User-level PC is saved and PC set to kernel entry point

  - User-level PC restored when returning to user-level
     Kernel entry via the designated entry point must be strictly enforced

20 JUNSW

19

System Call Mechanism Overview

- Registers
  - Set at user-level to indicate system call type and its arguments
    - A convention between applications and the kernel
  - Some registers are preserved at user-level or kernel-level in order to restart user-level execution
    - Depends on language calling convention etc.
  - Result of system call placed in registers when returning to
    - Another convention

Steps in Making a System Call

Why do we need system calls?

- Why not simply jump into the kernel via a function call?????
  - Function calls do not
    - · Change from user to kernel mode
    - and eventually back again
    - Restrict possible entry points to secure locations
      - To prevent entering after any security checks

21

22

20

# Address 0xFFFFFFF There are 11 steps in making the system call

read (fd, buffer, nbytes)

The MIPS R2000/R3000

• Before looking at system call mechanics in some detail, we need a basic understanding of the MIPS R3000

23









27 28

|    | c0_ca                                                                                                                                                                                                                             | ause  |         |             |                                                                                     |        |                           |      |      |      |   |     |
|----|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------|---------|-------------|-------------------------------------------------------------------------------------|--------|---------------------------|------|------|------|---|-----|
| 31 | 30                                                                                                                                                                                                                                | 29    | 28      | 27          | 16                                                                                  | 15     | 8                         | 7    | 6    | 2    | 1 | 0   |
| BD | 0                                                                                                                                                                                                                                 | CE    |         | 0           |                                                                                     | IP     | 9                         | 0    | Exc0 | Code | 0 |     |
|    |                                                                                                                                                                                                                                   | F     | igure 3 | 3.3. Fields | s in t                                                                              | he Cau | ıse reg                   | iste | r    |      |   |     |
|    | <ul> <li>IP         <ul> <li>Interrupts pending</li> <li>8 bits indicating current state of interrupt lines</li> </ul> </li> <li>CE         <ul> <li>Coprocessor error</li> <li>Attempt to access disabled</li> </ul> </li> </ul> |       |         | •           | BD     If set, the instruction that caused the exception was in a branch delay slot |        |                           |      |      |      |   |     |
|    |                                                                                                                                                                                                                                   | opro. | access  | aisablea    | • 1                                                                                 |        | de<br>code nu<br>ption ta |      |      | ne   |   |     |
|    |                                                                                                                                                                                                                                   |       |         |             |                                                                                     |        |                           |      |      | 29   | U | NSW |

| ExcCode<br>Value | Mnemonic      | Description                                                                                                                           |
|------------------|---------------|---------------------------------------------------------------------------------------------------------------------------------------|
| 0                | Int           | Interrupt                                                                                                                             |
| 1                | Mod           | "TLB modification"                                                                                                                    |
| 2                | TLBL          | "TLB load/TLB store"                                                                                                                  |
| 3                | TLBS          |                                                                                                                                       |
| 4                | AdEL          | Address error (on load/I-fetch or store respectively).                                                                                |
| 5                | AdES          | Either an attempt to access outside kuseg when in user<br>mode, or an attempt to read a word or half-word at a<br>misaligned address. |
| 2                | Table 3.2. Ex | ccCode values: different kinds of exceptions                                                                                          |

29 30





| Program<br>address |      | "segment"     | Physical<br>Address |        | Description                                                                             |  |  |  |
|--------------------|------|---------------|---------------------|--------|-----------------------------------------------------------------------------------------|--|--|--|
| 0008x0             | 0000 | kseg0         | 0x0000              | 0000   | TLB miss on kuseg reference only.                                                       |  |  |  |
| 0008x0             | 0800 | kseg0         | 0x0000              | 0080   | All other exceptions.                                                                   |  |  |  |
| 0xbfc0 (           | 0100 | kseg1         | 0x1fc0              | 0100   | Uncached alternative <i>kuseg</i> TLB miss entry point (used if <i>SR</i> bit BEV set). |  |  |  |
| 0xbfc0 (           | 0180 | kseg1         | 0x1fc0              | 0180   | Uncached alternative for all other exceptions, used if $SR$ bit BEV set).               |  |  |  |
| 0xbfc0 (           | 0000 | kseg1         | 0x1fc0              | 0000   | The "reset exception".                                                                  |  |  |  |
| Table              | 4.1. | Reset and exc | eption e            | ntry p | oints (vectors) for R30xx family                                                        |  |  |  |



33 34





35 36





Hardware exception handling EPC 0x80000080 0x12345678 Cause KUo IEo KUp IEp KUc IEc Address of general exception vector placed in PC 39 JUNSW 39



Returning from an exception • For now, lets ignore how the exception is actually handled • how user-level registers are preserved • Let's simply look at how we return from the exception 41 JUNSW



41 42





Returning from an exception

PC

Ox12345678

• We are now back in the same state we were in when the exception happened

Cause

Status

KUO IEO KUP IEP KUC IEC

? ? ? ? 1 1

• System calls are invoked via a syscall instruction.

• The syscall instruction causes an exception and transfers control to the general exception handler

• A convention (an agreement between the kernel and applications) is required as to how user-level software indicates

• Which system call is required

• Where its arguments are

• Where the result should go

46

45

OS/161 Systems Calls

OS/161 uses the following conventions
Arguments are passed and returned via the normal C function calling convention
Additionally
Reg v0 contains the system call number
On return, reg a3 contains

Oif success, v0 contains successful result

Ont C: If failure, v0 has the errno.
V0 stored in errno
Treturned in v0

Convention for kernel entry

| Convention for kernel entry | Fig. | Fig.

47 48



User-Level System Call Walk Through - Calling int read(int filehandle, void \*buffer, size\_t size) Three arguments, one return value Code fragment calling the read function 400124: 02602021 move a0,s3 400128: 27a50010 addiu a1,sp/16 0c1001a3 40012c: jal 40068c <read> 400130: 24060400 li a2,1024 400134: 00408021 move s0,v0 blez s0,400194 400138: 1a000016 <docat+0x94> · Args are loaded, return value is tested 50 JUNSW

49 50

Inside the read() syscall function
part 1

0040068c <read>:
 40068c: 08100190 j 400640
<\_\_syscall>
 400690: 24020005 li v0,5

• Appropriate registers are preserved
 • Arguments (a0-a3), return address (ra), etc.

• The syscall number (5) is loaded into v0

• Jump (not jump and link) to the common syscall routine

The read() syscall function part 2 Generate a syscall exception 00400640 < syscall>: syscall 400640: 000000c 400644: 10e00005 beqz a3,40065c <\_\_syscall+0x1c> 400648: 00000000 nop 40064c: 3c011000 lui at,0x1000 400650: ac220000 sw v0,0(at) 2403ffff li v1.-1 400654: 400658: 2402ffff li v0,-1 40065c: 03e00008 jr 400660: 00000000 nop

51 52

The read() syscall function part 2 Test success, if yes, branch to return 00400640 <\_\_syscall>: from function syscall 400640: 000000c beqz a3,40065c < syscall+0x1c> 400644: 10e00005 00000000 400648: nop 40064c: 3c011000 lui at,0x1000 400650: ac220000 sw v0,0(at) 400654: 2403ffff li v1,-1 400658: 2402ffff li **v**0,-1 40065c: 03e00008 jr ra 400660: 00000000 53 JUNSW

The read() syscall function part 2 00400640 <\_\_syscall>: If failure, store code 400640: 000000c syscall in errno 400644: 10e00005 beqz a3,40065 00000000 400648: nop 40064c: 3c011000 lui at,0x100 400650: ac220000 sw v0,0(at) 400654: 2403ffff li v1,-1 400658: 2402ffff li **v**0,-1 03e00008 40065c: jr ra 400660: 00000000 nop

53 54

The read() syscall function part 2 00400640 <\_\_syscall>: Set read() result to 000000c 400640: syscall 400644: 10e00005 beqz a3,40065 400648: 0000000 nop 40064c: 3c011000 lui at.0x10 400650: ac220000 v0,0( 400654: 2403ffff 1i v1, 1 400658: 2402ffff 1i v0,-1 40065c: 03e00008 jr ra 00000000 400660: nop 55 🌉 UNSW



55

### Summary

- From the caller's perspective, the read() system call behaves like a normal function call
  - It preserves the calling convention of the language
- However, the actual function implements its own convention by agreement with the kernel
- Our OS/161 example assumes the kernel preserves appropriate registers(s0-s8, sp, gp, ra).
- Most languages have similar *libraries* that interface with the operating system.

57 🐺 UNSW

• Things left to do
• Change to kernel stack
• Preserve registers by saving to memory (on the kernel stack)
• Leave saved registers somewhere accessible to
• Read arguments
• Store return values
• Do the "read()"
• Restore registers
• Switch back to user stack
• Return to application

57 58

## OS/161 Exception Handling

Note: The following code is from the uniprocessor variant of OS161 (v1.x).
Simpler, but broadly similar to current version.

59



59 60

```
exception:
  move k1, sp
                           /* Save previous stack pointer in k1 */
                          /* Get status register */
  mfc0 k0, c0_status
  andi k0, k0, CST_Kup /* Check the we-were-in-user-mode bit */
         k0, $0, 1f /* If clear, from kernel, already have stack */
  beq
                          /* delay slot */
  nop
  /* Coming from user mode - load kernel stack into sp */
  la k0, curkstack
                          /* get address of "curkstack" */
                           /* get its value */
  lw sp, 0(k0)
                           /* delay slot for the load */
  nop
  mfc0 k0, c0_cause /* Now, load the exception cause. */
                           /* Skip to common code */
  j common_exception
                           /* delay slot */
                                                          61 🌉 UNSW
```

common\_exception:

/\*

\* At this point:

\* Interrupts are off. (The processor did this for us.)

\* k0 contains the exception cause value.

\* k1 contains the old stack pointer.

\* sp points into the kernel stack.

\* All other registers are untouched.

\*/

/\*

\* Allocate stack space for 37 words to hold the trap frame,

\* plus four more words for a minimal argument block.

\*/
addi sp, sp, -164

61 62

```
/* The order here must match mips/include/trapframe.h. */
sw ra, 160(sp) /* dummy for gdb */
 sw s8, 156(sp)
                      /* save s8 */
                      /* dummy for gdb */
 sw sp, 152(sp)
                      /* save gp */
 sw qp, 148(sp)
                      /* dummy for gdb */
 sw k1, 144(sp)
                                                    These six stores are a "hack" to avoid
 sw k0, 140(sp)
                      /* dummy for gdb */
                                                       confusing GDB
  sw k1, 152(sp)
                      /* real saved sp */
                                                      You can ignore the
 nop
                      /* delay slot for store */
                                                      details of why and
                                                            how
 mfc0 k1, c0 epc
                     /* Copr.0 reg 13 == PC for
                      /* real saved PC */
 sw k1, 160(sp)
```

/\* The order here must match mips/include/trapframe.h. \*/ sw ra, 160(sp) /\* dummy for gdb \*/ The real work starts sw s8, 156(sp) /\* save s8 \*/ here /\* dummy for gdb \*/ sw sp, 152(sp) /\* save gp \*/ sw qp, 148(sp) /\* dummy for gdb \*/ sw k1, 144(sp) sw k0, 140(sp) /\* dummy for gdb \*/ /\* real saved sp \*/ sw k1, 152(sp) nop /\* delay slot for store \*/ /\* Copr.0 reg 13 == PC for exception \*/ mfc0 k1, c0 epc /\* real saved PC \*/ sw k1, 160(sp) 64 JUNSW

63

```
sw t9, 136 (sp)
sw t8, 132 (sp)
sw s7, 128 (sp)
sw s6, 122 (sp)
sw s5, 120 (sp)
sw s4, 116 (sp)
sw s2, 108 (sp)
sw s2, 108 (sp)
sw s2, 108 (sp)
sw s1, 104 (sp)
sw s0, 100 (sp)
sw t6, 92 (sp)
sw t6, 92 (sp)
sw t6, 93 (sp)
sw t2, 76 (sp)
sw t2, 76 (sp)
sw t2, 76 (sp)
sw t0, 68 (sp)
sw t1, 174 (sp)
sw t0, 68 (sp)
sw t1, 174 (sp)
sw t2, 76 (sp)
sw t2, 76 (sp)
sw t1, 174 (sp)
sw t2, 76 (sp)
sw t2, 76 (sp)
sw t3, 64 (sp)
sw t4, 84 (sp)
sw t4, 84 (sp)
sw t7, 44 (sp)
sw v7, 44 (sp)
```

\* Save special registers. We can now use the mfhi t0 \_ other registers (t0, t1) mflo t1 sw t0, 32(sp) that we have sw t1, 28(sp) preserved on the stack \* Save remaining exception context information. sw k0, 24(sp) mfc0 t1, c0\_status sw t1, 20(sp) /\* k0 was loaded with cause earlier \*/
/\* Copr.0 reg 11 == status \*/ /\* Copr.0 reg 8 == faulting vaddr \*/ mfc0 t2, c0 vaddr sw t2, 16(sp) \* Pretend to save \$0 for gdb's benefit. sw \$0, 12(sp) 66 🐺 UNSW

65 66





Now we arrive in the 'C' kernel

/\*

\* General trap (exception) handling function for mips.

\* This is called by the assembly-language exception handler once

\* the trapframe has been set up.

\*/

void

mips\_trap(struct trapframe \*tf)
{

u\_int32\_t code, isutlb, iskern;
int savespl;

/\* The trap frame is supposed to be 37 registers long. \*/
assert(sizeof(struct trapframe)==(37\*4));

/\* Save the value of curspl, which belongs to the old context. \*/
savespl = curspl;

/\* Right now, interrupts should be off. \*/
curspl = SPL\_HIGH;

What happens next?

• The kernel deals with whatever caused the exception
• Syscall
• Interrupt
• Page fault
• It potentially modifies the trapframe, etc
• E.g., Store return code in v0, zero in a3
• 'mips\_trap' eventually returns

69 70

```
lw t0, 68(sp)
lw t1, 72(sp)
lw t2, 76(sp)
lw t3, 80(sp)
lw t4, 84(sp)
lw t5, 88(sp)
lw t5, 88(sp)
lw t6, 92(sp)
lw a0, 100(sp)
lw a1, 104(sp)
lw a2, 108(sp)
lw a2, 108(sp)
lw a4, 116(sp)
lw a4, 116(sp)
lw a5, 120(sp)
lw a5, 120(sp)
lw a6, 124(sp)
lw t6, 124(sp)
lw t7, 128(sp)
lw t9, 136(sp)
/* 140(sp) "saved" k0 was dummy garbage anyway */
/* 144(sp) "saved" k1 was dummy garbage anyway */
```

71 72