

# The Principle of Computer System

#### Hardware/Boftware interface

#### 楼学庆

浙江大学计算机学院

http://10.214.47.99/

Email:hzlou@163.com





浙江大学计算机学院

## 联系方式

- 网站:
  - http://10.214.47.99
- 邮箱:
  - <u>hzlou@163.com</u>(不收作业)







#### **Course outline**

Name:

#### Computer Systems

Students:

Undergraduate students in department.

• Score: 4.5

Hours/week: 3.5-2

Total: 88 hours







### 《计算机系统原理》课程目标

- 利用一个学期,覆盖《逻辑与计算机设计》、《计算机组成》、《体系结构》与《汇编与接口》等计算机硬件系统系列主要课程。作为非计算机技术方向,学习了解掌握计算机硬件、计算机系统方面知识的主要课程。
  - ☆前导课程:《C语言程序设计》
- 强烈建议不是只想玩软件的同学,改选 《**份**算**见**组 **②**》!







### **Virtual Memory: Concepts**

15-213: Introduction to Computer Systems 17<sup>th</sup> Lecture, Oct. 27, 2015

**Instructors:** 





### **Today**

- Address spaces
- VM as a tool for caching
- VM as a tool for memory management
- VM as a tool for memory protection
- Address translation





### A System Using Physical Addressing



 Used in "simple" systems like embedded microcontrollers in devices like cars, elevators, and digital picture frames

09:33:37



### A System Using Virtual Addressing



- Used in all modern servers, laptops, and smart phones
- One of the great ideas in computer science



浙江大学计算机学院



#### **Address Spaces**

■ Linear address space: Ordered set of contiguous non-negative integer addresses:

$$\{0, 1, 2, 3 \dots \}$$

- Virtual address space: Set of N = 2<sup>n</sup> virtual addresses {0, 1, 2, 3, ..., N-1}
- Physical address space: Set of  $M = 2^m$  physical addresses  $\{0, 1, 2, 3, ..., M-1\}$



### Why Virtual Memory (VM)?

#### Uses main memory efficiently

Use DRAM as a cache for parts of a virtual address space

#### Simplifies memory management

Each process gets the same uniform linear address space

#### Isolates address spaces

- One process can't interfere with another's memory
- User program cannot access privileged kernel information and code





### **Today**

- Address spaces
- VM as a tool for caching
- VM as a tool for memory management
- VM as a tool for memory protection
- Address translation





### VM as a Tool for Caching

- Conceptually, virtual memory is an array of N contiguous bytes stored on disk.
- The contents of the array on disk are cached in physical memory (DRAM cache)
  - These cache blocks are called pages (size is P = 2<sup>p</sup> bytes)





浙江大学计算机学院



#### **DRAM Cache Organization**

- DRAM cache organization driven by the enormous miss penalty
  - DRAM is about 10x slower than SRAM
  - Disk is about 10,000x slower than DRAM

#### Consequences

- Large page (block) size: typically 4 KB, sometimes 4 MB
- Fully associative
  - Any VP can be placed in any PP
  - Requires a "large" mapping function different from cache memories
- Highly sophisticated, expensive replacement algorithms
  - Too complicated and open-ended to be implemented in hardware
- Write-back rather than write-through





### **Enabling Data Structure: Page Table**

- A page table is an array of page table entries (PTEs) that maps virtual pages to physical pages.
  - Per-process kernel data structure in DRAM





#### Page Hit

**Page hit:** reference to VM word that is in physical memory (DRAM cache hit)





#### Page Fault

Page fault: reference to VM word that is not in physical memory (DRAM cache miss)



浙江大学计算机学院



Page miss causes page fault (an exception)





- Page miss causes page fault (an exception)
- Page fault handler selects a victim to be evicted (here VP 4)





- Page miss causes page fault (an exception)
- Page fault handler selects a victim to be evicted (here VP 4)





- Page miss causes page fault (an exception)
- Page fault handler selects a victim to be evicted (here VP 4)
- Offending instruction is restarted: page hit!





#### **Allocating Pages**

Allocating a new page (VP 5) of virtual memory.







### Locality to the Rescue Again!

- Virtual memory seems terribly inefficient, but it works because of locality.
- At any point in time, programs tend to access a set of active virtual pages called the working set
  - Programs with better temporal locality will have smaller working sets
- If (working set size < main memory size)</p>
  - Good performance for one process after compulsory misses
- If (SUM(working set sizes) > main memory size)
  - Thrashing: Performance meltdown where pages are swapped (copied) in and out continuously



### **Today**

- Address spaces
- VM as a tool for caching
- VM as a tool for memory management
- VM as a tool for memory protection
- Address translation





### VM as a Tool for Memory Management

- Key idea: each process has its own virtual address space
  - It can view memory as a simple linear array
  - Mapping function scatters addresses through physical memory
    - Well-chosen mappings can improve locality





### VM as a Tool for Memory Management

- Simplifying memory allocation
  - Each virtual page can be mapped to any physical page
  - A virtual page can be stored in different physical pages at different times
- Sharing code and data among processes
  - Map virtual pages to the same physical page (here: PP 6)





#### Linking

- Each program has similar virtual address space
- Code, data, and heap always start at the same addresses.

#### Loading

- execve allocates virtual pages for .text and .data sections & creates PTEs marked as invalid
- The .text and .data sections are copied, page by page, on demand by the virtual memory system

Memory invisible to **Kernel virtual memory** user code User stack (created at runtime) %rsp (stack pointer) Memory-mapped region for shared libraries brk **Run-time heap** (created by malloc) Loaded Read/write segment from (.data, .bss) the **Read-only segment** executable (.init,.text,.rodata) file

Unused

0x400000

浙江大学计算机华



### **Today**

- Address spaces
- VM as a tool for caching
- VM as a tool for memory management
- VM as a tool for memory protection
- Address translation





### VM as a Tool for Memory Protection

- **Extend PTEs with permission bits**
- MMU checks these bits on each access



浙江大学计算机学院



### **Today**

- Address spaces
- VM as a tool for caching
- VM as a tool for memory management
- VM as a tool for memory protection
- Address translation





#### VM Address Translation

#### Virtual Address Space

#### Physical Address Space

$$P = \{0, 1, ..., M-1\}$$

#### Address Translation

- MAP:  $V \rightarrow P \cup \{\emptyset\}$
- For virtual address a:
  - MAP(a) = a' if data at virtual address a is at physical address a' in P
  - $MAP(a) = \emptyset$  if data at virtual address a is not in physical memory
    - Either invalid or stored on disk

### **Summary of Address Translation Symbols**

#### Basic Parameters

- N = 2<sup>n</sup>: Number of addresses in virtual address space
- M = 2<sup>m</sup>: Number of addresses in physical address space
- **P = 2**<sup>p</sup> : Page size (bytes)

#### Components of the virtual address (VA)

TLBI: TLB index

TLBT: TLB tag

VPO: Virtual page offset

VPN: Virtual page number

#### Components of the physical address (PA)

PPO: Physical page offset (same as VPO)

PPN: Physical page number





### **Address Translation With a Page Table**

#### Virtual address



Physical address



浙江大学计算机学院



#### **Address Translation: Page Hit**



- 1) Processor sends virtual address to MMU
- 2-3) MMU fetches PTE from page table in memory
- 4) MMU sends physical address to cache/memory
- 5) Cache/memory sends data word to processor





#### **Address Translation: Page Fault**



- 1) Processor sends virtual address to MMU
- 2-3) MMU fetches PTE from page table in memory
- 4) Valid bit is zero, so MMU triggers page fault exception
- 5) Handler identifies victim (and, if dirty, pages it out to disk)
- 6) Handler pages in new page and updates PTE in memory
- 7). Handler returns to original processy fresharting faulting instruction



#### **Integrating VM and Cache**



VA: virtual address, PA: physical address, PTE: page table entry, PTEA = PTE address



## Speeding up Translation with a TLB

- Page table entries (PTEs) are cached in L1 like any other memory word
  - PTEs may be evicted by other data references
  - PTE hit still requires a small L1 delay
- Solution: Translation Lookaside Buffer (TLB)
  - Small set-associative hardware cache in MMU
  - Maps virtual page numbers to physical page numbers
  - Contains complete page table entries for small number of pages



#### **Accessing the TLB**

MMU uses the VPN portion of the virtual address to access the TLB:





#### **TLB Hit**



#### A TLB hit eliminates a memory access





#### **TLB Miss**



#### A TLB miss incurs an additional memory access (the PTE)

Fortunately, TLB misses are rare. Why?



浙江大学计算机学院



## **Multi-Level Page Tables**

- Suppose:
  - 4KB (2<sup>12</sup>) page size, 48-bit address space, 8-byte PTE
- Problem:
  - Would need a 512 GB page table!
    - $2^{48} * 2^{-12} * 2^3 = 2^{39}$  bytes
- Common solution: Multi-level page table
- Example: 2-level page table
  - Level 1 table: each PTE points to a page table (always memory resident)
  - Level 2 table: each PTE points to a page (paged in and out like any other data)





## A Two-Level Page Table Hierarchy





#### Translating with a k-level Page Table





#### **Summary**

#### Programmer's view of virtual memory

- Each process has its own private linear address space
- Cannot be corrupted by other processes

#### System view of virtual memory

- Uses memory efficiently by caching virtual memory pages
  - Efficient only because of locality
- Simplifies memory management and programming
- Simplifies protection by providing a convenient interpositioning point to check permissions



## **Virtual Memory: Systems**

15-213: Introduction to Computer Systems 18<sup>th</sup> Lecture, Oct. 29, 2015

**Instructors:** 





# **Today**

- Simple memory system example
- Case study: Core i7/Linux memory system
- Memory mapping





#### **Review of Symbols**

#### Basic Parameters

- N = 2<sup>n</sup>: Number of addresses in virtual address space
- M = 2<sup>m</sup>: Number of addresses in physical address space
- **P = 2**<sup>p</sup> : Page size (bytes)

#### Components of the virtual address (VA)

- TLBI: TLB index
- TLBT: TLB tag
- VPO: Virtual page offset
- VPN: Virtual page number

#### Components of the physical address (PA)

- PPO: Physical page offset (same as VPO)
- PPN: Physical page number
- CO: Byte offset within cache line
- CI: Cache index
- CT: Cache tag





# **Simple Memory System Example**

#### Addressing

- 14-bit virtual addresses
- 12-bit physical address
- Page size = 64 bytes





**Physical Page Number** 

**Physical Page Offset** 





49

## 1. Simple Memory System TLB

- 16 entries
- 4-way associative



| Set | Tag | PPN | Valid |
|-----|-----|-----|-------|-----|-----|-------|-----|-----|-------|-----|-----|-------|
| 0   | 03  | _   | 0     | 09  | 0D  | 1     | 00  | _   | 0     | 07  | 02  | 1     |
| 1   | 03  | 2D  | 1     | 02  | _   | 0     | 04  | _   | 0     | 0A  | _   | 0     |
| 2   | 02  | _   | 0     | 08  | _   | 0     | 06  | _   | 0     | 03  | _   | 0     |
| 3   | 07  | _   | 0     | 03  | 0D  | 1     | 0A  | 34  | 1     | 02  | _   | 0     |

浙江大学计算机学院



## 2. Simple Memory System Page Table

Only show first 16 entries (out of 256)

| VPN | PPN | Valid |
|-----|-----|-------|
| 00  | 28  | 1     |
| 01  | _   | 0     |
| 02  | 33  | 1     |
| 03  | 02  | 1     |
| 04  | _   | 0     |
| 05  | 16  | 1     |
| 06  | _   | 0     |
| 07  | _   | 0     |

| VPN | PPN | Valid |
|-----|-----|-------|
| 80  | 13  | 1     |
| 09  | 17  | 1     |
| 0A  | 09  | 1     |
| ОВ  | _   | 0     |
| OC  | -   | 0     |
| OD  | 2D  | 1     |
| 0E  | 11  | 1     |
| OF  | 0D  | 1     |



# 3. Simple Memory System Cache

- 16 lines, 4-byte block size
- Physically addressed
- Direct mapped



| Idx | Tag | Valid | В0 | B1 | B2 | В3        |
|-----|-----|-------|----|----|----|-----------|
| 0   | 19  | 1     | 99 | 11 | 23 | 11        |
| 1   | 15  | 0     | _  | _  | _  | _         |
| 2   | 1B  | 1     | 00 | 02 | 04 | 08        |
| 3   | 36  | 0     | _  | -  | _  | _         |
| 4   | 32  | 1     | 43 | 6D | 8F | 09        |
| 5   | 0D  | 1     | 36 | 72 | F0 | 1D        |
| 6   | 31  | 0     | -  | _  | _  | _         |
|     | 16  | 1     | 11 | C2 | DF | <u>03</u> |

| ldx      | Tag | Valid | <i>B0</i> | B1 | B2 | В3 |
|----------|-----|-------|-----------|----|----|----|
| 8        | 24  | 1     | 3A        | 00 | 51 | 89 |
| 9        | 2D  | 0     | _         | _  | _  | _  |
| Α        | 2D  | 1     | 93        | 15 | DA | 3B |
| В        | 0B  | 0     | _         | _  | _  | _  |
| С        | 12  | 0     | _         | _  | _  | _  |
| D        | 16  | 1     | 04        | 96 | 34 | 15 |
| E        | 13  | 1     | 83        | 77 | 1B | D3 |
| <b>F</b> | 14  | 0     | _         | _  | _  | _  |



## **Address Translation Example #1**

Virtual Address: 0x03D4



#### **Physical Address**

CO 0







## **Address Translation Example #2**

Virtual Address: 0x0020



#### **Physical Address**

CO 0





浙江大学计算机学院



## **Address Translation Example #3**

Virtual Address: 0x0020



#### **Physical Address**

CO 0







## **Today**

- Simple memory system example
- Case study: Core i7/Linux memory system
- Memory mapping





56

## **Intel Core i7 Memory System**





## **Review of Symbols**

#### Basic Parameters

- N = 2<sup>n</sup>: Number of addresses in virtual address space
- M = 2<sup>m</sup>: Number of addresses in physical address space
- **P = 2**<sup>p</sup> : Page size (bytes)

#### Components of the virtual address (VA)

- TLBI: TLB index
- TLBT: TLB tag
- VPO: Virtual page offset
- VPN: Virtual page number

#### Components of the physical address (PA)

- PPO: Physical page offset (same as VPO)
- PPN: Physical page number
- CO: Byte offset within cache line
- CI: Cache index
- CT: Cache tag



#### **End-to-end Core i7 Address Translation**







#### **Core i7 Level 1-3 Page Table Entries**

| 63 | 62 52  | 51 12                            | 11 9   | 8 | 7  | 6 | 5 | 4  | 3  | 2   | 1   | 0   |
|----|--------|----------------------------------|--------|---|----|---|---|----|----|-----|-----|-----|
| XD | Unused | Page table physical base address | Unused | G | PS |   | Α | CD | WT | U/S | R/W | P=1 |

Available for OS (page table location on disk)

P=0

#### Each entry references a 4K child page table. Significant fields:

**P:** Child page table present in physical memory (1) or not (0).

**R/W:** Read-only or read-write access access permission for all reachable pages.

**U/S:** user or supervisor (kernel) mode access permission for all reachable pages.

**WT:** Write-through or write-back cache policy for the child page table.

**A:** Reference bit (set by MMU on reads and writes, cleared by software).

PS: Page size either 4 KB or 4 MB (defined for Level 1 PTEs only).

Page table physical base address: 40 most significant bits of physical page table address (forces page tables to be 4KB aligned)

**XD:** Disable or enable instruction fetches from all pages reachable from this PTE.



## **Core i7 Level 4 Page Table Entries**

| 63 | 62 52  | 51 12                      | 11 9   | 8 | 7 | 6 | 5 | 4  | 3  | 2   | 1   | 0   |
|----|--------|----------------------------|--------|---|---|---|---|----|----|-----|-----|-----|
| XD | Unused | Page physical base address | Unused | G |   | D | Α | CD | WT | U/S | R/W | P=1 |

Available for OS (page location on disk)

P=0

#### Each entry references a 4K child page. Significant fields:

P: Child page is present in memory (1) or not (0)

R/W: Read-only or read-write access permission for child page

**U/S:** User or supervisor mode access

**WT:** Write-through or write-back cache policy for this page

A: Reference bit (set by MMU on reads and writes, cleared by software)

**D:** Dirty bit (set by MMU on writes, cleared by software)

**Page physical base address:** 40 most significant bits of physical page address (forces pages to be 4KB aligned)

**XD:** Disable or enable instruction fetches from this page.





#### **Core i7 Page Table Translation**





## **Cute Trick for Speeding Up L1 Access**



#### Observation

- Bits that determine CI identical in virtual and physical address
- Can index into cache while address translation taking place
- Generally we hit in TLB, so PPN bits (CT bits) available next
- "Virtually indexed, physically tagged"
- Cache carefully sized to make this possible

浙江大学计算机学院 62



## Virtual Address Space of a Linux Process



# Linux Organizes VM as Collection of "Areas"





# **Linux Page Fault Handling**



Segmentation fault:

accessing a non-existing page

Normal page fault

#### **Protection exception:**

e.g., violating permission by writing to a read-only page (Linux reports as Segmentation fault)

浙江大学计算机学院 65



## **Today**

- Simple memory system example
- Case study: Core i7/Linux memory system
- Memory mapping





## **Memory Mapping**

- VM areas initialized by associating them with disk objects.
  - Process is known as memory mapping.
- Area can be backed by (i.e., get its initial values from) :
  - Regular file on disk (e.g., an executable object file)
    - Initial page bytes come from a section of a file
  - Anonymous file (e.g., nothing)
    - First fault will allocate a physical page full of 0's (demand-zero page)
    - Once the page is written to (dirtied), it is like any other page
- Dirty pages are copied back and forth between memory and a special swap file.



# **Sharing Revisited: Shared Objects**



Process 1 maps the shared object.



# **Sharing Revisited: Shared Objects**



- **Process 2 maps** the shared object.
- Notice how the virtual addresses can be different.



# **Sharing Revisited: Private Copy-on-write (COW) Objects**



- Two processes mapping a private copy-on-write (COW) object.
- Area flagged as private copy-onwrite
- PTEs in private areas are flagged as read-only



# **Sharing Revisited: Private Copy-on-write (COW) Objects**



- **Instruction writing** to private page triggers protection fault.
- **Handler creates** new R/W page.
- Instruction restarts upon handler return.
- **Copying deferred** as long as possible!



#### The fork Function Revisited

- VM and memory mapping explain how fork provides private address space for each process.
- To create virtual address for new new process
  - Create exact copies of current mm\_struct, vm\_area\_struct, and page tables.
  - Flag each page in both processes as read-only
  - Flag each vm\_area\_struct in both processes as private COW
- On return, each process has exact copy of virtual memory
- Subsequent writes create new pages using COW mechanism.



**73** 

#### The execve Function Revisited



- To load and run a new program a.out in the current process using execve:
- Free vm\_area\_struct's and page tables for old areas
- Create vm\_area\_struct's and page tables for new areas
  - Programs and initialized data backed by object files.
  - .bss and stack backed by anonymous files.
- Set PC to entry point in . text
  - Linux will fault in code and data pages as needed.

通程 09:

浙江大学计算机学院



#### **User-Level Memory Mapping**

- Map len bytes starting at offset offset of the file specified by file description fd, preferably at address start
  - start: may be 0 for "pick an address"
  - prot: PROT\_READ, PROT\_WRITE, ...
  - flags: MAP\_ANON, MAP\_PRIVATE, MAP\_SHARED, ...
- Return a pointer to start of mapped area (may not be start)



## **User-Level Memory Mapping**

```
void *mmap(void *start, int len,
           int prot, int flags, int fd, int offset)
```



Disk file specified by file descriptor fd

**Process virtual memory** 



# Example: Using mmap to Copy Files

Copying a file to stdout without transferring data to user space.

```
#include "csapp.h"
void mmapcopy (int fd, int size)
   /* Ptr to memory mapped area */
    char *bufp:
    bufp = Mmap(NULL, size,
                PROT READ,
                MAP PRIVATE,
                fd, 0);
    Write(1, bufp, size):
   return:
                             mmapcopy.c
```

```
/* mmapcopy driver */
int main(int argc, char **argv)
    struct stat stat:
   int fd:
   /* Check for required cmd line arg */
    if (argc != 2) {
       printf("usage: %s <filename>\n",
               argv[0]):
       exit(0):
   /* Copy input file to stdout */
   fd = Open(argv[1], O RDONLY, 0);
   Fstat(fd, &stat):
   mmapcopy(fd, stat.st size);
    exit(0):
                                    mmapcopy.c
```