# Operating Systems Lecture 12: Paging + TLB

Nipun Batra Aug 28, 2018

#### CS stories

https://www.youtube.com/watch?v=kTn56jJW4zY

#### Revision

- 1. Segmentation
  - 1. Registers containing:
    - 1. Start VA
    - 2. Bounds
    - 3. .... (think Stack)
    - 4. ... (save memory using identical code segment)
  - 2. Segment = (VA & SEG\_MASK) >>SEG\_SHIFT
  - 3. Offset = VA & OFF\_Mask
  - 4. Segmentation cons:
    - Requires \_\_\_\_ block of memory for each segment
      - 1. Can lead to \_\_\_\_ and \_\_\_\_

#### Revision

- 1.Large contiguous memory causes problems
  - 1. What happens if we map every byte of VA to a byte of PA?
    - 1.Reduces fragmentation?
      - 1.External?
      - 2.Internal?
    - 2. How much space needed per-process to store mapping?
- 2.Middle ground?

## Revision: Paging



#### Example

movl 21, %eax





#### Address Translation Summary



#### Page Table Storage

- Let's consider 32 bit address space
- 32 bit address space with 4 KB pages
- 4 KB pages -> \_\_\_\_ bits?
  - 12 bits Offset
- Remaining bits = 32 12 = 20
  - 20 bit VPN
  - # pages =  $2^2$
  - # translations required = \_\_\_\_\_
    - 2^20
- 4 bytes per translation -> 4 \* 2^20 MB = 4 MB/ process

#### Page Size Tradeoffs?

- Small size
  - More # of translations
    - More memory overhead/process
    - Less chances of fragmentation
- Large size
  - Less # of translations
    - Less memory overhead/process
    - More chances of fragmentation

## Page Table Storage



#### What else is in the Page Table?

- Protection bit : Read/Write/Execute?
- Present bit: On Memory or HDD/SSD?
- Reference bit: Is the page popular/being referenced?
  - Else?
- Valid bit: Is translation valid?
- Dirty bit: Modified since brought to memory?

```
int array[1000];
...
for (i = 0; i < 1000; i++)
    array[i] = 0;</pre>
```

```
1024 movl $0x0, (%edi,%eax,4) ← *(EDI + 4*EAX) = 0
1028 incl %eax
1032 cmpl $0x03e8,%eax
1036 jne 0x1024
```

```
int array[1000];
...
for (i = 0; i < 1000; i++)
    array[i] = 0;</pre>
```

Address of array[0]

```
1024 movl $0x0, (%edi,%eax,4) *(EDI + 4*EAX) = 0
1028 incl %eax
1032 cmpl $0x03e8,%eax
1036 jne 0x1024
```

```
int array[1000];
...
for (i = 0; i < 1000; i++)
    array[i] = 0;</pre>
```

```
1024 movl $0x0, (%edi,%eax,4) ← *(EDI + 4*EAX) = 0
1028 incl %eax
1032 cmpl $0x03e8,%eax
1036 jne 0x1024
```

```
int array[1000];
...
for (i = 0; i < 1000; i++)
    array[i] = 0;</pre>
```



```
int array[1000];
...
for (i = 0; i < 1000; i++)
    array[i] = 0;</pre>
```



```
int array[1000];
...
for (i = 0; i < 1000; i++)
    array[i] = 0;</pre>
```





























#### Example Summary

- 1. Extract VPN (virt page num) from VA (virt addr)
- 2. Calculate addr of PTE (page table entry)
- 3. Read PTE from memory
- 4. Extract PFN (page frame num) SLOW!
- 5. Build PA (phys addr)
- 6. Read contents of PA from memory into register

## Caching Makes Sense!

Factorial with and without memoization

#### Caching - Translation Lookaside Buffer (TLB)

- 1. Get the VPN from VA
- 2. Check if TLB has VA
- 3. If found, it is a TLB Hit. Yay!
  - Extract the PFN from TLB (PFN = TLB[VPN])
  - Generate PA from PFN (Add offset)
  - Access memory assuming protection checks work
- 4. If not found, it is a TLB Miss. :(
  - Access Page table to find the translation
  - Add translation to TLB (TLB[VPN] = PFN)
  - Goto Step 2

#### FETCH VA 1024



#### Get VPN for VA 1024. VPN = 1



#### LOOK IN TLB for VPN = 1. Not found. TLB Miss!



#### Find PFN for VPN = 1 by accessing Page Table



#### Add entry to TLB



#### Search for translation of VPN = 1 on TLB



#### Goto PFN 4 and create PA by adding offset



#### READ INSTRUCTION at PA(1024)



#### READ INSTRUCTION at PA(1024)



#### Find EDI + 4\*EAX -> VA = 40000



#### Find VPN for VA 40000. VPN = 39



#### Check TLB for VPN = 39. Miss!



#### Get PFN for VPN = 39 from Page Table



#### Store translation in TLB



#### Find PFN of VPN = 39 from TLB. Add offset to get PA.



#### FIND PA FOR VA = 1028



#### VPN = 1. Find Translation in TLB for VPN = 1. Found!



#### PFN = TLB[1] = 4



#### Get PA by adding offset to PFN = 4 and execute



#### 1032, 1036, 1024, 1028,......



EDI + 4\*EAX = 40000 + 4\*(1024/4) = 40000 + 1K -> VPN = 40



#### TLB miss for VPN = 40...



### Spatial and Temporal Locality

- 1. Hit rate = TLB Hit/(TLB Hit + TLB Miss)
- 2. Spatial locality -> TLB has good hit rate
  - 1. Arrays elements are spatially close (EDX + 4\*EAX)
  - 2. Instructions are spatially close (1024, ...)
- 3. Temporal locality -> TLB has a good hit rate
  - Loop. Re-using same instructions which exist in TLB

## Memory Cycle Rate Example

- 1. Hit = 1 clock cycle
- 2. Miss = 30 clock cycles
- 3. Miss rate = 1%
- 4. Cycle rate = .99\*1 + .01\*(30 + 1) = 1.3 cycles

#### Context Switch

**TLB** 

P1 running

VPN14397

P2 running

#### Context Switch

**TLB** 

P1 running

P2 running

| VPN | PFN |
|-----|-----|
| 1   | 4   |
| 39  | 7   |
|     |     |
| 1   | 30  |

What will VPN 1 be mapped to?