#### SHARED MEMORY SYSTEMS

Mahdi Nazm Bojnordi

**Assistant Professor** 

School of Computing

University of Utah



#### Overview

- □ Shared memory systems
  - Inconsistent vs. consistent data
- Cache coherence with write back policy
  - MSI protocol
  - MESI protocol
- □ Memory consistency
  - Sequential consistency

### Recall: Shared Memory Systems

- Multiple threads employ a shared memory system
  - Easy for programmers
- Complex synchronization mechanisms are required
  - Cache coherence
    - All the processors see the same data for a particular memory address as they should have if there were no caches in the system
    - e.g., snoopy protocol with write-through, write no-allocate
      - Inefficient
  - Memory consistency
    - All memory instructions appear to execute in the program order
    - e.g., sequential consistency

### Snooping with Writeback Policy

- Problem: writes are not propagated to memory until eviction
  - Cache data maybe different from main memory
- Solution: identify the owner of the most recently updated replica
  - Every data may have only one owner at any time
  - Only the owner can update the replica
  - Multiple readers can share the data
    - No one can write without gaining ownership first

#### Modified-Shared-Invalid Protocol

- Every cache block transitions among three states
  - Invalid: no replica in the cache
  - Shared: a read-only copy in the cache
    - Multiple units may have the same copy
  - Modified: a writable copy of the data in the cache
    - The replica has been updated
    - The cache has the only valid copy of the data block
- Processor actions
  - Load, store, evict
- Bus messages
  - BusRd, BusRdX, BusInv, BusWB, BusReply























#### Modified, Exclusive, Shared, Invalid

- Also known as Illinois protocol
  - Employed by real processors
  - A cache may have an exclusive copy of the data
  - The exclusive copy may be copied between caches
- □ Pros
  - No invalidation traffic on write-hits in the E state
  - Lower overheads in sequential applications
- □ Cons
  - More complex protocol
  - Longer memory latency due to the protocol

#### Alternatives to Snoopy Protocols

- Problem: snooping based protocols are not scalable
  - Shared bus bandwidth is limited
  - Every node broadcasts messages and monitors the bus
- Solution: limit the traffic using directory structures
  - Home directory keeps track of sharers of each block



#### Memory Consistency Model

- Memory operations are reordered to improve performance
- A memory consistency model for a shared address space specifies constraints on the order in which memory operations must appear to be performed with respect to one another.

Initially A = flag = 0

| P1        | P2                |
|-----------|-------------------|
| A=1;      | while (flag==0);  |
| flag = 1; | printf ("%d", A); |

What is the expected output of this application?

#### Memory Consistency

- □ Recall: load-store queue architecture
  - Check availability of operands
  - Compute the effective address
  - Send the request to memory if no memory hazards

```
Initially A = flag = 0

P1

P2

(2) A=1; while (flag==0); printf ("%d", A);
```

#### Dekker's Algorithm Example

- Critical region with mutually exclusive access
  - Any time, one process is allowed to be in the region
- Reordering in load-store queue may result in failure

```
P1
P2

(2) LOCK_A: A = 1;
(1) if (B!= 0) {
    A = 0;
    goto LOCK_A;
    }

// ...
    A = 0;
    B = 0;
    goto LOCK_B: B = 1;

// ...
    B = 0;
    goto LOCK_B;
}
```

## Sequential Consistency

- □ 1. within a program, program order is preserved
- 2. each instruction executes atomically
- 3. instructions from different threads can be interleaved arbitrarily

| P1 | P2 |  |
|----|----|--|
| a  | Α  |  |
| b  | В  |  |
| С  | C  |  |
| d  | D  |  |

- 1. abAcBCDdeE
- aAbBcCdDeE
- 3. ABCDEabcde



**Bad Performance!** 

## Relaxed Consistency Model

- Real processors do not implement sequential consistency
  - Not all instructions need to be executed in program order
  - e.g., a read can bypass earlier writes
- A fence instruction can be used to enforce ordering among memory instructions
  - e.g., Dekker's algorithm with fence



```
LOCK_A: A = 1;

fence;

if (B != 0) {
```

#### **P2**

LOCK\_B: **B** = 1; fence; if (A!= 0) { **B** = 0; goto LOCK\_B;



### Fence Example

```
P2
     P1
 Region of code
                             Region of code
 with no races
                             with no races
Fence
                           Fence
Acquire_lock
                           Acquire_lock
Fence
                           Fence
  Racy code
                            Racy code
Fence
                          Fence
Release_lock
                          Release_lock
Fence
                          Fence
```