# COL380

# Introduction to Parallel & Distributed Programming

# Sequential Consistency

"A multiprocessor is sequentially consistent if the result of any execution is the same as if the operations of all the processors were executed in some sequential order, and the operations of each individual processor appear in this sequence in the order specified by its program." [Lamport, 1979]

Weaker than Linearizability
Does not depend on global time
(Still not efficient to guarantee.)

Read X (5)

Read X (3)

time@A Thread A: 
$$X = 5$$

not related to
time@B Thread B:

- No global notion of time
  - Only consistent Order



- No global notion of time
  - → Only consistent Order

Thread A: EnQ(5)

Thread B:

EnQ(3)

- DeQ is 3
- DeQ is 5

- No global notion of time
  - → Only consistent Order

Thread A:  $A_1 \longrightarrow A_2$ 

Thread B:  $B_1 \longrightarrow B_2$ 

Thread C:  $C_1 \longrightarrow C_2 \longrightarrow C_3$ 

Sequential History

time@B

$$A_1 \rightarrow A_2$$
  $C_1 \rightarrow C_2 \rightarrow C_3$   $B_1 \rightarrow B_2$ 

$$A_1$$
  $C_1$   $A_2$   $B_1 \rightarrow B_2$   $C_2 \rightarrow C_3$ 

DeQ is 5

Thread A: EnQ(5) not related to

Thread B:

- No global notion of time
  - → Only consistent Order



DeQ is 3

[EnQ(3)]

Sequential History

time@B

$$A_1 \rightarrow A_2 \quad C_1 \rightarrow C_2 \rightarrow C_3 \quad B_1 \rightarrow B_2$$

$$A_1 \quad C_1 \quad A_2 \rightarrow B_1 \rightarrow B_2 \rightarrow C_2 \rightarrow C_3$$

DeQ is 5

Thread A: EnQ(5)
not related to
Thread B:

time@B

- No global notion of time
  - Only consistent Order



DeQ is 3

EnQ(3)



DeQ is 5

Thread A: EnQ(5)

not related to

Thread B:

time@B

- No global notion of time
  - → Only consistent Order



DeQ is 3

EnQ(3)



DeQ is 5

Thread A: EnQ(5)
not related to
Thread B:

time@B

- No global notion of time
  - → Only consistent Order



DeQ is 3

EnQ(3)





# Applying Sequential Consistency

- Threads always see values written by some thread
  - → No garbage (update is atomic)
- The value seen is constrained by thread-order
  - → for every thread

```
initially: ready=0, data=0

thread P

thread C

data = 10; while(!ready);
ready = 1; pvt = data;
```

# Applying Sequential Consistency

- Threads always see values written by some thread
  - → No garbage (update is atomic)
- The value seen is constrained by thread-order
  - → for every thread

Show: If C sees the updated ready (=1), C must also see the updated data (=10)

# Applying Sequential Consistency

- Threads always see values written by some thread
  - → No garbage (update is atomic)
- The value seen is constrained by thread-order
  - → for every thread





| If C sees ready = | then C sees data = |  |
|-------------------|--------------------|--|
| 0                 | 0 or 10            |  |
| 1                 | 10                 |  |

Show: If C sees the updated ready (=1), C must also see the updated data (=10)

Initially: 
$$X = 0$$
;  $Y = 0$ ;

#### Thread A

$$I = Y$$

$$r = X$$

$$[3]$$

$$\downarrow$$

$$[0]$$

#### Thread B

Initially: 
$$X = 0$$
;  $Y = 0$ ;



Initially: 
$$X = 0$$
;  $Y = 0$ ;



Initially: 
$$X = 0$$
;  $Y = 0$ ;



### Memory inconsistency

- Consistency is about global state (not per-variable)
  - → Program must not assume higher consistency than available
- Only local memory dependencies visible to compiler/architecture
  - Can allocate a register or stack entry for some shared variable
  - → Batching of memory transactions
  - → Network can also reorder two memory messages

### Memory inconsistency

- Consistency is about global state (not per-variable)
  - → Program must not assume higher consistency than available
- Only local memory dependencies visible to compiler/architecture
  - → Can allocate a register or stack er  $\chi=1$ ;  $\gamma=1$ ;  $\chi=2$ ;
  - → Batching of memory transactions
  - → Network can also reorder two mer
- → Second write to X may happen before Y's
- → 1st write may never happen

#### Memory inconsistency

- Consistency is about global state (not per-variable)
  - → Program must not assume higher consistency than available
- Only local memory dependencies visible to compiler/architecture
  - Can allocate a register or stack entry for some shared variable
  - → Batching of memory transactions
  - → Network can also reorder two memory messages
- ★ Solutions: Handle inconsistency, force synchronization

# Example: Not SC



# SC Inefficiency

#### Hard to implement efficiently

- → Need to enforce serialization of operations
- → No re-ordering of instructions allowed

| thread A | thread B | thread C                 | thread D             |
|----------|----------|--------------------------|----------------------|
| x = a    | x = b    | y1 = x (b)<br>y2 = x (a) | z1=x (a)<br>z2=x (b) |

#### Some solutions:

- → Allow out-of-order execution, Detect and recover from SC violation
- Only enforce ordering when required
  - programmer enforced

# Relaxing SC

initially: ready=0, data=0

thread P

thread C

data1 = 1

data2 = 1

Say OK

wait for OK

sav1 = data1

sav2 = data2

# Weak Consistency

- Special synchronization accesses are sequentially consistent
- Regular accesses ordered only with respect to synchronization accesses
  - ▶ Before any regular read/write is allowed to be visible to any other thread, all previous synchronization accesses must become visible
  - Before a synchronization access is allowed to complete, all previous ordinary read/write accesses must be completed
- Suitable for many optimizations



### Weak Consistency

- Special synchronization accesses are sequentially consistent
- Regular accesses ordered only with respect to synchronization accesses
  - ▶ Before any regular read/write is allowed to be visible to any other thread, all previous synchronization accesses must become visible
  - Before a synchronization access is allowed to complete, all previous ordinary read/write accesses must be completed
- Suitable for many optimizations

 $\Rightarrow x \text{ before } y$ (wrt every thread)



```
      flagA = flagB = 0

      Thread A
      Thread B

      flagA = 1;
      flagB = 1;

      #pragma omp flush
      #pragma omp flush

      if (flagB == 0) {
      flush-B

      shared ++;
      shared++;

      }
```

```
flagA = flagB = 0
  Thread A
                                    Thread B
                                                                 flagA = 1
                                                                             flagB = 1
flagA = 1;
                                    flagB = 1;
                                                                 flush-A
                                                                              flush-B
#pragma omp flush
                                    #pragma omp flush
if (flagB == 0)
                                   if (flagA == 0)
                                                                              flush-A
                                                                  flush-B
  shared ++;
                                      shared++;
                                                                 flagA == 0? flagB == 0?
```

```
flagA = flagB = 0
                                                                     flagB = 1
  Thread A
                                     Thread B
                                                                    flagA = 1
                                                                                flagB = 1
flagA = 1;
                                     flagB = 1;
                                                                    flush-A
                                                                                 flush-B
                                     #pragma omp flush
#pragma omp flush
if (flagB == 0)
                                     if (flagA == 0)
                                                                                 flush-A
                                                                    flush-B
                                        shared++;
  shared ++;
                                                                    flagA == 0? flagB == 0? flagB == 0?
```

```
flagA = flagB = 0
                                                                flagB = 1
                                   Thread B
 Thread A
                                                               flagA = 1
                                                                          flagB = 1
flagA = 1;
                                   flagB = 1;
                                                               flush-A
                                                                          flush-B
                                                               flagB == 0?
                                  #pragma omp flush
#pragma omp flush
if (flagB == 0)
                                  if (flagA == 0)
                                                                           flush-A
                                                               flush-B
                                     shared++;
  shared ++;
                                                               flagA == 0? flagB == 0?
```

```
Thread A

flagA = 1;

#pragma omp flush

if (flagB == 0) {

    shared ++;
}
```

flagA = flagB = 0

```
Thread B
```

```
flagB = 1;
#pragma omp flush
if (flagA == 0) {
    shared++;
}
```

```
flagB = 1
    flagA = 1
                flagB = 1
    flush-A
                flush-B
   flagB == 0?
                flush-A
    flush-B
    flagA == 0? flagB == 0?
∀ admissible orders
flagA=1 \rightarrow flagA==0?
```

```
flagA = flagB = 0

Thread A

Thread B

flagA = 1;

#pragma omp flush

if (flagB == 0) {
    shared ++;
    }

flagA = 0;

#pragma omp flush

#pragma omp flush

flagB = 0;

#pragma omp flush

#pragma omp flush
```

```
flagB = 1
    flagA = 1
                flagB = 1
    flush-A
                flush-B
   flagB == 0?
                flush-A
    flush-B
    flagA == 0? flagB == 0?
∀ admissible orders
flagA=1 \rightarrow flagA==0?
```

```
flagA = flagB = 0

Thread A

Thread B

flagA = 1;

#pragma omp flush

if (flagB == 0) {

shared ++; ← mutual exclusion → shared++;
}

flagA = 0;

#pragma omp flush

flagB = 0;

#pragma omp flush
```

```
flagB = 1
flagA = 1
flush-A
flush-B
flush-B
flush-B
flagA == 0?
flagB == 0?
```

```
flagA = flagB = 0

Thread A

Thread B

flagA = 1;

#pragma omp flush (flagA, flagB)

if (flagB == 0) {

    shared ++;

}

Thread B

flagB = 1;

#pragma omp flush (flagA, flagB)

if (flagA == 0) {

    shared++;

}
```

Access other variables

```
flagA = flagB = 0
```

#### Thread A

```
flagA = 1;
#pragma omp flush (flagA, flagB)
if (flagB == 0) {
    shared ++;
}
```

Access other variables

#### Thread B

```
flagB = 1;
#pragma omp flush (flagA, flagB)
if (flagA == 0) {
    shared++;
}
```

```
flagA = flagB = 0
```

#### Thread A

```
flagA = 1;
#pragma omp flush (flagA, flagB)
if (flagB == 0) {
    shared ++;
}
```

Access other variables

#### Thread B

Implicit Flush for all

synchronization operations

#pragma omp atomic write

var = expr();

```
flagA = flagB = 0
```

#### Thread A

```
flagA = 1;
#pragma omp flush (flagA, flagB)
if (flagB == 0) {
    shared ++;
}
```

Access other variables

#### Thread B

# Operations before Release flush must appear before (Completes) Operations after Acquire flush must appear after (Initiates)

# OpenMP Flush one-sided

would work)

```
flagA = flagB = 0
                                                                             or
                                     Thread B
 Thread A
                                                                   flagA = 1
                                                                               flagB = 1
                                    flagB = 1;
flagA = 1;
                                                                   flush-A
                                    #pragma omp flush acquire
#pragma omp flush release
                                                                                flush-B
                                    if (flagA == 0)
if (flagB == 0)
                                                                                flush-A
                                                                   flush-B
                                       shared++;
  shared ++;
                                                                   flagA == 0? flagB == 0?
                                                                (only this order
```

```
int data, flag = 0;
                                         Thread C
    Thread P
                                    // Busy-wait until flag is signalled
// Produce data
data = 42;
                                    while (flag != 1) {
// Set flag to signal Thread 1
flag = 1;
                                    // Consume data
                                    printf(data=%d\n", data);
```

```
int data, flag = 0;
                                            Thread C
     Thread P
                                      // Busy-wait until flag is signalled
 // Produce data
data = 42;
                                      while (flag != 1) {
// Set flag to signal Thread 1 flag = 1;
                                        Consume data
                                      printf(data=%d\n", data);
```

```
int data, flag = 0;
                                         Thread C
    Thread P
                                   // Busy-wait until flag is signalled
 Produce data
                                   #pragma omp flush(flag)
data = 42;
                                   while (flag != 1) {
                                      #pragma omp flush(flag)
// Set flag to signal Thread 1
flag = 1; -
                                    // Consume data
// Flush
                                    printf(data=%d\n", data);
#pragma omp flush(flag)
```

```
int data, flag = 0;
                                        Thread C
    Thread P
                                   // Busy-wait until flag is signalled
 Produce data
                                   #pragma omp flush(flag)
                                                              consume
data = 42;
                                   while (flag != 1) {
                                      #pragma omp flush(flag)
// Set flag to signal Thread 1
flag = 1;
                                     Consume data
// Flush
#pragma omp nusn(flag)
                                   printf(data=%d\n", data);
```

```
int data, flag = 0;
                                          Thread C
      Thread P
                                     // Busy-wait until flag is signalled
    Produce data
                                     #pragma omp flush(flag)
  data = 42;
                                     while (flag != 1) { R
   // Flush
                                        #pragma omp flush(flag, data) F4
F1 #pragma omp flush(flag, data)
  // Set flag to signal Thread 1
wflag = 1;
                                     // Consume data
  // Flush
                                      printf(data=%d\n", data);
F2 #pragma omp flush(flag)
   F2 F4 guarantees that Th.1 sees 'flag 1.'
                                                (aside: 'flag 1' → F2 → F4)
```

```
int data, flag = 0;
                                            Thread C
       Thread P
                                       // Busy-wait until flag is signalled
    Produce data
                                       #pragma omp flush(flag)
   data = 42;
                                      while (flag!= 1) { R
   // Flush
                                          #pragma omp flush(flag, data) F4
F1 #pragma omp flush(flag, data)
   // Set flag to signal Thread 1
w flag = 1;
                                        Consume data
  // Flush
                                       printf(data=%d\n", data);
F2 #pragma omp flush(flag)
   F2 F4 guarantees that Th.1 sees 'flag 1.'
                                                  (aside: 'flag 1' \Rightarrow F2 \rightarrow F4)
```

F2 must eventually finish.

```
int data, flag = 0;
                                            Thread C
       Thread P
                                       // Busy-wait until flag is signalled
    Produce data
                                       #pragma omp flush(flag)
   data = \overline{42};
                                                                         F3
                                      while (flag != 1) { R
   // Flush
                                         #pragma omp flush(flag, data) F4
F1 #pragma omp flush(flag, data)
   Set flag to signal Thread 1
w flag = 1;
                                        Consume data
  // Flush
                                       printf(data=%d\n", data);
F2 #pragma omp flush(flag)
   F2 F4 guarantees that Th.1 sees 'flag 1.'
```

'flag 1' in Th.1  $\Rightarrow$  W has started and hence F1 has happened  $\Rightarrow$  F1 R

Subodh Kumar

```
int data, flag = 0;
                                          Thread C
       Thread P
                                     // Busy-wait until flag is signalled
    Produce data
                                     #pragma omp flush(flag)
                                                                       F3
  data = 42;
                                    while (flag != 1) { R
     ∃lush
                                        #pragma omp flush(flag, data) F4
  #pragma omp flush(flag, data)
   Set flag to signal Thread 1
w flag = 1;
                                       Consume data
  // Flush
                                      printf(data=%d\n", data);
F2 #pragma omp flush(flag)
   F2 F4 guarantees that Th.1 sees 'flag 1.'
```

'flag 1' in Th.1  $\Rightarrow$  W has started and hence F1 has happened  $\Rightarrow$  F1 R

Subodh Kumar

```
int data, flag = 0;
                                          Thread C
       Thread P
                                     // Busy-wait until flag is signalled
    Produce data
                                     #pragma omp flush(flag)
                                                                       F3
  data = 42;
                                    while (flag != 1) { R
     Flush
                                        #pragma omp flush(flag, data) F4
F1 #pragma omp flush(flag, data)
   Set flag to signal Thread.
                                     #pragma omp flush(flag, data)
w flag = 1;
                                       Consume data
  // Flush
                                      printf(data=%d\n", data);
F2 #pragma omp flush(flag)
   F2 F4 guarantees that Th.1 sees 'flag 1.'
```

'flag 1' in Th.1  $\Rightarrow$  W has started and hence F1 has happened  $\Rightarrow$  F1 R F5

```
int data, flag = 0;
                                          Thread C
       Thread P
                                     // Busy-wait until flag is signalled
    Produce data
                                     #pragma omp flush(flag)
                                                                       F3
  data = 42;
                                     while (flag!= 1) { R
     Flush
                                        #pragma omp flush(flag, data) F4
F1 #pragma omp flush(flag, data)
   Set flag to signal Thread.
                                     #pragma omp flush(flag, data)
w flag = 1;
                                       Consume data
  // Flush
                                      printf(data=%d\n", data);
F2 #pragma omp flush(flag)
   F2 F4 guarantees that Th.1 sees 'flag 1.'
```

'flag 1' in Th.1  $\Rightarrow$  W has started and hence F1 has happened  $\Rightarrow$  F1 R

F5