### Semantics and invariance proof method for weakly consistent parallelism

Jade Alglave (MSR-Cambridge, UCL, UK)
Patrick Cousot (NYU, Emer. ENS, PSL)

IFIP WG 2.3

Château de Villebrumier, France October 4<sup>th</sup>, 2016

### Weakly consistent parallel programs

© I. Alglave & P. Cousot

#### Weakly consistent parallel programs

```
var x_1,...,x_m; // shared variables P_0; // prelude initializing x_1,...,x_m [P_1 \parallel P_2 \parallel ... \parallel P_n]
```

- P<sub>1</sub>, P<sub>2</sub>, ..., P<sub>n</sub> are the processes modifying the shared variables and their local registers R, ...
- The execution of a write x := E to a shared variable and the read R := x of a shared variable is not instantaneous (as in sequential consistency)

#### Example (1b, load buffer)

#### Algorithm A:

#### • Specification S<sub>inv</sub>:

at 3 
$$\wedge$$
 at 13  $\Rightarrow \neg(r1=1 \land r2=1)$ 

#### Example (Peterson)

#### Algorithm A:

```
0:{ w F1 false; w F2 false; w T 0; }

P0:

1:w[] F1 true

2:w[] T 2

3:do {i}

4: r[] R1 F2

5: r[] R2 T

6:while R1 \wedge R2 \neq 1

7:skip (* CS1 *)

8:w[] F1 false

9:

P1:

10:w[] F2 true;

11:w[] T 1;

12:do {j}

13: r[] R3 F1;

14: r[] R4 T;

15:while R3 \wedge R4 \neq 2;

16:skip (* CS2 *)

17:w[] F2 false;

18:
```

• Specification S<sub>inv</sub>:

```
1: {true} 10: {true} ... 7: {¬at{16}} 16: {¬at{7}} ... 9: {true} 18: {true}
```

G. L. Peterson. Myths about the mutual exclusion problem. *Inf. Process. Lett.*, 12(3):115–116, 1981. doi: 10.1016/0020-0190(81)90106-X. URL http://dx.doi.org/10.1016/0020-0190(81)90106-X.

#### Weak memory/consistency models

Sequential consistency:



atomic instantaneous communications

Weak memory models:



communication network

(anticipations, delays, shuffles...)

#### Read/write matching

 In the worst case a read x can read from any past or future write x of any process (including for the reading process)



#### Example (lb, incorrect)

 $\bullet$  at 3  $\wedge$  at 13  $\wedge$  r1=1  $\wedge$  r2=1

This erroneous behavior can be observed on TSO machines

#### Example: Peterson (incorrect)

Can read the wrong flags

```
0:{ w F1 false; w F2 false; w T 0; }
P0:
                        10:w[] F2 true;
1:w | F1 true
2:w[]
                        11:w[]
3:do
     r[] R1 F2
     r[] R2 T
                              r[] R4 T;
                       15:while R3 \wedge R4 \neq 2;
6:while R1 \wedge R2 \neq 1
                        16:skip (* CS2 *)
7:skip (* CS1 *)
                        17:w[] F2 false;
8:w[] F1 false
9:
```

at 6  $\wedge$  at 16:  $\neg R1 \wedge R2 = 1 \wedge \neg R3 \wedge R4 = 2$  holds

⇒ both processes simultaneously enter their critical section

#### Example: Peterson (incorrect)

Can read the wrong turns

```
0:{ w F1 false; w F2 false; w T 0; }
P0:
                        10:w[] F2 true;
                        11:w[] T 1;
2:w[] T 2
3:do
                       12:do
     r[] R1 F2
                              r[] R3 F1;
5: r[] R2 T
                       14: r[] R4 T;
6: while R1 \wedge R2 \neq 1 | 15: while R3 \wedge R4 \neq 2;
                       16:skip (* CS2 *)
7:skip (* CS1 *)
                       17:w[] F2 false;
8:w[] F1 false
9:
```

at 6  $\wedge$  at 16:  $\neg R1 \wedge R2 = 1 \wedge \neg R3 \wedge R4 = 2$  holds

⇒ both processes simultaneously enter their critical section

# A hierarchy of semantics of weakly consistent parallelism

#### Hierarchy of semantics

Hierarchy of semantic domains:



Induces a hierarchy of semantics

#### Sets of interleaved traces

- Traces: maximal finite or infinite sequence of states separated by events generated by computation and communication steps  $\longrightarrow$  global time
- States: shared memory assigning values to global variables, store buffers, ... program point of each process, assignment to local registers
- Events e: P(e) process executed, A(e): labelled action executed, X(e): shared variable involved, V(e): value involved, ...
- No restriction on who can read which write on the same shared variable!

#### Example of interleaved trace for 1b

© I. Alglave & P. Cousot

#### Sets of truly parallel execution traces

- project traces per process → local time on computations
- get rid of shared memory states using a read-from relation rf → no time on communications

$$\langle r, w \rangle \in \mathsf{rf} \iff \tau = \tau_0 \langle \nu, \ldots \rangle \xrightarrow{r} \langle \nu', \ldots \rangle \tau_1 \land \nu'(\mathsf{X}(r)) = w$$

- keep local states on process control points and values of registers
- keep computation progress information using cuts of parallel traces  $\longrightarrow$  global time

#### Example of truly parallel execution for 1b

J. Alglave & P. Cousot, Semantics and invariance proof methods for weakly consistent parallelism, WG 2.3, Villebrunier, France, 3–7 October 2016



#### Sets of histories

- Get rid of cuts  $\rightarrow$  no global time
- A processor cannot know where the others parallel processors are in their computations

#### Example of history for 1b



#### Sets of candidate executions

- Keep the set of events
- Keep the read-from relation rf
- Represent process traces  $\tau_0 \prod_{i=1} \tau_i$ , rf by
  - the set of initial writes IW in  $\tau_0$
  - the program order po

$$\langle e, e' \rangle \in \mathsf{po} \iff \tau_i = \tau_i' \xrightarrow{e} \tau_i'' \xrightarrow{e'} \tau_i'''$$

- $\rightarrow$  relational on events
- Get rid of states
  - $\rightarrow$  no values

#### Example of candidate execution for 1b



#### Auxiliary relations

- oc: between events on the same shared variable
- ext: between events on different processes
- coherence order co: between a write and the later ones on the same shared variable
- from-read fr: between a read reading from a write and the later writes to the same shared variable

$$fr = rf^{-1}; co$$

#### Auxiliary relations



#### co in cat

"co.cat" let fold f =let rec fold\_rec (es,y) = match es with || {} -> y  $| | e ++ es -> fold_rec (es, f(e,y))$ end in fold rec let map  $f = \text{fun es} \rightarrow \text{fold (fun (e,y)} \rightarrow \text{f e} + + \text{y) (es, {}})$ let rec cross S = match S with || {} -> { 0 } || S1 ++ S -> let yss = cross S infold  $(\text{fun } (\text{el,r}) \rightarrow \text{map } (\text{fun } \text{t} \rightarrow \text{el} \mid \text{t}) \text{ yss } \mid \text{r})$  $(S1,\{\})$  end let  $co0 = loc & (IW * (W\setminus IW))$ let makeCo(s) = linearisations(s,co0)let same-loc-writes = loc & (W\*W) let allCoL = map makeCo (classes (same-loc-writes)) let allCo = cross allCoL with co from allCo

Example of specification of weakly consistent parallelism in the semantic hierarchy: sequential consistency

© I. Alglave & P. Cousot

#### Sequential consistency

 Interleaved semantics: a read can only read from the last past write

#### Example: sequential consistency for 1b

 Parallel executions with cuts: a read can read only the <u>last</u> write <u>before</u> its cut

• lb:





#### Example: sequential consistency for 1b

- Parallel histories: abstract to candidate execution and check it is allowed
- Candidate executions: irreflexive po; rf; po; rf



## Analytic semantics of weakly consistent parallelism

#### Analytic semantics

- Anarchic semantics: all possible executions with cuts/ histories with no restriction on rf (any read can read any value from any write to the same shared variable)
- Communication consistency: requirements on rf specified on an abstraction to a candidate execution
- Analytic semantics: all executions with cuts/histories which rf satisfies the consistency requirements

#### Example of anarchic semantics: LB



J. Alglave and L. Maranget. herd7. virginia.cs.ucl.ac.uk/herd, 31 Aug. 2015.

### Example of communication specification in the cat language for LB

#### Rejects only the anarchic execution:



J. Alglave. A Shared Memory Poetics. PhD thesis, Université Paris 7, 2010.

J. Alglave, P. Cousot, and L. Maranget. Syntax and semantics of the cat language. *HSA Foundation*, Version 1.1:38 p., 16 Oct 2015b. URL http://www.hsafoundation.com/?ddownload=5382.

#### Examples of architecture specification

SC (sequential consistency):

```
let co = (IW*W) & loc
let fr = (rf^-1;co)
acyclic po | rf | co | fr as sc
```

TSO:

```
let co = (IW*W) & loc
let fr = (rf^-1;co)
let po-loc = po & loc
acyclic po-loc | rf | co | fr as scpv
let ppo = po \ (W*R)
let rfe = rf & ext
acyclic ppo | rfe | co | fr as tso
```

For lb:

```
acyclic (po | rf) as lb
```

 $sc \Rightarrow lb$ ,  $tso \not\Rightarrow lb$ 

#### Fence specification:

• In Lisa:

Implementation with dependencies and fences in TSO:

#### cat

- Handles one history at a time
- For each execution relies on:
  - the set E (\_) of events of the execution (partitionned into initial writes IW, writes W, read R, fences F, ...
  - the program order po of events per process
  - the read-from relation rf per variable
- Has predefined relations loc, ext,...
- Can define new relations e.g.  $*,;, |, \&, \backslash, +, ^-1,...$
- Accepts/eliminates the execution by defining relations r and checking irreflexive r, acyclic r, empty r, not empty r

#### ARM in cat

```
let fr = rf^-1;co
acyclic po-loc | rf | co | fr as scpv
let deps = addr | data
let rdw = po-loc & (fre;rfe)
let detour = po-loc & (coe; rfe)
let ii0 = deps | rfi | rdw
let ic0 = 0
let ci0 = ctrlcfence(ISB) | detour
let cc0 = deps | ctrl | (addr;po)
let rec ii = ii0 | ci | (ic;ci) | (ii;ii)
and ic = ic0 \mid ii \mid cc \mid (ic;cc) \mid (ii;ic)
and ci = ci0 \mid (ci;ii) \mid (cc;ci)
and cc = cc0 \mid ci \mid (ci;ic) \mid (cc;cc)
let ppo = ii & R*R | ic & R*W
let dmb = fencerel(DMB)
let dsb = fencerel(DSB)
let fences = dmb \mid dsb
let A-cumul = rfe; fences
let hb = ppo | fences | rfe
acyclic hb as no-thin-air
let prop-base = (fences | A-cumul);hb*
let prop = (prop-base & W*W)| (com*; prop-base*; fences; hb*)
irreflexive fre;prop;hb* as observation
acyclic co | prop as propagation
```

# Invariance proof method for weakly consistent parallelism

#### Difficulties

- There is no longer a notion of instantaneous value of the shared variables:
  - ⇒ pythia variables (denoting values of variables when read)
  - ⇒ communications rf (keeping track of which writes events the pythia variables take there values from)
  - ⇒ stamps (keeping track of events to distinguish different instruction executions)

#### Difficulties

- We have to make hypotheses on how communications do happen:
  - $\Rightarrow$  communication specification  $S_{com}$
- We have to show that the communication specification is correctly implemented on an architecture:
  - $\Rightarrow$  a way to mix invariant  $S_{com}$  and cat specifications

## Methodology



## Invariant



#### Pythia variables

Unique name given to communicated values during execution (using stamps)

```
0:{ w F1 false; w F2 false; w T 0; }

P0:

1:w[] F1 true

2:w[] T 2

3:repeat {i}

4: r[] R1 F2 {\rightarrow F2\(^i_4\)}

5: r[] R2 T {\rightarrow T\(^i_5\)}

6:until \negR1 \lor R2 = 1 {iend}

7:skip (* CS1 *)

8:w[] F1 false

9: | P1:

10:w[] F2 true;

11:w[] T 1;

12:repeat {j}

13: r[] R3 F1; {\rightarrow F1\(^i_{13}\)}

14: r[] R4 T; {\rightarrow T\(^i_{14}\)}

15:until \negR3 \lor R4 = 2; {jend}

16:skip (* CS2 *)

17:w[] F2 false;

18:
```

Stamp: label, counter

Pythia variables

#### Invariance abstraction



#### Invariance abstraction

$$\begin{split} &\alpha_{a}(\{\pi^{i} \mid i \in \Delta\}) \triangleq \prod_{p \in \mathbb{P} \widehat{i}} \bigcup_{\ell \in \mathbb{L}(p)} \bigcup_{i \in \Delta} \{\langle \kappa_{0,k_{0}}^{i}, \theta_{0,k_{0}}^{i}, \rho_{0,k_{0}}^{i}, \nu_{0,k_{0}}^{i}, \dots, \\ &\nu_{p-1,k_{p-1}}^{i}, \theta_{p,k_{p}}^{i}, \rho_{p,k_{p}}^{i}, \nu_{p,k_{p}}^{i}, \kappa_{p+1,k_{p+1}}^{i}, \dots, \kappa_{n-1,k_{n-1}}^{i}, \theta_{n-1,k_{n-1}}^{i}, \\ &\rho_{n-1,k_{n-1}}^{i}, \nu_{n-1,k_{n-1}}^{i}, \operatorname{rf}^{i} \rangle \mid \forall q \in [0, n[ \setminus \{p\} : \underline{\tau_{q}^{i}}_{k_{q}} = \\ & \mathfrak{s} \langle \kappa_{q,k_{q}}^{i}, \ \theta_{q,k_{q}}^{i}, \ \rho_{q,k_{q}}^{i}, \ \nu_{q,k_{q}}^{i} \rangle \wedge \underline{\tau_{p}^{i}}_{k_{p}} = \mathfrak{s} \langle \ell, \ \theta_{p,k_{p}}^{i}, \ \rho_{p,k_{p}}^{i}, \ \nu_{p,k_{p}}^{i} \rangle \}. \end{split}$$

#### Invariant

- An invariant  $S_{inv}(p)$  at point p of process  $P_i$  is a statement relating
  - the program points p<sub>1</sub>,...,p<sub>i-1</sub>, p<sub>i+1</sub>,..., p<sub>m</sub> of the other processes
  - the pythia variables (forbidden to mention of shared variables)
  - the local registers of all processes
  - the communications (rf)

which always holds when at the cut where execution reaches point p of process Pi and the other processes are at  $p_1,...,p_{i-1},p_{i+1},...,p_m$ 

```
0: { w F1 false; w F2 false; w T 0; }
                                                                                                                  \{F1=false \land F2=false \land T=0\}
                                                                                                                1: \{R1=0 \land R2=0\}
                                                                                                                                                              w[] F1 true
                                                                                                                2: \{R1=0 \land R2=0\}
3: \{R1=0 \land R2=0\}
do \{i\}

4: \{(i=0 \land R1=0 \land R2=0) \lor (i>0 \land R1=F2_i^{i-1} \land R2=T_5^{i-1})\}
r[] R1 F2 \{ \leadsto F2_i^{i} \}

5: \{R1=F2_i^{i} \land (i=0 \land R2=0) \lor (i>0 \land R2=T_5^{i-1})\}
r[] R2 T \{ \leadsto T_5^{i} \}
r[] R4 T; \{ \leadsto T_{14}^{j} \}
6: \{R1=F2_4^{i} \land R2=T_5^{i} \}
r[] R4 T; \{ \leadsto T_{14}^{j} \}
r[] R4 T
                                                                                                                                                               w[] T 2
```

```
10: \{R3=0 \land R4=0\}
13: \{(j=0 \land R3=0 \land R4=0) \lor (j>0 \land R3=F1_{13}^{j-1} \land R4=T_{14}^{j-1})\}
```

(these invariants are for the anarchic semantics, so all communications are possible, no constraints on rf)

## Invariance proof



#### Verification conditions

- Sequential proof
- Absence of interference proof
- Communication proof

#### **Examples:**

```
• { P(R, ..., rf) \land \langle w(x, v), r(\theta, x) \rangle \in rf } read x R \{ \rightarrow x_{\theta} \} communication { P[R \leftarrow x_{\theta}, x_{\theta} \leftarrow v, ..., rf]}
```

- { P } fence { P } (fences are markers in the execution)
- { P } write R x { P } (a write has no local effect)

#### Communication proof

- The communications rf must be checked to be well-formed (none allowed by  $H_{cm}$  should miss, see later)
- If  $\langle w(P, p, \theta, x, v), r(P', p', \theta', x, x_{\theta'}) \rangle \in rf$  then:
  - The read instruction of at point p' process P' must read from an initial or a reachable write
  - A read event (for a given stamp  $\theta$ ) must read from a unique write event with the same variable x
  - The value assigned to the read pythia variable  $x_{\theta}$  must be that of v the matching write

# Communication Specification $S_{com}$



#### Communication specification

- The algorithm A is often incorrect for the anarchic semantics
- The allowable communications are specified by a communication specification  $S_{com}$  (i.e. an invariant constraining the allowed communications rf)
- This communication specification can often be calculated from the anarchic invariant and the inductive invariant  $S_{ind}$

at 
$$7 \wedge$$
 at 16  

$$\Rightarrow (\neg F2_4^{i_{\text{end}}} \vee T_5^{i_{\text{end}}} = 1) \wedge (\neg F1_{13}^{j_{\text{end}}} \vee T_{14}^{j_{\text{end}}} = 2)\}$$
(i.e. the invariants at lines 7: and 16: hold)  

$$\Rightarrow \neg S_{com} \quad \{\text{since by taking } i = i_{\text{end}} \text{ and } j = j_{\text{end}}, \text{ we have}$$

$$(F2_4^i = \text{false} \vee T_5^i = 1) \wedge (F1_{13}^j = \text{false} \vee T_{14}^j = 2)\}$$

so that Peterson has been proved correct under the hypothesis that the communication specification  $S_{com}$  holds:

$$S_{com} \triangleq \neg[\exists i, j. [\mathfrak{rf}\langle F2_4^i, \langle 0:, false \rangle\rangle \vee \mathfrak{rf}\langle F2_4^i, \langle 17:, false \rangle\rangle \\ \vee \mathfrak{rf}\langle T_5^i, \langle 11:, 1 \rangle\rangle] \wedge [\mathfrak{rf}\langle F1_{13}^j, \langle 0:, false \rangle\rangle \\ \vee \mathfrak{rf}\langle F1_{13}^j, \langle 8:, false \rangle\rangle \vee \mathfrak{rf}\langle T_{14}^j, \langle 2:, 2 \rangle\rangle]]$$

(preventing the incorrect case)

### Soundness and completeness

- The invariance proof method is derived from the truly parallel semantics with cuts by calculational design
   ⇒ soundness and (relative) completeness
  - algorithm Ainvariant invariance proof algorithm Aspecification  $S_{inv}$  $S_{com} \Rightarrow S_{inv}$ proved correct  $H_{com}$  and  $S_{inv}$ communication  $H_{com} \Rightarrow S_{inv}$ incompleteness specification  $S_{com}$ inclusion proof  $H_{com} \Rightarrow S_{com}$ concistency algorithm Ahypothesis  $H_{com}$ proved correct w.r.t. M and  $S_{inv}$ consistency proof consistency  $M \mapsto S_{inv}$  $M \Rightarrow H_{com}$ model M

• A consistency specification  $H_{com}$  may be less expressive than  $S_{com} \Rightarrow incompleteness$  (\*)

<sup>(\*)</sup> e.g. hardware cannot restrict a read to input from writes writing odd numbers.

# Consistency hypothesis and inclusion proof



### Consistency hypothesis

- The communication specification  $S_{com}$  is useful to reason on invariance, but not on machine architecture
- We express S<sub>com</sub> as a consistency hypothesis H<sub>com</sub> expressed in the cat language
- $H_{com}$  is derived from  $S_{com}$  by calculations design while doing the inclusion proof

#### Inclusion proof

- Inclusion proof:  $\neg S_{com} \Rightarrow \neg H_{com}$
- Calculational design of H<sub>com</sub>:
  - Calculate all possible execution scenarios violating
     S<sub>com</sub>

$$S_{com} \triangleq \neg [\exists i, j. [\mathfrak{rf}\langle \mathtt{F2}_{4}^{i}, \langle \mathtt{0:, false} \rangle\rangle \vee \mathfrak{rf}\langle \mathtt{F2}_{4}^{i}, \langle \mathtt{17:, false} \rangle\rangle \\ \vee \mathfrak{rf}\langle \mathtt{T}_{5}^{i}, \langle \mathtt{11:, 1} \rangle\rangle] \wedge [\mathfrak{rf}\langle \mathtt{F1}_{13}^{j}, \langle \mathtt{0:, false} \rangle\rangle \\ \vee \mathfrak{rf}\langle \mathtt{F1}_{13}^{j}, \langle \mathtt{8:, false} \rangle\rangle \vee \mathfrak{rf}\langle \mathtt{T}_{14}^{j}, \langle \mathtt{2:, 2} \rangle\rangle]]$$

- Prevent each of them by a cat specification
- H<sub>com</sub> is their conjunction

```
0:{ w F1 false; w F2 false; w T 0; }
    0:{ w F1 false; w F2 false; w T 0; }
                                                                  P0:
                              P1: ) co
    P0:
                                                                  1:w[] F1 true
                                                                                               10:w[] F2 true;
    1:w[] F1 true,
                              10:w[] F2 true;
                                                                  2:w[] T 2
                                                                                              11:w[] T 1;
    2:w[] T 2
                               11:w[] T 1;
                                                                                               12:do
                                                                  3:do
                              12:do
    3:do
                                                                                               13: r[] R3 F1
                                                                        r[] R1 F2
         r[] R1 F2
                               4:
                                                                  5:
                                                                        r[] R2 T
                                                                                               14:
                                                                                                      r[] R4 T;
                              14:
          r[] R2 T
                                     r[] R4 T;
                                                                  6:while R1 \wedge R2 \neq 1
                                                                                              15:while R3 \wedge R4 \neq 2;
    6:while R1 \wedge R2 \neq 1
                              15:while R3 \wedge R4 \neq 2;
                                                                        f[p0] (* CS1 *)
                                                                                              16: f[p1] (* CS2 *)
          f[p0] (* CS1 *)
                                     f[p1] (* CS2 *)
                              16:
                                                                  8:w[] F1 false
                                                                                              17:w[] F2 false;
    8:w[] F1 false
                              17:w[] F2 false;
                                                                              14 - \text{fr} \rightarrow 11 - \text{po} \rightarrow 14
4 - \text{fr} \rightarrow 10 - \text{po} \rightarrow 13 - \text{fr} \rightarrow 1 - \text{po} \rightarrow 4
                    case 1: 0:F2,0:F1
                                                                         case 2a: 0:F2,1:F1 (2 - co \rightarrow 11)
                                                           0:{ w F1 false; w F2 false; w T 0; }
                                                           P0:
   0:{ w F1 false; w F2 false; w T 0; }
                                                                  (co
                                                           1:w[] F1 true fr
                                                                                          10:w[] F2 true;
   P0:
                                                                                          11:W[] T 1; — po
   1:w[] F1 true
                              10:w[] F2 true;
                                                           2:w[] T 2
                              11:w[] T 1; 	
   2:w[] T 2 _
                                                                                          12:do
                                                           3:do
                              12:do
   3:do
                                                                                          13: r[] R3 F1;
                                                                  r[] R1 F2
                                                           4:
                        CO
         r[] R1 F2
                                     r[] R3 F1;
                              13:
                                                           5:
                                                                  r[] R2 T
                                                                                                  r[] R4 T;
                                                                                          14:
         r[] R2 T
                              14:
                                     r[] R4 T;
                                                           6:while R1 \wedge R2 \neq 1
                                                                                         15:while R3 \wedge R4 \neq 2;
   6:while R1 \wedge R2 \neq 1 | 15:while R3 \wedge R4 \neq 2;
                                                                                          16: f[p1] (* CS2 *)
                                                                  f[p0] (* CS1 *)
         f[p0] (* CS1 *)
                                     f[p1] (* CS2 *)
                              16:
                                                           8:w[] F1 false
                                                                                         17:w[] F2 false;
   8:w[] F1 false
                              17:w[] F2 false;
   11 - co \rightarrow 2 - po \rightarrow 4 - fr \rightarrow 10 - po \rightarrow 11
                                                               2 \longrightarrow 11 \longrightarrow po \longrightarrow 13 \longrightarrow fr \longrightarrow 1 \longrightarrow po \longrightarrow 2
              case 2b: 0:F2,1:F1 (11 - co \rightarrow 2)
                                                                          case 3a: 10:F2,0:F1 (2 - co \rightarrow 11)
```

```
0:{ w F1 false; w F2 false; w T 0; }
        0:{ w F1 false; w F2 false; w T 0; }
                                                              P0:
                                                                                           P1:
        P0:
                                                              1:w[] F1 true
                                                                                           10:w[] F2 true;
        1:w[] F1 true
                                   10:w[] F2 true;
                                                              2:w[] T 2
                                                                                           11:w[] T 1;_
                                   11:w[] T 1;
       ·2:w[] T 2 🚤
                                                                                           12:do
                                                              3:do
                                   12:do
        3:do
                                                              4: r[] R1 F2
                                                                                                   r[] R3 F1;
                                   13:
                                        r[] R3 F1;
              r[] R1 F2
                                                                    r[] R2 T
                                                                                           14: Yr[] R4 T; ✓
                                   14: r[] R4 T;
              r[] R2 T
                                                             6:while R1 \wedge R2 \neq 1 |
                                                                                           15:while R3 \wedge R4 \neq 2;
        6: while R1 \wedge R2 \neq 1 | 15: while R3 \wedge R4 \neq 2;
                                                                                           16:
                                                                    f[p0] (* CS1 *) ||
                                                                                                   f[p1] (* CS2 *)
              f[p0] (* CS1 *) || 16: f[p1] (* CS2 *) 7:
        8:w[] F1 false
                                                              8:w[] F1 false
                                                                                           17:w[] F2 false;
                                 | 17:w[] F2 false;
                   5 - fr \rightarrow 2 - po \rightarrow 5
                                                                      14 ---fr \rightarrow 11 ---po \rightarrow 14
           case 3b: 10:F2,0:F1 (11 - co \rightarrow 2)
                                                                    case 4a: 10:F2,1:F1 (2 - co \rightarrow 11)
  0:{ w F1 false; w F2 false; w T 0; }
                                                             0:{ w F1 false; w F2 false; w T 0; }
  P0:
                                                            P0:
   1:w[] F1 true
                              10:w[] F2 true;
                                                             1:w[] F1 true
                                                                                        10:w[] F2 true;
                              11:w[] T 1;
PQ 2:w[] T 2 _
                                                             2:w[] T 2
                                                                                        11:w[] T 1;
                               12:do
  3:do/
                                                             3:do
                                                                                        12:do
        r[] R1 F2
                              13:
                                     r[] R3 F1;
                                                             4:
                                                                  r[] R1 F2
                                                                                        13: r[] R3 F1; po
        \r[] R2 T
                              14:
                                     r[] R4 T;
                                                                  r[] R2 T
                                                                                        14:
                                                                                               r[] R4 T;
  6: while R1 \wedge R2 \neq 1 | 15: while R3 \wedge R4 \neq 2;
                                                             6:while R1 \wedge R2 \neq 1 15:while R3 \wedge R4 \neq 2;
        f[p0] (* CS1 *)
                              16:
                                      f[p1] (* CS2 *)
                                                                  f[p0] (* C$1 *) || 16  f[p1] (* CS2 *)
                              17:w[] F2 false;
  8:w[] F1 false
                                                            8:w[] F1 false
                                                                                      || 17:w[] F2 false; <
               5 - \text{fr} \rightarrow 2 - \text{po} \rightarrow 5
                                                           4 \longrightarrow po \longrightarrow 8 \longrightarrow rf \longrightarrow 13 \longrightarrow po \longrightarrow 17 \longrightarrow rf \longrightarrow 4
        case 4b: 10:F2,1:F1 (11 - co \rightarrow 2)
                                                                        case 5: 17:F2,8:F1
```

```
0:{ w F1 false; w F2 false; w T 0; }
                                                                                0:{ w F1 false; w F2 false; w T 0; }
P0:
                                                                                P0:
1:w[] F1 true
                                     10:w[] F2 true;
                                                                                1:w[] F1 true
                                                                                                                    10:w[] F2 true;
2:w[] T 2
                                     11:w[] T 1;
                                                                                                                  | 11:w[] T 1;
                                                                                2:w[] T 2
3:do
                                     12:do
                                                                                3:do
                                    13: r[] R3 F1; po 4: r[] R1 F2 13: r[] R3 F1; 14: r[] R4 T; 5: r[] R2 T 14: r[] R4 T; 15:while R3 \wedge R4 \neq 2; 6:while R1 \wedge R2 \neq 1 15:while R3 \wedge R4 \neq 2;
        r[] R2 T
6:while R1 \wedge R2 \neq 1
                                     16:__f[p1] (* CS2 *)
                                                                                        f[p0] (* CS1 *) | 16; f[p1] (* CS2 *)
                                     17:w[] F2 false;
                                                                                                                 17:w[] F2 false;
8:w[] F1 false
                                                                                8:w[] F1 false
7 \longrightarrow po \longrightarrow 8 \longrightarrow rf \longrightarrow 13 \longrightarrow po \longrightarrow 17 \longrightarrow cut \longrightarrow 7 16 \longrightarrow po \longrightarrow 17 \longrightarrow rf \longrightarrow 4 \longrightarrow po \longrightarrow 7 \longrightarrow cut \longrightarrow 16
                         case 6: 8:F1
                                                                                                   case 7: 17:F2
```

- the cut relation can be expressed in cat using tags on fence markers f[p0] and f[p1]
- H<sub>com</sub> is irreflexive fr;po;fr;po irreflexive fr;po irreflexive co;po;fr;po irreflexive po;rf;po;rf irreflexive po;rf;po;cut

# Consistency model and proof



#### Example: Peterson in SC

H<sub>com</sub> is

irreflexive fr;po;fr;po irreflexive fr;po irreflexive co;po;fr;po irreflexive po;rf;po;rf irreflexive po;rf;po;cut

Sequential consistency in cat:

```
let fr = (rf^-1; co)
acyclic po | rf | co | fr as sc
```

Forbid all first 4 cases

#### Example: Peterson in SC

```
0:{ w F1 false; w F2 false; w T 0; }
                                                                         0:{ w F1 false; w F2 false; w T 0; }
P0:
                                                                         P0:
                                  10:w[] F2 true;
1:w[] F1 true
                                                                         1:w[] F1 true
                                                                                                          10:w[] F2 true;
2:w[] T 2
                                  11:w[] T 1;
                                                                         2:w[] T 2
                                                                                                          11:w[] T 1;
3:do
                                  12:do
                                                                                                                  r[] R3 F1;
                                                                                r[] R2 T
                                                                                                                  r[] R4 T;
6:while R1 \wedge R2 \#
                                  15:while R3 \wedge R4 \neq 2;
                                                                         6: while R1 \wedge R2 \neq 1 15: while R3 \wedge R4 \neq 2;
                                 16:__f[p1] (* CS2 *)
                                                                                f[p0] (* CS1 *) | 16; f[p1] (* CS2 *)
                                 17:w[] F2 false;
                                                                         8:w[] F1 false
8:w[] F1 false
                                                                                                        17:w[] F2 false;
7 \longrightarrow po \longrightarrow 8 \longrightarrow rf \longrightarrow 13 \longrightarrow po \longrightarrow 17 \longrightarrow cut \longrightarrow 7
                                                                    16 \longrightarrow po \longrightarrow 17 \longrightarrow rf \longrightarrow 4 \longrightarrow po \longrightarrow 7 \longrightarrow cut \longrightarrow 16
                      case 6: 8:F1
                                                                                          case 7: 17:F2
```

 The last case follows from the truly parallel execution trace semantics with cuts for sequential consistency



#### Example: Peterson in TSO

H<sub>com</sub> is <u>not</u> forbidden by TSO:

```
let fr = (rf^-1;co)
let po-loc = po & loc
acyclic po-loc | rf | co | fr as scpv
let ppo = po \ (W*R)
let rfe = rf & ext
acyclic ppo | rfe | co | fr as tso
```

For example the case 1,

```
\langle w_1, r_4 \rangle \in \text{fr ; po ; fr ; po}
```

is not forbidden by TSO since  $\langle w,r \rangle$  pairs on different variables are excluded from ppo.

# Implementation with (weak) cat fences

#### Implementation with fences

```
0:\{ F1 = 0; F2 = 0; T = 0; \}
   w[] F1 1
2: w[] T 2
                          11: w[] T 1
3:
                          12: do
   do
      f[fhw]
                               f[fhw]
                    | 13: r[] r3 F1
  r[] r1 F2
4:
5: r[] r2 T
                | 14: r[] r4 T
6: while r1 \wedge r2 \neq 1 | 15: while r3 \wedge r4 \neq 1
                        | 16: (* CS2 *)
7: (* CS1 *)
      f[fhw]
                          f[fhw]
8:
   w[] F1 0
                          L17: w[] F2 0
       let fhw = (po & (_ * F)); po
       let fre = (rf^-1;co) & ext
       irreflexive fhw;fre; fhw;fre
```

- Invariance proof unchanged (fence = skip)
- Proved to imply the previous fenceless cat specification
- so  $S_{com}$  unchanged

## consistency proof



- The proof is valid for the virtual machine defined by the cat specification Peterson
- Porting the algorithm to a different machine M' just need refencing (and redoing the proof  $M' \Rightarrow H_{cm}$ )
- On machine architecture stronger fences have to be used:
  - SC: fhw = no fence
  - TSO: fhw = mfence
  - ARM: fhw = dbm | dsb

## Conclusion

### Algorithm design methodology

- I. Design the algorithm A and its specification S in the sequential consistency model of parallelism
- 2. Consider the anarchic semantics of algorithm A
- 3. Add communication specifications  $S_{com}$  to restrict anarchic communications and ensure the correctness of A with respect to specification S
- 4. Do the invariance proof under WCM with S<sub>com</sub>
- 5. Infer H<sub>cm</sub> in cat from S<sub>com</sub>
- 6. Prove that the machine memory model M in cat implies H<sub>cm</sub>

#### Conclusion

- Modern machines have complex memory models
  - ⇒ portability has a price (refencing)
  - ⇒ debugging is very hard/quasi-impossible
  - ⇒ proofs are much harder than with sequential consistency (but still feasible?, mechanically?)
  - ⇒ static analysis parameterized by a WCM will be a challenge

## The End, Thank You