# **Verifying Optimizers for Concurrent Programs on Promising semantics**

## **ANONYMOUS**

#### **ACM Reference Format:**

Anonymous. 2018. Verifying Optimizers for Concurrent Programs on Promising semantics. *Proc. ACM Meas. Anal. Comput. Syst.* 37, 4, Article 111 (August 2018), 102 pages. https://doi.org/10.1145/1122445.1122456

#### 1 INTRODUCTION

Code optimizers are important components in compilations. Correct code optimizer requires that the target program generated preserve the semantics of the source program. Proving the correctness of code optimizers in compiler is usually difficult than proving the translation pass, which translates the program implemented in one language to another one, since the memory accesses in the source program may be modified during optimizations. In this document, we discuss the correctness proof of the optimizers for the concurrent programs under *promising semantics* [11] as shown in Fig. 1. At present, there are some works of proving the correctness of



Fig. 1. What do we focus on in this work

code optimization algorithms in compilers for sequential programs [6, 17, 19], but there is not much discussion about how to prove the correctness of optimizers for concurrent programs under weak memory models.

• Jiang, et al. [7] develop CASCompCert for correct compilation of the data-race-free concurrent program. However, the concurrent program that they focused on is defined under SC memory model [10]. The behaviors of some atomic memory accesses defined in C11 [1] can not be depicted under SC memory model. For example, we can not define the behaviors of the atomic release write and the atomic acquire read on SC memory model. The following program behavior that is well-defined under C11 can not be generated under SC memory model.

$$x_{rel} := 1;$$
  $y_{rel} := 1;$   $r_1 := y_{acq};$   $//0$   $r_1 := x_{acq};$   $r_2 := x_{acq}$ 

Author's address: Anonymous.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org.

© 2018 Association for Computing Machinery.

2476-1249/2018/8-ART111 \$15.00

https://doi.org/10.1145/1122445.1122456

Moreover, the simulation in CASCompCert preserves the data race freedom in source programs. Thus, the simulation in CASCompCert can not prove the some optimizers, such as *loop invariant code motion* in LLVM, which may introduce read-write race during code optimization.

• Ševčík, et al. [15] develops CompCertTSO. However, the TSO memory model is still a strong memory model and the source programs defined on TSO memory model can not be compiled to efficient ARM or Power programs. Moreover, CompCertTSO relies on a strong simulation relation, which requires that the source and the target always generate the same memory accesses. Such restriction is too strong and many common optimizations, like eliminating redundant reads/writes and instruction reordering, can break it. For example, the following constant propagation optimization that eliminates redundant memory accesses can not be proved by the simulation relation in CompCertTSO.

$$x_{na} := 2;$$
  $x_{na} := 2;$   $r := x_{na};$   $r := 2;$ 

The constant propagation in CompCertTSO only optimizes the operations on registers.

- Most of the weak memory models for platform-independent concurrent programming language are axiomatic models [1, 3, 9]. They express the concurrency semantics in terms of global properties of complete execution. It makes them difficult to be used in the correctness proof of compilers, if we want to achieve modular reasoning.
- Promising semantics [5, 8, 11] is an operational model. The work of promising semantics defines a simulation relation to validate many code transformations for adjacent instructions. Their simulation is simple and may be applied to prove the correctness of some optimization algorithms. However, the relation between the target and source memory in their simulation is fixed. It is often too strong and causes that it is difficult to apply it to prove the correctness of the some code optimization algorithms. For example, their simulation relation restricts that each message in the target memory has a corresponding message with the same timestamp in the memory of the source program. We show that maintaining such restriction is not a trivial task in proving the correctness of some optimization algorithms. in the following. The below is an example of dead code elimination optimization.

$$\begin{array}{lll} \text{while}(r_1 < r_2) \; \{ & & \text{while}(r_1 < r_2) \; \{ \\ & \text{$\mathsf{x}_{\mathsf{na}} := 1$;} & & \text{$\mathsf{skip}$;} \\ & r_1 := r_1 + 1; & \sim & r_1 := r_1 + 1; \\ \} & & & \} \\ & \text{$\mathsf{x}_{\mathsf{na}} := 2$;} & & \text{$\mathsf{x}_{\mathsf{na}} := 2$;} \end{array}$$

In order to maintain such restriction that the timestamp of the message generated by the instruction " $x_{na} := 2$ " in the target program equals to the timestamp of the message generated by the instruction " $x_{na} := 2$ " in the source program, we can not establish the simulation relation between the target and source programs as the following form, since the source program will generate some messages about " $x_{na} := 1$ " before executing " $x_{na} := 2$ " and cause the timestamp of the message generated by " $x_{na} := 2$ " in the target program greater than the message generated by  $x_{na} := 2$  in the source program.



- Vafeiadis, et al. [18] discuss the validation of many standard compiler optimizations on C/C++11 memory model. Soham Chakraborty and Viktor Vafeiadis [3] validates compiler optimizations on LLVM memory model. Ševčik [4] assumes that the transformed program runs sequentially consistently and present the correctness of compiler optimizations under the data race freedom assumption. However, these work focus on whether standard compiler optimizations, such as eliminating redundant reads/write and instruction reordering, are valid under the specific memory models. They do not discuss how to use their conclusions to prove concrete code optimization algorithms. Their works are mainly used to validate, for a source program, whether the optimizer does a correct optimization on it, e.g., [2, 12].
- There are many works [9, 11, 13, 14] that discuss and prove the correctness of compilation from concurrent programming languages with weak memory consistency semantics, such as promising semantics and C11 memory model, to mainstream multi-core architectures, such as x86TSO, ARM and POWER, by standard compilation schemes. However, the standard compilation schemes just map the high-level primitives to instructions of major modern architectures directly and do not include any code optimizations.

In this work, we consider how to prove the correctness of the optimizers about concurrent programs under promising semantics, since it is an operational memory model, whose definition does not rely on complete executions of programs, and has been proved to validate many code transformations for adjacent instructions, as well as standard compilation schemes to x86-TSO, ARM and Power. We focus on proving the the correctness of the optimization passes for two reasons.

- (1) A compile usually has optimization pass and translation pass. Optimizations pass can modify the memory accesses in the source program during optimization. However, translation passes plays a role to transform a program in one language to a form in another language. It usually does not modify the memory accesses in the source program. Thus, from the perspective of memory accesses in the program compiled, the translation pass, which does identity optimizations on memory accesses in the source program, can be regarded as a special case of the optimization pass.
- (2) Correctness translation from the programs in promising semantics to major modern architectures has been discussed in many previous work, e.g., [14].

In this document, we will do the following contributions.

(1) We find that promising semantics can be converted to a non-preemptive semantics, which does not permit the thread switching and the promise step of the current thread after the execution of non-atomic steps as the following shown. Here, "let  $\pi$  in  $f_1 \parallel f_2$ " means two threads, one starting from the entry  $f_1$  in the code  $\pi$  and the other starting from the entry  $f_2$  in the code  $\pi$ , executing under promising semantics, and "let  $\pi$  in  $f_1 \mid f_2$ " means that such two threads execute under our non-preemptive semantics.

let 
$$\pi$$
 in  $f_1 \parallel f_2 \approx let \pi$  in  $f_1 \mid f_2$ 

Proving the correctness of the code optimizers under the non-preemptive semantics can provide some convenience for us, since the non-preemptive semantics is simpler than promising semantics and there is no interaction with the environment after the execution non-atomic accesses in such non-preemptive semantics. In this work, we only consider the code optimizations on non-atomic memory accesses. GCC does nothing optimizations on the atomic memory accesses. As for LLVM, we only find that it does optimizations on the atomic memory accesses in register promotion. However, register promotion only optimizes the accesses on the memory locations, which are thread-local.

- (2) We define a thread-local simulation relation under the non-preemptive semantics to prove the correctness of the compiler optimizations that we care about. Our thread-local simulation has the following advantages:
  - It is defined under the non-preemptive semantics, which is simpler than promising semantics. This can simplify the definition of our thread-local simulation and the correctness proof of some code optimizations,

such as instruction reordering, since we only need to consider the interaction between the current thread and the environment at specific program points.

- The invariant *I* for shared memory in our thread-local simulation is general and can be instantiated in proving specific code optimization algorithms. Thus, in proving optimizers that do not eliminate write operations, like common subexpression elimination and instruction reordering, the invariant for shared memory can be simple. The memory in target program and source program can be strictly equal and it can simplify our proof.
- Our thread-local simulation is *parallel compositional* as the following shown, if the source program is *write-write race free* on non-atomic memory accesses (shown as ww-RF( $S_1 \mid S_2$ )).

$$(I \vdash \pi_t(\mathsf{f}_1) \preccurlyeq \pi_s(\mathsf{f}_1) \land I \vdash \pi_t(\mathsf{f}_2) \preccurlyeq \pi_s(\mathsf{f}_2) \land \mathsf{ww-RF}(\mathbf{let} \ \pi_s \ \mathbf{in} \ \mathsf{f}_1 \mid \mathsf{f}_2)) \\ \Longrightarrow \ \mathbf{let} \ \pi_t \ \mathbf{in} \ \mathsf{f}_1 \mid \mathsf{f}_2 \preccurlyeq \mathbf{let} \ \pi_s \ \mathbf{in} \ \mathsf{f}_1 \mid \mathsf{f}_2$$

Our work does not consider the multi-language linking.

- We show that our thread-local simulation is able to prove the correctness of common code optimization algorithms for eliminating redundant reads/writes and instruction reordering.
- (3) We formulate the correctness of the optimizers for concurrent programs defined under promising semantics. We need to require the source programs does not contain *write-write race* on non-atomic memory accesses (defined by plain memory accesses in promising semantics) in formulating the correctness of the optimizers as the following form.

$$\forall \pi_t, \pi_s. (\mathsf{Optimizer}(\pi_s) = \pi_t \land \mathsf{ww-RF}(\mathbf{let} \ \pi_s \ \mathbf{in} \ \mathsf{f}_1 \parallel \mathsf{f}_2))$$

$$\implies \mathbf{let} \ \pi_t \ \mathbf{in} \ \mathsf{f}_1 \parallel \mathsf{f}_2 \subseteq \mathbf{let} \ \pi_s \ \mathbf{in} \ \mathsf{f}_1 \parallel \mathsf{f}_2$$

And we provide a verification framework, which we will give more introductions to in the following, in Fig. 2 to present how to establish the correctness of the optimizers. The reason that we permit that source programs have read-write data race, is that some optimizers, such as loop invariant code motion in LLVM, may introduce read-write data race on non-atomic memory accesses in optimization. Our thread-local simulation is able to preserve the write-write race freedom to make the correctness of optimizer *transitive*.

(4) We use our method to prove three common algorithms of compiler optimizations in our work. (1) common subexpression elimination, which is responsible for eliminating redundant reads in programs; (2) dead code elimination, a code optimization to eliminate redundant writes in programs; (3) loop invariant code motion, which recognizes computations in loops that produce the same value on every iteration of the loop an moves them out of the loop. The loop invariant code motion, like the algorithm in LLVM, may introduce read-write data race on non-atomic memory accesses in optimization. Thus, this explains why our work allows the source program to exit read write data race. The algorithms of common subexpression elimination and dead code elimination in CompCert [6]. And the algorithm of loop invariant code motion is implemented based on the algorithm in Steven S.Muchnick's textbook on code optimizations [16]. However, we extend these code optimization algorithms in our work, since we consider the code optimizations accross atomic memory accesses and need to handle the atomic memory access operations in the program. Below, we give some examples to show that the code optimizations can across the atomic memory accesses. Some of those optimizations have already been observed in the LLVM.

$$x_{na} := 4;$$
  $skip;$   $r_1 := x_{na};$   $y_{rel} := 1;$   $cset in LLVM$  \*)

 $x_{na} := 4;$   $skip;$   $y_{rel} := 1;$   $cset in LLVM$  \*)

 $x_{na} := 4;$   $x_{na} := 2;$   $x_{na}$ 



We also show that our thread-local simulation is able to support verifying the correctness of instruction reordering optimizations.

In this work, we focus on the correctness proof the code optimizations for eliminating redundant reads/writes and instruction reordering. We do not discuss the code optimizations on stack memory, such as function inline, tail call, register promotion and register allocation & spilling. The reason is that, if we do not consider the escape of the stack pointer of a thread to another thread, the stack memory is thread-local. It is plausible that we can convert the part of the thread-local memory of a thread to a form of partial mapping from memory locations to variables and let it be the local state of the thread. The interfering of the environment does not influence these code optimizations.

We establish a verification framework in Fig. 2. We define a non-preemptive semantics, which equals to promising semantics as shown in the step (5) in Fig. 2, and do the correctness proof of optimizers on such non-preemptive semantics. A well-defined optimizer should ensures that each source thread can establish the thread-local simulation with its corresponding target thread, shown as the step (2). Our thread-local simulation is compositional under the write-write race freedom assumption and can preserve the write-write race free property as shown in the step (3). We require that our thread-local simulation preserves the write-write race freedom, since we need to make sure that the correctness of the optimization is transitive. The whole program simulation ensures the refinement relation between the source and target programs under the non-preemptive semantics as shown in the step ④. From the equivalence between promising semantics and the non-preemptive semantics, we establish the correctness of optimizers for concurrent programs with promising semantics.

In the following, we give more introductions to our approach to establish the correctness of optimizers for concurrent programs defined under promising semantics.

- Why we define the thread-local simulation on the non-preemptive semantics? As we have introduced, the non-preemptive semantics does not permit the thread switching and the promise step of the current thread after the execution of non-atomic steps. Such simplification can simplify our work in defining the thread-local simulation in two points.
- (1) The thread-local simulation should include an invariant for shared memory in target and source programs to depict the interaction with the environments. Defining the thread-local simulation on the non-preemptive semantics allows us to maintain the invariant in specific program points.
- (2) Since we only consider the code optimization on non-atomic memory accesses, we can define the thread-local simulation about thread steps in such a simple form, which says that, if the target thread takes a non-atomic step, the source thread also takes some non-atomic steps to preserve the thread-local simulation; and if the target thread takes an atomic or promise step, the source thread also takes an atomic step or some promise steps to preserve the thread-local simulation.

We use the following *instruction reordering* transformation to show that such two advantages can simplify the proof of some code optimizations.

$$r := x_{na};$$
  $y_{na} := 2;$   $y_{na} := 2;$   $r := x_{na};$   $y_{na} := 3;$   $y_{n$ 

For the source program, we call the thread on the left side  $S_1$  and the thread on the right side  $S_2$ . For the target program, we call the thread on the left side  $T_1$  and the thread on the right side  $T_2$ . We try to establish the thread-local upward simulation between  $S_1$  and  $T_1$ . We define a simply invariant I, which just say that the memory in the target program and in the source program are strictly equal. Consider that  $T_1$  executes " $y_{na} := 2$ " first as the following shown.



If we want to take advantage (2) shown above,  $S_1$  should not take any step. The reason is that  $S_1$  need to execute " $r := x_{na}$ " first, but it can not make sure which message it should read before  $T_1$  executing " $r := x_{na}$ ". Let  $S_1$  take zero step at this moment will break the invariant I introduced previously, since the execution of " $y_{na} := 2$ " in  $T_1$  will generate a new message in the target memory and cause the memory in target and source programs no longer the same. Thus, if we define the thread-local simulation on a preemptive semantics, which permits the interaction with the environments of the current thread in any program point, a problem will arise, since defining a thread-local simulation on a preemptive semantics requires us to maintain the invariant for shared memory in any program point. However, we can establish the thread-local simulation in our work, which is defined under the non-preemptive semantics, between  $S_1$  and  $T_1$  as the following shown.



The invariant I only need to be reestablished after the execution of "print(r)", and the interactions with the environment of  $S_1$  and  $T_1$  do not need to be considered in proving the reordering of " $r := x_{na}$ ;  $y_{na} := 2$ ". The interactions with the environment only need to be considered after the instruction reordering proof and after the execution of "print(r)".

• Why we require the write-write race freedom assumption? The write-write race freedom on nonatomic memory accesses assumption plays an important role in proving that our thread-local simulation is able to compose to a whole program simulation (shown as ③ in Fig. 2). The reason is that whole program step (called machine step in promising semantics) in promising semantics includes two components: (1) the current thread takes some steps; (2) after these steps, the promises of the current thread is consistent (or *certified*). The whole program step has the following form and we use  $\mathcal{TP}$  to represent the thread pool

$$\frac{(\mathcal{TP}(\mathsf{t}), M) \longrightarrow (\mathit{TS'}, M')}{\mathsf{consistent}(\mathit{TS'}, M')}$$
$$\overline{(\mathcal{TP}, \mathsf{t}, M) \Longrightarrow (\mathcal{TP}\{\mathsf{t} \leadsto \mathit{TS'}\}, \mathsf{t}, M')}$$

The promise consistency certification (shown as consistent(TS', M')) ensures that the current thread t is able to fulfill all its promises when executing in isolation. However, the certification does not start from the current memory M'. It starts from a capped memory [5] constructed from M'. In such construction, all timestamps intervals between existing messages are blocked by reservations. However, this will arise a problem in our work. The whole program simulation in our work has the following form, which is defined on the whole program step.



If the target program takes a whole program step, the source program can take some whole program steps and preserve the whole program simulation. From the definition of the whole program step, in the proof of the compositionality of our thread-local simulation, we need to prove that, if the current target thread can reach a thread configuration whose promises can be certified, the current source thread can also take some steps to a thread configuration whose promises can be certified. However, the problem here is that our thread-local simulation is defined from the current memory, but the promise certification starts from the capped memory. Thus, our thread-local simulation can not ensure the property which says, if the promises of the current target thread can be certified, the promises of the current source thread can also be certified.

$$\neg ((consistent(T) \land I \vdash T \preccurlyeq S) \implies consistent(S))$$

Thus, we introduce the write-write race freedom assumption. And our thread-local simulation can make sure such property with the write-write race freedom assumption. The write-write race freedom forbids the following execution inserting messages using the timestamp intervals between existing messages, since

the capped memory blocks these timestamps.

```
(consistent(T) \land I \vdash T \leq S \land ww-RF(S)) \implies consistent(S)
```

It means that a thread from a write-write-race-free program can certify promises (for non-atomic writes) against the current memory instead of the capped memory.

Intuitively a *write-write race* means that two threads both (non-atomically) write to the same location, and neither write happens before the other. So, write-write race freedom forbids a thread t to write to a location when the memory contains a write of the same location made by another thread t' and unobserved by t. This gives the same technical effect as the capped memory: t cannot write a message m when the memory already contains another message m' at the same location with a higher timestamp written by t'. We show that our thread-local simulation preserves the promises certification under write-write race freedom assumption and how a thread from a write-write-race-free program can certify promises (for non-atomic writes) against the current memory instead of the capped memory in details in Sec. 8.3.

Why we permit read-write data race? The reason that we permits read-write race is that some optimizations, like loop invariant code motion in LLVM, may introduce read-write race in the target program during optimization. The following is an example of loop invariant code motion that will be performed in LLVM.

$$\begin{array}{c} x_{na} := 20; \\ y_{rel} := 1; \\ z_{na} := 5; \end{array} \begin{array}{c} r := y_{acq}; \\ if(r == 1) \; \{ \\ r_1 := x_{na}; \\ while(r_1 < 10) \; \{ \\ r_2 := z_{na}; \\ r_1 := r_1 + 1; \\ \} \\ \} \end{array} \begin{array}{c} x_{na} := 20; \\ y_{rel} := 1; \\ z_{na} := 5; \\ \end{array} \begin{array}{c} r := y_{acq}; \\ if(r == 1) \; \{ \\ r_1 := x_{na}; \\ r' := z_{na}; \\ while(r_1 < 10) \; \{ \\ r_2 := r'; \\ r_1 := r_1 + 1; \\ \} \\ \} \end{array}$$

In the source program, the thread on the right side will not execute " $r_2 := z_{na}$ ", since the thread will never entry the loop. However, we can find that, after the loop invariant code motion optimization, there are read-write data race on accessing the variable z between two threads.

The method in CASCompCert does not support proving the correctness of loop invariant code motion in such form, since this method requires the data race freedom property of the source program preserves during compilation.

• How to make our thread-local simulation ensure write-write race freedom preserving? As shown in Fig. 2, our thread-local simulation is able to preserve the write-write race freedom. Thus, our thread-local simulation needs to restrict that the memory locations written by the execution of the target program should be the same or fewer than the execution of the source program and we introduce a delay set in the thread-local simulation to ensure such restriction. The delay set records the memory locations that has been written by the target thread but not by the source thread. Each memory location in the delay set also has an index, which restricts that the source thread has to write such memory location in finite steps. Consider the following instruction reordering transformation.

$$r := x_{na};$$
  $y_{na} := 2;$   $y_{na} := 2;$   $r := x_{na};$ 

We establish the thread-local simulation between the target and source programs in the above transformation. Consider that the delay set is  $\mathcal{D}$  before the instruction reordering proof. In the first step, the target thread executes " $y_{na} := 2$ " and the source thread does not execution. We records the variable y in the delay

set and an index i to restrict that the source thread has to write the variable y in i steps. In the next step, the source thread executes " $y_{na} := 2$ " and is able to remove the variable y in the delay set.



## 2 LANGUAGE

We define the language used to do optimizations in this section. We call the language shown in this section concur-SimpRTL. We will show its syntax and transition on the thread-local state in the following.

## 2.1 Syntax of concur-SimpRTL

```
 (Fid) \quad \mathsf{f} \quad \in \quad \mathbb{N} \qquad (Lab) \quad l \in \quad \mathbb{N} \qquad (Var) \quad \mathsf{x}, \mathsf{y}, \mathsf{z} \ ::= \ \ldots \\ (Val) \quad v \quad \in \quad \mathsf{Int}32 \\ (Expr) \quad e \quad ::= \quad r \mid v \mid e + e \mid e - e \mid e * e \\ (Instr) \quad c \quad ::= \quad r := e \mid r := \mathsf{x}_{o_r} \mid \mathsf{x}_{o_w} := e \mid \mathsf{skip} \mid \mathsf{print}(e) \\ \quad \mid \quad r := \mathsf{CAS}_{o_r,o_w}(\mathsf{x},e_r,e_w) \\ \quad \mid \quad \mathsf{fence-rel} \mid \mathsf{fence-acq} \mid \mathsf{fence-sc} \\ (BBlock) \quad B \quad ::= \quad c, B \mid \mathsf{jmp} \ l \mid \mathsf{call}(\mathsf{f},l_{ret}) \mid \mathsf{be} \ e, l_1, l_2 \mid \mathsf{return} \\ (Cdhp) \quad C \quad \in \quad Lab \rightarrow BBlock \qquad (FunDef) \quad Fd \quad ::= \quad (C,l) \\ (Code) \quad \pi \quad \in \quad Fid \rightarrow FunDef \\ (VarType) \quad \iota \quad \in \quad \mathcal{P}(Var) \\ (Prog) \quad \mathbb{P} \quad ::= \quad \mathbf{let} \ (\pi,\iota) \ \mathbf{in} \ \mathsf{f}_1 \parallel \cdots \parallel \mathsf{f}_n \\
```

Fig. 3. Syntax of concur-SimpRTL language

We define the syntax of the language in Fig. 3. The instantiation of the code  $\pi$  is defined as a partial mapping from the identifier of the function to the definition of the code heap. The definition of the function Fd is a tuple that includes the code heap C, and the entry l of the function. The code heap C is a set of the basic block B, which is a sequence of instructions. We define a set l to record the set of variables that can be performed atomic memory accesses. Achieving such set is not a difficult task. For example, the C programs use the keyword "\_Atomic" to present the variables that can perform atomic memory accesses; the Java programs have some classes for atomic memory accesses, such as "AtomicInteger", or we can use the keyword "volatile"; Rust also has a number of atomic types, such as "AtomicBool" and "AtomicU16". Technically, dividing the locations into the atomic locations and the non-atomic locations also plays an important role in proving that our thread-local simulation ensures the preservation of the promise certification under the write-write race freedom assumption. We explain such point in Sec. 8.3.

We instantiate the thread local state in Fig. 4. The instantiation of the thread local state  $\sigma$  is a tuple includes: the register file R, the current basic block B, the current code heap C, the continuation K and the set of functions  $\pi$ . The continuation K records the levels of function calls. Each level of continuation K is a tuple, which includes the register K to save the return value, and the register file K and the code heap K0 of the caller.

# 2.2 Thread-local transition

We define the thread-local transition of concur-SimpRTL language in Fig. 5, which has the form of " $\sigma \xrightarrow{te} \sigma'$ ". Some auxiliary definitions used in defining the thread local transition are shown below. The register file in the initial state of the execution of a function has the following form.

$$R_{\perp}$$
 ::=  $\lambda r.0$ 

(Cont) 
$$K ::= \epsilon \mid (R, B, C) :: K$$
 (RegFile)  $R \in Reg \rightarrow Val$  (ThrdLocSt)  $\sigma ::= (R, B, C, K, \pi)$ 

Fig. 4. Thread local state of concur-SimpRTL language

$$\frac{B = (r := e) :: B' \quad R' = R\{r \leadsto \llbracket e \rrbracket_R\}}{(R, B, C, K, \pi) \xrightarrow{T} (R', B', C, K, \pi)} \qquad \frac{B = (r := x_{o_r}) :: B' \quad R' = R\{r \leadsto v\}}{(R, B, C, K, \pi) \xrightarrow{T} (R', B', C, K, \pi)} \qquad \frac{B = (x_{o_w} := e) :: B' \quad \llbracket e \rrbracket_R = v}{(R, B, C, K, \pi) \xrightarrow{W(o_{v_w} x, v)} (R, B', C, K, \pi)} \qquad \frac{B = (c := c \times x_{o_r}) :: B' \quad R' = R\{r \leadsto v\}}{(R, B, C, K, \pi) \xrightarrow{W(o_{v_w} x, v)} (R', B', C, K, \pi)} \qquad \frac{B = (r := c \times x_{o_r}) :: B' \quad R' = R\{r \leadsto v\}}{(R, B, C, K, \pi) \xrightarrow{W(o_{v_w} x, v)} (R', B', C, K, \pi)} \qquad \frac{B = (r := x_{o_r}) :: B' \quad R' = R\{r \leadsto v\}}{(R, B, C, K, \pi) \xrightarrow{W(o_{v_w} x, v)} (R', B', C, K, \pi)} \qquad \frac{B = (r := x_{o_r}) :: B' \quad R' = R\{r \leadsto v\}}{(R, B, C, K, \pi) \xrightarrow{W(o_{v_w} x, v)} (R', B', C, K, \pi)} \qquad \frac{B = (r := x_{o_r}) :: B' \quad R' = R\{r \leadsto v\}}{(R, B, C, K, \pi) \xrightarrow{W(o_{v_w} x, v)} (R', B', C, K, \pi)} \qquad \frac{B = (r := x_{o_r}) :: B' \quad R' = R\{r \leadsto v\}}{(R, B, C, K, \pi) \xrightarrow{W(o_{v_w} x, v)} (R', B', C, K, \pi)} \qquad \frac{B = (r := x_{o_r}) :: B' \quad R' = R\{r \leadsto v\}}{(R, B, C, K, \pi) \xrightarrow{W(o_{v_w} x, v)} (R', B', C, K, \pi)} \qquad \frac{B = (r := x_{o_r}) :: B' \quad R' = R\{r \leadsto v\}}{(R, B, C, K, \pi) \xrightarrow{W(o_{v_w} x, v)} (R', B', C, K, \pi)} \qquad \frac{B = (r := x_{o_r}) :: B' \quad R' = R\{r \leadsto v\}}{(R, B, C, K, \pi) \xrightarrow{W(o_{v_w} x, v)} (R', B', C, K, \pi)} \qquad \frac{B = (r := x_{o_r}) :: B' \quad R' = R\{r \leadsto v\}}{(R, B, C, K, \pi) \xrightarrow{W(o_{v_w} x, v)} (R', B', C, K, \pi)} \qquad \frac{B = (r := x_{o_r}) :: B' \quad R' = R\{r \leadsto v\}}{(R, B, C, K, \pi) \xrightarrow{W(o_v x, v)} (R', B', C, K, \pi)} \qquad \frac{B = (r := x_{o_r}) :: B' \quad R' = R\{r \leadsto v\}}{(R, B, C, K, \pi) \xrightarrow{W(o_v x, v)} (R', B', C, K, \pi)} \qquad \frac{B = (r := x_{o_r}) :: B' \quad R' = R\{r \leadsto v\}}{(R, B, C, K, \pi) \xrightarrow{W(o_v x, v)} (R', B', C, K, \pi)} \qquad \frac{B = (r := x_{o_r}) :: B' \quad R' = R\{r \leadsto v\}}{(R, B, C, K, \pi) \xrightarrow{W(o_v x, v)} (R', B', C, K, \pi)} \qquad \frac{B = (r := x_{o_r}) :: B' \quad R' = R\{r \leadsto v\}}{(R, B, C, K, \pi) \xrightarrow{W(o_v x, v)} (R', B', C, K, \pi)} \qquad \frac{B = (r := x_{o_r}) :: B' \quad R' = R\{r \leadsto v\}}{(R, B, C, K, \pi) \xrightarrow{W(o_v x, v)} (R', B', C, K, \pi)} \qquad \frac{B = (r := x_{o_r}) :: B' \quad R' = R\{r \leadsto v\}}{(R, B, C, K, \pi) \xrightarrow{W(o_v x, v)} (R', B', C, K, \pi)} \qquad \frac{B = (r := x_{o_v x}) :: B' \quad R' = R\{r \leadsto v\}}{(R, B, C, K, \pi) \xrightarrow{W(o_v x, v)} (R', B', C, K, \pi)}$$

Fig. 5. Thread local transition of concur-SimpRTL language

We instantiate the initialization function.

$$\operatorname{Init}(\pi,\mathsf{f})\quad ::=\quad \left\{ \begin{array}{ll} (R_{\perp},B,C,\epsilon,\pi) & \quad \text{if } \pi(\mathsf{f})=(C,l) \text{ and } C(l)=B \\ \\ \text{undef} & \quad \text{otherwise} \end{array} \right.$$

Fig. 6. The memory defined as a message pool

#### 3 PROMISING SEMANTICS

In this section, we define the semantics of concur-SimpRTL language based on *promising semantics*, which is taken from Lee, et al. [11].

We first give the definition of the memory. The memory in promising semantics is defined as a message pool. We define the memory formally in Fig. 6. The message in the memory has two types: concrete message (in the form of " $\langle x : v@(f,t],V\rangle$ ") and reservation message (in the form of " $\langle x : (f,t]\rangle$ "). And We define the following notations for message.

$$m.var$$
 ::=  $x$  if  $m \in \{\langle x : @(\_, \_], \_\rangle, \langle x : (\_, \_]\rangle\}$ 
 $m.from$  ::=  $f$  if  $m \in \{\langle \_ : @(f, \_], \_\rangle, \langle \_ : (f, \_]\rangle\}$ 
 $m.to$  ::=  $t$  if  $m \in \{\langle \_ : @(\_, t], \_\rangle, \langle \_ : (\_, t]\rangle\}$ 
 $m.val$  ::=  $v$  if  $m = \langle \_ : v@(\_, \_], \_\rangle$ 
 $m.view$  ::=  $V$  if  $m = \langle \_ : @(\_, \_], V\rangle$ 

Two messages  $m_1$  and  $m_2$  are *disjoint*, denoted by  $m_1#m_2$ , if they have different locations or disjoint timestamp intervals:

$$m_1 \# m_2$$
 ::=  $m_1$ .var  $\neq m_2$ .var  $\vee$   $m_1$ .to  $< m_2$ .from  $\vee$   $m_2$ .to  $< m_1$ .from

Two sets  $M_1$  and  $M_2$  of messages are disjoint, denoted by  $M_1 \# M_2$ :

$$M_1 \# M_2 ::= \forall m_1 \in M_1, m_2 \in M_2. m_1 \# m_2.$$

We write M(x) for the sub-memory of the messages whose location is x, and  $\widetilde{M}$  for the set of concrete messages in M.

$$\begin{array}{lll} M(\mathsf{x}) & ::= & \{m \in M \mid m.\mathsf{var} = \mathsf{x}\} \\ & \widetilde{M} & ::= & \{m \in M \mid m = \langle\_: \_@(\_, \_], \_\rangle\} \end{array}$$

Given a timemap T and a memory M, we write  $T \in M$  if the views of memory locations in T are in memory M.

$$T \in M ::= \forall x \in Var. \exists m \in \widetilde{M}. T(x) = m.$$
to  
 $V \in M ::= V.T_{na} \in M \land V.T_{r|x} \in M$ 

$$(ThrdView) \quad \mathcal{V} \quad \in \quad \{(cur, acq, rel) \mid cur, acq \in View \\ \qquad \qquad \land rel \in Var \rightarrow View \\ \qquad \qquad \land (\forall \mathsf{x} \in Var. \ rel(\mathsf{x}) \leq cur \leq acq)\}$$

$$(ThrdState) \quad TS \quad \in \quad \{(\sigma, \mathcal{V}, P) \mid \forall \mathsf{m} \in P. \ \mathcal{V}. \mathsf{cur}. T_{\mathsf{rlx}}(\mathsf{m}. \mathsf{var}) < \mathsf{m}. \mathsf{to}\}$$

$$(Tid) \quad \mathsf{t} \quad \in \quad \mathbb{N}$$

$$(ThrdPool) \quad \mathcal{TP} \quad \in \quad Tid \rightarrow ThrdState$$

$$(World) \quad W \quad ::= \quad (\mathcal{TP}, \mathsf{t}, \mathcal{S}, M)^t$$

$$(MemOrdR) \quad o_r \quad ::= \quad \mathsf{na} \mid \mathsf{rlx} \mid \mathsf{acq} \quad (MemOrdW) \quad o_w \quad ::= \quad \mathsf{na} \mid \mathsf{rlx} \mid \mathsf{rel}$$

$$(ThrdEvt) \quad te \quad ::= \quad \tau \mid \mathsf{R}(o_r, \mathsf{x}, v) \mid \mathsf{W}(o_w, \mathsf{x}, v) \mid \mathsf{U}(o_r, o_w, \mathsf{x}, v_r, v_w) \mid \mathsf{F}_{\mathsf{rel}} \mid \mathsf{F}_{\mathsf{acq}} \mid \mathsf{F}_{\mathsf{sc}} \mid \mathsf{out}(v) \mid \mathsf{prm} \mid \mathsf{ccl} \mid \mathsf{rsv}$$

$$(ProgEvt) \quad pe \quad ::= \quad \tau \mid \mathsf{out}(v) \mid \mathsf{sw}$$

Fig. 7. Program state and Events

Inserting a new message into memory is defined below.

$$M \overset{\triangle}{\longleftarrow} m \quad ::= \left\{ \begin{array}{ll} M \cup \{m\} & \text{ if } \{m\} \# M, \\ & (m = \langle \mathtt{x} : \_@(f,t], \_\rangle \Longrightarrow \\ & \neg (\exists m' \in M.\ m'. \mathtt{var} = \mathtt{x} \land m'. \mathsf{from} = t)) \\ \mathsf{undef} & \textit{otherwise} \end{array} \right.$$

Splitting an existing message is defined below.

$$M \stackrel{\mathbb{S}}{\leftarrow} m ::= \begin{cases} (M \backslash m) \cup & \text{if} \langle \mathbf{x} : v@(f,t], V \rangle \in M, f \leq t' \leq t \\ \text{undef} & \text{otherwise} \end{cases}$$

$$\text{where } m = \langle \mathbf{x} : v@(f,t'], V' \rangle$$

The domain of the the memory is a set of pairs of the variable and the timestamp.

$$dom(M) ::= \{(x, t) \mid \exists m \in M. m. var = x \land m. to = t\}$$

We define the program state in Fig. 7. The thread state TS is a triple, which consists a local state  $\sigma$ , thread view V and promises P. The definition of the local state  $\sigma$  needs to be instantiated in practice. We also define the memory model o, the thread event te and the program event pe in Fig. 7. The program state is a tuple, including the thread pool  $\mathcal{TP}$ , the global timestamp  $\mathcal{S}$  used to depict the semantics of fence-sc, and the memory M. The program state should be well-defined as shown below.

$$\begin{aligned} \mathsf{wdSt}(\mathcal{TP},\mathcal{S},M) &::= & (\forall \mathsf{t} \in \mathsf{dom}(\mathcal{TP}). \ \mathcal{TP}(\mathsf{t}).P \subseteq M) \land \\ & (\forall \mathsf{t}_1,\mathsf{t}_2 \in \mathsf{dom}(\mathcal{TP}),\mathsf{t}_1 \neq \mathsf{t}_2. \ \mathcal{TP}(\mathsf{t}_1).P\#\mathcal{TP}(\mathsf{t}_2).P) \land \\ & (\forall \mathsf{t} \in \mathsf{dom}(\mathcal{TP}). \ \mathcal{TP}(\mathsf{t}).\mathcal{V} \in M) \land \mathcal{S} \in M \land (\forall m \in M. \ m. \mathsf{view} \in M) \end{aligned}$$

$$\begin{array}{c} \text{for any } i \in \{1, \dots, n\}. \ \operatorname{Init}(\pi, f_i) = \sigma_i \quad TS_i = (\sigma_i, V_L, \emptyset) \\ \hline \mathcal{TP} = \{1 \leadsto TS_1, \dots, n \leadsto TS_n\} \quad \mathbf{t} \in \{1, \dots, n\} \quad M = \{\langle \mathbf{x} : 0@(0, 0], V_L \rangle \mid \mathbf{x} \in Var\} \\ \hline \mathbf{let}(\pi, \iota) \ \mathbf{in} \ f_1 \parallel \dots \parallel f_n \stackrel{load}{\Longrightarrow} (\mathcal{TP}, \mathbf{t}, \lambda \mathbf{x}, 0, M)^{\iota} \\ \hline \\ \iota \vdash (\mathcal{TP}(\mathbf{t}), \mathcal{S}, M) \stackrel{+}{\longrightarrow} (TS', \mathcal{S}', M') \\ \hline (\mathcal{TP}, \mathbf{t}, \mathcal{S}, M)^{\iota} \stackrel{\tau}{\Longrightarrow} (\mathcal{TP}\{\mathbf{t} \leadsto TS'\}, \mathbf{t}, \mathcal{S}', M')^{\iota} \\ \hline \\ (\mathcal{TP}, \mathbf{t}, \mathcal{S}, M)^{\iota} \stackrel{\tau}{\Longrightarrow} (\mathcal{TP}\{\mathbf{t} \leadsto TS'\}, \mathbf{t}, \mathcal{S}', M')^{\iota} \\ \hline \\ \iota \vdash (\mathcal{TP}(\mathbf{t}), \mathcal{S}, M) \longrightarrow \mathbf{done} \\ \hline \\ \iota \vdash (\mathcal{TP}(\mathbf{t}), \mathcal{S}, M) \longrightarrow \mathbf{done} \\ \hline \\ \iota' \in \mathrm{dom}(\mathcal{TP}) \\ \hline \\ (\mathcal{TP}, \mathbf{t}, \mathcal{S}, M)^{\iota} \stackrel{sw}{\Longrightarrow} (\mathcal{TP} \setminus \{\mathbf{t}\}, \mathcal{S}, M)^{\iota} \\ \hline \\ (\mathcal{TP}, \mathbf{t}, \mathcal{S}, M)^{\iota} \stackrel{\tau}{\Longrightarrow} \mathbf{done} \\ \hline \\ (\mathcal{TP}, \mathbf{t}, \mathcal{S}, M)^{\iota} \stackrel{\tau}{\Longrightarrow} \mathbf{done} \\ \hline \\ (\mathcal{TP}, \mathbf{t}, \mathcal{S}, M)^{\iota} \stackrel{\tau}{\Longrightarrow} \mathbf{done} \\ \hline \\ (\mathcal{TP}, \mathbf{t}, \mathcal{S}, M)^{\iota} \stackrel{\tau}{\Longrightarrow} \mathbf{done} \\ \hline \\ (\mathcal{TP}, \mathbf{t}, \mathcal{S}, M)^{\iota} \implies \mathbf{done} \\ \hline \\ (\mathcal{TP}, \mathbf{t}, \mathcal{S}, M)^{\iota} \implies \mathbf{done} \\ \hline \\ (\mathcal{TP}, \mathbf{t}, \mathcal{S}, M)^{\iota} \implies \mathbf{done} \\ \hline \\ (\mathcal{TP}, \mathbf{t}, \mathcal{S}, M)^{\iota} \implies \mathbf{done} \\ \hline \\ (\mathcal{TP}, \mathbf{t}, \mathcal{S}, M)^{\iota} \implies \mathbf{done} \\ \hline \\ (\mathcal{TP}, \mathbf{t}, \mathcal{S}, M)^{\iota} \implies \mathbf{done} \\ \hline \\ (\mathcal{TP}, \mathbf{t}, \mathcal{S}, M)^{\iota} \implies \mathbf{done} \\ \hline \\ (\mathcal{TP}, \mathbf{t}, \mathcal{S}, M)^{\iota} \implies \mathbf{done} \\ \hline \\ (\mathcal{TP}, \mathbf{t}, \mathcal{S}, M)^{\iota} \implies \mathbf{done} \\ \hline \\ (\mathcal{TP}, \mathbf{t}, \mathcal{S}, M)^{\iota} \implies \mathbf{done} \\ \hline \\ (\mathcal{TP}, \mathbf{t}, \mathcal{S}, M)^{\iota} \implies \mathbf{done} \\ \hline \\ (\mathcal{TP}, \mathbf{t}, \mathcal{S}, M)^{\iota} \implies \mathbf{done} \\ \hline \\ (\mathcal{TP}, \mathbf{t}, \mathcal{S}, M)^{\iota} \implies \mathbf{done} \\ \hline \\ (\mathcal{TP}, \mathbf{t}, \mathcal{S}, M)^{\iota} \implies \mathbf{done} \\ \hline \\ (\mathcal{TP}, \mathbf{t}, \mathcal{S}, M)^{\iota} \implies \mathbf{done} \\ \hline \\ (\mathcal{TP}, \mathbf{t}, \mathcal{S}, M)^{\iota} \implies \mathbf{done} \\ \hline \\ (\mathcal{TP}, \mathbf{t}, \mathcal{S}, M)^{\iota} \implies \mathbf{done} \\ \hline \\ (\mathcal{TP}, \mathbf{t}, \mathcal{S}, M)^{\iota} \implies \mathbf{done} \\ \hline \\ (\mathcal{TP}, \mathbf{t}, \mathcal{S}, M)^{\iota} \implies \mathbf{done} \\ \hline \\ (\mathcal{TP}, \mathbf{t}, \mathcal{S}, M)^{\iota} \implies \mathbf{done} \\ \hline \\ (\mathcal{TP}, \mathbf{t}, \mathcal{S}, M)^{\iota} \implies \mathbf{done} \\ \hline \\ (\mathcal{TP}, \mathbf{t}, \mathcal{S}, M)^{\iota} \implies \mathbf{done} \\ \hline \\ (\mathcal{TP}, \mathbf{t}, \mathcal{S}, M)^{\iota} \implies \mathbf{done} \\ \hline \\ (\mathcal{TP}, \mathbf{t}, \mathcal{S}, M)^{\iota} \implies \mathbf{done} \\ \hline \\ (\mathcal{TP}, \mathbf{t}, \mathcal{S}, M)^{\iota} \implies \mathbf{done} \\ \hline \\ (\mathcal{TP}, \mathbf{t}, \mathcal{S}, M)^{\iota} \implies \mathbf{done} \\ \hline \\ (\mathcal{TP}, \mathbf{t}, \mathcal{S}, M)^{\iota} \implies \mathbf{done} \\ \hline \\ (\mathcal{TP}, \mathbf{t},$$

Fig. 8. Machine step

We start to present the definition of *promising semantics* taken from Lee, et al. [11]. We first give the machine step in Fig. 8. The (Load) rule presents the initialization of the program. The initial view  $V_{\perp}$  is defined below.

$$V_{\perp} ::= (\lambda x.0, \lambda x.0)$$

Then, we define the thread view in the initial state.

$$V_{\perp}$$
 ::=  $(V_{\perp}, V_{\perp}, \lambda x. V_{\perp})$ 

consistent( $TS, M, \iota$ ) holds, iff for any  $M_c \in \widehat{M}$ ,

$$\exists TS'. \ \iota \vdash (TS, \widehat{T}(M), M_c) \longrightarrow^* (TS', , ) \land TS'.P = \emptyset$$

where  $\widehat{M}$  is the method to construct the *capped memory*, which is proposed by Cho et al. [5]. We present the definition of constructing *capped memory* in Appendix A.

$$\{ \mathbf{x} @ t \} \quad ::= \quad V_{\perp} \{ \mathbf{x} \leadsto t \} \qquad \qquad T_1 \sqcup T_2 \quad ::= \quad \{ \mathbf{x} \leadsto t \mid t = \max(T_1(\mathbf{x}), T_2(\mathbf{x})) \}$$

$$V_1 \sqcup V_2 \quad ::= \quad (V_1.T_{\mathsf{na}} \sqcup V_2.T_{\mathsf{rlx}}, V_1.T_{\mathsf{na}} \sqcup V_2.T_{\mathsf{rlx}})$$

$$o = \text{na} \implies cur.T_{\text{na}}(\mathsf{x}) \leq t \\ o \in \{\text{rlx}, \text{ra}\} \implies cur.T_{\text{rlx}}(\mathsf{x}) \leq t \\ cur' = cur \sqcup V \implies acq' = acq \sqcup cur' \\ cur' = cur \sqcup V \sqcup (o \sqsupseteq \text{ra}? V_r) \\ acq' = acq \sqcup V \sqcup (o \sqsupseteq \text{rlx}? V_r) \\ \text{where } V = (o \sqsupseteq \text{rlx}? \times @t : V_\perp, \times @t) \\ \hline (cur, acq, rel) \xrightarrow{\text{R:}o, \times, t, V_r} (cur', acq', rel) \\ \hline (cur, acq, rel) \xrightarrow{\text{R:}o, \times, t, V_r} (cur', acq', rel) \\ \hline (cur, acq, rel) \xrightarrow{\text{M:}o, \times, t, V_r, V_w} (cur', acq', rel') \\ \hline (cur, acq, rel) \xrightarrow{\text{M:}o, \times, t, V_r, V_w} (cur', acq', rel') \\ \hline (cur, acq, rel) \xrightarrow{\text{M:}o, \times, t, V_r, V_w} (cur', acq', rel') \\ \hline (cur, acq, rel) \xrightarrow{\text{M:}o, \times, t, V_r, V_w} (cur', acq', rel') \\ \hline (cur, acq, rel) \xrightarrow{\text{M:}o, \times, t, V_r, V_w} (cur', acq', rel') \\ \hline (cur, acq, rel) \xrightarrow{\text{M:}o, \times, t, V_r, V_w} (cur', acq', rel') \\ \hline (cur, acq, rel) \xrightarrow{\text{M:}o, \times, t, V_r, V_w} (cur', acq', rel') \\ \hline (cur, acq, rel) \xrightarrow{\text{M:}o, \times, t, V_r, V_w} (cur', acq', rel') \\ \hline (cur, acq, rel) \xrightarrow{\text{M:}o, \times, t, V_r, V_w} (cur', acq', rel') \\ \hline (cur, acq, rel) \xrightarrow{\text{M:}o, \times, t, V_r, V_w} (cur', acq', rel') \\ \hline (cur, acq, rel) \xrightarrow{\text{M:}o, \times, t, V_r, V_w} (cur', acq', rel') \\ \hline (cur, acq, rel) \xrightarrow{\text{M:}o, \times, t, V_r, V_w} (cur', acq', rel') \\ \hline (cur, acq, rel) \xrightarrow{\text{M:}o, \times, t, V_r, V_w} (cur', acq', rel') \\ \hline (cur, acq, rel) \xrightarrow{\text{M:}o, \times, t, V_r, V_w} (cur', acq', rel') \\ \hline (cur, acq, rel) \xrightarrow{\text{M:}o, \times, t, V_r, V_w} (cur', acq', rel') \\ \hline (cur, acq, rel) \xrightarrow{\text{M:}o, \times, t, V_r, V_w} (cur', acq', rel') \\ \hline (cur, acq, rel) \xrightarrow{\text{M:}o, \times, t, V_r, V_w} (cur', acq', rel') \\ \hline (cur, acq, rel) \xrightarrow{\text{M:}o, \times, t, V_r, V_w} (cur', acq', rel') \\ \hline (cur, acq, rel) \xrightarrow{\text{M:}o, \times, t, V_r, V_w} (cur', acq', rel') \\ \hline (cur, acq, rel) \xrightarrow{\text{M:}o, \times, t, V_r, V_w} (cur', acq', rel') \\ \hline (cur, acq, rel) \xrightarrow{\text{M:}o, \times, t, V_r, V_w} (cur', acq', rel') \\ \hline (cur, acq, rel) \xrightarrow{\text{M:}o, \times, t, V_r, V_w} (cur', acq', rel') \\ \hline (cur, acq, rel) \xrightarrow{\text{M:}o, \times, t, V_r, V_w} (cur', acq', rel') \\ \hline (cur, acq, rel) \xrightarrow{\text{M:}o, \times, t, V_r, V_w} (cur', acq', rel') \\ \hline (cur, acq, rel) \xrightarrow{\text{M:}o, \times, t, V_r, V_w} (cur', acq', rel') \\ \hline (cur, acq, rel) \xrightarrow{\text{M:}o, \times, t, V_r, V_w} (cur', acq',$$

Fig. 9. Auxiliary definitions for thread local step

We present the thread-local step in Fig. 10. Here, we do not show the semantics of fence operations. Some auxiliary definitions used in defining the thread-local step are defined in Fig. 9. The notation ra represents that the memory order is either write release or read acquire.

ra ::= rel | acq ra 
$$\supset$$
 rlx  $\supset$  na

And we define the following notation to represent the thread-local step without observable events.

$$\longrightarrow \in \{ \xrightarrow{te} \mid te \neq \text{out}(v) \}$$

We define the abort-step in the following. A thread step is abort, if it either can not take a thread local transistion, or accesses the non-atomic location by atomic memory accesses, or accesses the atomic location by non-atomic memory accesses.

$$\iota \vdash ((\sigma, \mathcal{V}, P), \mathcal{S}, M) \longrightarrow \mathbf{abort} \quad ::= \\ \neg ((\exists \sigma'. \sigma \longrightarrow \sigma') \lor (\sigma \longrightarrow \mathbf{done})) \lor (\exists \sigma', \mathsf{x}. \sigma \xrightarrow{\mathsf{U}(\_,\_,\mathsf{x},\_,\_)} \sigma' \land \mathsf{x} \notin \iota) \\ (\exists \sigma', \mathsf{x}, o, v. (\sigma \xrightarrow{\mathsf{R}(o,\mathsf{x},v)} \sigma' \lor \sigma \xrightarrow{\mathsf{W}(o,\mathsf{x},v)} \sigma') \land ((o = \mathsf{na} \land \mathsf{x} \in \iota) \lor (o \neq \mathsf{na} \land \mathsf{x} \notin \iota)))$$

We define the condition that the current thread is done.

$$\iota \vdash ((\sigma, \mathcal{V}, P), \mathcal{S}, M) \longrightarrow \mathbf{done} ::= \sigma \longrightarrow \mathbf{done} \land P = \emptyset$$

$$\frac{\sigma \overset{\tau}{\to} \sigma'}{\iota + ((\sigma, \mathcal{V}, P), S, M)} \overset{\tau}{\to} ((\sigma', \mathcal{V}, P), S, M)} \text{ (Silent)} \qquad \frac{\sigma \overset{\text{out}(v)}{\to} \sigma' \quad (\mathcal{V}, S) \overset{F_{\text{les}}}{\to} (\mathcal{V}', S')}}{\iota + ((\sigma, \mathcal{V}, P), S, M) \overset{\text{out}(v)}{\to} ((\sigma', \mathcal{V}', \emptyset), S', M)} \overset{\text{out}(v)}{\to} ((\sigma', \mathcal{V}', \emptyset), S', M)} \text{ (Output)}$$

$$\frac{\sigma \overset{\text{R}(o_r, x, v)}{\to} \sigma'}{\langle x : v @_{(-, t]}, V_r \rangle \in M} \qquad \sigma \overset{\text{R}(o_r, x, v)}{\to} ((\sigma_r - \text{na} \land x \notin v) \lor (o_r \neq \text{na} \land x \in v)} \text{ (Read)} \qquad \frac{\sigma \overset{\text{W}(o_w, x, v)}{\to} \sigma'}{(\sigma_w = \text{na} \land x \notin v) \lor (o_r \neq \text{na} \land x \in v)} \text{ (Read)} \qquad \frac{\sigma \overset{\text{W}(o_w, x, v)}{\to} (\mathcal{V}', \mathcal{V}', \mathcal{V}', \mathcal{V}')}{(\sigma_w = \text{na} \land x \notin v) \lor (o_w \neq \text{na} \land x \in v)} \text{ (Write)}$$

$$\frac{\sigma \overset{\text{U}(o_r, o_w, x, v_r, v_w)}{\to} \sigma'}{\circ \sigma_r \in \{\text{rl}, \text{ acq}\}} \overset{\text{O}_w}{\to} \{\text{rl}, \text{ rel}\} & x \notin v_r \text{ (log}(t_r, t_w), v_w)} \text{ (Write)}$$

$$\frac{\sigma \overset{\text{U}(o_r, o_w, x, v_r, v_w)}{\to} \sigma'}{\circ \sigma_r \in \{\text{rl}, \text{ acq}\}} \overset{\text{O}_w}{\to} \{\text{rl}, \text{ rel}\} & x \notin v_r \text{ (log}(t_r, t_w), v_w)} \text{ (Update)}$$

$$\frac{\sigma \overset{\text{U}(o_r, o_w, x, v_r, v_w)}{\to} ((\sigma, v, v_r, v_w), v_w)} \text{ (Update)}$$

$$\frac{\sigma \overset{\text{Red}}{\to} \sigma' \quad \text{rel}' = \lambda_r \text{ cur} \quad \forall m \in P, m. \text{view} = V_\perp}{\bullet v_r \in (\sigma, v_r, v_r, v_w)} \text{ (Update)}$$

$$\frac{\sigma \overset{\text{Fest}}{\to} \sigma' \quad \text{rel}' = \lambda_r \text{ cur} \quad \forall m \in P, m. \text{view} = V_\perp}{\bullet v_r \in (\sigma, v_r, v_r, v_w)} \text{ (Rel-Fence)}$$

$$\frac{\sigma \overset{\text{Fest}}{\to} \sigma' \quad \text{rel}' = \lambda_r \text{ cur} \quad \forall m \in P, m. \text{view} = V_\perp}{\bullet v_r \in (\sigma, v_r, v_r, v_w)} \text{ (Corr, (cur, acq, rel), P), S, M)} \text{ (Promise)}$$

$$\frac{\sigma \overset{\text{Fest}}{\to} \sigma' \quad \text{rel}' = \lambda_r \text{ cur} \quad \forall m \in P, m. \text{view} = V_\perp}{\bullet v_r \in (\sigma, v_r, v_r, v_w)} \text{ (Corr, (cur, acq, rel), P), S, M)} \text{ (Promise)}$$

$$\frac{\sigma \overset{\text{Fest}}{\to} \sigma' \quad \text{cur}' = \text{acq}}{\bullet v_r \in (\sigma, v_r, v_r, v_w)} \text{ (Corr, (cur, acq, rel), P), S, M)} \text{ (Promise)}$$

$$\frac{\sigma \overset{\text{Fest}}{\to} \sigma' \quad \text{cur}' = \text{acq}}{\bullet v_r \in (\sigma, v_r, v_r, v_w)} \text{ (Corr, (cur, acq, rel), P), S, M)} \text{ (Reserve)}$$

$$\frac{\sigma \overset{\text{Fest}}{\to} \sigma' \quad \text{cur}' = \text{acq}}{\bullet v_r \in (\sigma, v_r, v_r, v_w)} \text{ (Corr, (cur, acq, rel), P), S, M)} \text{ (Cancel)}$$

$$\frac{\sigma \overset{\text{Fest}}{\to} \sigma' \quad \text{cur}' = \text{acq}} \sigma' \quad \text{cur}' = \text{acq} \sigma' \quad \text{cur}' = \text{acq} \sigma' \text{cur}' = \text{acq} \sigma' \text{cur}' = \text{ac$$

Fig. 10. Thread step

*Behaviors of promising semantics.* We use the event trace to depict the behaviors of programs under promising semantics.

$$(EvtTrace) \quad \mathcal{B} \ ::= \ \mathbf{done} \ | \ \mathbf{abort} \ | \ \epsilon \ | \ \mathrm{out}(v) :: \mathcal{B}$$
 
$$ProgEtr(\mathbb{P},\mathcal{B}) \ \text{iff} \ \exists W,n. \ (\mathbb{P} \overset{load}{\Longrightarrow} W) \land Etr^n(W,\mathcal{B})$$
 
$$\frac{W \Longrightarrow \mathbf{abort}}{Etr^{n+1}(W,\mathbf{abort})} \quad \frac{W \Longrightarrow \mathbf{done}}{Etr^{n+1}(W,\mathbf{done})}$$
 Proc. ACM Meas. Anal. Comput. Syst., Vol. 37, No. 4, Article 111. Publication date: August 2018. 
$$\underline{W \overset{out(v)}{\Longrightarrow} W' \quad Etr^n(W',\mathcal{B})} \quad \underline{W \overset{\tau/\operatorname{sw}}{\Longrightarrow} W' \quad Etr^n(W',\mathcal{B})} \quad \underline{W \overset{\tau/\operatorname{sw}}{\Longrightarrow} W' \quad Etr^n(W',\mathcal{B})}$$
 
$$\underline{Etr^{n+1}(W,\mathrm{out}(v) :: \mathcal{B})} \quad \underline{W \overset{\tau/\operatorname{sw}}{\Longrightarrow} W' \quad Etr^n(W',\mathcal{B})}$$

$$\mathcal{TP}(\mathsf{t}) = (\sigma, \mathcal{V}, P) \quad \sigma \xrightarrow{W(\mathsf{na}, \mathsf{x}, \_)} \_$$

$$\langle \mathsf{x} : v@(\_, t], \_\rangle \in (M \setminus P) \quad \mathcal{V}.\mathsf{cur}.T_{\mathsf{rlx}}(\mathsf{x}) < t$$

$$(\mathcal{TP}, \mathsf{t}, \mathcal{S}, M)^{\iota} \Longrightarrow \mathsf{ww}\text{-Race}$$

$$\mathbb{P} \xrightarrow{load} W \quad W \Longrightarrow^* W' \quad W' \Longrightarrow \mathsf{ww}\text{-Race}$$

$$\mathbb{P} \Longrightarrow \mathsf{ww}\text{-Race}$$

$$\mathsf{ww}\text{-RF}(\mathbb{P}) ::= \neg(\mathbb{P} \Longrightarrow \mathsf{ww}\text{-Race})$$

Fig. 11. Write-write race freedom under promising semantics



Fig. 12. The view of the thread t<sub>2</sub> to the location of the variable x (write-write race)

#### 4 WRITE-WRITE RACE FREEDOM

In this section, we define the write-write race freedom under promising semantics in Fig. 11. A program is write-write race free if it does not contain the write-write race. We explain what is the write-write race in the program. Consider the following program.

$$x_{na} := 8;$$
  $x_{rel} := 1;$   $x_{rel} := 1;$   $x_{rel} := 2;$ 

For this program, we call the thread on the left side t<sub>1</sub> and the thread on the right side t<sub>2</sub>. This program contains write-write race, since the execution of " $x_{na} := 8$ " in the thread  $t_1$  and the execution of " $x_{na} := 2$ " in the thread t<sub>2</sub> does not have happen-before relation. The happen-before relation means that, when the thread t<sub>2</sub> executes " $x_{na} := 2$ ", it does not know whether the execution of " $x_{na} := 8$ " in the thread  $t_1$  has been done or not as shown in Fig. 12. The execution of " $x_{na} := 2$ " in the thread  $t_2$  can insert the message before the message valued 8, which means that the execution of " $x_{na} := 2$ " is done before the execution of " $x_{na} := 8$ ", or it can insert the message after the message valued 8, which means that the execution of " $x_{na} := 2$ " is done after the execution of " $x_{na} := 8$ ".

#### 5 PROOF GOAL

In this section, we formulate the correctness of optimizers, which optimize program under promising semantics. The optimizer is a tranformation from the source program to the target program.

(Optimizer) Optimizer 
$$\in$$
 (Code  $\times$  VarType)  $\rightarrow$  Code

The correctness of the optimizers is defined in Def. 5.1.

Definition 5.1 (correctness of optimizers).

Correct(Optimizer) 
$$\triangleq \forall \pi_s, \pi_t, \iota$$
. Optimizer $(\pi_s, \iota) = \pi_t \land \text{ww-RF}(\mathbf{let}(\pi_s, \iota) \text{ in } f_1 \parallel \cdots \parallel f_n) \land \text{Safe}(\mathbf{let}(\pi_s, \iota) \text{ in } f_1 \parallel \cdots \parallel f_n) \land \text{Safe}(\mathbf{let}(\pi_t, \iota) \text{ in } f_1 \parallel \cdots \parallel f_n) \land \text{ww-RF}(\mathbf{let}(\pi_t, \iota) \text{ in } f_1 \parallel \cdots \parallel f_n) \land \text{Safe}(\mathbf{let}(\pi_t, \iota) \text{ in } f_1 \parallel \cdots \parallel f_n))$ 

Proving the correctness of the optimizer, which consists of multiple optimization passes, can be obtained by the transitive of the definition of the correct optimizer. We first define the composition of two optimizers below.

$$(\mathsf{Optimizer}_1 \circ \mathsf{Optimizer}_2)(\pi_s, \iota) \ \triangleq \ \left\{ \begin{array}{ll} \mathsf{Optimizer}_2(\pi_m, \iota) & \textit{if } \pi_m = \mathsf{Optimizer}_1(\pi_s, \iota) \\ \mathsf{undef} & \textit{otherwise} \end{array} \right.$$

We show the transitive of correct optimizer in Lemma. 5.2.

Lemma 5.2 (correct optimizer transitive).

$$\forall \mathsf{Optimizer}_1, \mathsf{Optimizer}_2.$$

$$(\mathsf{Correct}(\mathsf{Optimizer}_1) \land \mathsf{Correct}(\mathsf{Optimizer}_2))$$

$$\implies \mathsf{Correct}(\mathsf{Optimizer}_1 \circ \mathsf{Optimizer}_2)$$

The definition of the data race freedom will be introduced in Sec. 4. Below, we define the program safety. The execution of a safe program will not abort.

$$\mathsf{Safe}(W) \triangleq \neg(\exists W'. \ W \Longrightarrow^* W' \land W' \Longrightarrow \mathbf{abort})$$
$$\mathsf{Safe}(\mathbb{P}) \triangleq (\exists W. \ \mathbb{P} \stackrel{load}{\Longrightarrow} W) \land (\forall W. \ (\mathbb{P} \stackrel{load}{\Longrightarrow} W) \Longrightarrow \mathsf{Safe}(W))$$

For example, the following program is not safe, since we do an atomic memory access on the non-atomic location.

$$x_{rlx} := 3$$
 (where  $\iota(x) = na$ )

The following program is also not safe, since the execution of the program accesses the undefined variable.

$$x_{rlx} := 3$$
 (where  $x \notin dom(\iota)$ )

However, a safe program in our work does not mean that the program can always execute the current instruction successfully. Consider the following example.

$$CAS_{rlx,rlx}(x, 0, 1); \quad || \quad y_{rlx} := 1;$$

The above program satisfies our safe program definition straight-forwardly. However, it does not mean that such program can always execute its current instruction successfully. Consider the following execution.

|     | $t_1$                  |   | $t_2$                               |
|-----|------------------------|---|-------------------------------------|
| (1) |                        |   | reserve $\langle x : (0,2] \rangle$ |
| (2) | $CAS_{rlx,rlx}(x,0,1)$ | ? |                                     |

Since the thread  $t_2$  may reserve the timestamps that will be used by the execution of the instruction CAS<sub>rlx,rlx</sub>(x, 0, 1) in the thread  $t_1$ , the thread  $t_1$  can not execute its current instruction successfully.

We will prove that constant propagation (ConstProp), dead code elimination (DCE), common subexpression elimination (CSE) and loop invariant code motion (LICM) satisfy the definition of correct optimizer defined in Def. 5.1.

THEOREM 5.3 (CORRECT OPTIMIZERS).

$$Correct(ConstProp) \land Correct(DCE) \land Correct(CSE) \land Correct(LICM)$$

We will give the implementations of these optimizers in the following sections.

The thread local upward simulation will be defined in the following section. as the form of "I,  $\iota \models \pi_t \preccurlyeq \pi_s$ ". We need to prove that the results of constant propagation and dead code elimination optimizations satisfies such simulation.

Definition 5.4 (Well-formed optimizer).

wfOpt(Optimizer) 
$$\triangleq \forall \pi_t, \pi_s, \iota$$
. Optimizer $(\pi_s, \iota) = \pi_t \implies \exists I$ .  $I, \iota \models \pi_t \preccurlyeq \pi_s$ 

The definition of the well-formed optimizer shows the correctness of the step ② in Fig. 2.

LEMMA 5.5 (Well-formed optimizer implies correct optimizer).

∀Optimizer. wfOpt(Optimizer) ⇒ Correct(Optimizer)

## 6 NON-PREEMPTIVE SEMANTICS

We try to define the non-preemptive semantics, which is equivalent to promising semantics.

In the following introduction, we use the "atom { C }" to represent that the execution of C will not be interrupted by other threads and the promise steps. We consider which program points are permitted to do thread switching in the following. We call the thread on the left side  $t_1$  and the thread on the right side  $t_2$  in the following introduction.

(1) After the execution of the atomic memory accesses: Permitting thread switching after the atomic (release) memory write is essential. Consider the program shown below.

$$egin{array}{lll} x_{na} := 2; & & & & r_1 := y_{acq}; \\ y_{rel} := 1; & & & f(r_1) \ & & & r_2 := x_{na}; \\ & & & & while(true); \ & & & & while(true); \end{array}$$

The execution of the above program shown below demonstrates that it is essential to permit thread switching after atomic (release) write.

| 1 | $t_1: x_{na} := 2$        |
|---|---------------------------|
| 2 | $t_1:\;y_{rel}:=1$        |
| 3 | $t_2: r_1 := y_{rel} //1$ |
| 4 | $t_2: r_2 := x_{na} //2$  |
| 5 | $t_2: while(true)$        |

However, we do not find that thread switching after atomic read and relaxed read/write is essential. For simple representation, we still permit the thread switching after the execution of the atomic read.

(2) After the execution of the promise step: Permitting the thread switching after the execution of the promise step is also essential, since one thread will read the promise generated by the other thread. Consider the following example.

$$egin{aligned} r_1 &:= \mathsf{x}_{\mathsf{r} | \mathsf{x}}; \ \mathsf{y}_{\mathsf{r} | \mathsf{x}} &:= 1; \end{aligned} egin{aligned} & \mathsf{do} \ \{ & r_2 &:= \mathsf{y}_{\mathsf{r} | \mathsf{x}}; \ \} & \mathsf{while}(r_2 &:= 1); \end{aligned}$$

We consider the following execution. and call the thread on the left side  $t_1$  and the thread on the right side  $t_2$ . The thread  $t_1$  promises " $z_{rlx}$  := 1", and the thread  $t_2$  reads such promise and loops forever. It seems that the only way to define a non-preemptive semantics is to allow thread switch after promise step.

| 1 | $t_1$ : promise $\langle y : 1@(1,2], \{y@1\} \rangle$ |
|---|--------------------------------------------------------|
| 2 | $t_2: r_2 := y_{rlx} //1$                              |
| 3 | $t_2: r_2 := y_{rlx} //1$                              |
| 4 | $t_2: r_2 := y_{rlx} //1$                              |
|   | •••                                                    |

Fig. 13. Syntax and State of Non-preemptive semantics

(3) After the execution of the fence operations: We find that permit thread switching after the execution of fence operations is also essential. Consider the example shown below.

$$egin{array}{lll} x_{na} &:= 8; & & & & & \\ fence-rel; & & & r_2 &:= z_{rlx}; \\ r_1 &:= y_{rlx}; & & & y_{rlx} &:= r_2; \\ z_{rlx} &:= 1; & & & & \end{array}$$

The above program may generate the following execution.

| 1 | $t_1: x_{na} := 8$                                   |
|---|------------------------------------------------------|
| 2 | $t_1:$ fence-rel                                     |
| 3 | $t_1 : promise \langle z : 1@(2,3], \{x@3\} \rangle$ |
| 4 | $t_2: r_2 := z_{rlx} //1$                            |
| 5 | $t_2:\;y_{rlx}:=r_2$                                 |
| 6 | $t_1: r_1 := y_{rlx} //1$                            |
| 7 | $t_1: z_{rlx} := 1 $ (* fulfill promise *)           |

Consider whether the example shown above is equivalent to the following program.

```
\begin{array}{ll} \text{atom } \{ & & \\ & x_{\text{na}} := 8; \\ & \text{fence-rel}; \\ & r_1 := y_{\text{rlx}}; \\ \} & \\ z_{\text{rlx}} := 1; \end{array} \qquad \begin{array}{ll} r_2 := z_{\text{rlx}}; \\ & y_{\text{rlx}} := r_2; \end{array}
```

The answer is wrong. Since promising semantics does not permit that the atomic memory access after the fence release in the program order promises before the execution of the fence release, the program need to exit atomic block after the execution of the fence release.

Permitting thread switching after the execution of the acquire fence seems not essential. But for simple presentation, we still permit thread switching after the acquire fence execution.

(4) After generating observable events and the current thread done: Such two cases is straight-forward.

*Non-preemptive semantics.* We show the definition of the non-preemptive semantics in the following. consistent  $NP(TS, M, \beta, \iota)$  holds, iff for any  $M_c \in \widehat{M}$ ,

$$\exists \mathit{TS'}.\ \iota \vdash (\mathit{TS},\widehat{\mathit{T}}(M),M_c,\beta) \longmapsto^* (\mathit{TS'},\_,\_,\_) \ \land \ \mathit{TS'}.P = \emptyset$$

$$\begin{array}{lll} (NA) & na & \in & \{\tau, \mathsf{W}(\mathsf{na}, \_, \_), \mathsf{R}(\mathsf{na}, \_, \_)\} \\ (PRC) & prc & \in & \{\mathsf{prm}, \mathsf{rsv}, \mathsf{ccl}\} \\ (AT) & at & \in & \{te \mid te \notin (NAtm \cup PRC)\} \\ \\ \iota \vdash (TS, \mathcal{S}, M, \beta) \stackrel{te}{\longrightarrow} (TS', \mathcal{S}', M', \beta') & \coloneqq \\ & \iota \vdash (TS, \mathcal{S}, M) \stackrel{te}{\longrightarrow} (TS', \mathcal{S}', M') \wedge \\ & (te \in \{\mathsf{prm}, \mathsf{rsv}\} & \Longrightarrow \beta = \beta' = \circ) \wedge (te = \mathsf{ccl} \implies \beta = \beta') \wedge \\ & (te \in NA \implies \beta' = \bullet) \wedge (te \in AT \implies \beta' = \circ) \end{array}$$

Fig. 14. Auxiliary definitions in auxiliary promising semantics

$$\begin{array}{c} \text{for any } i \in \{1, \dots, n\}. \ \operatorname{Init}(\pi, \mathsf{f}_i) = \sigma_i \quad TS_i = (\sigma_i, \mathcal{V}_\perp, \emptyset) \\ \hline \mathcal{TP} = \{1 \leadsto TS_1, \dots, n \leadsto TS_n\} \quad \mathsf{t} \in \{1, \dots, n\} \quad M = \{\langle \mathsf{x} : v@(0, 0], V_\perp \rangle \mid \mathsf{x} \in \mathit{Var}\} \\ \hline \\ \mathbf{let} \ (\pi, \iota) \ \mathbf{in} \ \mathsf{f}_1 \mid \dots \mid \mathsf{f}_n \ \stackrel{load}{\Longrightarrow} \ (\mathcal{TP}, \mathsf{t}, \lambda \mathsf{x}. \, 0, M, \circ)^\iota \\ \hline \\ \iota \vdash (\mathcal{TP}(\mathsf{t}), S, M, \beta) \longmapsto^+ (TS', S', M', \beta') \\ \hline (\mathcal{TP}, \mathsf{t}, S, M, \beta)^\iota :\stackrel{\tau}{\Longrightarrow} (\mathcal{TP}\{\mathsf{t} \leadsto TS'\}, \mathsf{t}, S', M', \beta')^\iota \\ \hline \\ \frac{\mathsf{t} \vdash (\mathcal{TP}(\mathsf{t}), S, M, \beta)^\iota :\stackrel{\mathsf{sub}}{\Longrightarrow} (\mathcal{TP}\{\mathsf{t} \leadsto TS'\}, \mathsf{t}, S', M', \circ)^\iota \\ \hline \\ \frac{\mathsf{t} \vdash (\mathcal{TP}(\mathsf{t}), S, M, \circ)^\iota :\stackrel{\mathsf{sw}}{\Longrightarrow} (\mathcal{TP}, \mathsf{t}', S, M, \circ)^\iota \\ \hline \\ (\mathcal{TP}, \mathsf{t}, S, M, \beta)^\iota :\stackrel{\mathsf{sw}}{\Longrightarrow} (\mathcal{TP}, \mathsf{t}', S, M, \circ)^\iota \\ \hline \\ \frac{\iota \vdash (\mathcal{TP}(\mathsf{t}), S, M, \beta)^\iota :\stackrel{\mathsf{sw}}{\Longrightarrow} (\mathcal{TP}, \mathsf{t}', S, M, \beta)^\iota :\stackrel{\mathsf{t}}{\Longrightarrow} \mathsf{done} \\ \\ \iota \vdash (\mathcal{TP}(\mathsf{t}), S, M, \beta) \longmapsto^* (TS', S', M', \beta') \\ \hline \\ \iota \vdash (\mathcal{TP}, \mathsf{t}, S, M, \beta)^\iota :\Longrightarrow \mathsf{done} \\ \hline \\ (\mathcal{TP}, \mathsf{t}, S, M, \beta)^\iota :\Longrightarrow \mathsf{done} \\ \hline \\ (\mathcal{TP}, \mathsf{t}, S, M, \beta)^\iota :\Longrightarrow \mathsf{done} \\ \hline \\ (\mathcal{TP}, \mathsf{t}, S, M, \beta)^\iota :\Longrightarrow \mathsf{done} \\ \hline \\ (\mathcal{TP}, \mathsf{t}, S, M, \beta)^\iota :\Longrightarrow \mathsf{done} \\ \hline \\ (\mathcal{TP}, \mathsf{t}, S, M, \beta)^\iota :\Longrightarrow \mathsf{done} \\ \hline \\ (\mathcal{TP}, \mathsf{t}, S, M, \beta)^\iota :\Longrightarrow \mathsf{done} \\ \hline \\ (\mathcal{TP}, \mathsf{t}, S, M, \beta)^\iota :\Longrightarrow \mathsf{done} \\ \hline \\ \\ (\mathcal{TP}, \mathsf{t}, S, M, \beta)^\iota :\Longrightarrow \mathsf{done} \\ \hline \\ (\mathcal{TP}, \mathsf{t}, S, M, \beta)^\iota :\Longrightarrow \mathsf{done} \\ \hline \\ (\mathcal{TP}, \mathsf{t}, S, M, \beta)^\iota :\Longrightarrow \mathsf{done} \\ \hline \\ (\mathcal{TP}, \mathsf{t}, S, M, \beta)^\iota :\Longrightarrow \mathsf{done} \\ \hline \\ \\ (\mathcal{TP}, \mathsf{t}, S, M, \beta)^\iota :\Longrightarrow \mathsf{done} \\ \hline \\ \\ (\mathcal{TP}, \mathsf{t}, S, M, \beta)^\iota :\Longrightarrow \mathsf{done} \\ \hline \\ \\ (\mathcal{TP}, \mathsf{t}, S, M, \beta)^\iota :\Longrightarrow \mathsf{done} \\ \hline \\ \\ (\mathcal{TP}, \mathsf{t}, S, M, \beta)^\iota :\Longrightarrow \mathsf{done} \\ \hline \\ \\ (\mathcal{TP}, \mathsf{t}, S, M, \beta)^\iota :\Longrightarrow \mathsf{done} \\ \hline \\ \\ (\mathcal{TP}, \mathsf{t}, S, M, \beta)^\iota :\Longrightarrow \mathsf{done} \\ \hline \\ \\ (\mathcal{TP}, \mathsf{t}, S, M, \beta)^\iota :\Longrightarrow \mathsf{done} \\ \hline \\ \\ (\mathcal{TP}, \mathsf{t}, S, M, \beta)^\iota :\Longrightarrow \mathsf{done} \\ \hline \\ \\ (\mathcal{TP}, \mathsf{t}, S, M, \beta)^\iota :\Longrightarrow \mathsf{done} \\ \hline \\ \\ (\mathcal{TP}, \mathsf{t}, S, M, \beta)^\iota :\Longrightarrow \mathsf{done} \\ \hline \\ \\ (\mathcal{TP}, \mathsf{t}, S, M, \beta)^\iota :\Longrightarrow \mathsf{done} \\ \hline \\ \\ (\mathcal{TP}, \mathsf{t}, S, M, \beta)^\iota :\Longrightarrow \mathsf{done} \\ \hline \\ \\ (\mathcal{TP}, \mathsf{t}, S, M, \beta)^\iota :\Longrightarrow \mathsf{done} \\ \hline \\ \\ (\mathcal{TP}, \mathsf{t}, S, M, \beta)^\iota :\Longrightarrow \mathsf{done} \\ \hline \\ \\ (\mathcal{TP}, \mathsf{t}, S, M, \beta)^\iota :\Longrightarrow \mathsf{done} \\ \hline \\ \\ (\mathcal{TP}, \mathsf{t}, S, M, \beta)^\iota :\Longrightarrow \mathsf{done} \\ \hline \\ \\ (\mathcal{TP}$$

Fig. 15. Non-preemptive semantics

We permit that the thread takes a cancel step after the non-atomic step. The reason is that we need additional cancel steps to fulfill the reservations. Consider the following example.

$$x_{na} := 1;$$
  
while(true);  
 $y_{rlx} := 2;$ 

Consider the following execution of the above program under the non-preemptive semantics.

(1) 
$$\begin{array}{c|c} reservation \langle y:(0,1] \rangle \\ x_{na} := 1 \\ \hline (2) & while(true) \end{array}$$

We can find that, after the execution of the step (1), the atomic bit is " $\bullet$ ". If we do not permit executing cancel steps in the promise consistency certification, the reservation " $\langle y : (0,1] \rangle$ " can not be fulfilled.

Note that we can not view the cancel step as non-atomic step, since this will cause a problem in the proof of the equvialence of the na promise-free semantics and the non-preemptive semantics under the data race freedom assumption. The problem is that reordering the execution of non-atomic steps and atomic step is incorrect when viewing the cancel step as non-atomic step. Consider the following program.

$$x_{na} := 1 \quad \left\| \begin{array}{c} z_{na} := 2; \\ \text{FADD}_{acq,rlx}(y,1); \end{array} \right.$$

We consider the following execution of the above program under promising semantics.

|       | $t_1$                                                 | $t_2$                 |
|-------|-------------------------------------------------------|-----------------------|
| (P-1) | promise $\langle y : (0,2] \rangle$ (* reservation *) |                       |
| (P-2) |                                                       | $z_{na} := 2$         |
| (P-3) | $x_{na} := 1;$ cancel $\langle y : (0, 2] \rangle$    |                       |
| (P-4) |                                                       | $FADD_{acq,rlx}(y,1)$ |
| (P-5) |                                                       | done                  |
| (P-6) | done                                                  |                       |

If we view the cancel step as the non-atomic step, we need to construct the following execution of the above program under the non-preemptive semantics.

|         | $t_1$                                                  | $t_2$                   |
|---------|--------------------------------------------------------|-------------------------|
| (NP-1)  | promise $\langle y : (0, 2] \rangle$ (* reservation *) |                         |
| (NP-2)  |                                                        | $z_{na} := 2$           |
| (111 2) |                                                        | $FADD_{acq,rlx}(y, 1);$ |
| (NP-3)  |                                                        | done                    |
| (NID 4) | $x_{na} := 1;$                                         |                         |
| (NP-4)  | cancel $\langle y:(0,2]\rangle$                        |                         |
| (NP-5)  | done                                                   |                         |

We find that the step (NP-2) can not be taken, since the timestamps required for execution "FADD $_{acq,rlx}(y, 1)$ " has been reserved.

Behaviors of programs under the non-preemptive semantics. We define the behavior of programs under the non-preemptive semantics shown below.

$$\begin{split} \textit{NPProgEtr}(\hat{\mathbb{P}},\mathcal{B}) & \text{ iff } \exists \hat{W}, n. \ (\hat{\mathbb{P}} : \stackrel{load}{\Longrightarrow} \hat{W}) \land \textit{NPEtr}^n(\hat{W},\mathcal{B}) \\ & \frac{\hat{W} : \Longrightarrow \mathbf{abort}}{\textit{NPEtr}^n(\hat{W}, \epsilon)} & \frac{\hat{W} : \Longrightarrow \mathbf{abort}}{\textit{NPEtr}^{n+1}(\hat{W}, \mathbf{abort})} & \frac{\hat{W} : \Longrightarrow \mathbf{done}}{\textit{NPEtr}^{n+1}(\hat{W}, \mathbf{done})} \\ & \frac{\hat{W} : \stackrel{\text{out}(v)}{\Longrightarrow} \hat{W}' \quad \textit{NPEtr}^n(\hat{W}', \mathcal{B})}{\textit{NPEtr}^{n+1}(\hat{W}, \text{out}(v) :: \mathcal{B})} & \frac{\hat{W} : \stackrel{\tau/\text{sw}}{\Longrightarrow} \hat{W}' \quad \textit{NPEtr}^n(\hat{W}', \mathcal{B})}{\textit{NPEtr}^{n+1}(\hat{W}, \mathcal{B})} \end{split}$$

We need to prove the equivalence between the na promise-free semantics and the non-preemptive semantics under the DRF assumption as shown in Lemma. 6.1.

LEMMA 6.1 (SEMANTICS EQUVIALENCE - P2NP).

$$\forall \pi, f_1, \dots, f_n, \iota, \mathcal{B}.$$

$$ProgEtr(\textbf{let} (\pi, \iota) \textbf{ in } f_1 \parallel \dots \parallel f_n, \mathcal{B}) \Longleftrightarrow NPProgEtr(\textbf{let} (\pi, \iota) \textbf{ in } f_1 \mid \dots \mid f_n, \mathcal{B})$$

Lemma. 6.1 shows the step ⑦ in Fig. 2. We show the details of the proof of Lemma. 6.1 in Appendix. B. We give the intuition in the semantics equivalence proof. The intuition is that the write step, whose execution will generate a new message, in promising semantics can be divided into a promise step and a step to fulfill such promise.



Consider the following program.

$$egin{array}{lll} x_{na} := 1; & & & r_1 := x_{na}; \\ r_2 := y_{rlx}; & & if(r_1) \ \{ & & y_{rel} := 1; \\ r_3 := x_{na}; & & x_{na} := 2; \\ & print(r_2); & & while(true); \ \} & & \end{array}$$

Consider the following execution of the above program.

|       | $t_1$                                                | $t_2$                              |
|-------|------------------------------------------------------|------------------------------------|
| (P-1) | $x_{na} := 1$                                        |                                    |
| (P-2) |                                                      | $r_1 := x_{na} //1$ $y_{rel} := 1$ |
| (P-3) | $r_2 := y_{rlx} //1$                                 |                                    |
| (P-4) |                                                      | $x_{na} := 2$                      |
| (P-5) | $r_3 := x_{na} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ $ |                                    |
| (P-6) |                                                      | while(true)                        |

We can construct the execution under the non-preemptive semantics from the above execution. In construction, we need to reorder the step (P-1) and (P-2). It is impossible to reorder them directly, since the execution of " $r_1 := x_{na}$ " in (P-2) needs to read the message generated by " $x_{na} := 1$ " in (P-1). However, we can let " $x_{na} := 1$ " generate the message valued 1 by promise step as the following shown.

$$\mathcal{TP}(\mathsf{t}) = (\sigma, \mathcal{V}, P) \qquad \sigma \xrightarrow{\mathsf{W}(\mathsf{na}, \mathsf{x}, \underline{\ })} \underline{\hspace{0.5cm}} \underline{\hspace{0.5cm}} \langle \mathsf{x} : \upsilon@(\underline{\ }, t], \underline{\ } \rangle \in (M \backslash P) \qquad \mathcal{V}.\mathsf{cur}.T_{\mathsf{rlx}}(x) < t}$$

$$(\mathcal{TP}, \mathsf{t}, \mathcal{S}, M, \beta)^{\iota} : \Longrightarrow \mathsf{ww}\text{-Race}$$

$$\hat{\mathbb{P}} : \stackrel{load}{\Longrightarrow} \hat{W} \qquad \hat{W} : \Longrightarrow^{*} \hat{W}' \qquad \hat{W}' : \Longrightarrow \mathsf{ww}\text{-Race}$$

$$\hat{\mathbb{P}} : \Longrightarrow \mathsf{ww}\text{-Race}$$

$$\mathsf{ww}\text{-NPRF}(\hat{\mathbb{P}}) \quad ::= \quad \neg(\hat{\mathbb{P}} : \Longrightarrow \mathsf{ww}\text{-Race})$$

Fig. 16. Write-write race freedom under the non-preemptive semantics

|        | $t_1$                                              | $t_2$                                                    |
|--------|----------------------------------------------------|----------------------------------------------------------|
| (NP-1) | promise $\langle x : 1@(0,1], V_{\perp} \rangle$   |                                                          |
| (NP-2) |                                                    | $r_1 := x_{na} //1$ $y_{rel} := 1$                       |
| (NP-3) | $x_{na} := 1 $ (* fulfill *) $r_2 := y_{rlx} $ //1 |                                                          |
| (NP-4) |                                                    | promise $\langle x : 2@(1,2], V_{\perp} \rangle$         |
| (NP-5) | $r_3 := x_{na} //2$<br>print $(r_3)$               |                                                          |
| (NP-5) |                                                    | <pre>x<sub>na</sub> := 2 (* fulfill *) while(true)</pre> |

Data race under the non-preemptive semantics. We define the data race under the non-preemptive semantics in Fig. 16. We need to prove the equvialence between the data race freedoms under promising semantics and the non-preemptive semantics as shown in Lemma. 6.2.

Lemma 6.2 (ww-race free equvialence - P2NP).

$$\forall \pi, \mathsf{f}_1, \dots, \mathsf{f}_n, \iota.$$
 
$$\mathsf{ww-RF}(\mathbf{let}\ (\pi, \iota)\ \mathbf{in}\ \mathsf{f}_1\ \|\ \cdots\ \|\ \mathsf{f}_n) \Longleftrightarrow \mathsf{ww-NPRF}(\mathbf{let}\ (\pi, \iota)\ \mathbf{in}\ \mathsf{f}_1\ |\ \dots\ |\ \mathsf{f}_n)$$

Lemma. 6.2 shows the step (1) and the step (6) in Fig. 2.

$$(MMap) \quad \varphi \in (Var \times Time) \rightarrow Time$$

$$\operatorname{mon}(\varphi) \quad \stackrel{\triangle}{=} \quad \forall \mathsf{x}, t_1, t_2, t_1', t_2'. \ (\varphi(\mathsf{x}, t_1) = t_1' \land \varphi(\mathsf{x}, t_2) = t_2' \land t_1 < t_2) \implies t_1' < t_2'$$

$$\varphi(M) \quad \stackrel{\triangle}{=} \quad \{(\mathsf{x}, t') \mid \langle \mathsf{x} : \_@(\_, t], \_\rangle \in M \land \varphi(\mathsf{x}, t) = t'\}$$

$$\parallel M \parallel \quad \stackrel{\triangle}{=} \quad \{(\mathsf{x}, t) \mid \langle \mathsf{x} : \_@(\_, t], \_\rangle \in M\}$$

$$\varphi(M_t, M_s) \quad \stackrel{\triangle}{=} \quad \varphi(M_t) \subseteq \|M_s\| \land \operatorname{dom}(\varphi) = \|M_t\| \land \operatorname{mon}(\varphi)$$

Fig. 17.  $\varphi$ -related messages



Fig. 18. An example of message injection

## 7 THREAD LOCAL UPWARD SIMULATION

In this section, we define a *thread-local* simulation as the formal correctness definition of optimizations. Before presenting the simulation, we first introduce several relations that relate (part of) the program configurations at the target and the source levels.

Timestamp mapping. In PS2.1, the memory is defined as a set of messages. We introduce a partial mapping  $\varphi$  whose type is defined in Fig. 17 to relate the "to"-timestamps of the messages in the target and source levels as shown in Fig. 18. We use  $mon(\varphi)$  to represent that  $\varphi$  is monotonic.

Invariant parameter and rely conditions. Since we allow the simulation to be established for individual threads without relying on the code of other threads, we use the invariant I and R condition to depict the behaviors of other threads for thread-local reasoning and compositionality. Our simulation is parameterized with an invariant I (in Fig. 19), which needs to hold over the shared states at *every switch point*. It can be instantiated differently when verifying different optimizations. We use the quadruple  $S = (S_t, M_t, S_s, M_s)$  to represent the global time map  $S_t$  for SC fences and the memory  $S_t$  at the source and target levels. Users instantiating  $S_t$  are expected to specify the application-dependent invariant over  $S_t$  and  $S_t$  with the help of  $S_t$  of the atomic variables set at the switch point. An invariant  $S_t$  is well-formed (wf( $S_t$ ), defined in Fig. 19), if each concrete message in the target level has a related one in the source level through a monotonic timestamp mapping  $S_t$  (shown as  $S_t$ ), defined in Fig. 17).

The rely condition (R) defined in Fig. 19 specifies the environment's transitions at the source and the target levels. The parameters  $\mathbb{S}$  and  $\mathbb{S}'$  represent the shared states at the points when the current thread switches out and back, respectively. The first item in R are guaranteed by our non-preemptive semantics as well as PS2.1

$$(Sst) \quad \mathbb{S} \quad \triangleq \quad (S_t, M_t, S_s, M_s)$$

$$(Inv) \quad I \quad \in \quad Atms \rightarrow MMap \rightarrow Sst \rightarrow Prop$$

$$T \in M \quad \triangleq \quad \forall x \in Var. \ \exists m \in \widetilde{M}(x). \ T(x) = m. \text{to}$$

$$V \in M \quad \triangleq \quad V.T_{na} \in M \land V.T_{r|x} \in M$$

$$\text{closed}(M) \quad \triangleq \quad \forall m \in \widetilde{M}. \ m. \text{view} \in M$$

$$[M]_t \quad \triangleq \quad \{m \mid m \in M \land m. \text{var} \in t\}$$

$$M \approx M' \quad \triangleq \quad (\forall x, f, t, v. \ \langle x : v@(f, t], \_\rangle \in M \iff \langle x : v@(f, t], \_\rangle \in M')$$

$$\land (\forall x, f, t. \ \langle x : (f, t]\rangle) \in M \iff \langle x : (f, t]\rangle \in M')$$

$$T \leq T' \quad \triangleq \quad \forall x \in Var. \ T(x) \leq T'(x)$$

$$\text{wf}(I) \quad \triangleq \quad \forall t, \varphi, \mathbb{S}. \ I(t, \varphi, \mathbb{S}) \implies \varphi(\mathbb{S}.M_t, \mathbb{S}.M_s)$$

$$\text{env}(\mathbb{S}, \mathbb{S}', P_t, P_s) \quad \triangleq \quad M_t \subseteq \widetilde{M}'_t \land \widetilde{M}_s \subseteq \widetilde{M}'_s \land S_t \leq S'_t \land S_s \leq S'_s \land \quad \text{closed}(M'_t) \land S'_t \in M'_t \land P_t \subseteq M'_t \land P_s \subseteq M'_s$$

$$\text{where} \quad \mathbb{S} = (S_t, M_t, S_s, M_s) \ \text{and} \quad \mathbb{S}' = (S'_t, M'_t, S'_s, M'_s)$$

$$\mathbb{R}(t, (\varphi, \mathbb{S}), (\varphi', \mathbb{S}'), P_t, P_s) \quad \triangleq \quad \text{env}(\mathbb{S}, \mathbb{S}', P_t, P_s) \land \varphi \subseteq \varphi' \land [\mathbb{S}'.M_t]_t \approx [\mathbb{S}'.M_s]_t$$

Fig. 19. Invariant parameter and rely condition

$$\begin{array}{lll} (\mathit{Index}) & i \in \dots & (\mathit{DlyItem}) & d \in (\mathit{Var} \times \mathit{Time}) \\ (\mathit{Dlyset}) & \mathcal{D} \in \mathit{DlyItem} \rightharpoonup \mathit{Index} \\ \\ \mathcal{D}' < \mathcal{D} & \triangleq \dim(\mathcal{D}) = \dim(\mathcal{D}') \land \forall \mathcal{D} \in \dim(\mathcal{D}). \ \mathcal{D}'(d) < \mathcal{D}(d) \\ \\ \varphi(\mathcal{D}) & \triangleq \{(\mathsf{x},t') \mid \mathcal{D}(\mathsf{x},t) = i \land \varphi(\mathsf{x},t) = t'\} \\ \\ \mathcal{D}[d \mapsto i] & \triangleq \begin{cases} \mathcal{D} & \textit{if } d \in \dim(\mathcal{D}) \\ \mathcal{D}\{d \rightsquigarrow i\} & \textit{otherwise} \end{cases} \end{array}$$

Fig. 20. Delayed write set and corresponding definitions

(shown as env defined in Fig. 19,  $\widetilde{M}$  consisting of only the concrete messages in M (see Sec. 3)). The second item indicates the increasing of the timestamp mapping, since the environment may insert additional messages. Since we do not perform optimizations on atomic accesses, the sub-memory for atomic variables in the target and source levels must be strictly equal, except for the message views (shown as  $[S'.M_t]_t \approx [S'.M_s]_t$  in the item 3).

Delayed write set and step invariant. In our simulation, we allow the source state to be temporarily "left behind" the target. That is, when the target thread takes a write step, the source may not perform the corresponding step at present, but it must eventually do the step. We introduce the delayed write set  $\mathcal{D}$  to record the set of writes which must be caught up by the source thread later. A pair of location x and timestamp t (called delayed item) in  $\mathcal{D}$  means that the target thread has performed a write on x at the timestamp t, but such write has not been caught up by the source thread.  $\mathcal{D}$  maps d to a well-founded index i to require the source thread to catch up a

$$\begin{split} & (\mathsf{x},t) \in \big\| (P-P') \cup (M'-M) \big\| \\ & \mathcal{D}' = \mathcal{D}[(\mathsf{x},t) \mapsto i] \\ & \overline{(P,M),(P',M') \vdash \mathcal{D} \overset{\mathsf{W}(\mathsf{na},\mathsf{x},v)}{\leadsto} \mathcal{D}'} & \underbrace{te \neq \mathsf{W}(\mathsf{na},\_,\_)} \\ & (P,M),(P',M') \vdash \mathcal{D} \overset{te}{\leadsto} \mathcal{D} \end{split}$$

Fig. 21. Delayed item introduction rules

delayed write within finite steps. We also define some operations about  $\mathcal{D}$  in Fig. 20, which will be used in our thread-local simulation.

The step invariant SI relates the target and source thread states and holds in *every step* of the thread-local simulation.

Definition 7.1 (Step invariant).  $SI(\iota, \varphi, (TS_t, M_t), (TS_s, M_s), \mathcal{D})$  iff,

- (1) for any  $(x, t) \in \text{dom}(\varphi)$ , if  $TS_t.V.\text{cur.}T_{r|x}(x) < t$ , then  $TS_s.V.\text{cur.}T_{r|x}(x) < \varphi(x, t)$ ;
- (2) there exists  $\mathcal{D}_p \subseteq \mathcal{D}$ , such that  $(\varphi(\mathcal{D}_p) \cup \varphi(TS_t.P)) = ||TS_s.P||$ ;
- (3) for any  $\sigma'$ , if  $(TS_t.\sigma \xrightarrow{at} \sigma')$ , then  $\mathcal{D} = \emptyset$ ;
- (4)  $[M_t]_t \approx [M_s]_t$ .

Item 1 of SI requires that if a target level message has not been observed by the target thread, its  $\varphi$ -related message in the source level should not be observed by the source thread either. Item 2 requires a one-to-one mapping between the source and target threads' promises as the following shown (including those in  $\mathcal{D}$ , which have been fulfilled by the target thread but not yet by the source).

$$\varphi$$
 $D_p$ 
 $TS_t.P$ 
 $\varphi$ 
 $TS_s.P$ 

Item 3 says that if the current instruction at the target is an atomic memory access  $(TS_t.\sigma \xrightarrow{at} \sigma')$ , all locations written by the target thread have also been written by the source  $(\mathcal{D} = \emptyset)$ . Item 4 gives the same restriction as in R.

Simulation. We define the thread-local simulation between the target and source programs  $\pi_t$  and  $\pi_s$  in Def. 7.2. We use  $M_0 = \{\langle x : 0@(0,0], V_{\perp} \rangle \mid x \in Var\}$ ,  $S_{\perp} = \{x \leadsto 0 \mid x \in Var\}$  and  $\varphi_0 = \{(x,0) \leadsto 0 \mid x \in Var\}$  to represent the memory, the global time map for SC fence and the message mapping in the initial state respectively.

Definition 7.2 (Thread-local upward simulation).  $I, \iota \models \pi_t \preccurlyeq \pi_s$  iff,

- (1)  $I(\iota, \varphi_0, (S_\perp, M_0, S_\perp, M_0))$  and wf(I);
- (2) for any  $\sigma_t$  and f, if  $Init(\pi_t, f) = \sigma_t$ , then there exists  $\sigma_s$  such that  $Init(\pi_s, f) = \sigma_s$  and

$$I, \iota \models ((\sigma_t, \mathcal{V}_{\perp}, \emptyset), \mathcal{S}_{\perp}, M_0) \preccurlyeq^{\circ, \emptyset}_{\varphi_0} ((\sigma_s, \mathcal{V}_{\perp}, \emptyset), \mathcal{S}_{\perp}, M_0),$$

where  $I, \iota \models (TS_t, \mathcal{S}_t, M_t) \preccurlyeq_{\varphi}^{\beta, \mathcal{D}} (TS_s, \mathcal{S}_s, M_s)$  is defined in Def. 7.3.

Def. 7.2 first requires that the user-provided invariant I should hold at initial states and I should be well-formed. Second, if the execution of a target thread starts from the function f in  $\pi_t$  in the initial state, the source thread can also start from f in  $\pi_s$  and they have the simulation defined in Def. 7.3.

$$\frac{\iota \vdash (TS, \mathcal{S}, M) \xrightarrow{\mathbb{R}(\mathsf{na}, \mathsf{x}, v)} (TS', \mathcal{S}', M')}{\forall (\mathsf{x}, t) \in \mathsf{dom}(\mathcal{D}). \ TS.\mathcal{V}.\mathsf{cur}.T_{\mathsf{rlx}}(\mathsf{x}) = TS'.\mathcal{V}.\mathsf{cur}.T_{\mathsf{rlx}}(\mathsf{x})}$$

$$\iota \vdash (TS, \mathcal{S}, M, \mathcal{D}) \xrightarrow{\mathbb{R}(\mathsf{na}, \mathsf{x}, v)} (TS', \mathcal{S}', M', \mathcal{D})}$$

$$\frac{\iota \vdash (TS, \mathcal{S}, M) \xrightarrow{\mathsf{W}(\mathsf{na}, \mathsf{x}, v)} (TS', \mathcal{S}', M')}{\mathcal{D}' = \mathcal{D} \setminus (\mathsf{x}, t) \lor \mathcal{D}' = \mathcal{D}} \qquad \qquad \underbrace{\iota \vdash (TS, \mathcal{S}, M) \xrightarrow{\tau} (TS', \mathcal{S}', M')}_{\iota \vdash (TS, \mathcal{S}, M, \mathcal{D}) \xrightarrow{\tau} (TS', \mathcal{S}', M', \mathcal{D})}$$

Fig. 22. Delayed item elimination rules

Definition 7.3.  $I, \iota \models (TS_t, \mathcal{S}_t, M_t) \preccurlyeq_{\varphi}^{\beta, \mathcal{D}} (TS_s, \mathcal{S}_s, M_s)$  is the largest relation such that: whenever  $I, \iota \models (TS_t, S_t, M_t) \preccurlyeq_{\varphi}^{\beta, \mathcal{D}} (TS_s, S_s, M_s)$ , then either  $(\iota \vdash (TS_s, S_s, M_s, \beta) \mapsto^+ \mathbf{abort}),$ or  $SI(\iota, \varphi, (TS_t, M_t), (TS_s, M_s), \mathcal{D})$  and the following are true:

- (1)  $\forall TS'_t, S'_t, M'_t$ , te, if  $\iota \vdash (TS_t, S_t, M_t) \xrightarrow{te} (TS'_t, S'_t, M'_t)$ , then the following hold:
  - (a) if  $te \in AT$ , there exist  $TS'_s$ ,  $S'_s$ ,  $M'_s$ , and  $\varphi'$  such that:
  - (a) If  $te \in M$ , there exist  $TS_s$ ,  $S_s$ ,  $M_s$ , and  $\varphi$  such that:  $\iota \vdash (TS_s, S_s, M_s) \xrightarrow{na}^* \xrightarrow{te} (TS_s', S_s', M_s');$   $\varphi \subseteq \varphi'$  and  $I, \iota \models (TS_t', S_t', M_t') \preccurlyeq_{\varphi'}^{\circ, \emptyset} (TS_s', S_s', M_s');$ (b) if  $te \in NA$ , there exist  $TS_s', S_s', M_s', \mathcal{D}_1$  and  $\mathcal{D}_2$ , such that:
    - $(TS_t.P, M_t), (TS'_t.P, M'_t) \vdash \mathcal{D} \stackrel{te}{\leadsto} \mathcal{D}_1;$
  - $\iota \vdash (TS_s, S_s, M_s, \mathcal{D}_1) \xrightarrow{na}^* (TS_s', S_s', M_s', \mathcal{D}_2)$  and  $\mathcal{D}_2' < \mathcal{D}_2$ ;  $\exists \mathcal{D}_2' < \mathcal{D}_2$ .  $I, \iota \models (TS_t', S_t', M_t') \preccurlyeq_{\varphi}^{\bullet, \mathcal{D}_2'} (TS_s', S_s', M_s')$ ; (c) if  $te \in \{\text{prm}, \text{rsv}\}$  and  $\beta = \circ$ , there exist  $TS_s', S_s', M_s'$  and  $\varphi'$  such that:

  - $\iota \vdash (TS_s, S_s, M_s) \xrightarrow{prc'} (TS'_s, S'_s, M'_s);$   $\varphi \subseteq \varphi'$  and  $I, \iota \models (TS'_t, S'_t, M'_t) \preccurlyeq^{\circ, \emptyset}_{\varphi'} (TS'_s, S'_s, M'_s);$ (d) if te = ccl, there exist  $TS'_s, S'_s, M'_s$  such that:
  - - $\iota \vdash (TS_s, \mathcal{S}_s, M_s) \xrightarrow{\operatorname{ccl}}^* (TS'_s, \mathcal{S}'_s, M'_s);$   $I, \iota \models (TS'_t, \mathcal{S}'_t, M'_t) \preccurlyeq_{\varphi}^{\beta, \mathcal{D}} (TS'_s, \mathcal{S}'_s, M'_s);$
- (2) if  $\beta = 0$ , then  $I(\iota, \varphi, \mathbb{S})$  and  $\forall \mathbb{S}', \varphi'$ ,

if  $R(\iota, (\varphi, \mathbb{S}), (\varphi', \mathbb{S}'), TS_t.P, TS_s.P)$  and  $I(\iota, \varphi', \mathbb{S}')$ , then  $I, \iota \models (TS_t, S_t', M_t') \preccurlyeq_{\varphi'}^{\circ, \emptyset} (TS_s, S_s', M_s')$ , where  $\mathbb{S} = (S_t, M_t, S_s, M_s)$ ,  $\mathbb{S}' = (S_t', M_t', S_s', M_s')$ ;

- (3) if  $\iota \vdash (TS_t, S_t, M_t) \longrightarrow \mathbf{done}$ , there exist  $TS'_s, S'_s, M'_s$ , and  $\varphi'$  such that:
  - $\iota \vdash (TS_s, S_s, M_s, \mathcal{D}) \xrightarrow{na} (TS'_s, S'_s, M'_s, \emptyset), \iota \vdash (TS'_s, S'_s, M'_s) \longrightarrow \mathbf{done};$
  - $\varphi \subseteq \varphi'$  and  $I(\iota, \varphi', (S_t, M_t, S_s', M_s'));$
- (4) if  $\iota \vdash (TS_t, S_t, M_t) \longrightarrow \mathbf{abort}$ , then  $\iota \vdash (TS_s, S_s, M_s, \beta) \longmapsto^+ \mathbf{abort}$ .

The simulation  $I, \iota \models (TS_t, S_t, M_t) \preccurlyeq_{\varphi}^{\beta, \mathcal{D}} (TS_s, S_s, M_s)$  carries a delayed write set  $\mathcal{D}$  for the writes that the source thread has to perform later. The parameter  $\varphi$  records the timestamp mapping between the target and source memory at the last switch point. The atomic bit  $\beta$  indicates whether the thread can switch.

If the source thread aborts in finite steps (shown as  $\iota \vdash (TS_s, S_s, M_s, \beta) \longmapsto^+$  **abort**), the simulation trivially holds. So the correctness of the optimization is meaningful only when the source program never aborts.



Fig. 23. Simulation diagram of atomic step (t and t' are target and source threads respectively)



Fig. 24. Simulation diagrams of non-atomic step (t' and t are target and source threads respectively, and  $\mathcal{D}_2$  records the delayed writes that have not been caught up by the current source thread steps)

Otherwise, we require that the step invariant SI always holds, and discuss the different cases of target steps. Case (1-a) in Def. 7.3 shows an atomic step of the target. The simulation follows the diagram in Fig. 23. The step invariant SI ensures that the delayed write set has been empty when taking an atomic step. After the step, the timestamp mapping increasing (shown as  $\varphi \subseteq \varphi'$ ) and the invariant I needs to be reestablished, since the target and source threads reach a switch point.

Case (1-b) shows the condition that the target thread takes a non-atomic step. If the target thread takes a non-atomic write step at the timestamp t as shown in Fig. 24(a), we add a new delayed item (x, t) with an index i into the delayed write set  $\mathcal{D}$  as defined in Fig. 22. The target write step may fulfill a message in its promise set  $((x, t) \in ||TS_t.P-TS_t'.P||)$ , or generate a new message into memory  $((x, t) \in ||M_t'-M_t||)$ . Then, the source thread will take some non-atomic steps to catch up some delayed writes according to the rules in Fig. 22. Here, we forbid the non-atomic reads in the source thread steps to read unobserved messages, whose locations are recorded in the delayed write set. Reading such unobserved message on some location x may cause the source thread to insert the message on location x at some unexpected timestamps when catching up the delayed write on x, and may cause the write-write race freedom to fail to preserve. Then, we reduce the indexes of the elements in the delayed write set (shown as  $\mathcal{D}'_2 < \mathcal{D}_2$ ) to ensure that the source thread will eventually write to the locations in the delay set.

Case (1-c) and (1-d) in Def. 7.3 reflect the situations that the target thread takes promise/reserve step and cancel step (shown in Fig. 25). As we have introduced, a thread takes a promise step for its future write. Since we require that the locations written by the source thread should not be less than the target thread for write-write race freedom preservation, each memory write at the target level should have a corresponding memory write at the source level. Thus, if the target thread takes a promise step for a future write, the simulation requires the source thread to take a promise step for the corresponding future write in the source program. In the cancel step case (shown in Fig. 25(b)), the indexes of elements in the delayed write set do not need to be reduced, even if the

Fig. 25. Simulation diagrams of promise/reserve and cancel steps (t' and t are target and source threads respectively)

target and source threads are in atomic block, since the number of the reservations in the thread's promise set is finite and the thread will never take cancel steps forever.

If the atomic bit is  $\circ$  (case 2), we consider the interaction with other threads (following the diagram below). When switching back after environment steps satisfying R and the invariant I reestablished, the simulation must hold over the new states.

$$\begin{array}{cccc}
t & \circ & ---- & \circ \\
\emptyset, I & & R & \emptyset, I \\
t' & \circ & ---- & \circ
\end{array}$$

Finally, case 3 (or 4) says, if the target thread terminates (or aborts), so does the source thread. When both target and source threads terminate, we require that the invariant I holds and a larger message mapping  $\varphi'$  is well-formed from the restrictions of guarantee condition (G).

We show that why the item 3 in the step invariant defined in Def. 7.1 is essential in proving the write-write race preserving. Consider the following example. For the source program, we call the left thread  $t_1$  and the right thread  $t_2$ . For the target program, we call the left thread  $t_1'$  and the right thread  $t_2'$ .

The execution of the target program may have the following execution to generate the write-write race.

|     | $t_1'$                              | $t_2'$               |
|-----|-------------------------------------|----------------------|
| (1) | reserve $\langle z : (0,1] \rangle$ |                      |
| (2) | $x_{na} := 1$                       |                      |
| (3) |                                     | $x_{na} := 3 //Race$ |

Consider that we have already established the thread-local simulation between  $t'_2$  and  $t_2$  as the following shown and we need to use such simulation to construct the write-write race in the source program.



To ensure that the source thread will eventually write the location x and generates write-write race, our thread-local simulation needs to make sure that when executing  $CAS_{rlx,rlx}(z,0,1)$  the delayed write set is empty (the item

3 in the step invariant). Note that we can not write such restriction in the atomic step case in the thread-local simulation, since the execution of  $CAS_{rlx,rlx}(z,0,1)$  can not be done. The reason is that the timestamps that will be used by the execution of  $CAS_{rlx,rlx}(z,0,1)$  have been reserved by  $t_1'$ .

## 8 WHOLE PROGRAM SIMULATION AND COMPOSITIONALITY

In this section, we define the whole program simulation in Sec. 8.1, and then show the proof sketch of the compositionality in Sec. 8.2. In Sec. 8.3, we show that our thread-local simulation preserves the promise certification. and illustrate how we prove that a certification against the current memory for non-atomic locations ensures the existence of the certification against the capped memory in detail as we have mentioned in Sec. 1.

## 8.1 Whole program simulation

The role of the whole program simulation in our work, as shown in Fig. 2, is to ensure that there is a refinement relation between the target and source programs. We define the whole program simulation in the following.

Definition 8.1 (Whole program simulation). **let**  $(\pi_t, \iota)$  **in**  $f_1 \mid \ldots \mid f_n \leq \text{let} (\pi_s, \iota)$  **in**  $f_1 \mid \ldots \mid f_n$  iff, for any  $\hat{W}_t$ , if **let**  $(\pi_t, \iota)$  **in**  $f_1 \mid \ldots \mid f_n : \stackrel{load}{\Longrightarrow} \hat{W}_t$ , there exists  $\hat{W}_s$  such that:

- let  $(\pi_s, \iota)$  in  $f_1 \mid \ldots \mid f_n : \stackrel{load}{\Longrightarrow} \hat{W}_s$ ;
- $\hat{W}_t \leq \hat{W}_s$ .

where  $\hat{W}_t \leq \hat{W}_s$  is defined in Def. 8.2.

Definition 8.2. Whenever  $(\mathcal{TP}_t, t, S_t, M_t, \beta_t)^i \leq (\mathcal{TP}_s, t, S_s, M_s, \beta_s)^i$ , the following are true:

- (1) for any  $\mathcal{TP}_t'$ ,  $\mathcal{S}_t'$ ,  $M_t'$  and  $\beta_t'$ , if  $(\mathcal{TP}_t, t, \mathcal{S}_t, M_t, \beta_t)^i : \stackrel{\tau}{\Longrightarrow} (\mathcal{TP}_t', t, \mathcal{S}_t', M_t', \beta_t')^i$ , then there exist  $\mathcal{TP}_s'$ ,  $\mathcal{S}_s'$ ,  $M_s'$ and  $\beta'_s$  such that:
  - $(\mathcal{TP}_{s}, \mathsf{t}, \mathcal{S}_{s}, M_{s}, \beta_{s})^{\iota} : \stackrel{\tau}{\Longrightarrow}^{*} (\mathcal{TP}_{s}', \mathsf{t}, \mathcal{S}'_{s}, M'_{s}, \beta'_{s})^{\iota};$
  - $(\mathcal{TP}_t', \mathsf{t}, \mathcal{S}_t', M_t', \beta_t')^i \leqslant (\mathcal{TP}_s', \mathsf{t}, \mathcal{S}_s', M_s', \beta_s')^i$ .
- (2) for any  $\mathcal{TP}_t'$ ,  $\mathcal{S}_t'$  and  $M_t'$ , if  $(\mathcal{TP}_t, \mathsf{t}, \mathcal{S}_t, M_t, \beta_t)^i : \stackrel{\mathsf{out}(v)}{\Longrightarrow} (\mathcal{TP}_s', \mathsf{t}, \mathcal{S}_t', M_t', \circ)^i$ , then there exist  $\mathcal{TP}_s'$ ,  $\mathcal{S}_s'$ ,  $M_s'$ ,  $\beta_s'$ ,  $\mathcal{TP}_s'', \mathcal{S}_s''$  and  $M_s''$  such that:
  - $\bullet \ (\mathcal{TP}_{\mathcal{S}},\mathsf{t},\mathcal{S}_{s},M_{s},\beta_{s})^{\iota} : \stackrel{\tau}{\Longrightarrow} {}^{*} (\mathcal{TP}_{\mathcal{S}}',\mathsf{t},\mathcal{S}_{s}',M_{s}',\beta_{s}')^{\iota} \ \text{and} \ (\mathcal{TP}_{\mathcal{S}}',\mathsf{t},\mathcal{S}_{s}',M_{s}',\beta_{s}')^{\iota} : \stackrel{\mathrm{out}(\upsilon)}{\Longrightarrow} (\mathcal{TP}_{\mathcal{S}}'',\mathsf{t},\mathcal{S}_{s}'',M_{s}'',\circ)^{\iota}; \\ \bullet \ (\mathcal{TP}_{t}',\mathsf{t},\mathcal{S}_{t}',M_{t}',\circ)^{\iota} \leqslant (\mathcal{TP}_{\mathcal{S}}'',\mathsf{t},\mathcal{S}_{s}'',M_{s}'',\circ)^{\iota}.$
- (3) for any  $\mathcal{TP}_t'$ , t', if  $(\mathcal{TP}_t, t, \mathcal{S}_t, M_t, \beta_t)^t \stackrel{ss}{\Longrightarrow} (\mathcal{TP}_t', t', \mathcal{S}_t, M_t, \circ)^t$ , then there exists  $\mathcal{TP}_s'$  such that:
  - $(\mathcal{TP}_{s}, t, \mathcal{S}_{s}, M_{s}, \beta_{s})^{\iota} : \stackrel{\text{sw}}{\Longrightarrow} (\mathcal{TP}_{s}', t', \mathcal{S}_{s}, M_{s}, \circ);$
  - $(\mathcal{TP}_t', \mathsf{t}', \mathcal{S}_t, M_t, \circ)^{\iota} \leqslant (\mathcal{TP}_s', \mathsf{t}', \mathcal{S}_s, M_s, \circ)^{\iota}$ .
- (4) if  $(\mathcal{TP}_t, t, \mathcal{S}_t, M_t, \beta_t)^i :\Longrightarrow \mathbf{done}$ , then there exist  $\mathcal{TP}_s^i, \mathcal{S}_s^i, M_s^i$  and  $\beta_s^i$  such that:
  - $(\mathcal{TP}_s, t, \mathcal{S}_s, M_s, \beta_s)^i :\Longrightarrow^* (\mathcal{TP}_s', t, \mathcal{S}_s', M_s', \beta_s')^i$  and  $(\mathcal{TP}_s', t, \mathcal{S}_s', M_s', \beta_s')^i :\Longrightarrow$  **done**.
- (5) if  $(\mathcal{TP}_t, t, \mathcal{S}_t, M_t, \beta_t)^i :\Longrightarrow \mathbf{abort}$ , then  $(\mathcal{TP}_s, t, \mathcal{S}_s, M_s, \beta_s)^i :\Longrightarrow \mathbf{abort}$ .

We write some figures to illustrate the whole program simulation defined in Def. 8.1 in the following.

• If the target program takes a tau step, the source program is permitted to take multiple steps.



• If the target program takes an output step, the source program is restricted to generate the same output.



• If the target program does a thread switching, the source program will switch to the same thread.

# 8.2 Compositionality

We need to prove that the thread-local upward simulation can compose to the whole program simulation as shown in Def. 8.1.

LEMMA 8.3 (COMPOSITIONALITY).

$$\forall \pi_{t}, \pi_{s}, I, \iota, f_{1}, \dots, f_{n}.$$

$$I, \iota \models \pi_{t} \preccurlyeq \pi_{s} \land$$

$$Safe(\mathbf{let} (\pi_{s}, \iota) \mathbf{in} f_{1} \mid \dots \mid f_{n}) \land$$

$$ww-NPRF(\mathbf{let} (\pi_{s}, \iota) \mathbf{in} f_{1} \mid \dots \mid f_{n})$$

$$\implies \mathbf{let} (\pi_{t}, \iota) \mathbf{in} f_{1} \mid \dots \mid f_{n} \leqslant \mathbf{let} (\pi_{s}, \iota) \mathbf{in} f_{1} \mid \dots \mid f_{n}$$

PROOF. From the premises, we have the following.

$$I, \iota \models \pi_t \preccurlyeq \pi_s \tag{1}$$

Safe(let 
$$(\pi_t, \iota)$$
 in  $f_1 \mid \ldots \mid f_n$ ) (2)

ww-NPRF(let 
$$(\pi_t, \iota)$$
 in  $f_1 \mid \ldots \mid f_n$ ) (3)

We unfold (1) and have the following.

$$I(\iota, \varphi_0, (\mathcal{S}_\perp, M_0, \mathcal{S}_\perp, M_0)) \wedge \mathsf{wf}(I) \tag{4}$$

$$\forall \sigma_{t}, f. \operatorname{Init}(\pi_{t}, f) = \sigma_{t}$$

$$\implies \exists \sigma_{s}.(\operatorname{Init}(\pi_{s}, f) = \sigma_{s} \land I, \iota \models ((\sigma_{t}, \mathcal{V}_{\perp}, \emptyset), \mathcal{S}_{\perp}, M_{0}) \preccurlyeq_{\varphi}^{\circ, \emptyset} ((\sigma_{s}, \mathcal{V}_{\perp}, \emptyset), \mathcal{S}_{\perp}, M_{0}))$$
(5)

We unfold the proof goal. We have the following premise.

$$\mathbf{let} (\pi_t, \iota) \mathbf{in} \, \mathsf{f}_1 \mid \ldots \mid \mathsf{f}_n : \stackrel{load}{\Longrightarrow} \hat{W}_t \tag{6}$$

And we need to prove that there exist  $\hat{W}_s$  such that:

$$\mathbf{let} (\pi_s, \iota) \mathbf{in} f_1 \mid \ldots \mid f_n : \stackrel{load}{\Longrightarrow} \hat{W}_s$$
 (g1)

$$\hat{W}_t \leqslant \hat{W}_s \tag{g2}$$

Let  $\hat{W}_t = (\mathcal{TP}_t, t, \mathcal{S}_{\perp}, M_0, \circ)^t$ . From (6) and (5), we have that there exist  $\mathcal{TP}_s$  such that:

let 
$$(\pi_s, \iota)$$
 in  $f_1 \mid \ldots \mid f_n : \stackrel{load}{\Longrightarrow} (\mathcal{TP}_s, t, \mathcal{S}_{\perp}, M_0, \circ)^{\iota}$  (7)

Thus, we finish the proof of (g1).

By applying Lemma. 8.5 on (2), we have the following.

$$\neg (\exists \hat{W}_s. (\mathcal{TP}_s, t, \mathcal{S}_\perp, M_0, \circ)^\iota :\Longrightarrow^* \hat{W}_s \land \hat{W}_s :\Longrightarrow_{\mathsf{ax}} \mathsf{abort}) \tag{8}$$

By applying Lemma. 8.8 on (3), we have the following.

$$\neg (\exists \hat{W}_{S}. (\mathcal{TP}_{S}, \mathsf{t}, \mathcal{S}_{\perp}, M_{0}, \circ)^{\iota} :\Longrightarrow^{*} \hat{W}_{S} \wedge \hat{W}_{S} :\Longrightarrow_{\mathsf{ax}} \mathsf{ww}\text{-Race}) \tag{9}$$

By applying Lemma. 8.4 on (5), (8) and (9), we prove (g2).

LEMMA 8.4 (COMPOSITIONALITY - AUX).

$$\forall \mathcal{TP}_{t}, i, S_{t}, M_{t}, \mathcal{TP}_{s}, S_{s}, M_{s}, \beta, \beta_{s}, i, \mathcal{D}, \varphi, n.$$

$$(\forall j \in \{1, \dots, n\} \setminus \{i\}. \ I, \iota \models (\mathcal{TP}_{t}(j), S_{t}, M_{t}) \preccurlyeq^{\circ, \emptyset}_{\varphi} (\mathcal{TP}_{s}(j), S_{s}, M_{s})) \wedge$$

$$I, \iota \models (\mathcal{TP}_{t}(i), S_{t}, M_{t}) \preccurlyeq^{\beta, \mathcal{D}}_{\varphi} (\mathcal{TP}_{s}(i), S_{s}, M_{s}) \wedge$$

$$\neg (\exists \hat{W}_{s}. (\mathcal{TP}_{s}, i, S_{s}, M_{s}, \beta_{s})^{\iota} :\Longrightarrow^{*} \hat{W}_{s} \wedge \hat{W}_{s} :\Longrightarrow_{\mathsf{ax}} \mathsf{abort}) \wedge$$

$$\neg (\exists \hat{W}_{s}. (\mathcal{TP}_{s}, i, S_{s}, M_{s}, \beta_{s})^{\iota} :\Longrightarrow^{*} \hat{W}_{s} \wedge \hat{W}_{s} :\Longrightarrow_{\mathsf{ax}} \mathsf{ww}\text{-Race}) \wedge$$

$$(\beta = \circ \implies \beta_{s} = \circ) \wedge \mathsf{wf}(I)$$

$$\Longrightarrow (\mathcal{TP}_{t}, i, S_{t}, M_{t}, \beta)^{\iota} \leqslant (\mathcal{TP}_{s}, i, S_{s}, M_{s}, \beta_{s})^{\iota}$$

PROOF. By cofix. From the premises, we have the following.

$$\forall j \in \{1, \dots, n\} \setminus \{i\}. \ I, \iota \models (\mathcal{TP}_t(j), \mathcal{S}_t, M_t) \preccurlyeq^{\circ, \emptyset}_{\varphi} (\mathcal{TP}_s(j), \mathcal{S}_s, M_s)$$
 (1)

$$I, \iota \models (\mathcal{TP}_t(i), \mathcal{S}_t, M_t) \preccurlyeq^{\beta, \mathcal{D}}_{\varphi} (\mathcal{TP}_s(i), \mathcal{S}_s, M_s)$$
 (2)

$$\neg (\exists \hat{W}_{s}. (\mathcal{TP}_{s}, i, \mathcal{S}_{s}, M_{s}, \beta_{s})^{\iota} :\Longrightarrow^{*} \hat{W}_{s} \land \hat{W}_{s} :\Longrightarrow_{\mathsf{ax}} \mathsf{abort})$$
(3)

$$\neg (\exists \hat{W}_{s}. (\mathcal{TP}_{s}, i, \mathcal{S}_{s}, M_{s}, \beta_{s})^{t} :\Longrightarrow^{*} \hat{W}_{s} \land \hat{W}_{s} :\Longrightarrow_{\mathsf{ax}} \mathsf{ww}\text{-Race}) \tag{4}$$

$$(\beta = \circ \implies \beta_s = \circ) \land wf(I)$$
 (5)

We unfold the proof goal and we need to prove the following.

(1) If the current target thread takes a tau step, we have the following.

$$(\mathcal{TP}_t, i, \mathcal{S}_t, M_t, \beta)^{\iota} : \stackrel{\tau}{\Longrightarrow} (\mathcal{TP}_t', i, \mathcal{S}_t', M_t', \beta')^{\iota}$$

$$(6.1)$$

We unfold (6.1) and have that there exist  $TS'_t$ ,  $S'_t$ ,  $M'_t$  such that:

$$\iota \vdash (\mathcal{TP}_t(i), \mathcal{S}_t, M_t, \beta) \longmapsto^+ (TS'_t, \mathcal{S}'_t, M'_t, \beta')$$
(6.11)

$$\mathcal{TP}_t' = \mathcal{TP}_t \{ i \leadsto TS_t' \} \tag{6.12}$$

consistent<sub>NP</sub>
$$(TS'_t, M'_t, \beta', \iota)$$
 (6.13)

We apply Lemma. 8.11 on (6.12) (2). We have that there exist  $TS'_s$ ,  $S'_s$ ,  $M'_s$  and  $\beta'_s$  such that:

$$\iota \vdash (\mathcal{TP}_{s}(i), \mathcal{S}_{s}, M_{s}, \beta_{s}) \longmapsto^{*} (TS'_{s}, \mathcal{S}'_{s}, M'_{s}, \beta'_{s}) \tag{6.14}$$

$$I, \iota \models (TS'_t, \mathcal{S}'_t, M'_t) \preccurlyeq^{\beta', \mathcal{D}'}_{\varphi'} (TS'_s, \mathcal{S}'_s, M'_s)$$

$$(6.15)$$

$$(\beta' = \circ \implies \beta'_s = \circ) \land \varphi \subseteq \varphi' \tag{6.16}$$

We discuss (6.14). If the current source thread takes zero step, we finish the proof of such case by co-inductive hypothesis. We focus on the case that the current source thread takes multiple steps. By applying Lemma. 8.13 on (6.13), (6.15), (3) and (4), we have the following.

$$consistent_{NP}(TS'_{s}, M'_{s}, \beta'_{s}, \iota)$$
(6.17)

From (6.14), (6.17) and co-inductive hypothesis, we finish the proof.

$$\iota \vdash (\mathcal{TP}(\mathsf{t}), \mathcal{S}, M) \longrightarrow^* (TS', \mathcal{S}', M')$$

$$TS'.\sigma \xrightarrow{W(\mathsf{na}, \mathsf{x}, \_)} \_$$

$$\langle \mathsf{x} : \_@(\_, t], \_\rangle \in (M' \setminus TS'.P) \quad TS'.V.\mathsf{cur}.T_{\mathsf{rlx}}(\mathsf{x}) < t$$

$$\iota \vdash (TS', \mathcal{S}', M') \longrightarrow^* ((\_, \_, \emptyset), \_, \_)$$

$$(\mathcal{TP}, \mathsf{t}, \mathcal{S}, M, \beta)^{\iota} :\Longrightarrow_{\mathsf{ax}} \mathsf{ww}\text{-Race}$$

$$(\mathcal{TP}, \mathsf{t}, \mathcal{S}, M, \beta)^{\iota} :\Longrightarrow_{\mathsf{ax}} \mathsf{abort} ::= (\mathcal{TP}, \mathsf{t}, \mathcal{S}, M)^{\iota} \Longrightarrow \mathsf{abort}$$

Fig. 26. Auxiliary write-write race and abort step under non-preemptive semantics

(2) If the current target thread takes a done step, we have the following.

$$(\mathcal{TP}_t, \mathsf{t}, \mathcal{S}_t, M_t, \beta)^{\iota} : \stackrel{\mathsf{out}(v)}{\Longrightarrow} (\mathcal{TP}_t', \mathsf{t}, \mathcal{S}_t', M_t', \circ)^{\iota}$$

$$\tag{6.2}$$

We unfold (6.1) and have that there exist  $TS'_t$ ,  $S'_t$  and  $M'_t$  such that:

$$\iota \vdash (\mathcal{TP}_t(i), \mathcal{S}_t, M_t, \beta) \xrightarrow{\text{out}(\upsilon)} (TS'_t, S'_t, M'_t, \beta') \tag{6.21}$$

$$\mathcal{TP}_t' = \mathcal{TP}_t \{ i \leadsto TS_t' \} \tag{6.22}$$

consistent<sub>NP</sub>
$$(TS'_t, M'_t, \beta', \iota)$$
 (6.23)

By applying Lemma. 8.12 on (6.21) and (2), we have that there exist  $TS_s'$ ,  $S_s'$ ,  $M_s'$  and  $\varphi'$  such that:

$$\iota \vdash (TS_s, \mathcal{S}_s, M_s, \beta) \longmapsto^* \stackrel{\mathsf{out}(v)}{\longmapsto} (TS'_s, \mathcal{S}'_s, M'_s, \circ) \tag{6.24}$$

$$I, \iota \models (TS'_t, S'_t, M'_t) \preccurlyeq^{\circ, \emptyset}_{\sigma'} (TS'_s, S'_s, M'_s)$$

$$(6.25)$$

$$\varphi \subseteq \varphi' \wedge I(\iota, \varphi', (S'_t, M'_t, S'_s, M'_s)) \tag{6.26}$$

From (6.24), (6.25), (6.26) and co-inductive hypothesis, we finish the proof.

- (3) The correctness of the case that the target program takes a switch step is straight-forward.
- (4) The correctness of the case that the target program takes a done step is straight-forward.
- (5) If the current target thread takes an abort step, we have the following.

$$(\mathcal{TP}_t, t, \mathcal{S}_t, M_t, \beta)^i :\Longrightarrow \mathbf{abort}$$
 (6.5)

We unfold (6.5) and have the following.

$$\iota \vdash (\mathcal{TP}_t(\mathsf{t}), \mathcal{S}_t, M_t, \beta) \longmapsto^* (TS'_t, S'_t, M'_t, \beta') \tag{6.51}$$

$$\iota \vdash (TS'_t, S'_t, M'_t) \longrightarrow \mathbf{abort}$$
 (6.52)

By applying Lemma. 8.11 on (6.51) and (2), We have that there exist  $TS'_s$ ,  $S'_s$ ,  $M'_s$  and  $\beta'_s$  such that:

$$\iota \vdash (\mathcal{TP}_{s}(i), \mathcal{S}_{s}, M_{s}, \beta_{s}) \longmapsto^{*} (TS'_{s}, \mathcal{S}'_{s}, M'_{s}, \beta'_{s})$$

$$(6.53)$$

$$I, \iota \models (TS'_t, \mathcal{S}'_t, M'_t) \preccurlyeq^{\beta', \mathcal{D}'}_{\varphi'} (TS'_s, \mathcal{S}'_s, M'_s)$$

$$(6.54)$$

$$\beta' = \circ \implies \beta'_{s} = \circ \tag{6.55}$$

From (6.52) and (6.54), we construct an abort step of the source program.

LEMMA 8.5 (SOUND NP-ABORT).

$$\begin{split} \forall \mathcal{TP}, \mathsf{t}, \mathcal{S}, M, \iota. \\ \neg (\exists \hat{W}. \, (\mathcal{TP}, \mathsf{t}, \mathcal{S}, M, \circ)^{\iota} :\Longrightarrow^{*} \hat{W} \wedge \hat{W} :\Longrightarrow \mathsf{abort}) \\ \Longrightarrow \neg (\exists \hat{W}. \, (\mathcal{TP}, \mathsf{t}, \mathcal{S}, M, \circ)^{\iota} :\Longrightarrow^{*} \hat{W} \wedge \hat{W} :\Longrightarrow_{\mathsf{ax}} \mathsf{abort}) \end{split}$$

PROOF. We need to prove that the following.

$$(\exists \hat{W}. (\mathcal{TP}, \mathsf{t}, \mathcal{S}, M, \circ)^{\iota} :\Longrightarrow^{*} \hat{W} \land \hat{W} :\Longrightarrow_{\mathsf{ax}} \mathsf{abort})$$

$$\Longrightarrow (\exists \hat{W}. (\mathcal{TP}, \mathsf{t}, \mathcal{S}, M, \circ)^{\iota} :\Longrightarrow^{*} \hat{W} \land \hat{W} :\Longrightarrow \mathsf{abort})$$

From the premise, we have that there exist  $\hat{W}$  such that:

$$(\mathcal{TP}, \mathsf{t}, \mathcal{S}, M, \circ)^{\iota} :\Longrightarrow^{*} \hat{W}$$
 (1)

$$\hat{W} :\Longrightarrow_{\mathsf{ax}} \mathsf{abort}$$
 (2)

We have the following.

$$((\mathcal{TP}, t, \mathcal{S}, M, \circ)^{\iota} :\Longrightarrow_{ax} \mathbf{abort}) \vee \\ \neg ((\mathcal{TP}, t, \mathcal{S}, M, \circ)^{\iota} :\Longrightarrow_{ax} \mathbf{abort})$$
(3)

We destruct (3) and discuss each case respectively.

• We first consider that the current thread will abort.

$$(\mathcal{TP}, t, \mathcal{S}, M, \circ)^{l} :\Longrightarrow_{ax} abort$$
 (3.1)

We finish the proof of such case by applying Lemma. 8.6 on (3.1).

• Then, we consider that the current thread will not abort.

$$\neg((\mathcal{TP}, \mathsf{t}, \mathcal{S}, M, \circ)^{\iota} :\Longrightarrow_{\mathsf{ax}} \mathsf{abort}) \tag{3.2}$$

By applying Lemma. 8.7 on (1), (2) and (3.2), we finish the proof.

LEMMA 8.6 (SOUND NP-ABORT AUX-1).

$$\forall \mathcal{TP}, \mathsf{t}, \mathcal{S}, M, \iota.$$

$$(\mathcal{TP}, \mathsf{t}, \mathcal{S}, M, \circ)^{\iota} :\Longrightarrow_{\mathsf{ax}} \mathsf{abort}$$

$$\Longrightarrow (\mathcal{TP}, \mathsf{t}, \mathcal{S}, M, \circ)^{\iota} :\Longrightarrow \mathsf{abort}$$

Lemma 8.7 (Sound NP-Abort Aux-2).

$$\begin{array}{ccc} \forall \hat{W}, \hat{W_0}, n. \\ & \hat{W} :\Longrightarrow^n \hat{W_0} \wedge \hat{W_0} :\Longrightarrow_{\mathsf{ax}} \mathsf{abort} \wedge \\ & \neg (\hat{W} :\Longrightarrow_{\mathsf{ax}} \mathsf{abort}) \\ &\Longrightarrow \exists \hat{W}'. \ \hat{W} :\Longrightarrow^* \hat{W}' \wedge \hat{W}' :\Longrightarrow \mathsf{abort} \end{array}$$

Lemma 8.8 (Sound aux ww-np-race).

$$\forall \mathcal{TP}, \mathsf{t}, \mathcal{S}, M, \iota.$$

$$\neg (\exists \hat{W}. (\mathcal{TP}, \mathsf{t}, \mathcal{S}, M, \circ)^{\iota} :\Longrightarrow^{*} \hat{W} \land \hat{W} :\Longrightarrow \mathsf{ww}\text{-Race})$$

$$\Longrightarrow \neg (\exists \hat{W}. (\mathcal{TP}, \mathsf{t}, \mathcal{S}, M, \circ)^{\iota} :\Longrightarrow^{*} \hat{W} \land \hat{W} :\Longrightarrow_{\mathsf{ax}} \mathsf{ww}\text{-Race})$$

PROOF. From the premises, we have the following.

$$(\mathcal{TP}, \mathsf{t}, \mathcal{S}, M, \circ)^{\iota} :\Longrightarrow^{*} \hat{W}$$
 (1)

$$\hat{W} : \Longrightarrow_{ax} ww$$
-Race (2)

We need to prove the following.

$$\exists \hat{W}. (\mathcal{TP}, \mathsf{t}, \mathcal{S}, M, \circ)^{\iota} :\Longrightarrow^{*} \hat{W} \land \hat{W} :\Longrightarrow \mathsf{ww}\text{-Race}$$
 (g)

We have the following.

$$((\mathcal{TP}, t, \mathcal{S}, M, \circ)^{\iota} : \Longrightarrow_{ax} ww-Race) \lor \neg ((\mathcal{TP}, t, \mathcal{S}, M, \circ)^{\iota} : \Longrightarrow_{ax} ww-Race)$$
(3)

We destruct (3) and discuss each case respectively.

• We first consider that the current thread will generate data race.

$$(\mathcal{TP}, \mathsf{t}, \mathcal{S}, M, \circ)^{\iota} : \Longrightarrow_{\mathsf{ax}} \mathsf{ww}\text{-Race}$$
 (4.1)

We apply Lemma. 8.9 on (4.1) and finish the proof of such case.

• Then, we consider that the current thread will not generate data race.

$$\neg ((\mathcal{TP}, \mathsf{t}, \mathcal{S}, M, \circ)^{\iota} : \Longrightarrow_{\mathsf{ax}} \mathsf{ww}\text{-Race}) \tag{5.2}$$

We apply Lemma. 8.10 on (2), (3) and (5.2) and finish the proof of such case.

Lemma 8.9 (Sound aux ww-np-race aux-1).

$$\forall \mathcal{TP}, \mathsf{t}, \mathcal{S}, M, \iota.$$

$$(\mathcal{TP}, \mathsf{t}, \mathcal{S}, M, \circ)^{\iota} : \Longrightarrow_{\mathsf{ax}} \mathsf{ww}\text{-Race}$$

$$\Longrightarrow \exists \hat{W}. (\mathcal{TP}, \mathsf{t}, \mathcal{S}, M, \circ)^{\iota} : \Longrightarrow^{*} \hat{W} \land \hat{W} : \Longrightarrow \mathsf{ww}\text{-Race}$$

LEMMA 8.10 (SOUND AUX WW-NP-RACE AUX-2).

$$\forall \hat{W}, \hat{W}_0, n.$$

$$\hat{W} :\Longrightarrow^n \hat{W}_0 \land \hat{W}_0 : \longmapsto_{\mathsf{ax}} \mathsf{ww}\text{-Race} \land \\ \neg(\hat{W} : \longmapsto_{\mathsf{ax}} \mathsf{ww}\text{-Race})$$

$$\Longrightarrow \exists \hat{W}', \hat{W} :\Longrightarrow^* \hat{W}' \land \hat{W}' : \longmapsto \mathsf{ww}\text{-Race}$$

LEMMA 8.11 (SIMULATION: TAU STEP).

$$\begin{split} \forall TS_t, \mathcal{S}_t, M_t, TS_t', \mathcal{S}_t', M_t', TS_s, \mathcal{S}_s, M_s, n, \beta, \beta_s, \beta', \mathcal{D}, \varphi. \\ \iota \vdash (TS_t, \mathcal{S}_t, M_t, \beta) &\longmapsto^n (TS_t', \mathcal{S}_t', M_t', \beta') \land \\ I, \iota \models (TS_t, \mathcal{S}_t, M_t) \preccurlyeq^{\beta, \mathcal{D}}_{\varphi} (TS_s, \mathcal{S}_s, M_s) \land (\beta = \circ \implies \beta_s = \circ) \\ &\Longrightarrow \exists TS_s', \mathcal{S}_s', M_s', \mathcal{D}', \varphi', \beta_s'. \\ \iota \vdash (TS_s, \mathcal{S}_s, M_s, \beta_s) &\longmapsto^* (TS_s', \mathcal{S}_s', M_s', \beta_s') \land \\ I, \iota \models (TS_t', \mathcal{S}_t', M_t') \preccurlyeq^{\beta', \mathcal{D}'}_{\varphi'} (TS_s', \mathcal{S}_s', M_s') \land \varphi \subseteq \varphi' \land \\ (\beta' = \circ \implies \beta_s' = \circ) \end{split}$$

LEMMA 8.12 (SIMULATION: OUTPUT STEP).

$$\forall TS_{t}, S_{t}, M_{t}, TS'_{t}, S'_{t}, M'_{t}, TS_{s}, S_{s}, M_{s}, \beta, \beta_{s}, \mathcal{D}, \varphi.$$

$$\iota \vdash (TS_{t}, S_{t}, M_{t}, \beta) \xrightarrow{\text{out}(\upsilon)} (TS'_{t}, S'_{t}, M'_{t}, \circ) \land$$

$$I, \iota \models (TS_{t}, S_{t}, M_{t}) \preccurlyeq^{\beta, \mathcal{D}}_{\varphi} (TS_{s}, S_{s}, M_{s}) \land (\beta = \circ \implies \beta_{s} = \circ)$$

$$\Longrightarrow \exists TS'_{s}, S'_{s}, M'_{s}, \varphi'.$$

$$\iota \vdash (TS_{s}, S_{s}, M_{s}, \beta) \longmapsto^{*} \overset{\text{out}(\upsilon)}{\longmapsto} (TS'_{s}, S'_{s}, M'_{s}, \circ) \land$$

$$I, \iota \models (TS'_{t}, S'_{t}, M'_{t}) \preccurlyeq^{\circ, \emptyset}_{\varphi'} (TS'_{s}, S'_{s}, M'_{s}) \land \varphi \subseteq \varphi'$$

# 8.3 Promise certification preservation

Theorem. 8.13 shows that under the assumption of the write-write race freedom, our thread-local simulation ensures the preserving of the promise certification. illustrate how we prove that a certification against the current memory for non-atomic locations ensures the existence of the certification against the capped memory in Lemma. 8.18. This lemma also shows why the locations in our work are divided into atomic locations and non-atomic locations. In the following introduction, the conditions, which say that the thread promises set is a subseteq of the memory and the thread view is closed, are omitted in the presentations of some lemmas, since these conditions are ensured by promising semantics and not the main points of our proof.

THEOREM 8.13 (PROMISE CONSISTENCY PRESERVING).

$$\forall \mathcal{TP}_{t}, t, M_{t}, \mathcal{TP}_{s}, M_{s}, \iota, \beta, \mathcal{D}, \varphi.$$

$$consistent_{NP}(\mathcal{TP}_{t}(t), M_{t}, \beta, \iota) \wedge$$

$$I, \iota \models (\mathcal{TP}_{t}(t), \mathcal{S}_{t}, M_{t}) \preccurlyeq^{\beta, \mathcal{D}}_{\varphi} (\mathcal{TP}_{s}(t), \mathcal{S}_{s}, M_{s}) \wedge$$

$$\neg(\iota \vdash (\mathcal{TP}_{s}(t), \mathcal{S}_{s}, M_{s}) \longrightarrow^{*} \mathbf{abort}) \wedge$$

$$\neg((\mathcal{TP}_{s}, t, \mathcal{S}_{s}, M_{s}, \beta)^{\iota} : \Longrightarrow_{\mathsf{ax}} \mathsf{ww-Race})$$

$$\Longrightarrow \mathsf{consistent}_{NP}(\mathcal{TP}_{s}(t), M_{s}, \beta, \iota)$$

PROOF. From the premises, we have that the following hold.

consistent<sub>NP</sub>(
$$\mathcal{TP}_t(t), M_t, \beta, \iota$$
) (1)

$$I, \iota \models (\mathcal{TP}_t(\mathsf{t}), \mathcal{S}_t, M_t) \preccurlyeq^{\beta, \mathcal{D}}_{\varphi} (\mathcal{TP}_s(\mathsf{t}), \mathcal{S}_s, M_s)$$
 (2)

$$\neg(\iota \vdash (\mathcal{TP}_t(\mathsf{t}), \mathcal{S}_t, M_t) \longrightarrow^* \mathbf{abort}) \tag{3}$$

$$\neg ((\mathcal{TP}_s, \mathsf{t}, \mathcal{S}_s, M_s, \beta)^\iota : \Longrightarrow_{\mathsf{ax}} \mathsf{ww}\text{-Race})$$
 (4)

We unfold (1) and have the following:

$$\iota \vdash (\mathcal{TP}_t(\mathsf{t}), \widehat{T}(M_t), \widehat{M}_t, \beta) \longmapsto^* ((\_, \_, \emptyset), \_, \_, \_)$$
 (5)

Let  $M_{sc} = (\{m \in M_s \mid \iota(m.\text{var}) = \text{na}\} \cup \{m \in \widehat{M_s} \mid \iota(m.\text{var}) = \text{at}\})$ . We apply Lemma. 8.14 on (5), (2) and (3) and have the following.

$$\iota \vdash (\mathcal{TP}_s(\mathsf{t}), \widehat{T}(M_s), M_{sc}, \beta) \longmapsto^* ((\_, \_, \emptyset), \_, \_, \_)$$

$$\tag{6}$$

We apply Lemma. 8.17 on (4) and have the following.

$$\neg ((\mathcal{TP}_{s}, t, \widehat{T}(\widehat{M}_{s}), M_{sc}) : \Longrightarrow_{ax} ww-Race$$
 (7)

By applying Lemma. 8.18 on (6) and (7), we finish the proof.

LEMMA 8.14 (LSIM ENSURES PROMISE FULFILLING - CAPPED).

$$\forall \iota, TS_t, \mathcal{S}_t, M_t, TS_s, \mathcal{S}_s, M_s, \varphi, I, \iota, M_{sc}, \beta, n.$$

$$\iota \vdash (TS_t, \widehat{T}(M_t), \widehat{M_t}, \beta) \longmapsto^n ((\_, \_, \emptyset), \_, \_, \_) \land$$

$$I, \iota \models (TS_t, \mathcal{S}_t, M_t) \preccurlyeq^{\beta, \mathcal{D}}_{\varphi} (TS_s, \mathcal{S}_s, M_s) \land$$

$$M_{sc} = (\{m \in M_s \mid \iota(m.var) = na\} \cup \{m \in \widehat{M_s} \mid \iota(m.var) = at\}) \land$$

$$\neg(\iota \vdash (TS_t, \mathcal{S}_t, M_t) \longrightarrow^* \mathbf{abort})$$

$$\Longrightarrow \iota \vdash (TS_s, \widehat{T}(\widehat{M_s}), M_{sc}, \beta) \stackrel{\mathsf{pf}}{\longrightarrow}^* ((\_, \_, \emptyset), \_, \_, \_)$$

PROOF. Prove by applying Lemma. 8.15.

LEMMA 8.15 (LSIM ENSURES PROMISE FULFILLING - CAPPED AUX).

$$\forall TS_{t}, S_{tc}, M_{tc}, S_{t}, M_{t}, TS_{s}, S_{sc}, M_{sc}, S_{s}, M_{s}, \beta, \mathcal{D}, \varphi.$$

$$\iota \vdash (TS_{t}, S_{tc}, M_{tc}, \beta) \longmapsto^{n} ((\_, \_, \emptyset), \_, \_, \_) \land$$

$$M_{t} \subseteq M_{tc} \land (\forall m \in (M_{tc} - M_{t}). \ m = \langle\_: (\_, \_] \rangle) \land$$

$$I, \iota \models (TS_{t}, S_{t}, M_{t}) \preccurlyeq^{\beta, \mathcal{D}} (TS_{s}, S_{s}, M_{s}) \land$$

$$(\forall x \notin \iota. \ M_{s}(x) = M_{sc}(x)) \land ([M_{tc}]_{\iota} \approx [M_{sc}]_{\iota}) \land$$

$$\neg (\iota \vdash (TS_{t}, S_{tc}, M_{tc}) \longrightarrow^{*} \mathbf{abort})$$

$$\Longrightarrow \iota \vdash (TS_{s}, S_{sc}, M_{sc}, \beta) \stackrel{\mathsf{pf}}{\longrightarrow}^{*} ((\_, \_, \emptyset), \_, \_, \_)$$

PROOF. Prove by induction on n. If n is zero, we prove by applying Lemma. 8.16. And if n is greater than zero, we prove by applying inductive hypothesis.

LEMMA 8.16 (LSim ensures promise fulfilling - capped with target prm empty).

$$\forall TS_{t}, S_{t}, M_{t}, TS_{s}, S_{sc}, M_{sc}, S_{s}, M_{s}, \beta, \mathcal{D}, \varphi.$$

$$TS_{t}.P = \emptyset \land M_{t} \subseteq M_{tc} \land (\forall m \in (M_{tc} - M_{t}). \ m = \langle \_: (\_, \_] \rangle) \land$$

$$I, \iota \models (TS_{t}, S_{t}, M_{t}) \preccurlyeq^{\beta, \mathcal{D}}_{\varphi} (TS_{s}, S_{s}, M_{s}) \land$$

$$(\forall x \notin \iota. \ M_{s}(x) = M_{sc}(x)) \land ([M_{tc}]_{\iota} \approx [M_{sc}]_{\iota}) \land$$

$$\neg(\iota \vdash (TS_{t}, S_{t}, M_{t}) \xrightarrow{}^{*} \mathbf{abort})$$

$$\Longrightarrow \iota \vdash (TS_{s}, S_{sc}, M_{sc}) \xrightarrow{\to^{*}} ((\_, \_, \emptyset), \_, \_)$$

Proof. Prove by induction on the well-ordered delayed write set  $\mathcal{D}$ . The order of the delayed write set is  $\mathcal{D} \ll \mathcal{D}' \triangleq \exists \mathcal{D}_0 . \mathcal{D} \subseteq \mathcal{D}_0 \land \mathcal{D}_0 < \mathcal{D}'.$ 

Lemma 8.17 (Race-free implies capped race-free).

$$\begin{split} \forall \mathcal{TP}, \mathsf{t}, \mathcal{S}, M, \beta, \iota, \mathcal{S}_c, M_c. \\ \neg (& (\mathcal{TP}, \mathsf{t}, \mathcal{S}, M, \beta)^\iota : \Longrightarrow_{\mathsf{ax}} \mathsf{ww}\text{-Race}) \land \\ & M \subseteq M_c \land (\forall m \in (M_c - M). \ m = \langle\_:(\_,\_]\rangle) \land \mathcal{S} \leq \mathcal{S}_c \\ \Longrightarrow \neg (& (\mathcal{TP}, \mathsf{t}, \mathcal{S}_c, M_c, \beta)^\iota : \Longrightarrow_{\mathsf{ax}} \mathsf{Race}) \end{split}$$

Lemma. 8.18 shows that a thread from a write-write race free program can certify promises (for non-atomic writes) against the current memory instead of the capped memory. The intuition of the correctness of such lemma includes two points: (1) write-write race freedom forbids a thread t to write to a location when the memory contains a write of the same location made by another thread t' and unobserved by t; (2) There is no atomic update operation performing on the non-atomic location.

$$T \sim_{\varphi} T' \triangleq \forall \mathsf{x}. \ \varphi(\mathsf{x}, T(\mathsf{x})) = T'(\mathsf{x})$$

$$V \sim_{\varphi} V' \triangleq V.T_{\mathsf{na}} \sim_{\varphi} V'.T_{\mathsf{na}} \wedge V.T_{\mathsf{rlx}} \sim_{\varphi} V'.T_{\mathsf{rlx}}$$

$$\mathcal{V} \sim_{\varphi} \mathcal{V}' \triangleq \mathcal{V}.cur \sim_{\varphi} \mathcal{V}'.cur \wedge \mathcal{V}.acq \sim_{\varphi} \mathcal{V}'.acq \wedge (\forall \mathsf{x}. \mathcal{V}.rel(\mathsf{x}) \sim_{\varphi} \mathcal{V}.rel(\mathsf{x}))$$

$$M \sim_{\varphi} M' \triangleq \varphi(M_t, M_s) \wedge (\forall m_t \in M_t. \exists m_s \in M_s. \ \varphi(m_t.\mathsf{var}, m_t.\mathsf{to}) = m_s.\mathsf{to} \wedge m_t.\mathsf{var} = m_s.\mathsf{var} \wedge m_t.\mathsf{val} = m_s.\mathsf{val} \wedge m_t.\mathcal{V} \sim_{\varphi} m_s.\mathcal{V})$$

Fig. 27. Auxiliary definitions in promise certification preservation

Lemma 8.18 (Fulfilled under race-free implies promise consistent).

$$\forall \mathcal{TP}, \mathsf{t}, \mathcal{S}, M, \iota, M_{sc}, n, \beta.$$

$$\iota \vdash (\mathcal{TP}(\mathsf{t}), \widehat{T}(M), M_{sc}, \beta) \longmapsto^{n} ((\_, \_, \emptyset), \_, \_, \_) \land$$

$$\neg ((\mathcal{TP}, \mathsf{t}, \widehat{T}(\widehat{M}), M_{sc}, \beta)^{\iota} : \Longrightarrow_{\mathsf{ax}} \mathsf{ww}\text{-Race}) \land$$

$$M_{sc} = (\{m \in M \mid \iota(m.\mathsf{var}) = \mathsf{na}\} \cup \{m \in \widehat{M} \mid \iota(m.\mathsf{var}) = \mathsf{at}\})$$

$$\Longrightarrow \mathsf{consistent}_{\mathsf{NP}}(\mathcal{TP}(\mathsf{t}), M, \beta, \iota)$$

PROOF. By unfolding the definitions of consistent<sub>NP</sub>, we need to prove that for any  $\mathcal{TP}$ , t,  $\mathcal{S}$ , M,  $\iota$ ,  $M_{sc}$ , n,  $\beta$ , if

$$\iota \vdash (\mathcal{TP}(\mathsf{t}), \widehat{T}(M), M_{sc}, \beta) \longmapsto^{n} ((\_, \_, \emptyset), \_, \_, \_)$$

$$\tag{1}$$

$$\neg ((\mathcal{TP}, \mathsf{t}, \widehat{T}(\widehat{M}), M_{sc}, \beta)^{\iota} : \Longrightarrow_{\mathsf{ax}} \mathsf{ww}\text{-Race})$$
 (2)

$$M_{sc} = (\{m \in M \mid \iota(m.\text{var}) = \text{na}\} \cup \{m \in \widehat{M} \mid \iota(m.\text{var}) = \text{at}\})$$
(3)

then

$$\iota \vdash (\mathcal{TP}(\mathsf{t}), \widehat{T}(M), \widehat{M}, \beta) \longmapsto^* ((\_, \_, \emptyset), \_, \_, \_)$$
 (g)

By applying lemma. 8.19 (in the proof of this lemma, we illustrate the main idea that a thread from a write-write race free program can certify promises (for non-atomic writes) against current memory instead of the capped memory and why we divide locations into non-atomic locations and atomic locations) on (g), we let  $\varphi = \{(x, t) \leadsto t \mid (x, t) \in [\![M_{sc}]\!]\}$  and need to prove the following.

$$\iota \vdash (\mathcal{TP}(\mathsf{t}), \widehat{T}(M), M_{\mathsf{sc}}, \beta) \longmapsto^{n} ((\_, \_, \emptyset), \_, \_, \_) \tag{g1}$$

$$V \sim_{\varphi} V$$
 (g2)

$$P \approx P$$
 (g3)

$$M_{\rm sc} \sim_{\varphi} \widehat{M}$$
 (g4)

$$[M]_{\iota} \approx [\widehat{M}]_{\iota}$$
 (g5)

$$M_{sc} \subseteq \widehat{M} \land (\forall m \in (\widehat{M} \backslash M_{sc}). \ m = \langle \_ : (\_, \_] \rangle)$$
 (g6)

$$\neg ((\mathcal{TP}, \mathsf{t}, \widehat{T}(\widehat{M}), M_{\mathsf{sc}}, \beta)^{\iota} : \Longrightarrow_{\mathsf{ax}} \mathsf{ww}\text{-Race}) \tag{g7}$$

We prove (g1) by applying (1).

(g2) and (g3) can be proved according to the definitions in Fig. 27 directly.

(g4), (g5) and (g6) can be proved from (3).

We prove (g7) from (2).

Lemma 8.19 (Promise certification from current memory to capped memory).

$$\begin{split} \forall n, \sigma, \mathcal{V}, P, \mathcal{S}, M, \beta, \mathcal{V}', P', M', \mathcal{TP}, \mathsf{t}, . \\ & \iota \vdash ((\sigma, \mathcal{V}, P), \mathcal{S}, M, \beta) \longmapsto^n ((\_, \_, \emptyset), \_, \_, \_) \land \\ & \mathcal{V} \sim_{\varphi} \mathcal{V}' \land P \approx P' \land M \sim_{\varphi} M' \land [M]_{\iota} \approx [M']_{\iota} \\ & M \subseteq M' \land (\forall m \in (M' \backslash M). \ m = \langle \_ : (\_, \_] \rangle) \land \\ & \neg ((\mathcal{TP}, \mathsf{t}, \mathcal{S}, M)^{\iota} \longmapsto_{\mathsf{ax}} \mathsf{ww}\text{-Race}) \land \mathcal{TP}(\mathsf{t}) = (\sigma, \mathcal{V}, P) \\ \Longrightarrow & \iota \vdash ((\sigma, \mathcal{V}', P'), \mathcal{S}, M', \beta) \longmapsto^* ((\_, \_, \emptyset), \_, \_, \_). \end{split}$$

PROOF. We illustrate the main idea of the proof of such lemma. The atomic locations in  $M_{sc}$  and  $\widehat{M}$  are the same. Thus, writing to the atomic locations in  $M_{sc}$  and  $\widehat{M}$  has no difference. We focus on the non-atomic locations. Consider the following situation, the left side is the current memory on location x and the right side is the capped version of the current memory on location x. We assume that m and  $m_0$  are all concrete messages. Here, the location x is a non-atomic location.



Consider that the thread does a memory write from the current state.

• If it inserts a new message  $m_1$  between m and  $m_0$ ,  $m_0$  must be a promise of the thread. Otherwise, a write-write race arises. The corresponding memory write on the capped memory will split  $m_0$  and insert  $m'_1$ , which is the corresponding message of  $m_1$ . Note, one important thing here is that  $m_1$  is not generated by the atomic update operation (e.g. CAS), since we prohibit the atomic update operation performed on the non-atomic location. If  $m_1$  is generated by an atomic update operation, the "from"-timestamp of  $m_0$  must equal to the "to"-timestamp of m, which is impossible to achieve on the capped memory.



• If it inserts a new message  $m_2$ , which has a timestamp larger than  $m_0$ , the corresponding memory write (generating  $m'_2$ , which is the corresponding message of  $m_2$ ) on the capped memory insert a message, which has a larger timestamp than the capped message. We show such condition in the following figure.



## PROOF OF WRITE-WRITE RACE FREEDOM PRESERVING

We show the correctness proof of write-write race freedom preserving in the following.

LEMMA 9.1 (WW-RF PRESERVING).

$$\forall \pi_t, \pi_s, I, \iota, f_1, \dots, f_n.$$
 $\text{ww-NPRF}(\mathbf{let} (\pi_s, \iota) \mathbf{in} f_1 \mid \dots \mid f_n) \land I, \iota \models \pi_t \preccurlyeq \pi_s \land Safe(\mathbf{let} (\pi_s, \iota) \mathbf{in} f_1 \mid \dots \mid f_n)$ 
 $\implies \text{ww-NPRF}(\mathbf{let} (\pi_t, \iota) \mathbf{in} f_1 \mid \dots \mid f_n)$ 

PROOF. From the premises, we have the following.

ww-NPRF(
$$\mathbf{let}(\pi_s, \iota) \mathbf{in} f_1 \mid \ldots \mid f_n$$
) (1)

$$I, \iota \models \pi_t \preccurlyeq \pi_s \tag{2}$$

Safe(let 
$$(\pi_s, \iota)$$
 in  $f_1 \mid \ldots \mid f_n$ ) (3)

We need to prove the following.

$$\mathsf{ww}\mathsf{-NPRF}(\mathbf{let}\,(\pi_t,\iota)\,\mathbf{in}\,\mathsf{f}_1\,|\,\ldots\,|\,\mathsf{f}_n) \tag{g}$$

We unfold ww-race freedom defined under the non-preemptive semantics and have the following.

let 
$$(\pi_t, \iota)$$
 in  $f_1 \mid \ldots \mid f_n :\Longrightarrow ww$ -Race (4)

And we need to prove the following.

let 
$$(\pi_s, \iota)$$
 in  $f_1 \mid \dots \mid f_n :\Longrightarrow ww$ -Race (g1)

We unfold (4) and have that there exist  $\mathcal{TP}_t$ , t and  $\hat{W}_t$  such that:

let 
$$(\pi_t, \iota)$$
 in  $f_1 \mid \ldots \mid f_n : \stackrel{load}{\Longrightarrow} (\mathcal{TP}_t, t, \mathcal{S}_{\perp}, M_0, \circ)^{\iota}$  (4.1)

$$(\mathcal{TP}_t, \mathsf{t}, \mathcal{S}_1, M_0, \circ)^t :\Longrightarrow^* \hat{W}_t \tag{4.2}$$

$$\hat{W}_t :\Longrightarrow ww-NPRF$$
 (4.3)

We unfold (2) and have the following.

$$I(\varphi_0, \iota, (\mathcal{S}_\perp, M_0, \mathcal{S}_\perp, M_0)) \wedge \mathsf{wf}(I) \tag{2.1}$$

$$\forall \sigma_{t}. \operatorname{Init}(\pi_{t}, \mathsf{f}) = \sigma_{t}$$

$$\Longrightarrow \exists \sigma_{s}. \left( \operatorname{Init}(\pi_{s}, \mathsf{f}) = \sigma_{s} \wedge I, \iota \models ((\sigma_{t}, \mathcal{V}_{\perp}, \emptyset), \mathcal{S}_{\perp}, M_{0}) \preccurlyeq_{\varphi}^{\circ, \emptyset} ((\sigma_{s}, \mathcal{V}_{\perp}, \emptyset), \mathcal{S}_{\perp}, M_{0}) \right)$$

$$(2.2)$$

From (2.1), (2.2) and (4.1), we have that there exists  $\mathcal{TP}_s$  such that:

let 
$$(\pi_s, \iota)$$
 in  $f_1 \mid \dots \mid f_n : \stackrel{load}{\Longrightarrow} (\mathcal{TP}_s, t, \mathcal{S}_{\perp}, M_0, \circ)^{\iota}$  (5)

We unfold (g1). From (5), we need to prove that there exists  $\hat{W}_s$  such that.

$$(\mathcal{TP}_{s}, \mathsf{t}, \mathcal{S}_{\perp}, M_{0}, \circ)^{\iota} :\Longrightarrow^{*} \hat{W}_{s} \tag{g2.1}$$

$$\hat{W}_s :\Longrightarrow \text{ww-Race}$$
 (g2.2)

We finish the proof from Lemma 9.2.

LEMMA 9.2 (WW-RF PRESERVING - AUX).

$$\forall \mathcal{TP}_{t}, i, \mathcal{S}_{t}, M_{t}, \hat{W}_{t}, \mathcal{TP}_{s}, \mathcal{S}_{s}, M_{s}, n, m, \varphi.$$

$$(\mathcal{TP}_{t}, i, \mathcal{S}_{t}, M_{t}, \circ)^{\iota} :\Longrightarrow^{n} \hat{W}_{t} \wedge \hat{W}_{t} :\Longrightarrow \text{ww-Race} \wedge$$

$$(\forall j \in \{1, \dots, m\}. \text{ consistent}_{\mathsf{NP}}(\mathcal{TP}_{t}(j), M_{t}, \circ, \iota)) \wedge$$

$$(\forall j \in \{1, \dots, m\}. I, \iota \models (\mathcal{TP}_{t}(j), \mathcal{S}_{t}, M_{t}) \preccurlyeq^{\circ, \emptyset}_{\varphi} (\mathcal{TP}_{s}(j), \mathcal{S}_{s}, M_{s})) \wedge$$

$$I(\iota, \varphi, (\mathcal{S}_{t}, M_{t}, \mathcal{S}_{s}, M_{s})) \wedge \mathsf{wf}(I) \wedge$$

$$\neg (\exists \hat{W}_{s}. (\mathcal{TP}_{s}, i, \mathcal{S}_{s}, M_{s}, \circ)^{\iota} :\Longrightarrow^{*} \hat{W}_{s} \wedge \hat{W}_{s} :\Longrightarrow_{\mathsf{ax}} \mathsf{abort})$$

$$\Longrightarrow \exists \hat{W}_{s}. (\mathcal{TP}_{s}, i, \mathcal{S}_{s}, M_{s}, \circ)^{\iota} :\Longrightarrow^{*} \hat{W}_{s} \wedge \hat{W}_{s} :\Longrightarrow_{\mathsf{ax}} \mathsf{abort})$$

PROOF. From the premises, we have the following.

$$(\mathcal{TP}_t, i, \mathcal{S}_t, M_t, \circ)^{\iota} :\Longrightarrow^n \hat{W}_t \tag{1}$$

$$\hat{W}_t :\Longrightarrow ww$$
-Race (2)

$$(\forall j \in \{1, \dots, m\}. \text{ consistent}_{\mathsf{NP}}(\mathcal{TP}_t(j), M_t, \circ, \iota))$$
 (3)

$$(\forall j \in \{1, \dots, m\}. \ I, \iota \models (\mathcal{TP}_t(j), \mathcal{S}_t, M_t) \preccurlyeq_{\alpha}^{\circ, \emptyset} (\mathcal{TP}_s(j), \mathcal{S}_s, M_s))$$

$$(4)$$

$$I(\iota, \varphi, (S_t, M_t, S_s, M_s)) \wedge \mathsf{wf}(I) \tag{5}$$

$$\neg (\exists \hat{W}_{s}. (\mathcal{TP}_{s}, i, \mathcal{S}_{s}, M_{s}, \circ)^{\iota} \Longrightarrow^{*} \hat{W}_{s} \wedge \hat{W}_{s} \Longrightarrow_{\mathsf{ax}} \mathsf{abort}) \tag{6}$$

We have the following.

$$((\mathcal{TP}_t, i, \mathcal{S}_t, M_t, \circ)^{\iota} : \Longrightarrow_{\mathsf{ax}} \mathsf{ww}\text{-Race}) \vee \\ \neg ((\mathcal{TP}_t, i, \mathcal{S}_t, M_t, \circ)^{\iota} : \Longrightarrow_{\mathsf{ax}} \mathsf{ww}\text{-Race})$$

$$(7)$$

We destruct (7) and discuss each case respectively.

• We first consider that the current target thread will generate data race.

$$(\mathcal{TP}_t, i, \mathcal{S}_t, M_t, \circ)^t :\Longrightarrow_{\mathsf{ax}} \mathsf{ww}\text{-Race}$$
 (7.1)

By applying Lemma. 9.3 on (7.1), (4) and (5), we have the following.

$$(\mathcal{TP}_{s}, t, \mathcal{S}_{s}, M_{s}, \circ)^{t} :\Longrightarrow_{\mathsf{ax}} \mathsf{ww}\text{-Race}$$
 (8)

We finish the proof by applying Lemma. 8.8 on (7).

• Then, we consider that the current target thread will not generate data race.

$$\neg ((\mathcal{TP}_t, i, \mathcal{S}_t, M_t, \circ)^t : \Longrightarrow_{\mathsf{ax}} \mathsf{ww}\text{-Race}) \tag{7.2}$$

We finish the prove by applying Lemma. 9.7 on (1), (2), (7.2), (3), (4) and (5).

LEMMA 9.3 (WW-RF PRESERVING - AUX CURRENT RACE).

$$\forall \mathcal{TP}_{t}, t, \mathcal{S}_{t}, M_{t}, \mathcal{TP}_{s}, \mathcal{S}_{s}, M_{s}, \iota, \varphi.$$

$$(\mathcal{TP}_{t}, t, \mathcal{S}_{t}, M_{t}, \circ)^{\iota} : \Longrightarrow_{\mathsf{ax}} \mathsf{ww-Race} \land$$

$$I, \iota \models (\mathcal{TP}_{t}(t), \mathcal{S}_{t}, M_{t}) \preccurlyeq^{\circ, \emptyset}_{\varphi} (\mathcal{TP}_{s}(t), \mathcal{S}_{s}, M_{s}) \land$$

$$I(\iota, \varphi, (\mathcal{S}_{t}, M_{t}, \mathcal{S}_{s}, M_{s})) \land \mathsf{wf}(I) \land$$

$$\neg((\mathcal{TP}_{s}(t), \mathcal{S}_{s}, M_{s}) \longrightarrow^{*} \mathbf{abort})$$

$$\Longrightarrow (\mathcal{TP}_{s}, t, \mathcal{S}_{s}, M_{s}, \circ)^{\iota} : \Longrightarrow_{\mathsf{ax}} \mathsf{ww-Race}$$

PROOF. From the premises, we have the following.

$$(\mathcal{TP}_t, t, \mathcal{S}_t, M_t, \circ)^t :\Longrightarrow_{\mathsf{ax}} \mathsf{ww}\text{-Race}$$
 (1)

$$I, \iota \models (\mathcal{TP}_t(t), \mathcal{S}_t, M_t) \preccurlyeq_{\sigma}^{\circ, \emptyset} (\mathcal{TP}_s(t), \mathcal{S}_s, M_s)$$
 (2)

$$I(\iota, \varphi, (S_t, M_t, S_s, M_s)) \wedge \text{wf}(I)$$
(3)

$$\neg((\mathcal{TP}_s(t), \mathcal{S}_s, M_s) \longrightarrow^* \mathbf{abort}) \tag{4}$$

We unfold (1) and have the following.

$$\iota \vdash (\mathcal{TP}_t(\mathsf{t}), \mathcal{S}_t, M_t) \longrightarrow^* (T\mathcal{S}_t', \mathcal{S}_t', M_t') \tag{1.1}$$

$$TS'_t.\sigma \xrightarrow{W(\mathsf{na},\mathsf{x},\_)} \_$$
 (1.2)

$$\langle \mathsf{x} : \underline{@}(\underline{\ }, t], \underline{\ }\rangle \in (M'_t \backslash TS'_t.P) \tag{1.3}$$

$$TS'_t.V.\text{cur.}T_{\text{rlx}}(x) < t$$
 (1.4)

$$\iota \vdash (TS'_t, S'_t, M'_t) \xrightarrow{\mathsf{pf}}^* ((\_, \_, \emptyset), \_, \_)$$

$$\tag{1.5}$$

From (1.1), we have that there exist  $\beta'$  such that:

$$\iota \vdash (\mathcal{TP}_t(t), \mathcal{S}_t, M_t, \circ) \longmapsto^* (TS'_t, \mathcal{S}'_t, M'_t, \beta')$$
(5)

According to the thread-local simulation, we have that there exist  $TS'_s$ ,  $S'_s$ ,  $M'_s$ ,  $\varphi'$  and  $\beta'_s$  such that:

$$\iota \vdash (\mathcal{TP}_{s}(\mathsf{t}), \mathcal{S}_{s}, M_{s}, \circ) \longmapsto^{*} (TS'_{s}, \mathcal{S}'_{s}, M'_{s}, \beta'_{s}) \tag{6}$$

$$I, \iota \models (TS'_t, \mathcal{S}'_t, M'_t) \preccurlyeq^{\beta', \mathcal{D}'}_{\varphi'} (TS'_s, \mathcal{S}'_s, M'_s)$$

$$\tag{7}$$

$$\varphi \subseteq \varphi' \tag{8}$$

By applying Lemma. 9.4 in (1.1) and (1.3) and (1.4), we have the following.

$$\langle \mathsf{x} : @(\ ,t], \ \rangle \in (M_t \backslash TS_t.P) \tag{9}$$

From (3), we get that there exist and injection relation between the target memory and source memory. Thus, we have that there exists t' such that:

$$\varphi(\mathbf{x},t) = t' \tag{10}$$

$$\langle \mathsf{x} : \underline{\mathscr{Q}}(\underline{t'}], \underline{t'} \rangle \in (M_s \backslash TS_s.P) \tag{11}$$

We apply Lemma. 9.5 on (11) and (5). And we have the following.

$$\langle \mathsf{x} : \underline{\mathscr{Q}}(\underline{\ }, t'], \underline{\ }\rangle \in (M'_{\mathsf{s}} \backslash TS'_{\mathsf{s}}.P) \tag{12}$$

From (7), (8), (10) and (1.4), we have the following.

$$TS_s'.\mathcal{V}.\text{cur.}T_{\text{rlx}}(\mathbf{x}) < t'$$
 (13)

By applying Lemma. 9.6 on (1.2), (1.5), (7), (12), (13) and (4), we construct a write-write race under the source execution.  $\Box$ 

LEMMA 9.4 (RACE MESSAGE IN STARTING MEMORY).

$$\forall TS, S, M, TS', S', M', \iota, n.$$

$$\iota \vdash (TS, S, M) \longrightarrow^{n} (TS', S', M') \land$$

$$\langle x : \_@(\_, t], \_\rangle \in (M' \backslash TS'.P) \land TS'.V.cur.T_{rlx}(x) < t$$

$$\Longrightarrow \langle x : \_@(\_, t], \_\rangle \in (M \backslash TS.P)$$

LEMMA 9.5 (NON-PROMISE MESSAGE PRESERVING).

$$\forall TS, S, M, TS', S', M', x, t, n, \iota.$$

$$\langle x : \_@(\_, t], \_\rangle \in (M \backslash TS.P) \land$$

$$\iota \vdash (TS, S, M) \longrightarrow^{n} (TS', S', M')$$

$$\Longrightarrow \langle x : @(\_, t], \_\rangle \in (M' \backslash TS'.P)$$

LEMMA 9.6 (SOURCE WRITE-WRITE RACE CONSTRUCTION).

$$\forall TS_{t}, S_{t}, M_{t}, TS_{s}, S_{s}, M_{s}, \iota, \mathsf{x}, t', \beta, \mathcal{D}, \varphi, n.$$

$$TS_{t}.\sigma \xrightarrow{\mathsf{W}(\mathsf{na}, \mathsf{x}, \bot)} \wedge \wedge \\ \iota \vdash (TS_{t}, S_{t}, M_{t}) \xrightarrow{\mathsf{pf}} {}^{n} ((\_, \_, \emptyset), \_, \_) \wedge \\ I, \iota \models (TS_{t}, S_{t}, M_{t}) \preccurlyeq_{\varphi}^{\beta, \mathcal{D}} (TS_{s}, S_{s}, M_{s}) \wedge \\ \langle \mathsf{x} : \_@(\_, t'], \_\rangle \in (M_{s} \backslash TS_{s}.P) \wedge TS_{s}.V.\mathsf{cur}.T_{\mathsf{rlx}}(\mathsf{x}) < t' \wedge \\ \neg (\iota \vdash (TS_{s}, S_{s}, M_{s}) \longrightarrow^{*} \mathbf{abort})$$

$$\Longrightarrow \exists TS_{s0}, S_{s0}, M_{s0}.$$

$$\iota \vdash (TS_{s}, S_{s}, M_{s}) \longrightarrow^{*} (TS_{s0}, S_{s0}, M_{s0}) \wedge \\ \langle \mathsf{x} : \_@(\_, t'], \_\rangle \in (M'_{s0} \backslash TS_{s0}.P) \wedge TS_{s0}.V.\mathsf{cur}.T_{\mathsf{rlx}}(\mathsf{x}) < t' \wedge \\ \iota \vdash (TS_{s0}, S_{s0}, M_{s0}) \longrightarrow^{*} ((\_, \_, \emptyset), \_, \_)$$

LEMMA 9.7 (WW-RF PRESERVING - AUX CURRENT NOT RACE).

$$\begin{split} \forall \mathcal{TP}_{t}, i, \mathcal{S}_{t}, M_{t}, \hat{W}_{t}, \mathcal{TP}_{s}, \mathcal{S}_{s}, M_{s}, \beta, \beta_{s}, \mathcal{D}, n, m. \\ & (\mathcal{TP}_{t}, i, \mathcal{S}_{t}, M_{t}, \beta)^{\iota} :\Longrightarrow^{n} \hat{W}_{t} \wedge \hat{W}_{t} :\Longrightarrow \text{ww-Race} \wedge \\ & \neg ((\mathcal{TP}_{t}, i, \mathcal{S}_{t}, M_{t}, \beta)^{\iota} :\Longrightarrow_{\text{ax}} \text{ww-Race} \wedge \\ & (\forall j \in \{1, \ldots, m\}. \text{ consistent}_{\text{NP}}(\mathcal{TP}_{t}(j), M_{t}, \circ, \iota)) \wedge \\ & (\forall j \in \{1, \ldots, m\} \setminus \{i\}. \ I, \iota \models (\mathcal{TP}_{t}(j), \mathcal{S}_{t}, M_{t}) \preccurlyeq^{\circ, \emptyset}_{\varphi} (\mathcal{TP}_{s}(j), \mathcal{S}_{s}, M_{s})) \wedge \\ & I, \iota \models (\mathcal{TP}_{t}(i), \mathcal{S}_{t}, M_{t}) \preccurlyeq^{\beta, \mathcal{D}}_{\varphi} (\mathcal{TP}_{s}(i), \mathcal{S}_{s}, M_{s})) \wedge \\ & (\beta = \circ \implies \beta_{s} = \circ) \wedge \text{wf}(I) \wedge \\ & \neg (\exists \hat{W}_{s}. (\mathcal{TP}_{s}, i, \mathcal{S}_{s}, M_{s}, \beta_{s})^{\iota} :\Longrightarrow^{*} \hat{W}_{s} \wedge \hat{W}_{s} :\Longrightarrow_{\text{ax}} \text{abort}) \\ & \Longrightarrow \exists \hat{W}_{s}. (\mathcal{TP}_{s}, i, \mathcal{S}_{s}, M_{s}, \beta_{s})^{\iota} :\Longrightarrow^{*} \hat{W}_{s} \wedge \hat{W}_{s} :\Longrightarrow \text{ww-Race} \end{split}$$

PROOF. Prove by induction on n.

0: From the premises, we have the following.

$$(\mathcal{TP}_t, i, \mathcal{S}_t, M_t, \beta)^t :\Longrightarrow \text{ww-Race}$$
 (1)

$$\neg ((\mathcal{TP}_t, i, \mathcal{S}_t, M_t, \beta)^i : \Longrightarrow_{\mathsf{ax}} \mathsf{ww}\text{-Race})$$
 (2)

$$consistent_{NP}(\mathcal{TP}_t(i), M_t, \circ, \iota)$$
 (3)

From (1) and (3), we have the following.

$$(\mathcal{TP}_t, i, \mathcal{S}_t, M_t, \beta)^t :\Longrightarrow_{\mathsf{ax}} \mathsf{ww}\text{-Race}$$
 (4)

Thus, we construct a contradiction.

$$(\mathcal{TP}_t, i, \mathcal{S}_t, M_t, \beta)^i :\Longrightarrow^{n+1} \hat{W}_t \tag{5}$$

$$\hat{W}_t :\Longrightarrow ww$$
-Race (6)

$$\neg ((\mathcal{TP}_t, i, \mathcal{S}_t, M_t, \beta)^t : \Longrightarrow_{\mathsf{ax}} \mathsf{ww}\text{-Race}) \tag{7}$$

$$(\forall j \in \{1, \dots, m\}. \text{ consistent}_{\mathsf{NP}}(\mathcal{TP}_t(j), M_t, \circ, \iota))$$
 (8)

$$(\forall j \in \{1, \dots, m\} \setminus \{i\}. \ I, \iota \models (\mathcal{TP}_t(j), \mathcal{S}_t, M_t) \preccurlyeq_{\varphi}^{\circ, \emptyset} (\mathcal{TP}_s(j), \mathcal{S}_s, M_s))$$

$$(9)$$

$$I, \iota \models (\mathcal{TP}_t(i), \mathcal{S}_t, M_t) \preccurlyeq^{\beta, \mathcal{D}}_{\varphi} (\mathcal{TP}_s(i), \mathcal{S}_s, M_s))$$

$$\tag{10}$$

$$(\beta = \circ \implies \beta_s = \circ) \land \mathsf{wf}(I) \tag{11}$$

$$\neg (\exists \hat{W}_s. (\mathcal{TP}_s, i, \mathcal{S}_s, M_s, \beta)^\iota :\Longrightarrow^* \hat{W}_s \wedge \hat{W}_s :\Longrightarrow_{\mathsf{ax}} \mathsf{abort})$$
 (12)

We unfold (5) and have that there exists  $\hat{W}'_t$  such that:

$$(\mathcal{TP}_t, i, \mathcal{S}_t, M_t, \beta)^i :\Longrightarrow \hat{W}_t' \tag{5.1}$$

$$\hat{W}'_t :\Longrightarrow^n \hat{W}_t \tag{5.2}$$

We unfold (5.1) and discuss each case respectively. We let  $\hat{W}_t' = (\mathcal{TP}_t', i', \mathcal{S}_t', M_t', \beta')^t$ .

• The current target thread does not take an output step. We have that there exists  $TS'_t$  such that:

$$\iota \vdash (\mathcal{TP}_t(i), \mathcal{S}_t, M_t, \beta) \longmapsto^+ (TS'_t, \mathcal{S}'_t, M'_t, \beta') \tag{6.1.1}$$

$$consistent_{NP}(TS'_t, M'_t, \beta', \iota)$$
(6.1.2)

$$\mathcal{TP}_t' = \mathcal{TP}_t \{ i \leadsto TS_t' \} \tag{6.1.3}$$

We apply Lemma. 8.11 on (6.1.1) and (10) and have that there exist  $TS'_s$ ,  $S'_s$ ,  $M'_s$ ,  $\mathcal{D}'$ ,  $\varphi'$  and  $\beta'_s$  such that:

$$\iota \vdash (\mathcal{TP}_s(i), \mathcal{S}_s, M_s, \beta_s) \longmapsto^* (TS'_s, \mathcal{S}'_s, M'_s, \beta'_s)$$

$$(7.1.1)$$

$$I, \iota \models (\mathcal{TP}_t(i), \mathcal{S}'_t, M'_t) \preccurlyeq^{\beta', \mathcal{D}'}_{\alpha'} (\mathcal{TP}_s(i), \mathcal{S}'_s, M'_s)$$

$$(7.1.2)$$

$$\mathcal{TP}_{s}' = \mathcal{TP}_{s}\{i \leadsto TS_{s}'\} \tag{7.1.3}$$

$$(\beta' = \circ \implies \beta'_s = \circ) \land \text{wf}(I) \tag{7.1.4}$$

By applying Lemma. 8.13 on (6.1.2), (7.1.2), (12) and (7), we have the following.

$$consistent_{NP}(TS'_{s}, M'_{s}, \beta'_{s}, \iota)$$
(13)

From (7.1.1) and (13), we construct a source program transition. We can finish the proof of such case from inductive hypothesis.

- The proof of the case that the current target thread takes an output step is similar with the previous one.
- We consider that the target program takes a switch step. We have  $\beta' = 0$  and we discuss whether the new target thread will generate write-write race.

$$((\mathcal{TP}'_t, i', \mathcal{S}'_t, M'_t, \circ)^i : \Longrightarrow_{\mathsf{ax}} \mathsf{ww}\text{-Race}) \vee \\ \neg ((\mathcal{TP}'_t, i', \mathcal{S}'_t, M'_t, \circ)^i : \Longrightarrow_{\mathsf{ax}} \mathsf{ww}\text{-Race})$$

$$(14)$$

We destruct (14) and discuss each case respectively.

- We first consider that the new target thread will generate write-write race.

$$(\mathcal{TP}_t', i', \mathcal{S}_t', M_t', \circ)^t :\Longrightarrow_{\mathsf{ax}} \mathsf{ww}\text{-Race}$$
 (13.1)

We finish the proof from Lemma. 9.3.

– Then, we consider that the new target thread will not generate write-write race.

$$\neg ((\mathcal{TP}_t', i', S_t', M_t', \circ)^i := \Longrightarrow_{\mathsf{ax}} \mathsf{ww}\text{-Race})$$
 (14.2)

We finish the proof from inductive hypothesis.

In the proof of write-write race freedom preserving, we require that the thread-local transition is *deterministic*. Definition 9.8 (deterministic thread local transition).

$$\forall \sigma, \sigma_1, te_1, \sigma_2, te_2.$$

$$(\sigma \xrightarrow{te_1} \sigma_1 \land \sigma \xrightarrow{te_2} \sigma_2)$$

$$\implies (te_1 = te_2 \land \sigma_1 = \sigma_2) \lor (\exists x, o. te_1, te_2 \in \{R(o, x, \_)\}) \lor$$

$$(te_1 = U(\_, \_, \_, \_, \_) \lor te_2 = U(\_, \_, \_, \_, \_))$$

$$(AI) \quad L \quad \triangleq \quad \dots$$

$$(AIB) \quad LB \quad \triangleq \quad \epsilon \mid L :: LB$$

$$(AIF) \quad \mathbb{L} \quad \triangleq \quad \{l_1 \leadsto LB_1, \dots, l_n \leadsto LB_n\}$$

$$(AResP) \quad A \quad \triangleq \quad \{f_1 \leadsto \mathbb{L}_1, \dots, f_n \leadsto \mathbb{L}_n\}$$

Fig. 28. Definition of analysis result

#### 10 DEFINITION OF OPTIMIZERS

In this work, we focus on the correctness proof of optimizers. Many optimizers are implemented based on the program analysis, such as: the *constant propagation*, the *dead code elimination* and the *common subexpression elimination*. The optimizer may have the following form.



The optimizer is composed of the analyzer and the translater. The analyzer analyzes the source code  $\pi_s$  and gets the result of the code analysis A. Then, the translater optimizes the source code  $\pi_s$  according to A. In this section, we will define the form of the program analysis result in Subsec. 10.1. In Subsec. 10.2, we will give the definition of the value analysis. We will define the constant propagation optimization based on the value analysis.

# 10.1 The result of program analysis

In this subsection, we focus on defining the result of the program analysis. We show the definition of the analysis result in Fig. 28. Here, we use L to represent the abstract interpretation at each program point, LB that is a sequence of L to represent the abstract interpretation of each basic and  $\mathbb{L}$  to represent the abstract interpretation of a code heap, which is a partial mapping from the label to the corresponding LB. The analysis result of the whole program is shown as A, which is a collection of the result of code analysis of each code heap. We show more details about their meaning using the following figure.

$$\begin{array}{c} l: & L_1 & L_1 = \emptyset \\ r:=3; & L_2 = \{r \leadsto 3\} \\ x:=r+1; & L_3 = \{r \leadsto 3, x \leadsto 4\} \\ L_4 = \{r \leadsto 3, x \leadsto 4\} \\ L_4 = \{r \leadsto 3, x \leadsto 4\} \\ L_5 = L_1 :: L_2 :: L_3 :: L_4 :: \epsilon \end{array}$$

The above figure shows the result of the abstract interpretation of a basic code block in the value analysis. We give some auxiliary definitions on *LB* in Fig. 29.

$$\text{IN}[LB] \triangleq \begin{cases} L & \text{if } LB = L :: LB' \\ \text{undef} & \text{otherwise} \end{cases} \quad \text{OUT}[LB] \triangleq \begin{cases} L & \text{if } LB = LB' \cdot L \\ \text{undef} & \text{otherwise} \end{cases}$$
 
$$\text{Succ}(B) \triangleq \begin{cases} \{l\} & \text{if } B = B' \cdot (\mathsf{jmp} \ l) \text{ or } B = B' \cdot (\mathsf{call} \ f, l_{ret}) \end{cases}$$
 
$$\text{Succ}(B) \triangleq \begin{cases} \{l\} & \text{if } B = B' \cdot (\mathsf{jmp} \ l) \text{ or } B = B' \cdot (\mathsf{call} \ f, l_{ret}) \end{cases}$$
 
$$\text{OUT}[LB] \triangleq \begin{cases} L & \text{if } LB = LB' \cdot L \\ \text{undef} & \text{otherwise} \end{cases}$$
 
$$\text{Pred}(C, l) \triangleq \begin{cases} \{l_1, l_2\} & \text{if } B = B' \cdot (\mathsf{be} \ e, l_1, l_2) \\ \emptyset & \text{otherwise} \end{cases}$$
 
$$\text{Pred}(C, l) \triangleq \begin{cases} \{l_p \mid l \in \mathsf{succ}(C(l_p))\} \\ B[i \dots] \triangleq B_2 & \text{where } B = (B_1 \cdot B_2) \text{ and } |B_1| = i \end{cases}$$
 
$$LB[i \dots] \triangleq LB_2 & \text{where } LB = (LB_1 \cdot L \cdot LB_2) \text{ and } |LB_1| = i$$
 
$$LB(i) \triangleq L & \text{where } LB = (LB_1 \cdot L \cdot LB_2) \text{ and } |LB_1| = i$$

Fig. 29. Auxiliary definitions on the abstract interpretation of code block

# 10.2 Value analysis

The set  $L_v$  records the abstract interpretation of the values of variables and registers in the current state.

$$L_v \in \mathcal{P}((Var \cup Reg) \rightarrow Val) \cup \{\top\}$$

The Implementation of the value analysis is shown below (*n* is a very large constant).

$$Val\_Analyzer(C, l_0) \triangleq Val\_Analyzer'(C, \mathbb{L}_0, dom(C), n)$$

$$where \quad \mathbb{L}_0 = \{l \leadsto (\top :: \epsilon) \mid l \in dom(C)\}\{l_0 \leadsto (\emptyset :: \epsilon)\}$$

$$\begin{array}{l} \mathsf{Val\_Analyzer'}(C,\mathbb{L},W,n) \; \triangleq \\ & \left\{ \begin{array}{l} \mathsf{Val\_Analyzer'}(C,\mathbb{L}\{l \leadsto L_v\},W',n-1) & \text{ if } l \in W,L_v = \bigcap\limits_{\substack{l_p \in \mathsf{pred}(C,l) \\ \\ L'_v = \mathsf{TF}_v(L_v,C(l)), \\ (L'_v \neq \mathsf{OUT}[\mathbb{L}(l)] \implies W' = ((W\backslash\{l\}) \cup \mathsf{succ}(l))), \\ (L'_v = \mathsf{OUT}[\mathbb{L}(l)] \implies W' = (W\backslash\{l\})), \\ \mathbb{L} & \text{ if } W = \emptyset \\ & \text{ undef} \end{array} \right.$$

The value analysis on the whole program is defined as the following form. In this work, we only consider the intraprocedural analysis.

$$PVal\_Analyzer(\pi) \triangleq \{f \rightsquigarrow Val\_Analyzer(C) \mid \pi(f) = (C, l)\}$$

The transfer function TF<sub>v</sub> for basic code blocks in the value analysis is defined below.

$$\mathsf{TF}_v(L_v,B) \ \triangleq \begin{cases} \ \top :: \epsilon & \text{if } L_v = \top \\ L_v :: \mathsf{TF}_v(L_v',B') & \text{elif } B = c :: B' \wedge L_v' = f_v(c,L_v) \\ L_v :: f_v(B,L_v) :: \epsilon & \text{elif } B \in \{\mathsf{return}, \mathsf{call}(\mathsf{f},l_{ret}), \mathsf{jmp}\ l, \mathsf{be}\ e, l_1, l_2\} \\ \mathsf{undef} & \text{otherwise} \end{cases}$$

$$\llbracket e \rrbracket_{L_v} \ \triangleq \begin{cases} L(r) & \text{if } e = r \\ v & \text{if } e = v \\ v_1 + v_2 & \text{if } e = e_1 + e_2 \wedge \llbracket e_1 \rrbracket_{L_v} = v_1 \wedge \llbracket e_2 \rrbracket_{L_v} = v_2 \\ v_1 - v_2 & \text{if } e = e_1 - e_2 \wedge \llbracket e_1 \rrbracket_{L_v} = v_1 \wedge \llbracket e_2 \rrbracket_{L_v} = v_2 \\ v_1 * v_2 & \text{if } e = e_1 * e_2 \wedge \llbracket e_1 \rrbracket_{L_v} = v_1 \wedge \llbracket e_2 \rrbracket_{L_v} = v_2 \\ \text{undef} & \text{otherwise} \end{cases}$$

Fig. 30. Auxiliary definitions in value analysis

We show the transfer function for each instruction in the following. Some auxiliary definitions used in defining the transfer function for each instruction are shown in Fig. 30.

• Assignment operation

$$f_v(r:=e,L_v) \triangleq \begin{cases} L_v\{r \leadsto v\} & \text{if } \llbracket e \rrbracket_{L_v} = v \\ L_v \backslash \{r\} & \text{otherwise} \end{cases}$$

• Memory store operation

$$f_v(\mathsf{x}_{o_w} := e, L_v) \quad \triangleq \quad \left\{ \begin{array}{ll} L_v\{\mathsf{x} \leadsto v\} & \quad \text{if } o_w = \mathsf{na} \text{ and } [\![e_2]\!]_{L_v} = v \\ L_v\backslash\{\mathsf{x}\} & \quad \text{otherwise} \end{array} \right.$$

We focus on the case that the store operation is an atomic write (where  $o_w \in \{rlx, rel\}$ ). Here, we do not simply view the atomic write as an external function call, which may modify memory arbitrarily. Consider the following example.

$$\begin{cases} x \sim 3 \\ y_{rel} := 3 \\ x \sim 3 \end{cases}$$

Before the execution of the instruction " $y_{rel} := 3$ ", the abstract interpretation of the program state " $\{x \sim 3\}$ " means that the last message on x that current thread can read has value 3. We can find that, after the execution of the instruction  $y_{rel} := 3$ , the current thread can still read the value 3 from the variable x. Thus, we still have " $\{x \sim 3\}$ ". Since we do not optimize the atomic memory access, we assign  $\top$ , which represents any value, to the variable y. We can do constant propagation across the atomic memory access as shown below.

The soundness of the above optimization can also be shown by "roach-motal reordering". The soundness transformation to achieve the above optimization by "roach-motal reordering" is shown below.

• Memory load operation

$$\begin{array}{lll} f_v(r:=\mathsf{x_{na}},L_v) & \triangleq & \left\{ \begin{array}{ll} L_v\{r \leadsto v\} & \text{if } L_v(\mathsf{x})=v \\ L_v\backslash\{r\} & \text{otherwise} \end{array} \right. \\ \\ f_v(r:=\mathsf{x_{rlx}},L_v) & \triangleq & L_v\backslash\{r\} \\ \\ f_v(r:=\mathsf{x_{acq}},L_v) & \triangleq & \{r'\leadsto v\mid L_{nl}(r')=v\land r'\neq r\} \end{array}$$

Code optimizations across relaxed atomic read is sound, since the execution of the relaxed atomic read does not achieve synchronization between threads. Thus, the value of a location, which can be read before the execution of the relaxed atomic read, can still be read after the execution of the relaxed atomic read.

{}   

$$x_{na} := 2;$$
 $\{x \sim 2\}$ 
 $r_1 := y_{r|x};$ 
 $\{x \sim 2\}$ 
 $r_2 := x_{na};$ 

$$ConstProp \Rightarrow r_1 := y_{r|x};$$
 $r_1 := y_{r|x};$ 
 $r_2 := 2;$ 

The above optimization can be achieved according to the soundness code transformation as shown below.

Doing constant propagation accross acquire atomic read is not sound, since the execution of the acquire atomic read may implement synchronization between two threads.

• Compare and set operation

$$f_{v}(r := \mathsf{CAS}_{\mathsf{rlx},o_{w}}(\mathsf{x},e_{r},e_{w}),L_{v}) \quad \triangleq \quad L_{v} \setminus \{r,\mathsf{x}\}$$

$$f_{v}(r := \mathsf{CAS}_{\mathsf{acq},o_{w}}(\mathsf{x},e_{r},e_{w}),L_{v}) \quad \triangleq \quad L_{v} \setminus (\mathit{Var} \cup \{r\})$$

The transfer function for the CAS operation can be viewed as a composition of the transfer functions for the memory load operation and the memory store operation. If memory order for memory load in CAS is relaxed, the transfer function for CAS is defined as a composition of the relaxed atomic read and the atomic write. If the memory order for memory load in CAS is acquired, the transfer function for CAS is defined as a composition of the acquire atomic read and the atomic write. The following constant propagation optimization is correct.

{} { 
$$x_{na} := 2;$$
  ${x \sim 2}$   $r_1 := CAS_{r|x,rel}(y, 0, 1);$   $r_2 := x_{na};$   $r_3 := 2;$   $r_4 := CAS_{r|x,rel}(y, 0, 1);$   $r_5 := 2;$   $r_7 := 2;$ 

• Release and acquire fence operations

$$f_v(\text{fence-rel}, L_v) \triangleq L_v$$
  
 $f_v(\text{fence-acq}, L_v) \triangleq L_v \setminus Var$ 

The transfer function for the release fence operation is similar with the transfer function for release atomic write. And the transfer function for the acquire fence operation is similar with the transfer function for acquire atomic read.

The correctness of the above optimization can be achieved by applying soundness code transformation as shown below.

$$x_{na} := 1;$$
  $x_{na} := 1;$   $x_{na} := 1;$   $x_{na} := 1;$   $x_{na} := 1;$  fence-rel;  $r := x_{na};$   $r := 1;$   $r := 1;$  fence-rel;  $r := 1;$ 

Doing constant propagation across fence-acq is not sound, since the execution of the acquire fence operation will update the view to each location of the current thread.

• Unconditional and conditional branch

$$\begin{array}{ccc} f_{\mathcal{V}}(\texttt{jmp}\ l, L_{\mathcal{V}}) & \triangleq & L_{\mathcal{V}} \\ f_{\mathcal{V}}(\texttt{be}\ e, l_1, l_2, L_{\mathcal{V}}) & \triangleq & L_{\mathcal{V}} \end{array}$$

• Function call, return, system call and SC fence

$$f_v(\operatorname{call}(l, l_{ret}), L_v) \triangleq L_v \setminus Addr$$
  
 $f_v(\operatorname{print}(e), L_v) \triangleq L_v \setminus Addr$   
 $f_v(\operatorname{fence-sc}, L_v) \triangleq L_v \setminus Addr$ 

Since we consider intra-procedural analysis in this work, the callee may modify memory state arbitrarily. The definition of the system call is the same as fence-sc, thus their transfer functions are same.

$$f_v(\text{return}, L_v) \triangleq \emptyset$$

$$\mathsf{fv}(e) \ \stackrel{\triangle}{=} \ \begin{cases} \{r\} & \text{if } e = r \\ \emptyset & \text{if } e = v \\ \mathsf{fv}(e_1) \cup \mathsf{fv}(e_2) & \text{if } (e = e_1 + e_2) \vee (e = e_1 - e_2) \vee (e = e_1 * e_2) \end{cases}$$

Fig. 31. Auxiliary definitions in liveness analysis

## 10.3 Liveness analysis

The set  $L_{nl}$  records the set of registers and memory locations that will not be read before the next assignment.

$$L_{nl} \in \mathcal{P}(Var \cup Reg)$$

The Implementation of the liveness analysis is shown below. It relies on the result of the value analysis.

$$\begin{array}{ll} \mathsf{Lv\_Analyzer}(C) & \triangleq & \mathsf{Lv\_Analyzer'}(C,\mathbb{L}_0,\mathsf{dom}(C),n) \\ & where & \mathbb{L}_0 = \{l \leadsto ((Var \cup Addr) :: \epsilon) \mid l \in \mathsf{dom}(C)\}\{l' \leadsto (\emptyset :: \epsilon) \mid C(l') = \_; \mathsf{return}\} \\ \mathsf{Lv\_Analyzer'}(C,\mathbb{L},W,n) & \triangleq & \\ & \left\{ \begin{array}{ll} \mathsf{Lv\_Analyzer'}(C,\mathbb{L}\{l \leadsto L_{nl}\},W',n-1) & \text{ if } l \in W,L_{nl} = \bigcap\limits_{l_s \in \mathsf{succ}(C(l))} \mathsf{IN}[\mathbb{L}_l(l_s)], \\ & L'_{nl} = \mathsf{TF}_l(L_{nl},C(l)), \\ & (L'_{nl} \neq \mathsf{IN}[\mathbb{L}(l)] \implies (W' = (W \setminus \{l\}) \cup \mathsf{pred}(l))), \\ & (L'_{nl} = \mathsf{IN}[\mathbb{L}(l)] \implies W' = (W \setminus \{l\})) \\ & \mathbb{L} & \text{ if } W = \emptyset \\ \end{array} \right.$$

The liveness analysis on the whole program is defined as the following form. In this work, we only consider the intraprocedural analysis.

otherwise

$$PLv\_Analyzer(\pi) \triangleq \{f \sim Lv\_Analyzer(C) \mid \pi(f) = (C, l)\}$$

The transfer function TF<sub>1</sub> for basic code blocks in liveness analysis is shown below.

$$\mathsf{TF}_l(L_{nl},B) \quad \triangleq \quad \left\{ \begin{array}{ll} f_L(c,L'_{nl}) :: \mathsf{TF}_l(L_{nl},B') & \quad \text{if } B=c :: B' \wedge \mathsf{TF}_l(L_{nl},B') = L'_{nl} :: -1 \\ f_L(B,L_l) :: L_{nl} :: \epsilon & \quad \text{otherwise} \end{array} \right.$$

The merging of two abstract interpretations in the liveness analysis is just the intersection of two the sets. We show the transfer function for each instruction in the following. Some auxiliary definitions are shown in Fig. 31.

• Assignment operation.

$$f_L(r:=e,L_{nl}) \triangleq \begin{cases} L_{nl} & \text{if } r \in L_{nl} \\ (L_{nl} \cup \{r\}) \setminus \mathsf{fv}(e) & \text{otherwise} \end{cases}$$

• Memory load operation.

$$f_L(r := \mathsf{x}_{o_r}, L_{nl}) \triangleq \begin{cases} L_{nl} & \text{if } r \in L_{nl} \\ (L_{nl} \cup \{r\}) \setminus \{\mathsf{x}\} & \text{otherwise} \end{cases}$$

The transfer function for non-atomic read is taken from CompCert. Here, if the register r is dead after the execution of " $r := x_{na}$ ", the abstract interpretation before its execution is still  $L_{nl}$ , since the instruction

" $r := x_{na}$ " is a dead code. The transfer function for atomic read does not have such case, since we do not optimize the atomic memory accesses.

The transfer function for atomic memory accesses show that the dead code elimination in our work supports the following optimization.

$$\begin{array}{ll} x_{\text{na}} := 2; \\ \{x, r\} \\ r := y_{\text{acq}}; \\ \{x\} \\ x_{\text{na}} := 3; \end{array} \xrightarrow{DCE} \begin{array}{ll} \text{skip;} \\ r := y_{\text{acq}}; \\ x_{\text{na}} := 3; \end{array}$$

For the current thread, the memory write " $x_{na} := 2$ " is not read before the next assignment to the variable x. Thus, it is a dead code for the current thread. Since there is no write release operations after " $x_{na} := 2$ " before the next assignment to the variable x, there is no requirement for other threads to read such memory write. The correctness of the above optimization can also be achieve by the soundness code transformations as shown below. The reordering shown below is called "roach-motal reordering".

$$x_{na} := 2;$$
  $r := y_{acq};$   $r := y_{acq};$   $skip;$   $r := y_{acq};$   $\sim x_{na} := 2;$   $\sim skip;$   $\sim r := y_{acq};$   $x_{na} := 3;$   $x_{na} := 3;$   $x_{na} := 3;$   $x_{na} := 3;$ 

• Memory store operation.

$$\begin{array}{lll} f_L(\mathsf{x}_\mathsf{na} := e, L_\mathit{nl}) & \triangleq & \left\{ \begin{array}{ll} L_\mathit{nl} & \text{if } \mathsf{x} \in L_\mathit{nl} \\ & (L_\mathit{nl} \cup \{\mathsf{x}\}) \backslash \mathsf{fv}(e) & \text{otherwise} \end{array} \right. \\ \\ f_L(\mathsf{x}_\mathsf{rlx} := e, L_\mathit{nl}) & \triangleq & L_\mathit{nl} \backslash \mathsf{fv}(e) \\ \\ f_L(\mathsf{x}_\mathsf{rel} := e, L_\mathit{nl}) & \triangleq & L_\mathit{nl} \backslash (\mathit{Var} \cup \mathsf{fv}(e)) \end{array}$$

We focus on the transfer function for atomic write. Doing dead code elimination across the relaxed atomic write is correct. Consider the following dead code elimination optimization.

$$\begin{array}{lll} x_{\text{na}} := 2; \\ \{x\} & \\ y_{\text{rlx}} := 1; & \xrightarrow{\textit{DCE}} & \text{skip}; \\ \{x\} & \\ x_{\text{na}} := 4; & \\ \{\} & \end{array}$$

For the current thread, the memory write  $"x_{na} := 2"$  is not read before the next assignment to the variable x. Thus, the instruction " $x_{na} := 2$ " is a dead code for the current thread. Since the execution of the relaxed atomic write " $y_{r|x} := 1$ " does not release the information of the memory writes of the current thread to other threads, there is no requirement that the other threads must read the memory write " $x_{na} := 2$ ". We can find that " $x_{na} := 2$ " is also a dead code for other threads.

Doing dead code elimination across the release atomic writes is not correct under any context, since the release write operation may send information about memory writes to other threads. Consider the following optimization if we permit doing dead code elimination across release atomic writes.

• Compare and set operation.

$$\begin{split} f_L(r := \mathsf{CAS}_{o_r,\mathsf{rlx}}(\mathsf{x},e_r,e_w),L_{nl}) & \triangleq & (L_{nl} \cup \{r\}) \backslash (\mathsf{fv}(e_r) \cup \mathsf{fv}(e_w)) \\ f_L(r := \mathsf{CAS}_{o_r,\mathsf{rel}}(e,e_r,e_w),L_{nl}) & \triangleq & (L_{nl} \cup \{r\}) \backslash (Var \cup \mathsf{fv}(e_r) \cup \mathsf{fv}(e_w)) \end{split}$$

The transfer function for the CAS operation can be viewed as a composition of the transfer functions for the memory load operation and the memory store operation. The dead code elimination optimization shown below is correct.

$$\begin{array}{l} x_{\text{na}} := 2; \\ \{\textbf{x}, \textbf{r}\} \\ r := \text{CAS}_{\text{acq,rlx}}(\textbf{y}, \textbf{0}, \textbf{1}); \\ \{\textbf{x}\} \\ x_{\text{na}} := 4; \\ \{\} \end{array} \qquad \begin{array}{l} \textit{DCE} \\ \textit{DCE} \\ \textit{r} := \text{CAS}_{\text{acq,rlx}}(\textbf{y}, \textbf{0}, \textbf{1}); \\ x_{\text{na}} := 4; \end{array}$$

• Release and acquire fence operations.

$$f_L(\text{fence-rel}, L_{nl}) \triangleq L_{nl} \setminus Var$$
  
 $f_L(\text{fence-acq}, L_{nl}) \triangleq L_{nl}$ 

The transfer function for the release fence operation is similar with the transfer function for release atomic write. And the transfer function for the acquire fence operation is similar with the transfer function for acquire atomic read.

We permit the dead code elimination across the acquire fence as the following shown.

$$\begin{array}{lll} x_{na} := 2; \\ \{x\} \\ \text{fence-acq;} & \xrightarrow{\textit{DCE}} & \text{skip;} \\ \{x\} \\ x_{na} := 4; & x_{na} := 4; \end{array}$$

The correctness of the above optimization can also be shown by applying soundness code transformations as the following shown.

Doing dead code elimination across the release fence is forbidden, since the execution of the release fence may send the information of memory writes of the current thread to other threads.

$$\begin{array}{lll} x_{na} := 2; \\ \{\} \\ \text{fence-rel}; & \xrightarrow{\textit{DCE}} & x_{na} := 2; \\ \{x\} & \text{fence-acq}; \\ x_{na} := 4; \\ \{\} \end{array}$$

• Unconditional and conditional branch

$$f_L(\text{jmp }l, L_{nl}) \triangleq L_{nl}$$
 
$$f_L(\text{be }e, l_1, l_2, L_{nl}) \triangleq L_{nl} \backslash \text{fv}(e)$$

• Function call, return, system call and SC fence

$$\begin{array}{lcl} f_L(\mathsf{call}(l,l_{ret}),L_{nl}) & \triangleq & L_{nl}\backslash Var \\ \\ f_L(\mathsf{print}(e),L_{nl}) & \triangleq & L_{nl}\backslash (Var\cup \mathsf{fv}(e)) \\ \\ f_L(\mathsf{fence\text{-}sc},L_{nl}) & \triangleq & L_{nl}\backslash Var \end{array}$$

Since we consider intraprocedural analysis in this work, the callee may modify memory state arbitrarily. The definition of the system call is the same as fence-sc, thus their transfer functions are same.

$$f_L(\text{return}, L_{nl}) \triangleq Reg$$

onlymous 
$$\langle e \rangle_{L_a} \ \triangleq \ \begin{cases} v & \text{if } e = v \\ r & \text{if } (r,e) \in L_a \wedge e \neq v \\ e'_1 + e'_2 & \text{if } e = e_1 + e_2 \wedge \langle e_1 \rangle_{L_a} = e'_1 \wedge \langle e_2 \rangle_{L_a} = e'_2 \\ e_1 - e_2 & \text{if } e = e_1 - e_2 \wedge \langle e_1 \rangle_{L_a} = e'_1 \wedge \langle e_2 \rangle_{L_a} = e'_2 \\ e_1 * e_2 & \text{if } e = e_1 * e_2 \wedge \langle e_1 \rangle_{L_v} = e'_1 \wedge \langle e_2 \rangle_{L_v} = e'_2 \\ e & \text{otherwise} \end{cases}$$
 
$$L_a \cap \top \ \triangleq \ L_a \qquad \top \cap L'_a \ \triangleq \ L'_a$$
 
$$L_a \cap L'_a \ \triangleq \ \{(r,e) \mid (r,e) \in L_a \wedge (r,e) \in L'_a\} \cup \{(r,x) \mid (r,x) \in L_a \wedge (r,x) \in L'_a\}$$
 
$$\text{Kill}(L_a,r_0) \ \triangleq \ \{(r,e) \mid (r,e) \in L_a \} \cup \{(r,y) \in L_a \mid x \neq y\}$$
 
$$\text{Kill}(L_a,x) \ \triangleq \ \{(r,e) \mid (r,e) \in L_a\} \cup \{(r,y) \in L_a \mid x \neq y\}$$

Fig. 32. Auxiliary definitions in available expression analysis

## 10.4 Available expression analysis

The set  $L_a$  records the abstract interpretation of the available expressions in the current state.

$$L_a \in \mathcal{P}((Reg \times Expr) + (Reg \times Var)) \cup \{\top\}$$

The Implementation of the value analysis is shown below.

$$\begin{split} \mathsf{Ave\_Analyzer}(\mathit{C}, l_0) & \triangleq & \mathsf{Ave\_Analyzer'}(\mathit{C}, \mathbb{L}_0, \mathsf{dom}(\mathit{C}), n) \\ \mathit{where} & \mathbb{L}_0 = \{l \leadsto (\mathsf{T} :: \epsilon) \mid l \in \mathsf{dom}(\mathit{C})\} \{l_0 \leadsto (\emptyset :: \epsilon)\} \end{split}$$

$$\begin{array}{ll} \mathsf{Ave\_Analyzer'}(C,\mathbb{L},W,n) & \triangleq \\ & \qquad \qquad \\ & \qquad \qquad \\ \mathsf{Ave\_Analyzer'}(C,\mathbb{L}\{l \leadsto L_a\},W',n-1) & \qquad if \ l \in W, L_a = \bigcap\limits_{l_p \in \mathsf{pred}(C,l_p)} \mathsf{OUT}[\mathbb{L}(l_p)], \\ & \qquad \qquad \qquad \\ & \qquad \qquad \\ &$$

The value analysis on the whole program is defined as the following form. In this work, we only consider the intra-procedural analysis.

Ave\_Analyzer(
$$\pi$$
)  $\triangleq \{f \sim Ave_Analyzer(C, l_0) \mid \pi(f) = (C, l_0)\}$ 

We define the join of two abstract interpretations and some auxiliary definitions can be found in Fig. 32. The transfer function  $\mathsf{TF}_a$  for basic code blocks in the value analysis is defined below.

$$\mathsf{TF}_a(L_a,B) \triangleq \begin{cases} \top :: \epsilon & \text{if } L_a = \top \\ L_a :: \mathsf{TF}_a(L_a',B') & \text{elif } B = c :: B' \land L_a' = f_a(c,L_a) \\ L_a :: f_a(B,L_a) :: \epsilon & \text{elif } B \in \{\mathsf{return}, \mathsf{call}(\mathsf{f},l_{ret}), \mathsf{jmp}\ l, \mathsf{be}\ e,l_1,l_2\} \\ \mathsf{undef} & \text{otherwise} \end{cases}$$

We show the transfer function for each instruction in the following.

• Assignment operation

$$f_a(r := e, L_a) \triangleq \begin{cases} L'_a \cup \{(r', r)\} & \text{if } (r', \langle e \rangle_{L'_a}) \in L'_a \text{ and } r \notin \mathsf{fv}(\langle e \rangle_{L'_a}) \\ L'_a \cup \{(r, \langle e \rangle_{L'_a})\} & \text{if } (r', \langle e \rangle_{L'_a}) \notin L'_a \text{ and } r \notin \mathsf{fv}(\langle e \rangle_{L'_a}) \\ L'_a & \text{otherwise} \end{cases}$$

$$where \ L'_a = \mathsf{Kill}(L_a, r)$$

• Memory store operation

$$f_a(\mathsf{x}_{o,..} := e, L_a) \triangleq \mathsf{Kill}(L_a, \mathsf{x})$$

• Memory load operation

$$\begin{split} f_a(r := \mathsf{x}_\mathsf{na}, L_a) & \triangleq \begin{cases} L_a' \cup \{(r',r)\} & \text{if } (r',\mathsf{x}) \in L_a' \\ L_a' \cup \{(r,\mathsf{x})\} & \text{otherwise} \end{cases} \\ where \ L_a' = \mathsf{Kill}(L_a,r) \\ f_a(r := \mathsf{x}_\mathsf{rlx}, L_a) & \triangleq \mathsf{Kill}(L_a,r) \\ f_a(r := \mathsf{x}_\mathsf{acq}, L_a) & \triangleq \mathsf{Kill}(\{(r',e') \mid (r',e') \in L_a\}, r) \end{split}$$

• Compare and set operation

$$f_a(r := \mathsf{CAS}_{\mathsf{rlx},o_w}(\mathsf{x},e_r,e_w),L_a) \quad \triangleq \quad \mathsf{Kill}(\mathsf{Kill}(L_a,\mathsf{x}),r)$$

$$f_a(r := \mathsf{CAS}_{\mathsf{acg},o_w}(\mathsf{x},e_r,e_w),L_a) \quad \triangleq \quad \mathsf{Kill}(\{(r',e') \mid (r',e') \in L_a\},r)$$

• Release and acquire fence operations

$$f_a(\text{fence-rel}, L_a) \triangleq L_a$$
  
 $f_a(\text{fence-acq}, L_a) \triangleq \{(r, e) \mid (r, e) \in L_a\}$ 

• Unconditional and conditional branch

$$f_a(\text{jmp } l, L_a) \triangleq L_a$$
  $f_a(\text{be } e, l_1, l_2, L_a) \triangleq L_a$ 

• Function call, system call and SC fence

$$f_a(\mathsf{call}(l, l_{ret}), L_a) \triangleq \{(r, e) \mid (r, e) \in L_a\}$$

$$f_a(\mathsf{print}(e'), L_a) \triangleq \{(r, e) \mid (r, e) \in L_a\}$$

$$f_a(\mathsf{fence\text{-}sc}, L_a) \triangleq \{(r, e) \mid (r, e) \in L_a\}$$

• Return

$$f_a(\text{return}, L_a) \triangleq \emptyset$$

# 10.5 Constant propagation

• Transformation for an individual instruction.

$$\mathsf{Transl}_c(c,L_v) \ \triangleq \left\{ \begin{array}{ll} r := v & \textit{if } c = (r := e) \land \llbracket e \rrbracket_{L_v} = v \\ r := v & \textit{if } c = (r := \mathsf{x}_\mathsf{na}) \land L_v(\mathsf{x}) = v \\ c & \textit{otherwise} \end{array} \right.$$

• Transformation for a basic code block.

$$\mathsf{TransB}_c(B, LB) \quad \triangleq \quad \left\{ \begin{array}{ll} \mathsf{TransI}_c(c, L_v), \mathsf{TransB}_c(B', LB') & \quad \textit{if } B = c, B' \land LB = L_v :: LB' \\ B & \quad \textit{otherwise} \end{array} \right.$$

• Transformation for a code heap.

$$\operatorname{TransC}_c(C, \mathbb{L}) \triangleq \{l \sim \operatorname{TransB}_c(B, \mathbb{L}(l)) \mid C(l) = B\}$$

• Transformation for a program.

$$\mathsf{Translater}_c(\pi, A) \triangleq \{\mathsf{f} \leadsto \mathsf{TransC}_c(C, \mathbb{L}) \mid \pi(\mathsf{f}) = (C, l_0) \land A(\mathsf{f}) = \mathbb{L}\}$$

• Implementation of constant propagation.

$$ConstProp(\pi, \iota) \triangleq Translater_c(\pi, A)$$
 where  $A = PVal\_Analyzer(\pi)$ 

Our constant propagation optimization supports the following optimizations across the atomic memory access and the fence operation.

• Optimization across release store.

• Optimization across relaxed store.

• Optimization across release fence.

```
 \begin{cases} \{\} \\ \mathsf{x}_{\mathsf{na}} := 2; \\ \{\mathsf{x} \sim 2\} \\ \mathsf{fence-rel}; \end{cases} \xrightarrow{\mathit{ConstProp}}  \begin{array}{c} \mathsf{x}_{\mathsf{na}} := 2; \\ \mathsf{fence-rel}; \\ \{\mathsf{x} \sim 2\} \\ r := \mathsf{x}_{\mathsf{na}}; \\ \{\mathsf{x} \sim 2, r \sim 2\} \end{cases}   r := 2;   (* \textit{Performed in LLVM}^*)
```

• Optimize across CAS with relaxed read and release write.

(\* Not Performed in LLVM \*)

• Optimization across relaxed read.

#### 10.6 Dead code elimination

• Transformation for an individual instruction.

$$\mathsf{Transl}_d(c, L_{nl}) \triangleq \begin{cases} \mathsf{skip} & \textit{if } c = (r := e) \land r \in L_{nl} \\ \mathsf{skip} & \textit{if } c = (r := \mathsf{x}_{\mathsf{na}}) \land r \in L_{nl} \\ \mathsf{skip} & \textit{if } c = (\mathsf{x}_{\mathsf{na}} := \_) \land \mathsf{x} \in L_{nl} \\ c & \textit{otherwise} \end{cases}$$

• Transformation for a basic code block.

$$\begin{array}{ll} \mathsf{TransB}_d(B,LB_l) & \triangleq \\ & \left\{ \begin{array}{ll} \mathsf{TransI}_d(c,L_{nl}) :: \mathsf{TransB}_d(B',LB'_l) & & \textit{if } B=c :: B' \land LB_l = L_{nl} :: LB'_l \\ B & & \textit{otherwise} \end{array} \right.$$

• Transformation for a code heap.

$$\mathsf{TransC}_d(C, \mathbb{L}_l) \triangleq \{l \leadsto \mathsf{TransB}_d(B, \mathbb{L}_l(l)[1 \dots]) \mid C(l) = B \land \mathsf{IN}[\mathbb{L}_l(l)] \neq \top \}$$

• Transformation for a program.

Translater<sub>d</sub>
$$(\pi, A_l) \triangleq \{f \sim \text{TransC}(C, LB_l) \mid \pi(f) = (C, l_0) \land A_l(f) = LB_l\}$$

• Implementation of constant propagation.

$$DCE(\pi, \iota) \triangleq Translater_d(\pi, A_l)$$
 where  $A_l = PLv\_Analyzer(\pi)$ 

Our dead code elimination optimization supports the following optimizations across the atomic memory access and the fence operation.

• Optimize across acquire read.

$$\begin{cases} \mathsf{x}, r \rbrace \\ \mathsf{x}_{\mathsf{na}} \coloneqq 1; \\ \{\mathsf{x}, r \} \\ r \coloneqq \mathsf{y}_{\mathsf{acq}}; & \xrightarrow{\mathit{DCE}} & \mathsf{skip}; \\ \mathsf{x}_{\mathsf{na}} \coloneqq \mathsf{y}_{\mathsf{acq}}; \\ \mathsf{x}_{\mathsf{na}} \coloneqq 2; \\ \{ \}$$

(\* Not Performed in LLVM \*)

• Optimize across relaxed read.

$$\begin{cases} x, r \\ x_{na} := 1; \\ \{x, r \} & \text{skip;} \\ r := y_{r|x}; & \xrightarrow{DCE} & r := y_{r|x}; \\ \{x \} & x_{na} := 2; \\ \}$$

(\* Performed in LLVM \*)

• Optimize across acquire fence.

$$\begin{cases} \texttt{x} \\ \texttt{x}_{na} := 1; \\ \texttt{\{x\}} \\ \texttt{fence-acq}; & \xrightarrow{\textit{DCE}} & \texttt{skip}; \\ \texttt{fence-acq}; & \texttt{fence-acq}; \\ \texttt{x}_{na} := 2; \\ \texttt{\{}\} \\ \end{cases}$$

(\* Not Performed in LLVM \*)

• Optimize across CAS with acquire read and relaxed write.

```
 \begin{cases} \{\mathsf{x},r\} \\ \mathsf{x}_{\mathsf{na}} \coloneqq 1; \\ \{\mathsf{x},r\} \\ r \coloneqq \mathsf{CAS}_{\mathsf{acq},\mathsf{rlx}}(\mathsf{y},0,1); & \xrightarrow{\mathit{DCE}} \\ \mathsf{x}_{\mathsf{na}} \coloneqq 2; \\ \{ \} \end{cases}  \qquad \mathsf{skip}; \\ r \coloneqq \mathsf{CAS}_{\mathsf{acq},\mathsf{rlx}}(\mathsf{y},0,1); \\ \mathsf{x}_{\mathsf{na}} \coloneqq 2; \\ \mathsf{skip}; \\ \mathsf{x}_{\mathsf{na}} \coloneqq 2; \\ \mathsf{na} \coloneqq 2; \\ \mathsf{skip}; \\ \mathsf{na} \coloneqq 2; \\ \mathsf{na} \coloneqq 2;
```

(\* Not Performed in LLVM \*)

• Optimize across relaxed write.

```
 \begin{cases} x \\ x_{na} := 1; \\ \{x \} & \text{skip;} \\ y_{rlx} := 1; & \xrightarrow{\textit{DCE}} & y_{rlx} := 1; \\ \{x \} & x_{na} := 2; \\ \}
```

(\* Performed in LLVM \*)

### 10.7 Loop invariant code motion

The implement of the *loop invariant code motion* is divided into three steps:

- (1) detecting loops and loop invariants in the program;
- (2) inserting pre-header nodes to hoist the evaluations of loop invariants before entering loops;
- (3) reusing *common subexpression elimination* and *dead code elimination* to eliminate redundant reads and writes.

Note that the *loop invariant code motion* optimization will not move division operations out of loops, since it will make the original safe program abort. We use the following example to show such implementation.

while 
$$(r_1 < 100)$$
 {  $r_2 := z_{na}$ ;  $r_1 := r_1 + 1$ ; }

- (1) We find that the instruction " $r_2 := z_{na}$ " is a loop invariant;
- (2) We allocate a new register to save the expression in the loop invariant as the following shown.

$$t := z_{na};$$
  
while  $(r_1 < 100)$  {  
 $r_2 := z_{na};$   
 $r_1 := r_1 + 1;$   
}

(3) We use common subexpression elimination optimization to eliminate redundant reads.

$$t := z_{na};$$
  
while  $(r_1 < 100)$  {  
 $r_2 := t;$   
 $r_1 := r_1 + 1;$   
}

Detecting loops. To detect the loops in the code, we need to evaluate the dominator of each block block.

(Dominators) 
$$\mathbb{D} \in Lab \rightarrow \mathcal{P}(Lab)$$

We use the data flow analysis to evaluate the dominators of each block (where n is a very large constant).

```
 \begin{split} \mathsf{Dominator}(C,l_0) & \triangleq \ \mathsf{Dominator}'(C,\mathbb{D}_0, \mathsf{dom}(C), n) \\ & where \ \mathbb{D}_0 = \{l \sim \mathsf{dom}(C) \mid l \in \mathsf{dom}(C)\} \{l_0 \sim \emptyset\} \\ \mathsf{Dominator}'(C,\mathbb{D},W,n) & \triangleq \\ & \left\{ \begin{array}{l} \mathsf{Dominator}'(C,\mathbb{D}\{l \sim D'\},W',n-1) & \text{if } l \in W,D = \bigcap\limits_{l_p \in \mathsf{pred}(C,l)} (\mathbb{D}(l_p) \cup \{l_p\}), \\ & (D \neq \mathbb{D}(l) \implies W' = ((W \setminus \{l\}) \cup \mathsf{succ}(l))), \\ & (D = \mathbb{D}(l) \implies W' = (W \setminus \{l\})) \end{array} \right. \\ & \left\{ \begin{array}{l} \mathsf{Dominator}'(C,\mathbb{D}\{l \sim D'\},W',n-1) & \text{if } l \in W,D = \bigcap\limits_{l_p \in \mathsf{pred}(C,l)} (\mathbb{D}(l_p) \cup \{l_p\}), \\ & (D \neq \mathbb{D}(l) \implies W' = ((W \setminus \{l\})) \\ & \text{otherwise} \end{array} \right.
```

After evaluating the dominators of each block, we can find the loop.  $l_{entry}$  and  $l_{exit}$  are entry and exit of a loop if  $l_{exit}$  points to  $l_{entry}$  and  $l_{entry}$  dominates  $l_{exit}$ .  $l_{entry}$  and  $l_{exit}$  constructs back edge.

$$back_edge(l_{exit}, l_{entry}, \mathbb{D}, C) \triangleq l_{entry} \in succ(C(l_{exit})) \land (l_{entry} \in \mathbb{D}(l_{exit}))$$

Fig. 33. The result of detecting loops and loop invariants

The blocks in the loop whose entry and exit are  $l_{entry}$  and  $l_{exit}$  are evaluated below. natural\_loop defined below returns the identifiers of blocks in the loop. The function det\_loops returns the loops in the code C. A block with identifier l is in a loop, whose entry and exit are  $l_{entry}$  and  $l_{exit}$ , if  $l_{entry}$  is the dominator of l and l can reach the exit  $l_{exit}$ .

```
\begin{aligned} \mathsf{Reach}(l,l',C,l_{entry}) & \triangleq & \exists l_0,\dots,l_n \in \mathsf{dom}(C). \ (\forall i \in \{1,\dots,n\}.\ l_i \in \mathsf{succ}(C(l_{i\!-\!1})) \land l_{i\!-\!1} \neq l_{entry}) \\ & \land l_0 \in \mathsf{succ}(C(l)) \land l' \in \mathsf{succ}(C(l_n)) \end{aligned}  \land l_0 \in \mathsf{succ}(C(l)) \land l' \in \mathsf{succ}(C(l_n))   \land l_0 \in \mathsf{succ}(C(l_n)) \land l' \in \mathsf{succ}(C(l_n))   \land l_0 \in \mathsf{succ}(C(l_n)) \land l' \in \mathsf{succ}(C(l_n))   \land l_0 \in \mathsf{succ}(C(l_n)) \land l_{entry}, C, l_{entry})   \land l_0 \in \mathsf{loops}(C, l_0) \land l_{entry}, l_{exit}, l_0) \land l_{entry} \in \mathsf{dom}(C) \land l_{exit} \in \mathsf{dom}(C)   \land l_0 \in \mathsf{succ}(C(l_n)) \land l_{entry}, l_{exit}, C, \mathbb{D}) \land l_{entry} \in \mathsf{dom}(C) \land l_{exit} \in \mathsf{dom}(C)   \land l_0 \in \mathsf{succ}(C(l_n)) \land l_0 \in \mathsf{succ}(C(l_n))   \land l_0 \in \mathsf{succ}(C(l_n)) \land l_0 \in \mathsf{succ}(C(l_n)) \land l_0 \in \mathsf{succ}(C(l_n)) \land l_0 \in \mathsf{succ}(C(l_n))   \land l_0 \in \mathsf{succ}(C(l_n)) \land l_0 \in \mathsf{succ}(C(l_n)) \land l_0 \in \mathsf{succ}(C(l_n))   \land l_0 \in \mathsf{succ}(C(l_n)) \land l_0 \in \mathsf{succ}(C(l_n)) \land l_0 \in \mathsf{succ}(C(l_n))   \land l_0 \in \mathsf{succ}(C(l_n)) \land l_0 \in \mathsf{succ}(C(l_n))   \land l_0 \in \mathsf{succ}(C(l_n)) \land l_0 \in \mathsf{succ}(C(l_n))   \land l_0 \in \mathsf{succ}(C(l_n)) \land l_0 \in \mathsf{succ}(C(l_n))   \land l_0 \in \mathsf{succ}(C(l_n)) \land l_0 \in \mathsf{succ}(C(l_n))   \land l_0 \in \mathsf{succ}(C(l_n)) \land l_0 \in \mathsf{succ}(C(l_n))   \land l_0 \in \mathsf{succ}(C(l_n)) \land l_0 \in \mathsf{succ}(C(l_n))   \land l_0 \in \mathsf{succ}(C(l_n)) \land l_0 \in \mathsf{succ}(C(l_n))   \land l_0 \in \mathsf{succ}(C(l_n)) \land l_0 \in \mathsf{succ}(C(l_n))   \land l_0 \in \mathsf{succ}(C(l_n)) \land l_0 \in \mathsf{succ}(C(l_n))   \land l_0 \in \mathsf{succ}(C(l_n)) \land l_0 \in \mathsf{succ}(C(l_n))   \land l_0 \in \mathsf{succ}(C(l_n)) \land l_0 \in \mathsf{succ}(C(l_n))   \land l_0 \in \mathsf{succ}(C(l_n)) \land l_0 \in \mathsf{succ}(C(l_n))   \land l_0 \in \mathsf{succ}(C(l_n)) \land l_0 \in \mathsf{succ}(C(l_n))   \land l_0 \in \mathsf{succ}(C(l_n)) \land l_0 \in \mathsf{succ}(C(l_n))   \land l_0 \in \mathsf{succ}(C(l_n)) \land l_0 \in \mathsf{succ}(C(l_n))   \land l_0 \in \mathsf{succ}(C(l_n)) \land l_0 \in \mathsf{succ}(C(l_n))   \land l_0 \in \mathsf{succ}(C(l_n)) \land l_0 \in \mathsf{succ}(C(l_n))   \land l_0 \in \mathsf{succ}(C(l_n)) \land l_0 \in \mathsf{succ}(C(l_n))   \land l_0 \in \mathsf{succ}(C(l_n)) \land l_0 \in \mathsf{succ}(C(l_n))   \land l_0 \in \mathsf{succ}(C(l_n)) \land l_0 \in \mathsf{succ}(C(l_n))   \land l_0 \in \mathsf{succ}(C(l_n)) \land l_0 \in \mathsf{succ}(C(l_n))   \land l_0 \in \mathsf{succ}(C(l_n)) \land l_0 \in \mathsf{succ}(C(l_n
```

Loop invariants. A loop invariant is a non-atomic read of the variable or an evaluation of a expression, whose result is the same on every iteration of the loop. B[i] represents the i-th instruction in the block B. We use  $a \in ls$  to represent that a is an element in the list. The evaluation of the expression e is a loop invariant, if the registers in e is not updated in the loop. The reading of a variable x is a loop invariant, if there is no write to x in the loop. The parameter B in loop\_invB is the set of blocks in the loop.

```
\begin{split} & \mathsf{loop\_invB}(B,\mathbb{B},RS,\mathsf{loop\_inv},\iota) \ \triangleq \\ & \left\{ \begin{array}{l} \mathsf{loop\_invB}(B',\mathbb{B},RS \cup \{r'\},(e,r') :: \mathsf{loop\_inv},\iota) & \text{ } if \ B = (r := e),B', \\ & (\forall r_0 \in \mathsf{fv}(e). \ \neg (\exists B_0 \in \mathbb{B},i. \ B_0[i] = (r_0 := \_)), \\ & (e,\_) \notin \mathsf{loop\_inv},r' \notin RS \end{array} \right. \\ & \left\{ \begin{array}{l} \mathsf{loop\_invB}(B',\mathbb{B},RS \cup \{r'\},(\mathsf{x},r') :: \mathsf{loop\_inv},\iota) & \text{ } elif \ B = (r := \mathsf{x_{na}},B',\mathsf{x} \notin \iota, \\ & \neg (\exists B_0 \in \mathbb{B},i. \ B_0[i] = (\mathsf{x_{o_w}} := e)), \\ & (\mathsf{x},\_) \notin \mathsf{loop\_inv},r' \notin RS \end{array} \right. \\ & \left\{ \begin{array}{l} \mathsf{loop\_invB}(B',\mathbb{B},RS,\mathsf{loop\_inv},\iota) & \text{ } elif \ B = c :: B' \\ & (\mathsf{loop\_invBS}(\mathbb{B}_0,\mathbb{B},RS,\mathsf{loop\_inv},\iota) & \text{ } elif \ B \in \mathbb{B}_0,\mathbb{B}_0' = \mathbb{B} \backslash \{B\} \\ & \mathsf{loop\_invBS}(\mathbb{B}_0',\mathbb{B},RS',\mathsf{loop\_inv}',\iota) & \text{ } otherwise \end{array} \right. \\ & \left\{ \begin{array}{l} \mathsf{loop\_invB}(B,\mathbb{B},RS,\mathsf{loop\_inv},\iota) = (\mathsf{loop\_inv}',RS') \\ & \text{ } otherwise \end{array} \right. \end{aligned}
```

loop\_invC defined below returns the loop invariants of each loop in the code *C*.

$$\begin{cases} \{l_{entry}, l_{exit}, \mathsf{loop\_invC'}(lps, RS, C, \iota) & \triangleq \\ & \begin{cases} \{l_{entry}, l_{exit}, \mathsf{loop\_invC'}(lps', RS', C, \iota) & \text{if } lps = ((l_{entry}, l_{exit}, ls) \cup lps'), (l_{entry}, l_{exit}, ls) \notin lps', \\ & \mathbb{B} = \{B \mid C(l) = B \land l \in ls\}, \\ & \mathsf{loop\_invBS}(\mathbb{B}, \mathbb{B}, RS, \epsilon, \iota) = (\mathsf{loop\_inv}, RS') \end{cases} \\ & \mathsf{otherwise} \\ \\ \mathsf{loop\_invC}(C, l_0, \iota) & \triangleq \mathsf{loop\_invC'}(lps, \mathsf{fv}(C), C, \iota) \\ & & where \ lps = \mathsf{det\_loops}(C, l_0) \end{cases}$$

We define det\_loop\_inv to evaluate the loop invariants in each function.

$$det\_loop\_inv(\pi, \iota) \triangleq \{f \leadsto loop\_invC(C, l_0, \iota) \mid \pi(f) = (C, l_0)\}$$

The implementation of detecting loops and loop invariants need to ensure the following property.

LEMMA 10.1 (Well-formed detecting loops and loop invariants).

$$\begin{split} \forall \pi, \mathsf{loops\_P}, \mathsf{f}, l_{entry}, l_{exit}, C, \iota. \\ & \mathsf{det\_loop\_inv}(\pi, \iota) = \mathsf{loops\_P} \land \\ & (l_{entry}, l_{exit}, \mathsf{loop\_inv}) \in \mathsf{loops\_P}(\mathsf{f}) \land \pi(\mathsf{f}) = (C, \_) \land \\ & (\_, r) \in \mathsf{loop\_inv} \\ \Longrightarrow r \notin \mathsf{fv}(C) \land l_{entry} \in \mathsf{dom}(C) \land l_{exit} \in \mathsf{dom}(C) \land \\ & (\forall (\mathsf{x}, \_) \in \mathsf{loop\_inv}. \ \mathsf{x} \notin \iota) \end{split}$$

PROOF. The correctness of Lemma 10.1 is straight-forward from the implementation of det\_loop\_inv. Since, we always allocate a new register to save the result of the loop invariants, we have  $r \notin fv(C)$ . According to the definition of det\_loops, we have  $l_{entry} \in dom(C)$  and  $l_{exit} \in dom(C)$ . According to the definition of loop\_invB, since we only view the non-atomic read whose result is the same on every iteration of the loop as the invariant. We have  $\forall (x, _) \in loop_inv$ .  $x \notin l$ .

We define the allocation of pre-header in the following. We divide it into two steps. Consider a loop in the following form.



(1) We first allocate a new block according to the loop invariants of such loop as the pre-header of the entry of such loop.



(2) We let the nodes that are not the exit but point to the entry node of the loop point to the pre-header.



We first give the definition of allocating pre-header.

$$\text{alloc\_ph(loop\_inv}, l_{entry}) \triangleq \begin{cases} r := e, \text{alloc\_ph(loop\_inv'}, l_{entry}) & \text{if loop\_inv} = (e, r) :: \text{loop\_inv'} \\ r := x_{\text{na}}, \text{alloc\_ph(loop\_inv'}, l_{entry}) & \text{if loop\_inv} = (x, r) :: \text{loop\_inv'} \\ \text{jmp } l_{entry} & \text{otherwise} \end{cases}$$

$$\text{consInv(loop\_inv'}, (r := e, B)) & \text{if loop\_inv} = (e, r) :: \text{loop\_inv'} \\ \text{consInv(loop\_inv'}, (r := x, B)) & \text{if loop\_inv} = (e, x) :: \text{loop\_inv'} \\ B & \text{otherwise} \end{cases}$$

We give a mapping that records the pre-header of the entry of each loop.

$$(PreHeader)$$
 pre-header  $\in Lab \rightarrow Lab$ 

We define the transformation for function in *loop invariant code motion* formally below.

```
TransC'(C, pre-header, loops, loops_0) \triangleq
                  TransC'(C', pre-header, loops', loops_0)
                                                                                                                   if (l_{entrv}, l_{exit}, loop_inv) \uplus loops' = loops,
                                                                                                                               pre-header(l_{entry}) = l_{ph}, C(l_{ph}) = B_{ph},
            B' = \mathsf{consinv}(\mathsf{loop}_{\_}\mathsf{inv}, \varSigma_{pn}) \mathsf{if} \ (l_{entry}, l_{exit}, \mathsf{loop}_{\_}\mathsf{inv}) \uplus \mathsf{loops'} = \mathsf{loops}, \mathsf{pre-header}(l_{entry}) = \bot, l_{ph} \not\in \mathsf{dom}(C), B_{ph} = \mathsf{alloc}_{\_}\mathsf{ph}(\mathsf{loop}_{\_}\mathsf{inv}, l_{entry}), \mathsf{pre-header'} = \mathsf{pre-header}\{l_{entry} \leadsto l \ C' = \mathsf{ptC-ph}(C, l_{ph}, l_{entry}, \mathsf{loops}_0) \cup \{ \ . \ . \ .
                                                                                                                               B' = \text{consInv}(\widehat{\text{loop\_inv}}, B_{ph}) and C' = C\{l \sim B'\}
                                                                                                                               pre-header' = pre-header\{l_{entry} \sim l_{ph}\} and
                                                                                                                               C' = \mathsf{ptC-ph}(C, l_{ph}, l_{entry}, \mathsf{loops}_0) \cup \{l_{ph} \leadsto B_{ph}\}\
                                                                                                                         otherwise
```

$$\mathsf{TransC}(C, l_0, \mathsf{loops}) \ \triangleq \left\{ \begin{array}{ll} (C', l_0') & \textit{if } (C', \mathsf{pre-header}) = \mathsf{TransC}'(C, \emptyset, \mathsf{loops}, \mathsf{loops}) \textit{ and } \\ & \mathsf{pre-header}(l_0) = l_0' \\ (C', l_0) & \textit{if } (C', \mathsf{pre-header}) = \mathsf{TransC}'(C, \emptyset, \mathsf{loops}, \mathsf{loops}) \textit{ and } \\ & \mathsf{pre-header}(l_0) = \bot \\ & \mathsf{undef} & \textit{otherwise} \end{array} \right.$$

We give the transformation for program in loop invariant code motion formally below.

$$\begin{split} \mathsf{LInv}(\pi, \iota) & \triangleq & \{\mathsf{f} \leadsto (C', l_0') \mid \pi(\mathsf{f}) = (C, l_0) \land \mathsf{loops\_P}(l_0) = \mathsf{loops} \land \\ & \mathsf{TransC}(C, l_0, \mathsf{loops}) = (C', l_0') \} \\ & \textit{where } \mathsf{loops\_P} = \mathsf{det\_loop\_inv}(\pi, \iota) \\ & \mathsf{LICM} & \triangleq & \mathsf{LInv} \circ \mathsf{CSE} \end{split}$$

### 10.8 Common subexpression elimination

• Transformation for an individual instruction.

$$\mathsf{Transl}_\mathit{cse}(c, L_a) \ \triangleq \left\{ \begin{array}{l} r := r' \qquad \ \, if \ c = (r := e) \wedge (r', e) \in L_a \\ r := r' \qquad \ \, if \ c = (r := \mathsf{x}_\mathsf{na}) \wedge (r', \mathsf{x}) \in L_a \\ c \qquad \qquad otherwise \end{array} \right.$$

• Transformation for a basic code block.

$$\mathsf{TransB}_{\mathit{cse}}(B, LB) \triangleq \begin{cases} \mathsf{TransI}_{\mathit{cse}}(c, L_a) :: \mathsf{TransB}_{\mathit{cse}}(B', LB') & \textit{if } B = c :: B' \land LB = L_a :: LB' \\ B & \textit{otherwise} \end{cases}$$

• Transformation for a code heap.

$$\mathsf{TransC}_{\mathit{cse}}(C, \mathbb{L}) \triangleq \{l \leadsto \mathsf{TransB}_{\mathit{cse}}(B, \mathbb{L}(l)) \mid C(l) = B\}$$

• Transformation for a program.

$$\mathsf{Translater}_{\mathit{cse}}(\pi, A) \triangleq \{\mathsf{f} \leadsto \mathsf{TransC}_{\mathit{cse}}(C, \mathbb{L}) \mid \pi(\mathsf{f}) = (C, l') \land A(\mathsf{f}) = \mathbb{L}\}$$

• Implementation of constant propagation.

$$CSE(\pi, \iota) \triangleq Translater_{cse}(\pi, A)$$
 where  $A = Ave\_Analyzer(\pi)$ 

Our common subexpression elimination optimization supports the following optimizations across the atomic memory access and the fence operation.

• Optimization across release store.

• Optimization across relaxed store.

$$\begin{cases} \{r, x\} \\ (r, x) \} \\ y_{r|x} := 1; \end{cases} \xrightarrow{CSE} \begin{cases} r := x_{na}; \\ (r, x) \} \\ r' := x_{na}; \\ \{(r, x), (r, r'), (r', x) \} \end{cases}$$

• Optimization across release fence.

• Optimize across CAS with relaxed read and release write.

• Optimization across relaxed read.

$$\begin{cases} \{r, x\} \\ r_{1} := y_{rlx}; \\ \{(r, x)\} \\ r_{1} := y_{rlx}; \end{cases} \xrightarrow{CSE} \begin{cases} r := x_{na}; \\ r_{1} := y_{rlx}; \\ r' := r; \end{cases}$$

$$r' := r;$$

$$\{(r, x), (r, r'), (r', x)\}$$

$$\frac{M_t = M_s \quad S_t = S_s \quad ||M_t|| = \operatorname{dom}(\varphi)}{(\forall (\mathsf{x}, t) \in \operatorname{dom}(\varphi). \ \varphi(\mathsf{x}, t) = t)}$$
$$\frac{I_{cp}(\varphi, (S_t, M_t), (S_s, M_s))}{}$$

Fig. 34. Invariant in constant propagation proof

$$(R, \mathcal{V}, M) \models \{r \leadsto v\} \quad ::= \quad R(r) = v$$

$$(R, \mathcal{V}, M) \models \{x \leadsto v\} \quad ::= \quad \exists t. \ \mathcal{V}. \text{cur.} T_{\text{na}}(x) = t \land \langle x : v@(\_, t], \_) \in M$$

$$(R, \mathcal{V}, M) \models_{t} L_{v} \quad ::= \quad (\forall r, v. \ L_{v}(r) = v \implies (R, \mathcal{V}, M) \models_{t} \{r \leadsto v\}) \land \\ \quad (\forall x, v. \ L_{v}(x) = v \implies ((R, \mathcal{V}, M) \models_{t} \{x \leadsto v\} \land x \notin t))$$

$$\text{Val\_Analyzer}(C_{s}, l_{0}) = \mathbb{L}_{v} \quad \text{TransS}_{c}(C_{s}, \mathbb{L}_{v}) = C_{t}$$

$$B_{s} = C_{s}(l')[i \dots] \quad \text{TransB}_{c}(B_{s}, LB_{v}) = B_{t}$$

$$R_{t} = R_{s} \quad (R_{s}, \mathcal{V}_{s}, M_{s}) \models_{t} \text{IN}[LB_{v}]$$

$$\forall l_{p} \in \text{succ}(B_{s}). \text{ OUT}[LB_{v}] \geq \text{IN}[\mathbb{L}_{v}(l_{p})]$$

$$\mathcal{V}_{s}, M_{s}, \iota \vdash (R_{t}, B_{t}, C_{t}) \sim_{cp} (R_{s}, B_{s}, C_{s})$$

$$\frac{K_{t} = K_{s} = \epsilon}{\iota \vdash K_{t} \sim_{cp} K_{s}} \quad \iota \vdash ((R_{t}, B_{t}, C_{t}) :: K'_{t}) \sim_{cp} ((R_{s}, B_{s}, C_{s}) :: K'_{s})$$

$$PVal\_\text{Analyzer}(\pi_{s}) = A \quad \text{Translater}_{c}(\pi_{s}, A) = \pi_{t}$$

$$\frac{\mathcal{V}_{s}, M_{s}, \iota \vdash (R_{t}, B_{t}, C_{t}) :: K'_{t}) \sim_{cp} (R_{s}, B_{s}, C_{s}) :: K'_{s}}{\iota \vdash (G_{t}, B_{t}, C_{t}, K_{t}, \pi_{t}) \sim_{cp} (R_{s}, B_{s}, C_{s}, K_{s}, \pi_{s})}$$

$$\frac{\mathcal{V}_{s}, M_{s}, \iota \vdash (R_{t}, B_{t}, C_{t}, K_{t}, \pi_{t}) \sim_{cp} (R_{s}, B_{s}, C_{s}, K_{s}, \pi_{s})}{\iota \vdash (\sigma_{t}, \mathcal{V}_{t}, P_{t}) \sim_{cp} ((\sigma_{s}, \mathcal{V}_{s}, P_{s}), M_{s})}$$

$$\frac{\mathcal{V}_{s}, M_{s}, \iota \vdash \sigma_{t} \sim_{cp} \sigma_{s} \quad \mathcal{V}_{t} = \mathcal{V}_{s} \quad P_{t} = P_{s}}{\iota \vdash (\sigma_{t}, \mathcal{V}_{t}, P_{t}) \sim_{cp} ((\sigma_{s}, \mathcal{V}_{s}, P_{s}), M_{s})}$$

$$\frac{\mathcal{V}_{c}(\mathcal{V}_{s}, M_{s}, \iota \vdash \sigma_{t}) \cap_{cp} ((\sigma_{s}, \mathcal{V}_{s}, P_{s}), M_{s})}{\iota \vdash (\sigma_{t}, \mathcal{V}_{t}, P_{t}) \sim_{cp} ((\sigma_{s}, \mathcal{V}_{s}, P_{s}), M_{s})}$$

$$\frac{\mathcal{V}_{c}(\mathcal{V}_{s}, M_{s}, \iota \vdash \sigma_{t}) \cap_{cp} ((\sigma_{s}, \mathcal{V}_{s}, P_{s}), M_{s})}{\iota \vdash (\sigma_{t}, \mathcal{V}_{t}, P_{t}) \sim_{cp} ((\sigma_{s}, \mathcal{V}_{s}, P_{s}), M_{s})}$$

$$\frac{\mathcal{V}_{c}(\mathcal{V}_{s}, M_{s}, \iota \vdash (\Gamma_{s}, P_{t}, \Gamma_{s}, P_{s}), M_{s})}{\iota \vdash (\sigma_{t}, \mathcal{V}_{t}, P_{t}) \sim_{cp} ((\sigma_{s}, \mathcal{V}_{s}, P_{s}), M_{s})}$$

Fig. 35. Match state in constant propagation proof

# CORRECTNESS PROOF OF OPTIMIZERS

In this section, we show the correctness proof of Constant Propagation, Dead Code Elimination, Loop invariant code motion and Common subexpression elimination.

### Correctness proof of Constant Propagation

Invariant in constant propagation proof. We show the invariant  $I_{cp}$  for shared resource in Fig. 34.

Match state in constant propagation proof. We define the match state in constant propagation proof in Fig. 35.

Correctness proof of constant propagation optimizer. We present the correctness proof of constant propagation optimizer in the following.

LEMMA 11.1 (WELL-DEFINED CONSTANT PROPAGATION).

$$\forall \pi_s, \pi_t, \iota$$
. ConstProp $(\pi_s, \iota) = \pi_t \implies I_{cp}, \iota \models \pi_t \preccurlyeq \pi_s$ 

PROOF. From the premise, we have the following.

$$ConstProp(\pi_s, \iota) = \pi_t \tag{1}$$

We unfold the proof goal and need to prove that the following subgoals hold.

$$I_{cp}(\iota, \varphi_0, (\mathcal{S}_\perp, M_0), (\mathcal{S}_\perp, M_0)) \tag{g-1}$$

$$\forall \sigma_t, f. \ \operatorname{Init}(\pi_t, f) = \sigma_t \Longrightarrow$$

$$\exists \sigma_s. (Init(\pi_s, f) = \sigma_s \land (g-2))$$

$$I_{cp}, \iota \models ((\sigma_t, \mathcal{V}_{\perp}, \emptyset), \mathcal{S}_{\perp}, M_0) \preccurlyeq_{\varphi}^{\circ, \emptyset} ((\sigma_s, \mathcal{V}_{\perp}, \emptyset), \mathcal{S}_{\perp}, M_0))$$

The subgoal (g-1) can be proved by definitions directly.

We consider the correctness proof of the subgoal (g-2). We have the following.

$$I_{cb}(\iota, \varphi_0, (\mathcal{S}_\perp, M_0, \mathcal{S}_\perp, M_0)) \tag{2}$$

$$Init(\pi_t, f) = \sigma_t \tag{3}$$

We unfold (2) and have that there exist  $C_t$ ,  $l_0$  and  $B_t$  such that:

$$\pi_t(f) = (C_t, l_0) \wedge C_t(l_0) = B_t$$
 (4)

$$\sigma_t = (R_\perp, B_t, C_t, \epsilon, \pi_t) \tag{5}$$

We unfold (1) and have that there exists *A* such that:

PVal Analyzer(
$$\pi_s$$
) =  $A$  (6)

$$Translater_c(\pi_s, A) = \pi_t \tag{7}$$

From (4) and (7), we have that there exist  $\pi_s$ ,  $B_s$  and  $\sigma_s$  such that:

$$\pi_s(f) = (C_s, l_0) \wedge C_s(l_0) = B_s$$
 (8)

$$TransC_c(C_s, A(f)) = C_t$$
(9)

$$TransB_c(B_s, A(f)(l_0)) = B_t$$
(10)

$$Init(\pi_s, f) = \sigma_s \tag{11}$$

$$\sigma_{s} = (R_{\perp}, B_{s}, C_{s}, \epsilon, \pi_{s}) \tag{12}$$

From (2), (4), (5), (6), (7), (8), (9), (10) and (12), we prove that the following hold.

$$\Phi_{cp}(\varphi_0, \iota, ((\sigma_t, \mathcal{V}_\perp, \emptyset), \mathcal{S}_\perp, M_0), ((\sigma_s, \mathcal{V}_\perp, \emptyset), \mathcal{S}_\perp, M_0), \circ, \emptyset)$$
(13)

By applying Lemma. 11.2 on (13), we prove the following.

$$I_{cp}, \iota \models ((\sigma_t, \mathcal{V}_{\perp}, \emptyset), \mathcal{S}_{\perp}, M_0) \preccurlyeq^{\circ, \emptyset}_{\varphi_0} ((\sigma_s, \mathcal{V}_{\perp}, \emptyset), \mathcal{S}_{\perp}, M_0)$$

LEMMA 11.2 (MATCH STATE IMPLIES SIMULATION - CONSTPROP).

$$\forall \varphi, \iota, TS_t, \mathcal{S}_t, M_t, TS_s, \mathcal{S}_s, M_s, \beta.$$

$$\Phi_{cp}(\varphi, \iota, (TS_t, \mathcal{S}_t, M_t), (TS_s, \mathcal{S}_s, M_s), \beta)$$

$$\Longrightarrow I_{cp}, \iota \models (TS_t, \mathcal{S}_t, M_t) \preccurlyeq_{\theta}^{\theta, \emptyset} (TS_s, \mathcal{S}_s, M_s)$$

PROOF. By co-induction. From the premise, we know

$$\Phi_{cp}(\varphi, \iota, (TS_t, \mathcal{S}_t, M_t), (TS_s, \mathcal{S}_s, M_s), \beta) \tag{1}$$

We need to prove that the following hold.

(1) for any  $TS'_t$ ,  $S'_t$ ,  $M'_t$  and te, if

$$\iota \vdash (TS_t, \mathcal{S}_t, M_t) \xrightarrow{te} (TS_t', \mathcal{S}_t', M_t') \tag{2}$$

then, we need to prove that the following hold:

• if  $te \in AT$ , there exist  $TS'_s$ ,  $S'_s$ ,  $M'_s$  and  $\varphi'$  such that

$$\iota \vdash (TS_s, \mathcal{S}_s, M_s) \xrightarrow{na}^* \xrightarrow{te} (TS'_s, \mathcal{S}'_s, M'_s)$$
 (g1.1)

$$\varphi \subseteq \varphi' \wedge I_{cp}(\iota, \varphi', (\mathcal{S}'_t, M'_t, \mathcal{S}'_s, M'_s)) \tag{g1.2}$$

$$I_{cp}, \iota \models (TS'_t, S'_t, M'_t) \preccurlyeq^{\circ, \emptyset}_{\omega'} (TS'_s, S'_s, M'_s)$$
(g1.3)

By applying Lemma. 11.3 on (1) and the preserving of the match state  $\Phi_{cp}$ .

$$\Phi_{cp}(\varphi', \iota, (TS'_t, S'_t, M'_t), (TS'_s, S'_s, M'_s), \circ) \tag{3}$$

We prove the subgoal (g1.2) and (g1.3) by co-inductive hypothesis and (3).

• if  $te \in NA$ , there exist  $TS'_s$ ,  $S'_s$ ,  $M'_s$ , and  $\mathcal{D}_1$ , such that:

$$(TS_t.P, M_t), (TS_t'.P, M_t') \vdash \emptyset \stackrel{te}{\sim} \mathcal{D}_1$$
 (g2.1)

$$\iota \vdash (TS_s, \mathcal{S}_s, M_s, \mathcal{D}_1) \xrightarrow{na} (TS'_s, \mathcal{S}'_s, M'_s, \emptyset)$$
 (g2.2)

$$I_{cp}, \iota \models (TS'_t, S'_t, M'_t) \preccurlyeq_{\varphi}^{\bullet, \emptyset} (TS'_s, S'_s, M'_s)$$
(g2.3)

We consider that, if  $te \in \{R(na, x, v), W(na, x, v)\}$ , the subgoals (g2.1), (g2.2), (g2.3) can be proved by applying Lemma. 11.4 on (1) and (2). And, if  $te = \tau$ , the subgoals (g2.1), (g2.2), (g2.3) can be proved by applying Lemma. 11.5 on (1) and (2). We prove the preserving of the match state  $\Phi_{cp}$ .

$$\Phi_{cp}(\varphi, \iota, (TS'_t, S'_t, M'_t), (TS'_s, S'_s, M'_s), \bullet)$$

$$\tag{4}$$

We prove the subgoal (g2.3) by co-inductive hypothesis and (4).

• if  $te \in \{\text{prm, rsv, ccl}\}$ , the proof is similar with the case that  $te \in (Atm \cup \text{out}(v))$ . Thus, we omit the proof of these cases.

(2) if  $\beta = 0$ , let  $\mathbb{S} = (S_t, M_t, S_s, M_s)$  and for any  $\varphi'$  and  $\mathbb{S}' = (S_t', M_t', S_s', M_s')$ , if

$$R(\iota, (\varphi, \mathbb{S}), (\varphi', \mathbb{S}'), TS_t.P, TS_s.P) \wedge I_{cp}(\iota, \varphi', \mathbb{S}')$$
(5)

we need to prove the following hold:

$$I_{cp}, \iota \models (TS_t, \mathcal{S}'_t, M'_t) \preccurlyeq^{\circ, \emptyset}_{\varphi'} (TS_s, \mathcal{S}'_s, M'_s)$$
(g3.2)

By applying Lemma. 11.6 on (1) and (5), we have the following.

$$\Phi_{cp}(\varphi', \iota, (TS_t, \mathcal{S}'_t, M'_t), (TS_s, \mathcal{S}'_s, M'_s), \circ)$$

$$\tag{6}$$

We prove the subgoal (g3.2) by co-inductive and (6).

(3) if  $\iota \vdash (TS_t, S_t, M_t) \longrightarrow \mathbf{done}$ , the proof of such case is straight-forward and we omit the proof details.

(4) if  $\iota \vdash (TS_t, S_t, M_t) \longrightarrow \mathbf{abort}$ , there exist  $TS'_s, S'_s$  and  $M'_s$  such that:

$$\iota \vdash (TS_s, \mathcal{S}_s, M_s) \xrightarrow{na}^* (TS_s', \mathcal{S}_s', M_s') \tag{7}$$

$$\iota \vdash (TS'_s, S'_s, M'_s) \longrightarrow \mathbf{abort}$$
 (8)

We finish the proof of such case by applying Lemma. 11.7.

Lemma 11.3 (Match state CP Preserving - Atomic&Output).

$$\forall \varphi, \iota, TS_{t}, \mathcal{S}_{t}, M_{t}, TS_{s}, \mathcal{S}_{s}, M_{s}, \beta, TS'_{t}, S'_{t}, M'_{t}, te.$$

$$\Phi_{cp}(\varphi, \iota, (TS_{t}, \mathcal{S}_{t}, M_{t}), (TS_{s}, \mathcal{S}_{s}, M_{s}), \beta) \wedge$$

$$\iota \vdash (TS_{t}, \mathcal{S}_{t}, M_{t}) \xrightarrow{te} (TS'_{t}, \mathcal{S}'_{t}, M'_{t}) \wedge te \in AT$$

$$\Longrightarrow \exists TS'_{s}, \mathcal{S}'_{s}, M'_{s}, \varphi'.$$

$$\iota \vdash (TS_{s}, \mathcal{S}_{s}, M_{s}) \xrightarrow{te} (TS'_{s}, \mathcal{S}'_{s}, M'_{s}) \wedge$$

$$\varphi \subseteq \varphi' \wedge \Phi_{cp}(\varphi', \iota, (TS'_{t}, \mathcal{S}'_{t}, M'_{t}), (TS'_{s}, \mathcal{S}'_{s}, M'_{s}), \circ)$$

LEMMA 11.4 (MATCH STATE CP PRESERVING - NON-ATOMIC READ/WRITE).

$$\forall \varphi, \iota, TS_{t}, \mathcal{S}_{t}, M_{t}, TS_{s}, \mathcal{S}_{s}, M_{s}, \beta, TS'_{t}, \mathcal{S}'_{t}, M'_{t}, te.$$

$$\Phi_{cp}(\varphi, \iota, (TS_{t}, \mathcal{S}_{t}, M_{t}), (TS_{s}, \mathcal{S}_{s}, M_{s}), \beta) \land$$

$$\iota \vdash (TS_{t}, \mathcal{S}_{t}, M_{t}) \xrightarrow{te} (TS'_{t}, \mathcal{S}'_{t}, M'_{t}) \land te \in \{\mathsf{R}(\mathsf{na}, \mathsf{x}, \_), \mathsf{W}(\mathsf{na}, \mathsf{x}, \_)\}$$

$$\Longrightarrow \exists TS'_{s}, \mathcal{S}'_{s}, M'_{s}.$$

$$\iota \vdash (TS_{s}, \mathcal{S}_{s}, M_{s}) \xrightarrow{te} (TS'_{s}, \mathcal{S}'_{s}, M'_{s}) \land$$

$$\Phi_{cp}(\varphi, \iota, (TS'_{t}, \mathcal{S}'_{t}, M'_{t}), (TS'_{s}, \mathcal{S}'_{s}, M'_{s}), \bullet)$$

LEMMA 11.5 (MATCH STATE CP PRESERVING - TAU).

$$\forall \varphi, \iota, TS_{t}, \mathcal{S}_{t}, M_{t}, TS_{s}, \mathcal{S}_{s}, M_{s}, \beta, TS'_{t}, \mathcal{S}'_{t}, M'_{t}, te.$$

$$\Phi_{cp}(\varphi, \iota, (TS_{t}, \mathcal{S}_{t}, M_{t}), (TS_{s}, \mathcal{S}_{s}, M_{s}), \beta) \wedge$$

$$\iota \vdash (TS_{t}, \mathcal{S}_{t}, M_{t}) \xrightarrow{\tau} (TS'_{t}, \mathcal{S}'_{t}, M'_{t})$$

$$\Longrightarrow \exists TS'_{s}, \mathcal{S}'_{s}, M'_{s}, te.$$

$$\iota \vdash (TS_{s}, \mathcal{S}_{s}, M_{s}) \xrightarrow{te} (TS'_{s}, \mathcal{S}'_{s}, M'_{s}) \wedge te \in \{\tau, \mathsf{R}(\mathsf{na}, \mathsf{x}, \_)\} \wedge$$

$$\Phi_{cp}(\varphi, \iota, (TS'_{t}, \mathcal{S}'_{t}, M'_{t}), (TS'_{s}, \mathcal{S}'_{s}, M'_{s}), \bullet)$$

LEMMA 11.6 (MATCH STATE CP PRESERVING - RELY).

$$\forall \iota, \varphi, \varphi', TS_t, TS_s, \mathbb{S} = (\mathcal{S}_t, M_t, \mathcal{S}_s, M_s), \mathbb{S}' = (\mathcal{S}_t', M_t', \mathcal{S}_s', M_s').$$

$$\Phi_{cp}(\varphi, \iota, (TS_t, \mathcal{S}_t, M_t), (TS_s, \mathcal{S}_s, M_s), \circ) \wedge$$

$$R(\iota, (\varphi, \mathbb{S}), (\varphi', \mathbb{S}'), TS_t.P, TS_s.P) \wedge I_{cp}(\iota, \varphi', \mathbb{S}')$$

$$\Longrightarrow \Phi_{cp}(\varphi', \iota, (TS_t, \mathcal{S}_t', M_t'), (TS_s, \mathcal{S}_s', M_s'), \circ)$$

LEMMA 11.7 (MATCH STATE IMPLIES ABORT PRESERVING).

$$\forall \varphi, \iota, TS_t, \mathcal{S}_t, M_t, TS_s, \mathcal{S}_s, M_s, \beta.$$

$$\Phi_{cp}(\varphi, \iota, (TS_t, \mathcal{S}_t, M_t), (TS_s, \mathcal{S}_s, M_s), \beta) \land$$

$$\iota \vdash (TS_t, \mathcal{S}_t, M_t) \longrightarrow \mathbf{abort}$$

$$\Longrightarrow \iota \vdash (TS_s, \mathcal{S}_s, M_s) \longrightarrow \mathbf{abort}$$

Fig. 36. Auxiliary definitions in defining  $I_{dce}$ 

# 11.2 Correctness proof of Dead Code Elimination

*Invariant in dead code elimination.* We instantiate the invariant for shared memory in proof of dead code elimination.

Definition 11.8 (Invariant in dead code elimination proof).

$$\begin{split} I_{dce}(\iota, \varphi, (\mathcal{S}_t, M_t, \mathcal{S}_s, M_s)) &\triangleq \varphi(\mathcal{S}_t, \mathcal{S}_s) \wedge (\varphi, \iota \vdash M_t \sim M_s) \wedge \\ (\forall \mathsf{x} \notin \iota, t > 0. \ \langle \mathsf{x} : \upsilon@(\_, t], \_\rangle \in M_t. \\ &\Longrightarrow \exists \langle \mathsf{x} : \upsilon@(f', t'], \_\rangle \in M_s, t_r. \\ &\varphi(\mathsf{x}, t) = t' \wedge t_r < f' \wedge \\ &(\forall m \in M_s(\mathsf{x}). \ m.\mathsf{to} \leq t'_r \vee t' \leq m.\mathsf{from})) \end{split}$$

The most important restriction in  $I_{dce}$  is the item (3), which says that each message that does not in the initial state and has a corresponding message in target memory reserves a timestamp interval before it. Consider the following dead code elimination code transformation.

$$x_{na} := 1;$$
  $skip;$   $x_{na} := 2;$   $\sim$   $x_{na} := 2;$   $r := x;$   $r := x;$ 

If we want to establish the simulation relation for the above program as the following form, we will find that the problem will arise.



Consider that the target thread executes "skip" and the source thread executes " $x_{na} := 1$ " corresponding. The execution of the source thread will generate a message valued 1. Establishing such simulation requires us to find a place to insert such message.

To handler such problem, we need to depict the timestamps reserved for inserting messages generated by the execution of the source thread. Consider that the states of the target memory and the source memory before the target thread executing "skip" and the source thread executing " $x_{na} := 1$ " are shown in Fig. 37. The message valued 5 and the message valued 8 are generated by other threads. Now, we can find the proper place to insert the messages generated by the source thread from the timestamp reserved.

• Consider that the target thread executes "skip" and the source thread executes " $x_{na} := 1$ ". We find that the next message of the lastest message viewed by the target thread is the message valued 8. From  $\varphi_r$ , we



Fig. 37. Timestamps reservation for source writes



Fig. 38. Function of timestamps reservation for source write - I

know that the range of timestamps from  $f_r$  to the lower bound f' of the message valued 8 in the source memory are reserved. Thus, we can insert the message generated by the execution of " $x_{na} := 1$ " as shown in Fig. 38. For example, the new message can have the form as " $\langle x : 1@(f_r, (f_r + f')/2], V_\perp \rangle$ ". Note that the source thread does not need to care about the message valued 5, which is generated by a redundant write of the other thread.

• Then, we consider that the target and source threads both execute " $x_{na} := 2$ ". We find that the next message of the message generated by " $x_{na} := 2$ " in the target memory is the message valued 8. From the invariant, we know that the range of timestamps from  $f_r'$  to the lower bound of the message valued 8 in the source memory are reserved. Thus, we can insert the message of " $x_{na} := 2$ " as shown in Fig. 39. Inserting message generated by the execution of " $x_{na} := 2$ " still needs to reserve some timestamps previous it. For example, the new message can have the form as " $\langle x : 2@((f_r' + t')/2, t'], V_{\perp} \rangle$ ", where  $t' = (f' + f_r')/2$ . The range  $(f_r', (f_r' + t')/2)$  is reserved for inserting messages.



Fig. 39. Function of timestamps reservation for source write - II



Fig. 40. Necessity to require no reservation on non-atomic locations

We also require that there is no reservations on non-atomic locations in  $\varphi$ ,  $\iota \vdash M_t \sim M_s$  defined in Fig. 36. It forbids that other source threads insert some new messages, that have corresponding messages in the target level, to break the item (1) in the step invariant. Consider the condition in Fig. 40. The execution of the environment (Rely condition) may cancel such reservation and insert new message between the message valued 0 and 5. It will break the item (1) in the step invariant as shown in Fig. 41.

*Match state in dead code elimination proof.* We define the match state in proving dead code elimination. Some auxiliary definitions that will be used in defining match state are shown in Fig. 42. We define the match state in Fig. 43.

*Correctness proof of dead code elimination.* To prove that correctness of dead code elimination, we need to prove that the following Lemma. 11.9 holds.



Fig. 41. Necessity to require no reservation on non-atomic locations - II

$$\begin{aligned} &\operatorname{covered}_{-}C(\mathbf{x},t',M_s) &\triangleq \exists \langle \mathbf{x} : v@(f,t],V \rangle \in M_s.\ t' \in (f,t] \\ &\operatorname{TM}(\varphi,\mathbf{x},T_t,(T_s,M_s)) &\triangleq (\forall (\mathbf{x},t) \in \operatorname{dom}(\varphi).\ T_t(\mathbf{x}) < t \implies T_s(\mathbf{x}) < \varphi(\mathbf{x},t)) \wedge \\ &\quad (\exists t'.\ \varphi(\mathbf{x},T_t(\mathbf{x})) = t' \wedge t' \leq T_s(\mathbf{x}) \wedge (\forall t_0 \in (t',T_s(\mathbf{x})].\ \operatorname{covered}_{-}C(\mathbf{x},t_0,M_s))) \\ &\operatorname{InvView}_{dce}(\varphi,\iota,V_t,(V_s,M_s)) &\triangleq \\ &\quad (*\ For\ current\ view\ *) \\ &\quad (\forall \mathbf{x} \in \iota.\ (\varphi(\mathbf{x},\mathit{cur}_t.T_{\mathsf{na}}(\mathbf{x})) = \mathit{cur}_s.T_{\mathsf{na}}(\mathbf{x}) \wedge \varphi(\mathbf{x},\mathit{cur}_t.T_{\mathsf{rlx}}(\mathbf{x})) = \mathit{cur}_s.T_{\mathsf{rlx}}(\mathbf{x}))) \wedge \\ &\quad (\forall \mathbf{x} \notin \iota.\ TM(\varphi,\mathbf{x},\mathit{cur}_t.T_{\mathsf{rlx}},(\mathit{cur}_s.T_{\mathsf{rlx}},M_s))) \wedge \\ &\quad (*\ For\ accquire\ view\ *) \\ &\quad (\forall \mathbf{x} \in \iota.\ (\varphi(\mathbf{x},\mathit{acq}_t.T_{\mathsf{rlx}},(\mathit{acq}_s.T_{\mathsf{rlx}},M_s))) \wedge \\ &\quad (\forall \mathbf{x} \notin \iota.\ TM(\varphi,\mathbf{x},\mathit{acq}_t.T_{\mathsf{rlx}},(\mathit{acq}_s.T_{\mathsf{rlx}},M_s))) \wedge \\ &\quad (\forall \mathbf{x} \notin \iota.\ TM(\varphi,\mathbf{x},\mathit{acq}_t.T_{\mathsf{rlx}},(\mathit{acq}_s.T_{\mathsf{rlx}},M_s))) \wedge \\ &\quad (*\ For\ release\ view\ *) \\ &\quad (\forall \mathbf{x},\ \mathbf{x} \in \iota \implies \varphi(\mathit{rel}_t(\mathbf{x}),\mathit{rel}_s(\mathbf{x})) \\ &\quad where\ V_t = (\mathit{cur}_t,\mathit{acq}_t,\mathit{rel}_t)\ \mathit{and}\ V_s = (\mathit{cur}_s,\mathit{acq}_s,\mathit{rel}_s) \\ &\quad (\varphi,V_t,V_s)) \models \{\mathbf{x}\} \qquad \triangleq \\ &\quad \varphi(\mathbf{x},V_t.\mathit{cur}.T_{\mathsf{na}}(\mathbf{x})) = V_s.\mathit{cur}.T_{\mathsf{na}}(\mathbf{x}) \wedge \varphi(\mathbf{x},V_t.\mathit{acq}.T_{\mathsf{na}}(\mathbf{x})) = V_s.\mathit{acq}.T_{\mathsf{rlx}}(\mathbf{x}) \\ &\quad (R_t,R_s) \models \{r\} \qquad \triangleq R_t(r) = R_s(r) \\ &\quad (\varphi,(R_t,V_t),(R_s,V_s)) \models L_{nl} \quad \triangleq \\ &\quad (\forall \mathbf{x} \notin L_{nl}.\ (\varphi,V_t,V_s) \models \{\mathbf{x}\} \wedge (\forall r \notin L_{nl}.\ (R_t,R_s) \models \{r\}) \\ &\quad \varphi,\iota \vdash P_t \sim_{dce} P_s \quad \triangleq [P_t]_t \approx [P_s]_t \wedge \varphi(P_t) = \|P_s\| \wedge (\forall m \in \widetilde{P}_t.\ m. from < m.to) \\ \end{aligned}$$

Fig. 42. Auxiliary definitions in match state for dead code elimination

$$\begin{aligned} & \operatorname{cur}_{-} \operatorname{acq}(\iota, \varphi, (\operatorname{cur}_t, \operatorname{acq}_t), (\operatorname{cur}_s, \operatorname{acq}_s)) & \triangleq \\ & \forall \mathsf{x} \notin \iota. (\operatorname{cur}_t. T_{\mathsf{r} \mid \mathsf{x}}(\mathsf{x}) < \operatorname{acq}_t. T_{\mathsf{r} \mid \mathsf{x}}(\mathsf{x}) \land \varphi(\mathsf{x}, \operatorname{acq}_t. T_{\mathsf{r} \mid \mathsf{x}}(\mathsf{x})) = \operatorname{acq}_s. T_{\mathsf{r} \mid \mathsf{x}}(\mathsf{x})) \lor \\ & (\operatorname{cur}_t. T_{\mathsf{r} \mid \mathsf{x}}(\mathsf{x}) = \operatorname{acq}_t. T_{\mathsf{r} \mid \mathsf{x}}(\mathsf{x}) \land \varphi(\mathsf{x}, \operatorname{acq}_t. T_{\mathsf{r} \mid \mathsf{x}}(\mathsf{x})) = \operatorname{acq}_s. T_{\mathsf{r} \mid \mathsf{x}}(\mathsf{x})) \lor \\ & (\operatorname{cur}_t. T_{\mathsf{r} \mid \mathsf{x}}(\mathsf{x}) = \operatorname{acq}_t. T_{\mathsf{r} \mid \mathsf{x}}(\mathsf{x}) \land \operatorname{cur}_s. T_{\mathsf{r} \mid \mathsf{x}}(\mathsf{x}) = \operatorname{acq}_s. T_{\mathsf{r} \mid \mathsf{x}}(\mathsf{x})) \lor \\ & (\operatorname{cur}_t. T_{\mathsf{r} \mid \mathsf{x}}(\mathsf{x}) = \operatorname{acq}_t. T_{\mathsf{r} \mid \mathsf{x}}(\mathsf{x}) \land \operatorname{cur}_s. T_{\mathsf{r} \mid \mathsf{x}}(\mathsf{x})) = \operatorname{acq}_s. T_{\mathsf{r} \mid \mathsf{x}}(\mathsf{x})) \lor \\ & (\operatorname{cur}_t. T_{\mathsf{r} \mid \mathsf{x}}(\mathsf{x}) \land \operatorname{cur}_s. T_{\mathsf{r} \mid \mathsf{x}}(\mathsf{x}) \land \operatorname{cur}_s. T_{\mathsf{r} \mid \mathsf{x}}(\mathsf{x})) = \operatorname{acq}_s. T_{\mathsf{r} \mid \mathsf{x}}(\mathsf{x})) \lor \\ & (\operatorname{cur}_t. T_{\mathsf{r} \mid \mathsf{x}}(\mathsf{x}) \land \operatorname{cur}_s. T_{\mathsf{r} \mid \mathsf{x}}(\mathsf{x})) = \operatorname{Ct} \\ & (\operatorname{cur}_t. T_{\mathsf{r} \mid \mathsf{x}}(\mathsf{x}) \land \operatorname{cur}_s. T_{\mathsf{r} \mid \mathsf{x}}(\mathsf{x})) = \operatorname{Ct} \\ & (\operatorname{cur}_t. T_{\mathsf{r} \mid \mathsf{x}}(\mathsf{x}) \land \operatorname{cur}_s. T_{\mathsf{r} \mid \mathsf{x}}(\mathsf{x})) = \operatorname{Ct} \\ & (\operatorname{cur}_t. T_{\mathsf{r} \mid \mathsf{x}}(\mathsf{x}), (\mathsf{x}, \mathsf{x}, \mathsf{x})) \vdash (\mathsf{x}, \mathsf{x}, \mathsf{x}) = \operatorname{Ct} \\ & (\varphi, (\mathsf{x}, \mathsf{x}, \mathsf{x}, \mathsf{x}), \mathsf{x}) \vdash (\mathsf{x}, \mathsf{x}, \mathsf{x}) = \operatorname{Ct} \\ & (\varphi, (\mathsf{x}, \mathsf{x}, \mathsf{x}, \mathsf{x}), \mathsf{x}) \vdash (\mathsf{x}, \mathsf{x}, \mathsf{x}, \mathsf{x})) \vdash (\mathsf{x}, \mathsf{x}, \mathsf{x}, \mathsf{x}) = \operatorname{Ct} \\ & (\mathsf{x}, \mathsf{x}, \mathsf{x}, \mathsf{x}, \mathsf{x}) \vdash (\mathsf{x}, \mathsf{x}, \mathsf{x}, \mathsf{x}, \mathsf{x}) \vdash (\mathsf{x}, \mathsf{x}, \mathsf{x}, \mathsf{x}) + \mathsf{x}, \mathsf{x}, \mathsf{x}) = \operatorname{Ct} \\ & (\mathsf{x}, \mathsf{x}, \mathsf{x}, \mathsf{x}, \mathsf{x}, \mathsf{x}) \vdash (\mathsf{x}, \mathsf{x}, \mathsf{x}, \mathsf{x}, \mathsf{x}, \mathsf{x}) \vdash (\mathsf{x}, \mathsf{x}, \mathsf{x}, \mathsf{x}, \mathsf{x}) = \operatorname{Ct} \\ & (\mathsf{x}, \mathsf{x}, \mathsf{x}, \mathsf{x}, \mathsf{x}, \mathsf{x}, \mathsf{x}) \vdash (\mathsf{x}, \mathsf{x}, \mathsf{x}, \mathsf{x}, \mathsf{x}, \mathsf{x}, \mathsf{x}, \mathsf{x}, \mathsf{x}) = \operatorname{Ct} \\ & (\mathsf{x}, \mathsf{x}, \mathsf{x},$$

Fig. 43. Match state for dead code elimination

LEMMA 11.9 (WELL-DEFINED DEAD CODE ELIMINATION).

$$\forall \pi_s, \pi_t, \iota. \ \mathsf{DCE}(\pi_s, \iota) = \pi_t \implies I_{dee}, \iota \models \pi_t \preccurlyeq \pi_s$$

Proof. Prove by applying Lemma. 11.10.

LEMMA 11.10 (MATCH STATE IMPLIES SIMULATION - DEAD CODE ELIMINATION).

$$\forall \varphi, \iota, TS_t, \mathcal{S}_t, M_t, TS_s, \mathcal{S}_s, M_s, \beta.$$

$$\Phi_{dce}(\varphi, \iota, (TS_t, \mathcal{S}_t, M_t), (TS_s, \mathcal{S}_s, M_s), \beta)$$

$$\Longrightarrow I_{dce}, \iota \models (TS_t, \mathcal{S}_t, M_t) \preccurlyeq_{\varphi}^{\theta, \emptyset} (TS_s, \mathcal{S}_s, M_s)$$

PROOF. Prove by cofix and applying Lemma. 11.11, 11.12, 11.13 and 11.14.

LEMMA 11.11 (MATCH STATE DCE PRESERVING - TAU).

$$\forall \varphi, \iota, TS_{t}, \mathcal{S}_{t}, M_{t}, TS_{s}, \mathcal{S}_{s}, M_{s}, \beta, TS'_{t}, S'_{t}, M'_{t}, rc.$$

$$\Phi_{dce}(\varphi, I_{dce}, \iota, (TS_{t}, \mathcal{S}_{t}, M_{t}), (TS_{s}, \mathcal{S}_{s}, M_{s}), \beta) \wedge$$

$$\iota \vdash (TS_{t}, \mathcal{S}_{t}, M_{t}) \xrightarrow{\tau} (TS'_{t}, S'_{t}, M'_{t})$$

$$\Longrightarrow (\exists TS'_{s}, \mathcal{S}'_{s}, M'_{s}.$$

$$\iota \vdash (TS_{s}, \mathcal{S}_{s}, M_{s}) \xrightarrow{na} (TS'_{s}, \mathcal{S}'_{s}, M'_{s}) \wedge$$

$$\Phi_{dce}(\varphi, \iota, (TS'_{t}, \mathcal{S}'_{t}, M'_{t}), (TS'_{s}, \mathcal{S}'_{s}, M'_{s}), \bullet))) \vee$$

$$\iota \vdash (TS_{s}, \mathcal{S}_{s}, M_{s}) \xrightarrow{na} \mathbf{abort}$$

LEMMA 11.12 (MATCH STATE DCE PRESERVING - NA).

$$\begin{split} \forall \varphi, \iota, TS_t, \mathcal{S}_t, M_t, TS_s, \mathcal{S}_s, M_s, \beta, TS_t', \mathcal{S}_t', M_t', te &\in \{\mathsf{R}(\mathsf{na}, \mathsf{x}, \_), \mathsf{W}(\mathsf{na}, \mathsf{x}, \_)\}. \\ &\Phi_{dce}(\varphi, \iota, (TS_t, \mathcal{S}_t, M_t), (TS_s, \mathcal{S}_s, M_s), \beta) \wedge \\ &\iota \vdash (TS_t, \mathcal{S}_t, M_t) \stackrel{te}{\longrightarrow} (TS_t', \mathcal{S}_t', M_t') \\ &\Longrightarrow \exists TS_s', \mathcal{S}_s', M_s'. \\ &\iota \vdash (TS_s, \mathcal{S}_s, M_s) \stackrel{te}{\longrightarrow} (TS_s', \mathcal{S}_s', M_s') \wedge \\ &\Phi_{dce}(\varphi, \iota, (TS_t', \mathcal{S}_t', M_t'), (TS_s', \mathcal{S}_s', M_s'), \bullet) \end{split}$$

LEMMA 11.13 (MATCH STATE DCE PRESERVING - ATM).

$$\forall \varphi, \iota, TS_{t}, \mathcal{S}_{t}, M_{t}, TS_{s}, \mathcal{S}_{s}, M_{s}, \beta, TS'_{t}, \mathcal{S}'_{t}, M'_{t}, te \in AT.$$

$$\Phi_{dce}(\varphi, \iota, (TS_{t}, \mathcal{S}_{t}, M_{t}), (TS_{s}, \mathcal{S}_{s}, M_{s}), \beta) \land$$

$$\iota \vdash (TS_{t}, \mathcal{S}_{t}, M_{t}) \xrightarrow{te} (TS'_{t}, \mathcal{S}'_{t}, M'_{t})$$

$$\Longrightarrow \exists TS'_{s}, \mathcal{S}'_{s}, M'_{s}.$$

$$\iota \vdash (TS_{s}, \mathcal{S}_{s}, M_{s}) \xrightarrow{te} (TS'_{s}, \mathcal{S}'_{s}, M'_{s}) \land$$

$$\Phi_{dce}(\varphi, \iota, (TS'_{t}, \mathcal{S}'_{t}, M'_{t}), (TS'_{s}, \mathcal{S}'_{s}, M'_{s}), \circ)$$

LEMMA 11.14 (MATCH STATE DCE PRESERVING - RELY).

$$\forall \iota, \varphi, \varphi', TS_t, TS_s, \mathbb{S} = (S_t, M_t, S_s, M_s), \mathbb{S}' = (S_t', M_t', S_s', M_s').$$

$$\Phi_{dce}(\varphi, \iota, (TS_t, S_t, M_t), (TS_s, S_s, M_s), \circ) \wedge$$

$$R(\iota, (\varphi, \mathbb{S}), (\varphi', \mathbb{S}'), TS_t.P, TS_s.P) \wedge I_{dce}(\iota, \varphi', \mathbb{S}')$$

$$\Longrightarrow \Phi_{dce}(\varphi', \iota, (TS_t, S_t', M_t'), (TS_s, S_s', M_s'), \circ)$$

$$V \leq V' \triangleq V.T_{\mathsf{na}} \leq V'.T_{\mathsf{na}} \wedge V.T_{\mathsf{rlx}} \leq V'.T_{\mathsf{rlx}}$$

$$V \leq V' \triangleq cur \leq cur' \wedge acq \leq acq' \wedge (\forall \mathsf{x}. rel(\mathsf{x}) \leq rel'(\mathsf{x}))$$

$$where \ V = (cur, acq, rel) \ and \ V' = (cur', acq', rel')$$

$$M \leq M' \triangleq M \approx M' \wedge (\forall (\mathsf{x}: v@(f, t], V) \in M. \ \exists V'. \ (\mathsf{x}: v@(f, t], V') \in M' \wedge V \leq V')$$

$$M_s \leq M_t \quad S_s \leq S_t \quad \|M_t\| = \mathsf{dom}(\varphi)$$

$$(\forall (\mathsf{x}, t) \in \mathsf{dom}(\varphi). \ \varphi(\mathsf{x}, t) = t)$$

$$I_{lice}(t, \varphi, (S_t, M_t, S_s, M_s))$$

Fig. 44. Invariant in loop invariant code motion

$$\mathsf{wdph}(B_t, \mathsf{f}_{entry}, C_s, \iota) \qquad \mathsf{if} \ B_t = r := e, B_t' \ \mathsf{and} \ r \notin \mathsf{fv}(C_s) \\ \mathsf{wdph}(B_t', \mathsf{f}_{entry}, C_s, \iota) \qquad \mathsf{if} \ B_t = r := \mathsf{x}_{\mathsf{na}}, B_t' \ \mathsf{and} \\ r \notin \mathsf{fv}(C_s) \ \mathsf{and} \ \iota(\mathsf{x}) = \mathsf{na} \\ \mathsf{true} \qquad \mathsf{if} \ B_t = \mathsf{jmp} \ \mathsf{f}_{entry} \\ \mathsf{false} \qquad \mathsf{otherwise} \\ \\ \frac{\mathsf{pre-header}(\mathsf{f}) = \mathsf{f}'}{\mathsf{ptB\_ph\_rel}((c, B_t), (c, B_s), \mathsf{pre-header})} \qquad \frac{\mathsf{pre-header}(\mathsf{f}) = \mathsf{f}'}{\mathsf{ptB\_ph\_rel}(\mathsf{jmp} \ \mathsf{f}', \mathsf{jmp} \ \mathsf{f}, \mathsf{pre-header})} \\ \frac{\mathsf{pre-header}(\mathsf{f}_1) = \mathsf{f}'_1}{\mathsf{ptB\_ph\_rel}((\mathsf{be} \ e, \mathsf{f}_1, \mathsf{f}_2), \mathsf{(be} \ e, \mathsf{f}'_1, \mathsf{f}'_2), \mathsf{pre-header})} \qquad \frac{\mathsf{pre-header}(\mathsf{f}_2) = \mathsf{f}'_2}{\mathsf{ptB\_ph\_rel}((\mathsf{be} \ e, \mathsf{f}_1, \mathsf{f}_2), (\mathsf{be} \ e, \mathsf{f}'_1, \mathsf{f}'_2), \mathsf{pre-header})} \\ \mathsf{pre-header}(\mathsf{f}_1) = \mathsf{f}'_1 \qquad \mathsf{pre-header}(\mathsf{f}_2) = \mathsf{f}'_2} \\ \mathsf{ptB\_ph\_rel}((\mathsf{be} \ e, \mathsf{f}_1, \mathsf{f}_2), (\mathsf{be} \ e, \mathsf{f}'_1, \mathsf{f}'_2), \mathsf{pre-header})} \\ \mathsf{ptB\_ph\_rel}((\mathsf{be} \ e, \mathsf{f}_1, \mathsf{f}_2), (\mathsf{be} \ e, \mathsf{f}'_1, \mathsf{f}'_2), \mathsf{pre-header})} \\ \mathsf{ptB\_ph\_rel}((\mathsf{be} \ e, \mathsf{f}_1, \mathsf{f}_2), (\mathsf{be} \ e, \mathsf{f}'_1, \mathsf{f}'_2), \mathsf{pre-header})$$

Fig. 45. Auxiliary definitions in the match state of loop invariant code motion proof

### 11.3 Correctness proof of Loop Invariant Code Motion

Invariant in loop invariant code motion. We show the invariant  $I_{licm}$  for shared resource in Fig. 44.

*Match state in loop invariant code motion.* We define the match state in loop invariant code motion in Fig. 46. Some auxiliary definitions in defining loop invariant code motion are shown in Fig. 45.

Correctness proof of loop invariant code motion. We present the correctness proof of loop invariant code motion in the following.

LEMMA 11.15 (WELL-DEFINED LOOP INVARIANT CODE MOTION).

$$\forall \pi_s, \pi_t, \iota. \text{ Trans\_licm}(\pi_s, \iota) = \pi_t \implies I_{licm}, \iota \models \pi_t \preccurlyeq \pi_s$$

PROOF. From the premises, we have the following.

$$Trans\_licm(\pi_s, \iota) = \pi_t \tag{1}$$

$$\begin{aligned} & \operatorname{TransC'(C_S,\emptyset,\operatorname{loops,\operatorname{loops}})} = (C_t,\operatorname{pre-header}) \\ & \operatorname{loops}_P(f) = \operatorname{loops} \quad \forall r \in \operatorname{fv}(C_s). \ R_t(r) = R_s(r) \quad \operatorname{fv}(B_s) \subseteq \operatorname{fv}(C_s) \\ & B_t = B_s \vee \operatorname{ptB}_p\operatorname{h}_r\operatorname{erl}(B_t,B_s,\operatorname{pre-header}) \\ & \operatorname{loops}_p, \iota \vdash (R_t,B_t,C_t,\pi_t) \sim_{\operatorname{licm}}(R_s,B_s,C_s,\pi_s) \\ & \operatorname{TransC'(C_S,\emptyset,\operatorname{loops,\operatorname{loops}})} = (C_t,\operatorname{pre-header}) \\ & \operatorname{loops}_p(f) = \operatorname{loops} \quad \forall r \in \operatorname{fv}(C_s). \ R_t(r) = R_s(r) \\ & \operatorname{wdph}(B_t,\operatorname{fentry},C_s,\iota) \quad C_s(\operatorname{fentry}) = B_s \\ & \operatorname{loops}_p, \iota \vdash (R_t,B_t,C_t,\pi_t) \sim_{\operatorname{licm}}(R_s,B_s,C_s,\pi_s) \\ & K_t = (R_t,B_t,C_t) :: K_t' \quad K_s = (R_s,B_s,C_s) :: K_s' \\ & \operatorname{TransC'(C_S,\emptyset,\operatorname{loops,\operatorname{loops}})} = (C_t,\operatorname{pre-header}) \\ & \operatorname{loops}_p(f) = \operatorname{loops} \quad \forall r \in \operatorname{fv}(C_s). \ R_t(r) = R_s(r) \quad \operatorname{fv}(B_s) \subseteq \operatorname{fv}(C_s) \\ & B_t = B_s \vee \operatorname{ptB}_p\operatorname{h}_r\operatorname{erl}(B_t,B_s,\operatorname{pre-header}) \\ & \operatorname{loops}_p(F) = \operatorname{loops} \quad \forall r \in \operatorname{fv}(C_s). \ R_t(r) = R_s(r) \quad \operatorname{fv}(B_s) \subseteq \operatorname{fv}(C_s) \\ & B_t = B_s \vee \operatorname{ptB}_p\operatorname{h}_r\operatorname{erl}(B_t,B_s,\operatorname{pre-header}) \\ & \operatorname{loops}_p(F) = \operatorname{l$$

Fig. 46. Match state in loop invariant code motion proof

We need to prove the following.

$$I_{licm}, \iota \models \pi_t \preccurlyeq \pi_s$$
 (g)

We unfold (g) and need to prove the following.

$$I_{licm}(\iota, \varphi_0, (\mathcal{S}_\perp, M_0, \mathcal{S}_\perp, M_0)) \tag{g1}$$

$$\forall \sigma_t, \mathsf{f.} \ \mathsf{Init}(\pi_t, \mathsf{f}) = \sigma_t \implies \\ \exists \sigma_s. \ \mathsf{Init}(\pi_s, \mathsf{f}) = \sigma_s \land I_{licm}, \iota \models ((\sigma_t, \mathcal{V}_{\perp}, \emptyset), \mathcal{S}_{\perp}, M_0) \preccurlyeq_{\theta}^{\circ, \emptyset} ((\sigma_s, \mathcal{V}_{\perp}, \emptyset), \mathcal{S}_{\perp}, M_0)$$
 (g2)

The goal (g1) can be proved by definitions directly.

We focus on the correctness proof of (g2). We have the following assumptions.

$$I_{licm}(\iota, \varphi_0, (\mathcal{S}_\perp, M_0, \mathcal{S}_\perp, M_0)) \tag{2}$$

$$Init(\pi_t, f) = \sigma_t \tag{3}$$

By applying Lemma. 11.16 on (3), (2) and (1), we have that there exists  $\sigma_s$  such that:

$$Init(\pi_s, f) = \sigma_s \tag{4}$$

$$\Phi_{licm}(\varphi_0, \iota, ((\sigma_t, \mathcal{V}_\perp, \emptyset), \mathcal{S}_\perp, M_0), ((\sigma_s, \mathcal{V}_\perp, \emptyset), \mathcal{S}_\perp, M_0), \circ, \emptyset)$$
(5)

By applying Lemma. 11.17 on (5), we have the following.

$$I_{licm}, \iota \models ((\sigma_t, \mathcal{V}_{\perp}, \emptyset), \mathcal{S}_{\perp}, M_0) \preccurlyeq^{\circ, \emptyset}_{\varphi_0} ((\sigma_s, \mathcal{V}_{\perp}, \emptyset), \mathcal{S}_{\perp}, M_0)$$

$$(6)$$

From (4) and (6), we finish the proof.

LEMMA 11.16 (MATCH STATE HOLDING IN INITIAL STATE - LICM).

$$\forall \pi_{t}, \mathsf{f}, \sigma_{t}, \varphi, \iota, \mathcal{S}_{t}, M_{t}, \mathcal{S}_{s}, M_{s}.$$

$$\mathsf{Init}(\pi_{t}, \mathsf{f}) = \sigma_{t} \land I_{licm}(\iota, \varphi, (\mathcal{S}_{t}, M_{t}, \mathcal{S}_{s}, M_{s})) \land$$

$$\mathsf{Trans\_licm}(\pi_{s}, \mathsf{f}) = \pi_{t}$$

$$\implies \exists \sigma_{s}. \, \mathsf{Init}(\pi_{s}, \mathsf{f}) = \sigma_{s} \land$$

$$\Phi_{licm}(\varphi, \iota, ((\sigma_{t}, \mathcal{V}_{l}, \emptyset), \mathcal{S}_{t}, M_{t}), ((\sigma_{s}, \mathcal{V}_{l}, \emptyset), \mathcal{S}_{s}, M_{s}), \circ)$$

PROOF. From the premises, we have the following.

$$Init(\pi_t, f) = \sigma_t \tag{1}$$

$$I_{licm}(\iota, \varphi, (\mathcal{S}_t, M_t, \mathcal{S}_s, M_s)) \tag{2}$$

$$Trans\_licm(\pi_s, \iota) = \pi_t \tag{3}$$

We unfold (1) and have that there exist  $C_t$ ,  $B_t$  and  $f_t$  such that:

$$\sigma_t = (R_\perp, B_t, C_t, \epsilon, \pi_t) \tag{1.1}$$

$$\pi_t(\mathsf{f}) = (C_t, \mathsf{f}_t) \tag{1.2}$$

$$C_t(\mathsf{f}_t) = B_t \tag{1.3}$$

We unfold (3) and have that there exists loops\_P such that:

$$loops_P = det_loop_inv(\pi_s, \iota)$$
(3.1)

$$\forall f, C_t, f_t. \pi_t(f) = (C_t, f_t) \implies$$

$$\exists C_s, f_s, loops. loops_P(f) = loops \land \pi_s(f) = (C_s, f_s) \land TransC(C_s, f_s, loops) = (C_t, f_t)$$

$$(3.2)$$

We apply (3.2) on (1.2) and have that there exist  $C_s$ ,  $f_s$  and loops such that:

$$loops_P(f) = loops (4)$$

$$\pi_s(\mathsf{f}) = (C_s, \mathsf{f}_s) \tag{5}$$

$$TransC(C_s, f_s, loops) = (C_t, f_t)$$
(6)

We unfold (6) and have that there exists pre-header such that:

$$TransC'(C_s, \emptyset, loops, loops) = (C_t, pre-header)$$
 (7)

We discuss whether  $f_s$  is in the domain of pre-header.

• We first consider that  $f_s$  is in the domain of pre-header.

$$pre-header(f_s) = f_t$$
 (8)

From Lemma. 10.1, we have the following.

$$\forall (l_{entry}, l_{exit}, loop\_inv) \in loops, (\_, r) \in loop\_inv.$$

$$r \notin fv(C_s) \land l_{entry} \in dom(C_s) \land l_{exit} \in dom(C_s) \land$$

$$(\forall (x, \_) \in loop\_inv. x \notin l)$$
(9)

By applying Lemma. 11.18 on (8), (1.3) (9) and (7), we have that there exists  $B_s$  such that:

$$C_{s}(f_{s}) = B_{s} \tag{8.1}$$

$$wdph(B_t, f_s, C_s, \iota) \tag{8.2}$$

From (5) and (8.1), we have that there exists  $\sigma_s$  such that:

$$\sigma_{s} = (R_{\perp}, B_{s}, C_{s}, \epsilon, \pi_{s}) \tag{10}$$

$$Init(\pi_s, f) = \sigma_s \tag{11}$$

We focus on the proof of the match state holding.

$$\Phi_{licm}(\varphi, \iota, ((\sigma_t, \mathcal{V}_\perp, \emptyset), \mathcal{S}_t, M_t), ((\sigma_s, \mathcal{V}_\perp, \emptyset), \mathcal{S}_s, M_s), \circ)$$
(g1)

We unfold (g1) and we need to prove that the following hold.

$$I_{licm}(\iota, \varphi, (S_t, M_t, S_s, M_s)) \tag{g1.1}$$

$$((\sigma_t, \mathcal{V}_\perp, \emptyset), \mathcal{S}_t, M_t) \sim_{licm} ((\sigma_s, \mathcal{V}_\perp, \emptyset), \mathcal{S}_s, M_s)$$
 (g1.2)

From (2), we prove (g1.1). We unfold (g1.2) and we need to prove the following.

$$Trans\_licm(\pi_s, \iota) = \pi_t \tag{g1.2.1}$$

$$det_loop_inv(\pi_s, \iota) = loops_P$$
 (g1.2.2)

$$loops_P \vdash (R_\perp, B_t, C_t, \pi_t) \sim_{licm} (R_\perp, B_s, C_s, \pi_s)$$
 (g1.2.3)

From (3), we prove (g1.2.1). From (3.1), we prove (g1.2.2). From (7), (4), (8.2) and (8.1), we prove (g1.2.3).

• Then, we consider that  $f_s$  is not in the domain of pre-header.

$$f_s = f_t \tag{12}$$

By applying Lemma. 11.19 on (1.3) and (7), we have that there exists  $B_s$  such that:

$$C_{\mathcal{S}}(\mathsf{f}_{\mathcal{S}}) = B_{\mathcal{S}} \tag{12.1}$$

$$(B_t = B_s \lor ptB\_ph\_rel(B_t, B_s, pre-header))$$
(12.2)

From (5) and (12.1), we have that there exists  $\sigma_s$  such that:

$$\sigma_s = (R_\perp, B_s, C_s, \epsilon, \pi_s) \tag{13}$$

$$Init(\pi_s, f) = \sigma_s \tag{14}$$

We focus on the proof of the match state holding.

$$\Phi_{licm}(\varphi, \iota, ((\sigma_t, \mathcal{V}_{\perp}, \emptyset), \mathcal{S}_t, M_t), ((\sigma_s, \mathcal{V}_{\perp}, \emptyset), \mathcal{S}_s, M_s), \circ)$$
(g2)

We unfold (g2) and we need to prove that the following hold mainly.

$$I_{licm}(\iota, \varphi, (\mathcal{S}_t, M_t, \mathcal{S}_s, M_s)) \tag{g2.1}$$

$$\iota \vdash ((\sigma_t, \mathcal{V}_\perp, \emptyset), \mathcal{S}_t, M_t) \sim_{licm} ((\sigma_s, \mathcal{V}_\perp, \emptyset), \mathcal{S}_s, M_s)$$
 (g2.2)

From (2), we prove (g2.1). We unfold (g2.2) and we need to prove the following.

$$Trans\_licm(\pi_s, \iota) = \pi_t \tag{g2.2.1}$$

$$det_loop_inv(\pi_s, \iota) = loops_P$$
 (g2.2.2)

$$loops_P \vdash (R_\perp, B_t, C_t, \pi_t) \sim_{licm} (R_\perp, B_s, C_s, \pi_s)$$
 (g2.2.3)

From (3), we prove (g1.2.1). From (3.1), we prove (g1.2.2). From (7), (4), (12.1) and (12.2), we prove (g2.2.3).

LEMMA 11.17 (MATCH STATE IMPLIES SIMULATION - LICM).

$$\forall \varphi, \iota, TS_t, \mathcal{S}_t, M_t, TS_s, \mathcal{S}_s, M_s, \beta.$$

$$\Phi_{licm}(\varphi, \iota, (TS_t, \mathcal{S}_t, M_t), (TS_s, \mathcal{S}_s, M_s), \beta)$$

$$\implies I_{licm}, \iota \models (TS_t, \mathcal{S}_t, M_t) \preccurlyeq^{\beta, \emptyset}_{\varphi} (TS_s, \mathcal{S}_s, M_s)$$

PROOF. By co-fix. From the premises, we have the following

$$\Phi_{licm}(\varphi, \iota, (TS_t, \mathcal{S}_t, M_t), (TS_s, \mathcal{S}_s, M_s), \beta) \tag{1}$$

We need to prove the following.

$$I_{licm}, \iota \models (TS_t, \mathcal{S}_t, M_t) \preccurlyeq^{\beta, \emptyset}_{\emptyset} (TS_s, \mathcal{S}_s, M_s)$$
 (g)

We unfold (g) and need to prove the following.

• The invariant between the target thread and source thread configurations holds.

$$SI(\iota, \varphi, (TS_t, M_t), (TS_s, M_s), \emptyset)$$
 (g1)

We prove (g1) by applying Lemma. 11.20.

• for any  $TS'_t$ ,  $S'_t$ ,  $M'_t$  and te, if

$$\iota \vdash (TS_t, S_t, M_t) \xrightarrow{te} (TS'_t, S'_t, M'_t)$$
(2)

We need to prove the following.

− if  $te \in AT$ , we need to prove that there exist  $TS'_s$ ,  $S'_s$ ,  $M'_s$  and  $\varphi'$  such that:

$$\iota \vdash (TS_s, \mathcal{S}_s, M_s) \xrightarrow{na}^* \xrightarrow{te} (TS_s', \mathcal{S}_s', M_s')$$
 (g2.1)

$$\varphi \subseteq \varphi' \wedge I_{licm}(\iota, \varphi', (S'_t, M'_t, S'_s, M'_s))$$
(g2.2)

$$I_{licm}, \iota \models (TS'_t, S'_t, M'_t) \leq^{\circ, \emptyset}_{\sigma'} (TS'_s, S''_s, M''_s)$$
(g2.3)

We finish the proof by applying Lemma. 11.21 on (1) and (2) and from co-inductive hypothesis.

- if  $te \in NA$ , there exist  $TS'_s$ ,  $S'_s$ ,  $M'_s$ , and  $\mathcal{D}_1$ , such that:

$$(TS_t.P, M_t), (TS_t'.P, M_t') \vdash \emptyset \stackrel{te}{\sim} \mathcal{D}_1$$
 (g3.1)

$$\iota \vdash (TS_s, \mathcal{S}_s, M_s, \mathcal{D}_1) \xrightarrow{na}^* (TS'_s, \mathcal{S}'_s, M'_s, \emptyset)$$
 (g3.2)

$$I_{licm}, \iota \models (TS'_t, \mathcal{S}'_t, M'_t) \preceq_{\sigma}^{\bullet, \emptyset} (TS'_s, \mathcal{S}'_s, M'_s)$$
(g3.3)

We finish the proof from Lemma. 11.22 and co-inductive hypothesis.

- The case that te ∈ PRC is simpler. Thus, we omit the proof details.

- If  $\beta = 0$ , let  $\mathbb{S} = (S_t, M_t, S_s, M_s)$  and we need to prove that for any  $\varphi'$  and  $\mathbb{S}' = (S'_t, M'_t, S'_s, M'_s)$ , if

$$R(\iota, (\varphi, \mathbb{S}), (\varphi', \mathbb{S}'), TS_t.P, TS_s.P) \wedge I_{licm}(\iota, \varphi', \mathbb{S}')$$
 (3)

the following holds.

$$I_{licm}, \iota \models (TS_t, \mathcal{S}'_t, M'_t) \preccurlyeq^{\circ, \emptyset}_{\sigma'} (TS_s, \mathcal{S}'_s, M'_s)$$

$$(g4.2)$$

By applying Lemma. 11.23 on (1) and (3), we have the following.

$$\Phi_{licm}(\varphi', \iota, (TS_t, \mathcal{S}'_t, M'_t), (TS_s, \mathcal{S}'_s, M'_s), \circ) \tag{4}$$

We finish the proof of such case from (4) and co-inductive hypothesis.

- The done case is similar with the case that  $te \in AT$  and we omit the proof details here.
- Finally, we consider the abort case. From the premises, we have the following.

$$\iota \vdash (TS_t, \mathcal{S}_t, M_t) \longrightarrow \mathbf{abort}$$
 (5)

And we need to prove that there exist  $TS'_s$ ,  $S'_s$  and  $M'_s$  such that:

$$\iota \vdash (TS_s, S_s, M) \xrightarrow{na} (TS'_s, S'_s, M'_s) \land \iota \vdash (TS'_s, S'_s, M'_s) \longrightarrow \mathbf{abort}$$
 (g6)

Lemma 11.18 (well-defined preheader - I).

$$\forall \mathsf{pre-header}, \mathsf{f}_s, \mathsf{f}_t, \mathsf{loops}, C_s, C_t, B_t, \iota.$$

$$\mathsf{pre-header}(\mathsf{f}_s) = \mathsf{f}_t \land C_t(\mathsf{f}_t) = B_t \land$$

$$\mathsf{TransC'}(C_s, \emptyset, \mathsf{loops}, \mathsf{loops}) = (C_t, \mathsf{pre-header}) \land$$

$$(\forall (\mathsf{f}_{enty}, \_, \mathsf{loop}_{inv}) \in \mathsf{loops}. \ \mathsf{f}_{entry} \in \mathsf{dom}(C_s) \land$$

$$(\forall (\mathsf{x}, \_) \in \mathsf{loop}_{inv}. \ \mathsf{x} \notin \iota))$$

$$\Longrightarrow \exists B_s. \ C_s(\mathsf{f}_s) = B_s \land \mathsf{wdph}(B_t, \mathsf{f}_s, C_s)$$

Lemma 11.19 (Well-Defined Preheader - II).

$$\forall \text{pre-header}, f_s, f_t, \text{loops}.$$

$$C_t(f) = B_t \land$$

$$\mathsf{Trans}C'(C_s, \emptyset, \text{loops}, \text{loops}) = (C_t, \text{pre-header})$$

$$\Longrightarrow \exists B_s. C_s(f) = B_s \land$$

$$(B_s = B_t \lor \text{ptB ph rel}(B_t, B_s, \text{pre-header}))$$

LEMMA 11.20 (MATCH STATE IMPLIES INVT - LICM).

$$\forall \varphi, \iota, TS_t, \mathcal{S}_t, M_t, TS_s, \mathcal{S}_s, M_s, \beta.$$

$$\Phi_{licm}(\varphi, \iota, (TS_t, \mathcal{S}_t, M_t), (TS_s, \mathcal{S}_s, M_s), \beta)$$

$$\implies \text{invT}(\varphi, \iota, (TS_t, \mathcal{S}_t, M_t), (TS_s, \mathcal{S}_s, M_s), \beta, \emptyset)$$

LEMMA 11.21 (MATCH STATE LICM PRESERVING - ATOMIC&OUTPUT).

$$\forall \varphi, \iota, TS_{t}, \mathcal{S}_{t}, M_{t}, TS'_{t}, \mathcal{S}'_{t}, M'_{t}, TS_{s}, \mathcal{S}_{s}, M_{s}, \beta, te \in (Atm \cup \{out(v)\}).$$

$$\Phi_{licm}(\varphi, \iota, (TS_{t}, \mathcal{S}_{t}, M_{t}), (TS_{s}, \mathcal{S}_{s}, M_{s}), \beta) \wedge$$

$$\iota \vdash (TS_{t}, \mathcal{S}_{t}, M_{t}) \xrightarrow{te} (TS'_{t}, \mathcal{S}'_{t}, M'_{t})$$

$$\Longrightarrow \exists TS'_{s}, \mathcal{S}'_{s}, M'_{s}, \varphi'.$$

$$\Phi_{licm}(\varphi', \iota, (TS'_{t}, \mathcal{S}'_{t}, M'_{t}), (TS'_{s}, \mathcal{S}'_{s}, M'_{s}), \circ) \wedge$$

$$\varphi \subseteq \varphi' \wedge \iota \vdash (TS_{s}, \mathcal{S}_{s}, M_{s}) \xrightarrow{te} (TS'_{s}, \mathcal{S}'_{s}, M'_{s})$$

LEMMA 11.22 (MATCH STATE LICM PRESERVING - NON-ATOMIC).

$$\forall \varphi, \iota, TS_{t}, \mathcal{S}_{t}, M_{t}, TS'_{t}, \mathcal{S}'_{t}, M'_{t}, TS_{s}, \mathcal{S}_{s}, M_{s}, \beta, ws.$$

$$\Phi_{licm}(\varphi, \iota, (TS_{t}, \mathcal{S}_{t}, M_{t}), (TS_{s}, \mathcal{S}_{s}, M_{s}), \beta) \wedge$$

$$\iota \vdash (TS_{t}, \mathcal{S}_{t}, M_{t}) \xrightarrow{na} (TS'_{t}, \mathcal{S}'_{t}, M'_{t})$$

$$\Longrightarrow \exists TS'_{s}, S'_{s}, M'_{s}, ws'.$$

$$\Phi_{licm}(\varphi, \iota, (TS'_{t}, \mathcal{S}'_{t}, M'_{t}), (TS'_{s}, \mathcal{S}'_{s}, M'_{s}), \bullet) \wedge$$

$$\iota \vdash (TS_{s}, \mathcal{S}_{s}, M_{s}) \xrightarrow{ma} {}^{*} (TS'_{s}, \mathcal{S}'_{s}, M'_{s}) \wedge ws \subseteq ws'$$

LEMMA 11.23 (MATCH STATE LICM PRESERVING - RELY).

$$\forall \varphi, \iota, TS_t, \mathcal{S}_t, M_t, TS_s, \mathcal{S}_s, M_s, \varphi', \mathcal{S}_t', M_t', \mathcal{S}_s', M_s'.$$

$$\Phi_{licm}(\varphi, \iota, (TS_t, \mathcal{S}_t, M_t), (TS_s, \mathcal{S}_s, M_s), \circ) \land$$

$$R(\iota, (\varphi, (\mathcal{S}_t, M_t, \mathcal{S}_s, M_s), (\varphi', (\mathcal{S}_t', M_t', \mathcal{S}_s', M_s'), TS_t.P, TS_s.P) \land$$

$$I_{licm}(\iota, \varphi', (\mathcal{S}_t', M_t', \mathcal{S}_s', M_s'))$$

$$\Longrightarrow \Phi_{licm}(\varphi', \iota, (TS_t, \mathcal{S}_t', M_t'), (TS_s, \mathcal{S}_s', M_s'), \circ)$$

LEMMA 11.24 (MATCH STATE LICM PRESERVING ABORT STEP).

$$\forall \varphi, \iota, TS_t, \mathcal{S}_t, M_t, TS_s, \mathcal{S}_s, M_s, \beta.$$

$$\Phi_{licm}(\varphi, \iota, (TS_t, \mathcal{S}_t, M_t), (TS_s, \mathcal{S}_s, M_s), \beta) \land$$

$$\iota \vdash (TS_t, \mathcal{S}_t, M_t) \longrightarrow \mathbf{abort}$$

$$\Longrightarrow \iota \vdash (TS_s, \mathcal{S}_s, M_s) \longrightarrow \mathbf{abort}$$

$$\frac{M_t = M_s \quad S_t = S_s \quad ||M_t|| = \operatorname{dom}(\varphi)}{(\forall (\mathsf{x}, t) \in \operatorname{dom}(\varphi). \ \varphi(\mathsf{x}, t) = t)}$$
$$\frac{I_{cse}(\iota, \varphi, (S_t, M_t), (S_s, M_s))}{(S_t \in \mathcal{S}_t, M_t) + (S_t \in \mathcal{S}_t, M_t)}$$

Fig. 47. Invariant in common subexpression elimination proof

$$(R, \mathcal{V}, M) \models (r, e) \quad ::= \quad R(r) = \llbracket e \rrbracket_R$$

$$(R, \mathcal{V}, M) \models (r, x, t) \quad ::= \quad \langle x : R(r)@(\_, t], \_\rangle \in M \land (\mathcal{V}.\mathsf{cur}.T_\mathsf{na}(x) \le t \le \mathcal{V}.\mathsf{cur}.T_\mathsf{rlx}(x))$$

$$(R, \mathcal{V}, M) \models_t L_a \quad ::= \quad (\forall (r, e) \in L_a. \ (R, \mathcal{V}, M) \models_t (r, e)) \land \\ \quad (\forall (r, x) \in L_a. \ (R, \mathcal{V}, M) \models_t (r, x) \land x \notin t)$$

$$\mathsf{Ave\_Analyzer}(C_s, l_0) = \mathbb{L}_a \quad \mathsf{TransC}_\mathsf{cse}(C_s, \mathbb{L}_a) = C_t$$

$$B_s = C_s(l)[i \dots] \quad \mathsf{TransB}_\mathsf{cse}(B_s, LB_a) = B_t$$

$$R_t = R_s \quad (R_s, \mathcal{V}_s, M_s) \models_t \mathsf{IN}[LB_a]$$

$$\forall l_p \in \mathsf{succ}(B_s). \; \mathsf{OUT}[LB_a] \ge \mathsf{IN}[\mathbb{L}_a(l_p)]$$

$$\mathcal{V}_s, M_s, \iota \vdash (R_t, B_t, C_t) \sim_\mathsf{cse}(R_s, B_s, C_s)$$

$$\mathsf{VV}_s, M_s, \iota \vdash (R_t, B_t, C_t) :: K'_t \sim_\mathsf{cse}(R_s, B_s, C_s)$$

$$\mathsf{VV}_s, M_s, \iota \vdash (R_t, B_t, C_t) :: K'_t \sim_\mathsf{cse}((R_s, B_s, C_s) :: K'_s)$$

$$\mathsf{Ave\_Analyzer}(\pi_s) = A_a \quad \mathsf{Translater}_\mathsf{cse}(\pi_s, A_a) = \pi_t$$

$$\mathcal{V}_s, M_s, \iota \vdash (R_t, B_t, C_t) \sim_\mathsf{cse}(R_s, B_s, C_s) \quad \iota \vdash K_t \sim_\mathsf{cse}(K_s)$$

$$\mathsf{Vs}, M_s, \iota \vdash (R_t, B_t, C_t, K_t, \pi_t) \sim_\mathsf{cse}(R_s, B_s, C_s, K_s, \pi_s)$$

$$\mathcal{V}_s, M_s, \iota \vdash (R_t, B_t, C_t, K_t, \pi_t) \sim_\mathsf{cse}(R_s, B_s, C_s, K_s, \pi_s)$$

$$\mathcal{V}_s, M_s, \iota \vdash (R_t, B_t, C_t, K_t, \pi_t) \sim_\mathsf{cse}(R_s, B_s, C_s, K_s, \pi_s)$$

$$\mathcal{V}_s, M_s, \iota \vdash (\sigma_t, \mathcal{V}_t, P_t) \sim_\mathsf{cse}((\sigma_s, \mathcal{V}_s, P_s), M_s)$$

$$\mathcal{V}_s, M_s, \iota \vdash (\sigma_t, \mathcal{V}_t, P_t) \sim_\mathsf{cse}((\sigma_s, \mathcal{V}_s, P_s), M_s)$$

$$\mathcal{V}_s, M_s, \mathcal{V}_s, \mathcal{V}_$$

Fig. 48. Match state in common subexpression elimination proof

## 11.4 Correctness proof of Common Subexpression Elimination

Invariant in common subexpression elimination proof. We show the invariant  $I_{cse}$  for shared resource in Fig. 47.

*Match state in common subexpression elimination proof.* We define the match state in common subexpression elimination proof in Fig. 48.

Correctness proof of common subexpression elimination optimizer. The correctness proof of common subexpression elimination optimizer is similar with the correctness proof of constant propagation.

# A CAPPED MEMORY

We give the formal definition of contructing capped memory below.

• The last message of a memory *M* to a location x.

$$\overline{m}(M, \mathbf{x}) ::= \underset{m \in M(\mathbf{x})}{\arg \max} m.$$
to

• The cap timemap of a memory *M*.

$$\widehat{T}(M) ::= \lambda x. \overline{m}(\widetilde{M}, x).to$$

• Cap message of a memory *M* to a location x.

$$\widehat{m}(M, x) ::= \langle x : (\overline{m}(M, x).to, \overline{m}(M, x).to + 1] \rangle$$

• Capped memory.

Definition A.1 (Capped Message).  $M_c \in \widehat{M}$  holds iff  $M \subseteq M_c$  and the following hold:

- (1) for any  $m_1, m_2 \in M$ , if  $m_1.\text{var} = m_2.\text{var}$ ,  $m_1.\text{to} < m_2.\text{to}$ ,  $\neg (\exists m \in M. m.\text{var} = m_1.\text{var} \land m_1.\text{to} < m.\text{to} < m_2.\text{to})$  and  $m_1.\text{to} < m_2.\text{from}$ , then  $\langle m_1.\text{var} : (m_1.\text{to}, m.\text{from}] \rangle \in M_c$ ;
- (2)  $\forall x \in Var. \ \widehat{m}(M, x) \in M_c;$
- (3) for any  $m \in M_c$ , if  $m \notin M$ , then there exists  $m' \in M$ , such that m'.var = m.var and m'.to < m.from.

$$\begin{array}{c} \text{for any } i \in \{1, \dots, n\}. \ \, \operatorname{Init}(\pi, \mathsf{f}_i) = \sigma_i \quad TS_i = (\sigma_i, \mathcal{V}_i, \emptyset) \\ \hline \mathcal{TP} = \{1 \sim TS_1, \dots, n \sim TS_n\} \quad \mathsf{t} \in \{1, \dots, n\} \quad M = \{\langle x : 0 @ (0, 0], V_{\perp} \rangle \mid x \in Var \} \\ \hline 1et (\pi, \iota) \text{ in } \mathsf{f}_1 \parallel \cdots \parallel \mathsf{f}_n \stackrel{load}{=} \triangleright \langle \mathcal{TP}, \mathsf{t}, \lambda x.0, M \rangle^{\iota} \\ \hline \\ i \vdash (\mathcal{TP}(\mathsf{t}), S, M) \stackrel{na}{\longrightarrow} + (TS', S', M') \\ \hline (\mathcal{TP}, \mathsf{t}, S, M)^{\iota} \stackrel{na}{\Longrightarrow} \triangleright \langle \mathcal{TP} \{\mathsf{t} \sim TS'\}, \mathsf{t}, S', M' \rangle^{\iota} \\ \hline \\ i \vdash (\mathcal{TP}(\mathsf{t}), S, M) \stackrel{prc}{\longrightarrow} + (TS', S', M') \\ \hline (\mathcal{TP}, \mathsf{t}, S, M)^{\iota} \stackrel{na}{\Longrightarrow} \triangleright \langle \mathcal{TP} \{\mathsf{t} \sim TS'\}, \mathsf{t}, S', M' \rangle^{\iota} \\ \hline \\ (\mathcal{TP}, \mathsf{t}, S, M)^{\iota} \stackrel{na}{\Longrightarrow} \triangleright \langle \mathcal{TP} \{\mathsf{t} \sim TS'\}, \mathsf{t}, S', M' \rangle^{\iota} \\ \hline \\ (\mathcal{TP}, \mathsf{t}, S, M)^{\iota} \stackrel{atmBlk}{\Longrightarrow} + (TS', S', M') \\ \hline (\mathcal{TP}, \mathsf{t}, S, M)^{\iota} \stackrel{atmBlk}{\Longrightarrow} \triangleright \langle \mathcal{TP} \{\mathsf{t} \sim TS'\}, \mathsf{t}, S', M' \rangle^{\iota} \\ \hline \\ (\mathcal{TP}, \mathsf{t}, S, M)^{\iota} \stackrel{at}{\Longrightarrow} \triangleright \langle \mathcal{TP} \{\mathsf{t} \sim TS'\}, \mathsf{t}, S', M' \rangle^{\iota} \\ \hline \\ (\mathcal{TP}, \mathsf{t}, S, M)^{\iota} \stackrel{at}{\Longrightarrow} \triangleright \langle \mathcal{TP} \{\mathsf{t} \sim TS'\}, \mathsf{t}, S', M' \rangle^{\iota} \\ \hline \\ (\mathcal{TP}, \mathsf{t}, S, M)^{\iota} \stackrel{at}{\Longrightarrow} \triangleright \langle \mathcal{TP} \{\mathsf{t} \sim TS'\}, \mathsf{t}, S', M' \rangle^{\iota} \\ \hline \\ (\mathcal{TP}, \mathsf{t}, S, M)^{\iota} \stackrel{tterm}{\Longrightarrow} \triangleright \langle \mathcal{TP} \{\mathsf{t} \rangle, S', M' \rangle^{\iota} \\ \hline \\ (\mathcal{TP}, \mathsf{t}, S, M)^{\iota} \stackrel{tterm}{\Longrightarrow} \triangleright \langle \mathcal{TP} \{\mathsf{t}, \mathsf{t}', S, M \rangle^{\iota} \\ \hline \\ (\mathcal{TP}, \mathsf{t}, S, M)^{\iota} \stackrel{tterm}{\Longrightarrow} \triangleright \langle \mathcal{TP} \{\mathsf{t}\}, \mathsf{t}', S, M \rangle^{\iota} \\ \hline \\ (\mathcal{TP}, \mathsf{t}, S, M)^{\iota} \stackrel{tterm}{\Longrightarrow} \triangleright \langle \mathcal{TP} \{\mathsf{t}\}, \mathsf{t}', S, M \rangle^{\iota} \\ \hline \\ (\mathcal{TP}, \mathsf{t}, S, M)^{\iota} \stackrel{tterm}{\Longrightarrow} \triangleright \langle \mathcal{TP} \{\mathsf{t}\}, \mathsf{t}', S, M \rangle^{\iota} \\ \hline \\ (\mathcal{TP}, \mathsf{t}, S, M)^{\iota} \stackrel{tterm}{\Longrightarrow} \triangleright \langle \mathcal{TP}, \mathsf{t}', S, M \rangle^{\iota} \\ \hline \\ (\mathcal{TP}, \mathsf{t}, S, M)^{\iota} \stackrel{tterm}{\Longrightarrow} \triangleright \langle \mathcal{TP}, \mathsf{t}', S, M \rangle^{\iota} \\ \hline \\ (\mathcal{TP}, \mathsf{t}, S, M)^{\iota} \stackrel{tterm}{\Longrightarrow} \triangleright \langle \mathcal{TP}, \mathsf{t}', S, M \rangle^{\iota} \\ \hline \\ (\mathcal{TP}, \mathsf{t}, S, M)^{\iota} \stackrel{tterm}{\Longrightarrow} \triangleright \langle \mathcal{TP}, \mathsf{t}', S, M \rangle^{\iota} \\ \hline \\ (\mathcal{TP}, \mathsf{t}, S, M)^{\iota} \stackrel{tterm}{\Longrightarrow} \triangleright \langle \mathcal{TP}, \mathsf{t}', S, M \rangle^{\iota} \\ \hline \\ (\mathcal{TP}, \mathsf{t}, S, M)^{\iota} \stackrel{tterm}{\Longrightarrow} \triangleright \langle \mathcal{TP}, \mathsf{t}', S, M \rangle^{\iota} \\ \hline \\ (\mathcal{TP}, \mathsf{t}, S, M)^{\iota} \stackrel{tterm}{\Longrightarrow} \triangleright \langle \mathcal{TP}, \mathsf{t}', S, M \rangle^{\iota} \\ \hline \\ (\mathcal{TP}, \mathsf{t}, S, M)^{\iota} \stackrel{tterm}{\Longrightarrow} \triangleright \langle \mathcal{TP}, \mathsf{t}', S, M \rangle^{\iota} \\ \hline \\ (\mathcal{TP}, \mathsf{t}, S, M)^{\iota} \stackrel{tterm}{\Longrightarrow} \triangleright \langle \mathcal{TP}, \mathsf{t}', S, M \rangle^{\iota} \\ ($$

Fig. 49. Machine step in auxiliary promising semantics

### B PROOF OF SEMANTICS EQUIVALENCE

We show the correctness proof of Lemma. 6.1 in this section. In this proof, we first define an auxiliary promising semantics.

Auxiliary promising semantics. In order to facilitate the proof of the semanitcs equivalence between promising semantics and the non-preemptive semantics, which will be introduced in Sec. 6, we provide the auxiliary promising semantics defined in Fig. 49 (an auxiliary definition is defined below).

$$\iota \vdash (TS, \mathcal{S}, M) \xrightarrow{\operatorname{atmBlk}} (TS', \mathcal{S}', M') ::= \\ \exists TS_0, \mathcal{S}_0, M_0, TS_1, \mathcal{S}_1, M_1.$$

$$\iota \vdash (TS, \mathcal{S}, M) \xrightarrow{na} (TS_0, \mathcal{S}_0, M_0) \land \iota \vdash (TS_0, \mathcal{S}_0, M_0) \xrightarrow{at} (TS_1, \mathcal{S}_1, M_1) \land \iota \vdash (TS_1, \mathcal{S}_1, M_1) \xrightarrow{prc} (TS_2, \mathcal{S}_2, M_2)$$

We can prove that the following conclusion holds.

Lemma B.1 (PS to Aux-PS). For any W and W', if  $W \Longrightarrow W'$ , then  $W \Longrightarrow W'$ .

PROOF. Prove by applying Lemma. B.2.

$$AProgEtr(\mathbb{P},\mathcal{B}) \quad \text{iff} \quad \exists W, n. \ (\mathbb{P} \xrightarrow{load} \mathbb{V}) \land AEtr^n(W,\mathcal{B})$$

$$\frac{W \Longrightarrow \mathsf{abort}}{AEtr^0(W,\epsilon)} \qquad \frac{W \Longrightarrow \mathsf{abort}}{AEtr^{n+1}(W,\mathsf{abort})} \qquad \frac{W \Longrightarrow \mathsf{done}}{AEtr^{n+1}(W,\mathsf{done})}$$

$$W \xrightarrow{\mathsf{out}(v)} \mathbb{V} W' \qquad AEtr^n(W',\mathcal{B}) \qquad W \Longrightarrow W' \qquad AEtr^n(W',\mathcal{B})$$

$$AEtr^{n+1}(W,\mathsf{out}(v) :: \mathcal{B}) \qquad AEtr^{n+1}(W,\mathcal{B})$$

Fig. 50. Event trace under the auxiliary promisng semantics

LEMMA B.2 (PS STEPS SPLIT).

$$\forall TS, S, M, TS', S', M', \iota, n.$$

$$\iota \vdash (TS, S, M) \longrightarrow^{n} (TS', S', M')$$

$$\Longrightarrow \exists TS_{0}, S_{0}, M_{0}, TS_{1}, S_{1}, M_{1}.$$

$$\iota \vdash (TS, S, M) \xrightarrow{prc}^{*} (TS_{0}, S_{0}, M_{0}) \wedge$$

$$\iota \vdash (TS_{0}, S_{0}, M_{0}) \xrightarrow{\text{atmBlk}}^{*} (TS_{1}, S_{1}, M_{1}) \wedge$$

$$\iota \vdash (TS_{1}, S_{1}, M_{1}) \xrightarrow{na}^{*} (TS', S', M')$$

From Lemma. B.1, we can prove the equivalence between the promising semantics and the auxiliary promising semantics as the following shown.

LEMMA B.3 (SEMANTICS EQUIVALENCE - PS2APS).

$$\forall \pi, \iota, f_1, \dots, f_n, \mathcal{B}.$$

$$ProgEtr(\mathbf{let} \ \pi \ \mathbf{in} \ f_1 \parallel \dots \parallel f_n, \mathcal{B}) \iff AProgEtr(\mathbf{let} \ \pi \ \mathbf{in} \ f_1 \parallel \dots \parallel f_n, \mathcal{B})$$

PROOF. Prove by applying Lemma. B.1.

Equivalence between the auxiliary promising semantics and the non-preemptive semantics. Then, we prove that the auxiliary promising semantics and the non-preemptive semantics are equivalent.

LEMMA B.4 (SEMANTICS EQUVIALENCE - APS2NP).

$$\forall \pi, f_1, \dots, f_n, \iota, \mathcal{B}.$$

$$AProgEtr(\mathbf{let} (\pi, \iota) \mathbf{in} f_1 \parallel \dots \parallel f_n, \mathcal{B}) \iff NPProgEtr(\mathbf{let} (\pi, \iota) \mathbf{in} f_1 \mid \dots \mid f_n, \mathcal{B})$$

PROOF. For "NPProgEtr(let  $(\pi, \iota)$  in  $f_1 \mid \ldots \mid f_n, \mathcal{B}) \implies AProgEtr(let <math>(\pi, \iota)$  in  $f_1 \mid \ldots \mid f_n, \mathcal{B})$ ", since every step in the non-preemptive semantics can be easily converted to a step of promising semantics, it is obviously that every event trace in the non-preemptive semantics can be produced in promising semantics.

We show the proof of "AProgEtr(let  $(\pi, \iota)$  in  $f_1 \parallel \cdots \parallel f_n, \mathcal{B}) \implies NPProgEtr(let <math>(\pi, \iota)$  in  $f_1 \mid \cdots \mid f_n, \mathcal{B})$ ". We do intros and have the following.

$$AProgEtr(\mathbf{let}(\pi, \iota) \mathbf{in} f_1 \parallel \cdots \parallel f_n, \mathcal{B})$$
 (1)

We unfold (1) and get that there exists TP, t, S, M and n such that the followings hold.

let 
$$(\pi, \iota)$$
 in  $f_1 \parallel \cdots \parallel f_n \stackrel{load}{=} \triangleright (\mathcal{TP}, \mathsf{t}, \mathcal{S}, M)^{\iota}$  (2)

$$AEtr^{n}((\mathcal{TP}, \mathsf{t}, \mathcal{S}, M)^{\iota}, \mathcal{B}) \tag{3}$$

$$\frac{W \stackrel{na/sw}{==} \rhd^* W'}{\text{sw-procs}^0(W, W', \epsilon)} \qquad \frac{W \stackrel{na/sw}{==} \rhd^* W' \quad W' == \rhd \text{done}}{\text{sw-procs}^{n+1}(W, W', \text{done})}$$

$$\frac{W \stackrel{na/sw}{==} \rhd^* W' \quad W' == \rhd \text{abort}}{\text{sw-procs}^{n+1}(W, W', \text{abort})}$$

$$\frac{W \stackrel{na/sw}{==} \rhd^* W_1 \quad W_1 \stackrel{\text{out}(v)}{==} \rhd W_2 \quad \text{sw-procs}^n(W_2, W', \mathcal{B})}{\text{sw-procs}^{n+1}(W, W', \text{out}(v) :: \mathcal{B})}$$

$$\frac{W \stackrel{na/sw}{==} \rhd^* W_1}{\text{sw-procs}^{n+1}(W, W', \text{out}(v) :: \mathcal{B})}$$

$$\frac{W \stackrel{na/sw}{==} \rhd^* W_1}{\text{sw-procs}^{n+1}(W, W', \mathcal{B})}$$

$$\frac{W \stackrel{na/sw}{==} \rhd W_2) \vee (W_1 \stackrel{\text{tterm}}{==} \rhd W_2)}{\text{sw-procs}^n(W_2, W', \mathcal{B})}$$

$$\frac{W^{\text{term}}{=} \rhd W_2}{\text{sw-procs}^{n+1}(W, W', \mathcal{B})}$$

$$\frac{W^{\text{term}}{=} \rhd W_0 \quad \text{NAStep}^n(W_0^{t_0}, W')}{\text{NAStep}^{n+1}(W, W')}$$

$$\frac{W^{\text{term}}}{\text{PRCStep}^0(W, W')} \stackrel{W^{\text{prc}}}{\text{PRCStep}^{n+1}(W, W')}$$

$$\frac{W \stackrel{prc}{=} \rhd W_0 \quad \text{PRCStep}^n(W_0, W')}{\text{PRCStep}^{n+1}(W, W')}$$

$$\frac{W \stackrel{prc}{=} \rhd W_0 \quad \text{PRCStep}^n(W_0, W')}{\text{PRCStep}^{n+1}(W, W')}$$

Fig. 51. Auxiliary definitions in semantics equivalent proof

We apply Lemma. B.5 on (3) and get that there exists n' and W' such that the followings hold:

$$sw-procs^{n'}((\mathcal{TP}, t, \mathcal{S}, M)^{\iota}, W', \mathcal{B})$$
(4)

We can construct a non-preemptive program state such that:

$$\mathbf{let} (\pi, \iota) \mathbf{in} f_1 | \dots | f_n \stackrel{load}{\Longrightarrow} (\mathcal{TP}, \mathsf{t}, \mathcal{S}, M, \circ)^{\iota}$$
 (5)

We apply Lemma. B.6 on (4) and get that the following holds.

$$NPEtr^*((\mathcal{TP}, t, \mathcal{S}, M, \circ)^{\iota}, \mathcal{B})$$

LEMMA B.5 (AETR TO SW-PROCS).

$$\forall W, n, \mathcal{B}. \ AEtr^n(W, \mathcal{B}) \Longrightarrow \\ \exists n', W'. \ \text{sw-procs}^{n'}(W, W', \mathcal{B})$$

PROOF. Prove by induction on n.

LEMMA B.6 (SW-PROCS TO NPETR).

$$\forall \mathcal{TP}, \mathcal{S}, M, t, \iota, W', \hat{W}, n, \mathcal{B}.$$

$$sw-procs^{n}((\mathcal{TP}, t, \mathcal{S}, M)^{\iota}, t', W', \mathcal{B}) \wedge wdSt(\mathcal{TP}, \mathcal{S}, M)$$

$$\implies \exists t'. NPEtr^{*}((\mathcal{TP}, t', \mathcal{S}, M, \circ)^{\iota}, \mathcal{B})$$

PROOF. Prove by induction on n.

0: We get that  $\mathcal{B} = \epsilon$ . We can prove that  $NPEtr^0((\mathcal{TP}, t, \mathcal{S}, M)^{\iota}, \epsilon)$ .

n+1: We do intro and have the following.

sw-procs<sup>$$n+1$$</sup>( $(\mathcal{TP}, t, \mathcal{S}, M)^{i}, W', \mathcal{B}$ ) (1)

$$wdSt(\mathcal{TP}, \mathcal{S}, M) \tag{2}$$

We unfold (1) and discuss each case respectively.

- $\mathcal{B} \in \{\text{done}, \text{abort}\}$ . We finish the proof directly.
- $\mathcal{B} = \operatorname{out}(v) :: \mathcal{B}'$ . We have that there exist  $W_1$  and  $W_2$  such that:

$$(\mathcal{TP}, \mathsf{t}, \mathcal{S}, M)^{\iota} \xrightarrow{na/\mathsf{sw}} \rhd^* W_1 \tag{3}$$

$$W_1 \xrightarrow{\operatorname{out}(v)} \rhd W_2$$
 (4)

$$sw-procs^n(W_2, W', \mathcal{B})$$
 (5)

We apply Lemma. B.7 on (3) and (4) and have that there exist  $W_{01}$  and  $W_{02}$  such that:

$$(\mathcal{TP}, \mathsf{t}, \mathcal{S}, M)^{\iota} \xrightarrow{prc/\mathsf{sw}} \rhd^* W_{01} \tag{6}$$

$$W_{01} \xrightarrow{\operatorname{out}(v)} \rhd W_{02}$$
 (7)

$$W_{02} \xrightarrow{=na/\text{sw}} \rhd^* W_2 \tag{8}$$

Let  $W_{02} = (\mathcal{TP}_{02}, t_{02}, \mathcal{S}_{02}, M_{02})^t$ . From (6) and (7), we have the following that there exists  $\hat{W}_{01}$  such that:

$$(\mathcal{TP}, \mathsf{t}, \mathcal{S}, M, \circ)^{\iota} :\Longrightarrow^{*} \hat{W}_{01} \tag{9}$$

$$\hat{W}_{01} : \xrightarrow{\operatorname{out}(v)} (\mathcal{TP}_{02}, \mathsf{t}_{02}, \mathcal{S}_{02}, M_{02}, \circ)^{\iota}$$

$$\tag{10}$$

From (8) and (5), we have the following.

sw-procs<sup>n</sup>(
$$(\mathcal{TP}_{02}, t_{02}, \mathcal{S}_{02}, M_{02})^{i}, W', \mathcal{B})$$
 (11)

We finish the proof by applying the inductive hypothesis on (11).

• We consider the case that the program takes an atomic step. We have that there exist  $W_1$  and  $W_2$  such that:

$$(\mathcal{TP}, \mathsf{t}, \mathcal{S}, M)^{\iota} \xrightarrow{na/\mathsf{sw}} \rhd^* W_1 \tag{12}$$

$$W_1 \stackrel{at}{=\!\!\!=\!\!\!=\!\!\!=} \triangleright W_2 \tag{13}$$

$$sw-procs^{n}(W_{2},W',\mathcal{B}) \tag{14}$$

We apply Lemma. B.8 on (12) and (13) and have that there exist  $W_{01}$  and  $W_{02}$  such that:

$$(\mathcal{TP}, \mathsf{t}, \mathcal{S}, M)^{\iota} \xrightarrow{prc/\mathsf{sw}} \rhd^* W_{01} \tag{15}$$

$$W_{01} \stackrel{at}{=} \triangleright W_{02}$$
 (16)

$$W_{02} \xrightarrow{=na/\text{sw}} \rhd^* W_2 \tag{17}$$

Let  $W_{02} = (\mathcal{TP}_{02}, t_{02}, S_{02}, M_{02})^{\iota}$ . From (15) and (16), we have the following.

$$(\mathcal{TP}, \mathsf{t}, \mathcal{S}, M, \circ)^{\iota} :\Longrightarrow^{+} (\mathcal{TP}_{02}, \mathsf{t}_{02}, \mathcal{S}_{02}, M_{02}, \circ)^{\iota} \tag{18}$$

From (17) and (14), we have the following.

sw-procs<sup>n</sup>(
$$(\mathcal{TP}_{02}, \mathsf{t}_{02}, \mathcal{S}_{02}, M_{02})^{\iota}, W', \mathcal{B})$$
 (19)

We finish the proof by applying the inductive hypothesis on (19).

• Similarly, if the program takes a PRC-step, we finish the proof by applying Lemma. B.9 and the inductive hypothesis. And if the program takes a thread termination step, we finish the proof by applying Lemma. B.10 and the inductive hypothesis.

LEMMA B.7 (SWITCH POINT FOWARDING - OUTPUT).

$$\forall W, W', W'', n$$
.

$$W \xrightarrow{na/\text{sw}} \rhd^n W' \wedge W' \xrightarrow{\text{out}(\upsilon)} \rhd W''$$

$$\implies \exists W_0, W_1. W \xrightarrow{prc/\text{sw}} \rhd^* W_0 \wedge W_0 \xrightarrow{\text{out}(\upsilon)} \rhd W_1 \wedge W_1 \xrightarrow{na/\text{sw}} \rhd^* W''$$

PROOF. From the premises, we have the following.

$$W \xrightarrow{na/\text{sw}} \rhd^n W' \tag{1}$$

$$W' \xrightarrow{\operatorname{out}(v)} \rhd W'' \tag{2}$$

By applying Lemma. B.11 on (1), we have that there exist t such that.

$$NAStep^*(W^t, W') \tag{3}$$

By applying Lemma. B.12 on (3) and (2), we have that there exist  $W_0$ ,  $W_1$ , t' and  $t_0$  such that:

$$\mathsf{PRCStep}^*(W^{\mathsf{t'}}, W_0) \tag{4}$$

$$W_0 \xrightarrow{\operatorname{out}(v)} \rhd W_1 \tag{5}$$

$$W_1 \xrightarrow{na/\text{sw}} \rhd^* W'' \tag{6}$$

From (4), we have the following.

$$W^{\mathsf{t'}} \xrightarrow{prc/\mathsf{sw}} \rhd^* W_0 \tag{7}$$

We finish the proof.

LEMMA B.8 (SWITCH POINT FOWARDING - ATOMIC).

$$\forall W, W', W'', n.$$

$$W \xrightarrow{na/sw} \rhd^n W' \wedge W' \xrightarrow{at} \rhd W''$$

$$\implies \exists W_0, W_1. W \xrightarrow{prc/sw} \rhd^* W_0 \wedge W_0 \xrightarrow{at} \rhd W_1 \wedge W_1 \xrightarrow{na/sw} \rhd^* W''$$

Proc. ACM Meas. Anal. Comput. Syst., Vol. 37, No. 4, Article 111. Publication date: August 2018.

PROOF. From the premise, we have the following.

$$W \xrightarrow{na/sw} \rhd^n W' \tag{8}$$

$$W' \xrightarrow{at} \rhd W'' \tag{9}$$

By applying Lemma. B.11 on (8), we have that there exists t such that:

$$NAStep^*(W^t, W') \tag{10}$$

By applying Lemma. B.13 on (10), (9), we have that there exist  $W_0$ ,  $W_1$ , t' and  $t_0$  such that:

$$PRCStep^*(W^{t'}, W_0) \tag{11}$$

$$W_0 \stackrel{at}{=\!\!\!=\!\!\!=\!\!\!=} \triangleright W_1 \tag{12}$$

$$W_1 \xrightarrow{na/\text{sw}} \rhd^* W'' \tag{13}$$

From (11), we have the following.

$$W \xrightarrow{prc/\text{sw}} \rhd^* W_0 \tag{14}$$

We finish the proof.

Lemma B.9 (Switch point fowarding - prc).

$$\forall W, W', W'', n.$$

$$W \xrightarrow{na/sw} \rhd^n W' \land W' \xrightarrow{prc} \rhd W''$$

$$\Longrightarrow \exists W_0. W \xrightarrow{prc/sw} \rhd^* W_0 \land W_0 \xrightarrow{na/sw} \rhd^* W''$$

PROOF. From the premises, we have the following.

$$W \xrightarrow{na/sw} \rhd^n W' \tag{15}$$

$$W' \xrightarrow{prc} > W'' \tag{16}$$

We apply Lemma. B.11 on (15) and have that there exists t such that:

$$NAStep^*(W^t, W') \tag{17}$$

By applying Lemma. B.14 on (17) and (16), we have that there exist  $W_0$  and t' such that:

$$PRCStep^*(W^{t'}, W_0)$$
 (18)

$$W_0 \xrightarrow{na/\text{sw}} \rhd^* W'' \tag{19}$$

From (18), we have the following.

$$W = \frac{na/sw}{} \rhd^* W_0 \tag{20}$$

Lemma B.10 (Switch point fowarding - thread termination).

We show some auxiliary lemmas in the following that are used in the proof of the above lemmas.

LEMMA B.11 (NA/SW TO NA STEP).

$$\forall W, W', n. (W \xrightarrow{na/\text{sw}} \rhd^n W') \implies \exists t. \, \text{NAStep}^*(W^t, W')$$

Lemma B.12 (Switch point forwarding - output aux).

$$\begin{split} \forall W, W', W'', n. \\ & \mathsf{NAStep}^n(W, W') \land W' \xrightarrow{\mathsf{out}(v)} \rhd W'' \\ & \Longrightarrow \exists W_0, W_1, \mathsf{t}, \mathsf{t}_0. \\ & \mathsf{PRCStep}^*(W^\mathsf{t}, W_0) \land W_0^{\mathsf{t}_0} \xrightarrow{\mathsf{out}(v)} \rhd W_1 \land W_1 \xrightarrow{\mathit{na/sw}} \rhd^* W'' \end{split}$$

Lemma B.13 (Switch point forwarding - Atomic Aux).

$$\begin{split} \forall W, W', W'', n. \\ & \mathsf{NAStep}^n(W, W') \land W' \xrightarrow{at} \rhd W'' \\ & \Longrightarrow \exists W_0, W_1, \mathsf{t}, \mathsf{t}_0. \\ & \mathsf{PRCStep}^*(W^\mathsf{t}, W_0) \land W_0^{\mathsf{t}_0} \xrightarrow{at} \rhd W_1 \land W_1 \xrightarrow{na/\mathsf{sw}} \rhd^* W'' \end{split}$$

PROOF. Prove by induction on n.

0: We finish the proof directly.

n+1: From the premises, we have the following.

$$NAStep^{n+1}(W, W') \tag{21}$$

$$W' \stackrel{at}{=\!\!\!=\!\!\!=\!\!\!=} \triangleright W'' \tag{22}$$

We unfold (21) and have that there exist  $W_0$  and  $t_0$  such that:

$$W \stackrel{na}{=} \triangleright W_0 \tag{23}$$

$$NAStep^{n}(W_0^{t_0}, W') \tag{24}$$

We apply the inductive hypothesis on (24) and (22), and have that there exist  $W_1$ ,  $W_2$ ,  $t_0'$ ,  $t_1$  such that:

$$PRCStep^*(W_0^{t_0'}, W_1) \tag{25}$$

$$W_1^{\mathsf{t}_1} \stackrel{at}{=\!\!\!=\!\!\!=\!\!\!=} \triangleright W_2 \tag{26}$$

$$W_2 \xrightarrow{na/\text{sw}} \rhd^* W'' \tag{27}$$

We apply Lemma. B.15 on (24) and (25), and have that there exist  $W_3$ , t and  $t_3$  such that:

$$PRCStep^*(W^t, W_3)$$
 (28)

$$W_3^{t_3} \xrightarrow{\underline{na}} \rhd W_1^{t_3} \tag{29}$$

By applying Lemma. B.16 on (29) and (26), we have that there exist  $W_4$  and  $W_5$  such that:

$$PRCStep^*(W_3^{t_3}, W_4) \tag{30}$$

$$W_4^{\mathsf{t}_1} \stackrel{at}{=\!\!\!=\!\!\!=\!\!\!=} \triangleright W_5 \tag{31}$$

$$W_5 \xrightarrow{na/\text{sw}} \rhd^* W_2 \tag{32}$$

Thus, we are done.  $\Box$ 

$$\begin{split} M \approx_{\mathsf{at}}^{\iota} M' & ::= & (\forall \langle \mathtt{x} : v@(f,t], V \rangle \in M. \\ & \exists V'. \, (\langle \mathtt{x} : v@(f,t], V' \rangle \in M' \wedge (\iota(\mathtt{x}) = \mathsf{at} \implies V = V'))) \wedge \\ & (\forall \langle \mathtt{x} : v@(f,t], V' \rangle \in M'. \, \langle \mathtt{x} : v@(f,t], \_ \rangle \in M) \wedge \\ & (\forall \mathtt{x}, f, t. \, \langle \mathtt{x} : (f,t] \rangle \in M \iff \langle \mathtt{x} : (f,t] \rangle \in M') \end{split}$$

Fig. 52. Auxiliary definitions in proving semantics equivalence

LEMMA B.14 (SWITCH POINT FORWARDING - PRC AUX).

$$\forall W, W', W'', n.$$

$$\mathsf{NAStep}^n(W, W') \land W' \xrightarrow{prc} \lor W''$$

$$\implies \exists W_0, \mathsf{t}. \ \mathsf{PRCStep}^*(W^\mathsf{t}, W_0) \land W_0 \xrightarrow{na/\mathsf{sw}} \lor^* W''$$

LEMMA B.15 (NA STEP DELAY - PRC).

$$\forall W, W_0, W_1, t_0.$$

$$W \stackrel{na}{\Longrightarrow} \triangleright W_0 \land \mathsf{PRCStep}^n(W_0^{t_0}, W_1)$$

$$\Longrightarrow \exists W_2, \mathsf{t}, \mathsf{t}_1.$$

$$\mathsf{PRCStep}^*(W_0^\mathsf{t}, W_2) \land W_2^{\mathsf{t}_1} \stackrel{na}{\Longrightarrow} \triangleright W_1^{\mathsf{t}_1}$$

LEMMA B.16 (NA STEP DELAY - ATOMIC STEP).

$$\forall W, W_0, t_0.$$

$$W \xrightarrow{\underline{na}} \triangleright W_0 \wedge W_0^{t_0} \xrightarrow{\underline{at}} \triangleright W'$$

$$\Longrightarrow \exists W_1, W_2.$$

$$\mathsf{PRCStep}^*(W, W_1) \wedge W_1^{t_0} \xrightarrow{\underline{at}} \triangleright W_2 \wedge W_2 \xrightarrow{\underline{na/sw}} \triangleright^* W'$$

PROOF. In this proof, we assume that the program configuration is well-defined. From the premises, we have the following.

$$W \stackrel{na}{\Longrightarrow} \triangleright W_0 \tag{1}$$

$$W_0^{t_0} \stackrel{at}{=\!\!\!=\!\!\!=\!\!\!=} \triangleright W' \tag{2}$$

We let  $W = (\mathcal{TP}, \mathsf{t}, \mathcal{S}, M)^{\iota}$  and have  $\mathsf{wdSt}(\mathcal{TP}, \mathcal{S}, M)$ . We unfold (1) and have that there exist  $TS_0$ ,  $S_0$  and  $M_0$ such that:

$$\iota \vdash (\mathcal{TP}(\mathsf{t}), \mathcal{S}, M) \xrightarrow{na} (TS_0, \mathcal{S}_0, M_0)$$
 (1.1)

$$consistent(TS_0, S_0, M_0, \iota)$$
 (1.2)

$$\mathcal{TP}_0 = \mathcal{TP}\{t \leadsto TS_0\} \tag{1.3}$$

$$W_0 = (\mathcal{TP}_0, \mathsf{t}, \mathcal{S}_0, M_0)^t \tag{1.4}$$

We unfold (2). We have that there exist  $TS'_0$ ,  $S'_0$ ,  $M'_0$ , TS', S' and M' such that:

$$\iota \vdash (\mathcal{TP}_0(\mathsf{t}_0), \mathcal{S}_0, M_0) \xrightarrow{\mathit{prc}} {}^* (\mathit{TS}'_0, \mathcal{S}'_0, M'_0)$$
 (2.1)

$$\iota \vdash (TS'_0, S'_0, M'_0) \xrightarrow{\text{atmBlk}} (TS', S', M')$$

$$(2.2)$$

consistent(
$$TS', S', M', \iota$$
) (2.3)

$$W' = (\mathcal{TP}_0\{\mathsf{t}_0 \leadsto \mathit{TS}'\}, \mathsf{t}_0, \mathcal{S}', M')^{\iota}$$
(2.4)

We discuss whether t equals to  $t_0$ .

• We first consider  $t = t_0$ . We apply Lemma. B.17 on (1.1) and (2.1) and have that there exist  $TS_1$ ,  $S_1$  and  $M_1$  such that:

$$\iota \vdash (\mathcal{TP}(\mathsf{t}), \mathcal{S}, M) \xrightarrow{prc}^* (TS_1, \mathcal{S}_1, M_1)$$
 (3)

$$\iota \vdash (TS_1, \mathcal{S}_1, M_1) \xrightarrow{na}^* (TS_0', \mathcal{S}_0', M_0') \tag{4}$$

From (3), (4), (2.2) and (2.3), we have the following.

$$(\mathcal{TP}, \mathsf{t}, \mathcal{S}, M)^{\iota} \stackrel{at}{=\!\!\!=\!\!\!=} \triangleright (\mathcal{TP}\{\mathsf{t} \leadsto \mathit{TS}'\}, \mathsf{t}, \mathcal{S}', M')^{\iota}$$
 (5)

We finish the proof of such case.

• Then, we consider that  $t \neq t_0$ . We apply Lemma. B.18 on (1.1) and there exist  $TS_1$  and  $M_1$  such that:

$$\iota \vdash (\mathcal{TP}(\mathsf{t}), \mathcal{S}, M) \xrightarrow{\mathsf{prm}}^* (TS_1, \mathcal{S}_1, M_1)$$
 (6)

$$\iota \vdash (TS_1, \mathcal{S}_1, M_1) \xrightarrow{na}^+ (TS_0, \mathcal{S}_0, M_0) \tag{7}$$

$$M_1 \approx_{\mathsf{at}}^{\mathsf{l}} M_0 \tag{8}$$

We apply Lemma. B.19 on (7), (8) and (1.2) and have the following.

consistent(
$$TS_1, S_1, M_1, \iota$$
) (9)

We apply Lemma. B.20 on (7), (8) and (2.1), we have that there exists  $M_1''$  such that:

$$\iota \vdash (\mathcal{TP}_0(\mathsf{t}_0), \mathcal{S}_1, M_1) \xrightarrow{prc} {}^* (TS_0', \mathcal{S}_1, M_1'')$$

$$\tag{10}$$

$$\iota \vdash (TS_1, \mathcal{S}_1, M_1^{\prime\prime}) \xrightarrow{na}^+ (TS_0, \mathcal{S}_0^{\prime}, M_0^{\prime}) \tag{11}$$

$$M_1^{\prime\prime} \approx_{\mathsf{at}}^{\iota} M_0^{\prime} \tag{12}$$

We apply Lemma. B.21 on (11), (12) and (2.2) and have that there exists M'' such that:

$$\iota \vdash (TS'_0, \mathcal{S}_1, M''_1) \xrightarrow{\text{atmBlk}} (TS', \mathcal{S}', M'')$$
(13)

$$\iota \vdash (TS_1, \mathcal{S}', M'') \xrightarrow{na} (TS_0, \mathcal{S}', M') \tag{14}$$

$$M^{\prime\prime} \approx_{\mathsf{at}}^{\iota} M^{\prime}$$
 (15)

From (6) and (9), we have the following.

$$PRCStep^*((\mathcal{TP}, t, \mathcal{S}, M)^i, (\mathcal{TP}\{t \leadsto TS_1\}, t, \mathcal{S}_1, M_1)^i)$$
(16)

Since we have "wdSt( $\mathcal{TP}, \mathcal{S}, M$ )<sup>*t*</sup>", according to (3), we have "wdSt( $\mathcal{TP}\{t \leadsto TS_1\}, \mathcal{S}, M_1$ )". Then, according to (10) and (13), we have "wdSt( $\mathcal{TP}\{t \leadsto TS_1, t_0 \leadsto TS'\}, \mathcal{S'}, M''$ )". Thus, we have the following.

$$TS'.P \subseteq M''$$
 (17)

By applying Lemma. B.22 on (2.3), (15) and (17), we have the following.

consistent(
$$TS', S', M'', \iota$$
) (18)

From (10), (13) and (18), we have the following.

$$(\mathcal{TP}\{t \rightsquigarrow TS_1\}, t, S_1, M_1)^t \stackrel{at}{=\!\!\!=\!\!\!=} \triangleright (\mathcal{TP}\{t \rightsquigarrow TS_1, t_0 \rightsquigarrow TS'\}, t_0, S', M'')^t$$

$$\tag{19}$$

By applying Lemma. B.23 on (1.2), (2.1) and (2.2), we have the following.

consistent(
$$TS_0, \mathcal{S}', M', \iota$$
) (20)

From (14) and (20), we have the following

$$(\mathcal{TP}\{t \rightsquigarrow TS_1, t_0 \rightsquigarrow TS'\}, t_0, \mathcal{S}', M'')^i \stackrel{na}{=} \triangleright (\mathcal{TP}\{t \rightsquigarrow TS_0, t_0 \rightsquigarrow TS'\}, t_0, \mathcal{S}', M')^i$$
(21)

From (16), (19) and (21), we finish the proof.

LEMMA B.17 (PRC STEPS FORWARDING IN THE SAME THREAD).

$$\forall TS, \mathcal{S}, M, TS', \mathcal{S}', M', TS'', \mathcal{S}'', M'', \iota, n_1, n_2.$$

$$\iota \vdash (TS, \mathcal{S}, M) \xrightarrow{na} {}^{n_1} (TS', \mathcal{S}', M') \land$$

$$\iota \vdash (TS', \mathcal{S}', M') \xrightarrow{prc} {}^{n_2} (TS'', \mathcal{S}'', M'')$$

$$\Longrightarrow \exists TS_0, \mathcal{S}_0, M_0.$$

$$\iota \vdash (TS, \mathcal{S}, M) \xrightarrow{prc} (TS_0, \mathcal{S}_0, M_0) \land$$

$$\iota \vdash (TS_0, \mathcal{S}_0, M_0) \xrightarrow{na} (TS'', \mathcal{S}'', M'')$$

LEMMA B.18 (PROMISES FORWARDING NON-ATOMIC STEPS).

$$\forall TS, S, M, TS', S', M', \iota, n.$$

$$\iota \vdash (TS, S, M) \xrightarrow{na} {}^{n} (TS', S', M')$$

$$\Longrightarrow \exists TS_{0}, M_{0}. \ \iota \vdash (TS, S, M) \xrightarrow{\text{prm}} {}^{*} (TS_{0}, S, M_{0}) \land$$

$$\iota \vdash (TS_{0}, S, M_{0}) \xrightarrow{na} {}^{n} (TS', S', M') \land M_{0} \approx_{a}^{\iota} M'$$

PROOF. Prove by induction on n.

LEMMA B.19 (CONSISTENCY FORWARDING NON-ATOMIC STEPS).

$$\forall TS, S, M, TS', S', M', \iota, n.$$

$$\iota \vdash (TS, S, M) \xrightarrow{na}^{n} (TS', S', M') \land$$

$$M \approx_{\iota}^{at} M' \land consistent(TS', S', M', \iota)$$

$$\implies consistent(TS, S, M, \iota)$$

Lemma B.20 (non-atomic steps and prc step reordering).

$$\begin{split} \forall TS_1, \mathcal{S}, M, TS_1', \mathcal{S}_1, M_1, TS_2, TS_2', \mathcal{S}_2, M_2, n_1, n_2. \\ \iota \vdash (TS_1, \mathcal{S}, M) & \stackrel{na}{\longrightarrow}^{n_1} (TS_1', \mathcal{S}_1, M_1) \land M \approx_{\mathsf{at}}^{\iota} M_1 \land \\ \iota \vdash (TS_2, \mathcal{S}_1, M_1) & \stackrel{prc}{\longrightarrow}^{n_2} (TS_2', \mathcal{S}_2, M_2) \\ \Longrightarrow & \exists M_{20}. \ \iota \vdash (TS_2, \mathcal{S}, M) & \stackrel{prc}{\longrightarrow}^{n_2} (TS_2', \mathcal{S}, M_{20}) \land \\ \iota \vdash (TS_1, \mathcal{S}, M_{20}) & \stackrel{na}{\longrightarrow}^{n_1} (TS_1', \mathcal{S}_2, M_2) \land M_{20} \approx_{\mathsf{at}}^{\iota} M_2 \end{split}$$

LEMMA B.21 (NON-ATOMIC STEPS AND ATOMIC-BLOCK STEP REORDERING).

$$\forall TS_1, \mathcal{S}, M, TS'_1, \mathcal{S}_1, M_1, TS_2, TS'_2, \mathcal{S}_2, M_2, n_1, n_2.$$

$$\iota \vdash (TS_1, \mathcal{S}, M) \xrightarrow{na} {}^{n_1} (TS'_1, \mathcal{S}_1, M_1) \land M \approx_{\mathsf{at}}^{\iota} M_1 \land$$

$$\iota \vdash (TS_2, \mathcal{S}_1, M_1) \xrightarrow{\mathsf{atmBlk}} {}^{n_2} (TS'_2, \mathcal{S}_2, M_2)$$

$$\Longrightarrow \exists M_{20}. \ \iota \vdash (TS_2, \mathcal{S}, M) \xrightarrow{\mathsf{atmBlk}} {}^{n_2} (TS'_2, \mathcal{S}_2, M_{20}) \land$$

$$\iota \vdash (TS_1, \mathcal{S}_2, M_{20}) \xrightarrow{\mathsf{atm}} {}^{n_1} (TS'_1, \mathcal{S}_2, M_2) \land M_{20} \approx_{\mathsf{at}}^{\iota} M_2$$

Proc. ACM Meas. Anal. Comput. Syst., Vol. 37, No. 4, Article 111. Publication date: August 2018.



Fig. 53. Proof sketch

LEMMA B.22 (MEM APPROX EQUAL CONSISTENCY PRESERVING).

```
\forall \mathit{TS}, \mathcal{S}, \mathit{M}, \iota, \mathit{M}_0.
\mathsf{consistent}(\mathit{TS}, \mathcal{S}, \mathit{M}, \iota) \land \mathit{M} \approx^{\iota}_{\mathsf{at}} \mathit{M}_0 \land \mathit{TS}.\mathit{P} \subseteq \mathit{M}_0
\implies \mathsf{consistent}(\mathit{TS}, \mathcal{S}, \mathit{M}_0, \iota)
```

Lemma B.23 (consistent forwarding).

$$\forall TS, S, M, TS_0, n, S', M', \iota.$$

$$consistent(TS, S, M, \iota) \land TS.P \# TS_0.P \land$$

$$\iota \vdash (TS_0, S, M) \longrightarrow {}^{n}(\_, S', M')$$

$$\implies consistent(TS, S', M', \iota)$$

*Equivalence between the promising semantics and the non-preemptive semantics.* We show the correctness proof of Lemma. 6.1 in the following.

PROOF. We finish the proof by applying Lemma. B.3 and 6.1.



$$t \circ \xrightarrow{r := x_{na}} \bullet \xrightarrow{y_{na} := 2} \bullet \xrightarrow{print(r)} \bullet$$

$$I_{rr} \mid \qquad \qquad \downarrow I_{rr} \qquad \downarrow I_{rr}$$

$$t' \circ \xrightarrow{y_{na} := 2} \bullet \xrightarrow{r := x_{na}} \bullet \xrightarrow{print(r)} \bullet$$

$$where I_{rr}(\varphi, \iota, (S_t, M_t, S_s, M_s))$$

$$\triangleq (M_t = M_s \land S_t = S_s)$$

#### **REFERENCES**

- [1] Mark Batty, Scott Owens, Susmit Sarkar, Peter Sewell, and Tjark Webe. 2011. Mathematizing C++ Concurrency. In Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages (POPL'11). 55–66.
- [2] Soham Chakraborty and Viktor Vafeiadis. 2016. Validating optimizations of concurrent C/C++ programs. In Proceedings of the 2016 International Symposium on Code Generation and Optimization (CGO '16). 216–226.
- [3] Soham Chakraborty and Viktor Vafeiadis. 2017. Formalizing the concurrency semantics of an LLVM fragment. In Proceedings of the 2017 International Symposium on Code Generation and Optimization (CGO '17). 100–110.
- [4] Ševčík. 2011. Safe Optimisations for Shared-Memory Concurrent Programs. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '11).
- [5] Minki Cho, Sung-Hwan Lee, Chung-Kil Hur, and Ori Lahav. 2021. Promising 2.0: Global Optimizations in Relaxed Memory Concurrency. In Proceedings of the 42nd annual ACM SIGPLAN conference on Programming Languages Design and Implementation (PLDI '21).
- [6] CompCert Developers. 2020. CompCert-3.7. http://compcert.inria.fr/release/compcert-3.7.tgz
- [7] Hanru Jiang, Hongjin Liang, Siyang Xiang, Junpeng Zha, and Xinyu Feng. 2019. Towards Certified Separate Compilation for Concurrent Programs. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '19).
- [8] Jeehoon Kang, Chung-Kil Hur, Ori Lahav, Viktor Vafeiadis, and Derek Dreyer. 2017. A Promising Semantics for Relaxed-Memory Concurrency. In Proceedings of the 44th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL '17).
- [9] Ori Lahav, Viktor Vafeiadis, Jeehoon Kang, Chung-Kil Hur, and Derek Dreyer. 2017. Repairing Sequential Consistency in C/C++11. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'17). 618–632.
- [10] Leslie Lamport. 1979. How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs. IEEE Trans. Comput. C-28, 9 (1979), 690–691.
- [11] Sung-Hwan Lee, Minki Cho, Anton Podkopaev, Soham Chakraborty, Chung-Kil Hur, Ori Lahav, and Viktor Vafeiadis. 2020. Promising 2.0: Global Optimizations in Relaxed Memory Concurrency. In *Proceedings of the 41st annual ACM SIGPLAN conference on Programming Languages Design and Implementation (PLDI '20)*.

- [12] Robin Morisset, Pankaj Pawan, and Francesco Zappa Nardelli. 2013. Compiler testing via a theory of sound optimisations in the C11/C++11 memory model. In *Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '13)*. 187–196.
- [13] Anton Podkopaev, Ori Lahav, and Viktor Vafeiadis. 2017. Promising Compilation to ARMv8 POP. In 31st European Conference on Object-Oriented Programming (ECOOP 2017) (Leibniz International Proceedings in Informatics (LIPIcs)), Vol. 74. 22:1–22:28.
- [14] Anton Podkopaev, Ori Lahav, and Viktor Vafeiadis. 2018. Bridging the gap between programming languages and hardware weak memory models. In *Proceedings of the ACM on Programming Languages (POPL'18)*, Vol. 3. 1–32.
- [15] Jaroslav Ševčík, Viktor Vafeiadis, Francesco Zappa Nardelli, Suresh Jagannathan, and Peter Sewell. 2013. CompCertTSO: A Verified Compiler for Relaxed-Memory Concurrency. J. ACM 60, 3 (2013), 22.
- [16] Steven S.Muchnick. 1997. Advanced Compiler Design Implementation. Academic Press.
- [17] Youngju Song, Minki Cho, Dongjoo Kim, Yonghyun Kim, Jeehoon Kang, and Chung-Kil Hur. 2020. CompCertM: CompCert with C-assembly linking and lightweight modular verification. *Proceedings of the ACM on Programming Languages* 4, 23 (2020), 1–31. Issue POPL.
- [18] Viktor Vafeiadis, Thibaut Balabonski, Soham Chakraborty, Robin Morisset, and Francesco Zappa Nardelli. 2015. Common Compiler Optimisations are Invalid in the C11 Memory Model and what we can do about it. In *Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL '15)*. 209–220.
- [19] Yuting Wang, Pierre Wilke, and Zhong Shao. 2019. An abstract stack based approach to verified compositional compilation to machine code. *Proceedings of the ACM on Programming Languages* 3, 62 (2019), 1–30. Issue POPL.