Memory model

Stephen Dolan edited this page May 11, 2017 · 7 revisions

The OCaml memory model

This is the (draft!) memory model for multicore OCaml, which answers the question of what you get when you read a ref. In single-threaded code, the answer is "whatever you most recently wrote", but in multi-threaded code with complex synchronisation exactly what "most recently" means becomes murky.

The first part of this document explains how to avoid the subtleties, and write reasonably straightforward synchronisation code in which "most recent" is not a complicated concept. The later parts delve into some of the details, but these are useful mainly for people writing very low-level synchronisation libraries or hacking on the compiler (if you're relying on such low-level details when writing application code, you're doing it wrong).

For all of the details, you should read and play with the formal specification of the memory ordering guarantees, specified using herd. The formal spec is more precise and much, much shorter than this document, but if you're not familiar with axiomatic memory models it may take longer to understand.

Happens-before

We can reason about the interaction between a running OCaml program and its memory by considering the trace of the program, a sequence of events, such as reading a value from or writing a value to a ref (This document uses ref as a shorthand for "mutable location": the same rules apply to mutable record fields and mutable object members as to ref cells). Each event corresponds to a read or a write operation in the source code, but a particular operation in the source code generates a new event every time it runs.

The order that a single-threaded program generates events is called program order. For multi-threaded programs, we still talk about program order, but it is no longer a total order: events generated by two different threads are neither before nor after each other in program order.

Program order describes the order operations were written in the source code, but doesn't necessarily specify the order in which events actually happen: events can be reordered, combined or optimised away both by the compiler and by the CPU. The orderings that can be relied upon are specified as a relation called happens-before: if one event happens before a second, then no thread will ever observe the second but not the first. We say that e₁ happens after e₂ if e₂ happens before e₁, and that they happen concurrently if neither happens before the other. Happens before is transitive (if e₁ happens before e₂ and e₂ happens before e₃, then e₁ happens before e₃) and irreflexive (no event happens before itself).

The API for synchronisation constructs will generally specify happens-before relationships. For instance, a mutex specifies that one thread's Lock operation happens before the corresponding Unlock, and that the Unlock happens before the next thread's Lock. Below, the happens-before relationships are specified for refs and atomic references (Atomic.t).

This model doesn't talk about high-level synchronisation constructs (channels, etc.) yet, mainly because we haven't implemented many of them. However, the guarantees made about atomics are sufficient to implement high-level synchronisation on top, and this model will be updated with details of high-level synchronisation features.

Reading, writing and races

The most basic happens-before guarantee is that made about a single thread's accesses to a particular ref:

If e₁ and e₂ are events in program order (that is, performed in order by the same thread), they access the same ref, and they're not both reads, then e₁ happens before e₂.

Suppose we have the following program:

1. let r = ref 0 in
2. let a = !r in
3. r := 1
4. let b = !r in
5. let c = !r in
    ...

The above means that lines 1, 2, 3 happen in order (for memory model purposes, the initialisation of the reference on line 1 counts as a "write"). Line 3 happens before lines 4 and 5, but lines 4 and 5 happen concurrently since they are both reads.

When a read event r occurs, it reads the value written by some write w to the same location (again, initialisation counts as a "write" here). Which write that is can usually be determined by the following rule:

If a write w happens before a read r on the same ref, and every other write w' either happens before w or after r, then r reads the value written by w.

This rule does not always apply. It may be the case that there is no such w: for instance, there may be a write that happens concurrently with r, or there may be two writes that happen before r but happen concurrently with each other. When this rule does not apply to a read event, we say that there is a data race.

Programs should avoid data races by using synchronisation mechanisms, such as the atomics described below. Unlike e.g. the C++ memory model, this memory model does give relatively strong guarantees even in the presence of data races. However, programs should still strive to avoid races: the guarantees provided by this model (see "Coherence and data races" below) serve to limit the damage done by a data race, rather than to make data races a reasonable way of writing programs (for instance, in the OCaml memory model a data race on a particular ref will only cause future reads of that ref to produce odd results, rather than making the entire program have undefined behaviour as per C++).

The rule above is sufficient to ensure the expected behaviour for references used only by one thread. In the example program given previously, the read on line 2 will read the value written by the initialisation on line 1, since the only other write (line 3) happens after line 2. Similarly, the reads on lines 4 and 5 read the value written on line 3, since the only other write (line 1) happens before line 3.

Atomics

While the above rules ensure that ref cells have sensible behaviour when accessed by a single thread, they do not work well for synchronising multiple threads. For instance, consider this message-passing idiom, where message is an int ref initialised to 0 and flag is a bool ref initialised to false:

thread 1:
1. message := 42
2. flag := true

thread 2:
3. let seen = !flag in
4. let value = !message in
5. if seen then print_int value;

This program contains data races. The read of flag on line 3 is racy, since it happens concurrently with the write on line 2, and the read of message on line 4 similarly races with the write on line 1. This program might well print 0, if the read on line 3 reads from the write on line 2, and the read on line 4 reads from the initialisation of message.

Concretely, this can happen for a number of reasons. For instance, suppose thread 2 accesses message beforehand:

thread 2:
let old = !message in
let seen = !flag in
let value = !message in
if seen then print_int value

The compiler might optimise this by removing the second access to message:

thread 2:
let old = !message in
let seen = !flag in
let value = old in
if seen then print_int value

Now, this program can have seen = true yet value = 0, even if executed on a sequentially consistent machine.

Even if executed without any optimisations, this program can print 0 if run on a weakly ordered machine such as an ARM or PowerPC. Those machines will happily reorder the reads from or the writes to flag and message on their own.

In the memory model, such bad behaviour (printing 0) is permitted because there are no happens-before relations between the operations in thread 1 and thread 2. To provide the needed synchronisation, atomics should be used. We rewrite the program, making flag an Atomic.t:

thread 1:
1. message := 42
2. Atomic.set flag true

thread 2:
3. let seen = Atomic.get flag in
4. let value = !message in
5. if seen then print_int value;

This program will never print 0: it may not print anything, but if it prints it will print 42. This is due to the two rules about atomics, one specifying happens-before on a single thread:

Everything program-order-before an atomic event happens before it, and everything program-order-after an atomic event happens after it.

and one specifying happens-before between threads:

For any two events on the same atomic reference which are not both reads, one of them happens before the other.

By the first rule, the event on line 1 happens before line 2, and line 3 happens before line 4. By the second, one of lines 2 and 3 happens before the other. If 2 happens before 3, then 1 happens before 4 and the program prints 42. Otherwise, 3 reads false from the initialisation of flag, and the program does not print anything.

Note that since the events on a particular atomic reference are totally ordered by happens-before, data races cannot happen on atomics.

Comparison with other memory models

This model specifies behaviour in the presence of data races, a feature shared by Java's memory model but not by that of C++.

The atomic references of this model are analogous to the volatile variables of Java or seq_cst atomics in C++, but OCaml atomics have stronger properties regarding reordering with nonatomic variables. For instance, consider this program:

thread 1:
Atomic.set x 1
let a = !y

thread 2:
y := 1
let b = Atomic.get x

In OCaml, one of the atomic accesses to x happens before the other. If the set happens first, then b = 1. If the get happens first, then the assignment to y happens before the get, before the set, before the read of y, so a = 1. However, in both Java and C++ a = b = 0 is a possible outcome.

Low-level details

This (incomplete!) section is useful mainly for reasoning about behaviour in the presence of data races, and verifying that compiler optimisations are sound. Applications (other than low-level synchronisation libraries) should avoid depending on any of this.

The formal memory model is specified by defining the happens-before relation as above, and asserting two conditions: Causality and Coherence

Causality and thin air

The Causality axiom specifies:

acyclic(hb | po | rf)

where hb is the happens-before relation, po is program order, and rf is the reads-from relation (where w rf r if r reads the value written by w). This axiom forbids all load-buffering behaviours, which removes the "values out of thin air" problem facing C++ and Java.

Demanding that there be no cycles in hb | po | rf has been proposed as a solution to the thin-air problem in C++, but rejected on performance grounds since it requires on weakly-ordered machines (Power, ARM) that subsequent stores be made dependent on prior loads, which requires inserting many useless branches. I believe this is an acceptable trade-off in OCaml in order to ensure causality, because:

  • Most loads in OCaml are of immutable locations, for which an additional branch need not be inserted.

  • Multicore OCaml's GC design already has a branching read-barrier, so the desired branches are generally already present.

Coherence and data races

The Coherence axiom specifies:

acyclic(co | rf | fr | (hb & loc))

where co is the coherence order, which is a total order on writes to any given location, fr is the from-reads relation (defined as rf^(-1); co) and loc relates events to the same location.

This means that even if there are data races on a non-atomic location, one of the racing writes will win the race and the value of the location will eventually stabilise.