Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Clone this wiki locally
The OCaml memory model
This is the (draft!) memory model for multicore OCaml, which answers
the question of what you get when you read a
ref. In single-threaded
code, the answer is "whatever you most recently wrote", but in
multi-threaded code with complex synchronisation exactly what
"most recently" means becomes murky.
The first part of this document explains how to avoid the subtleties, and write reasonably straightforward synchronisation code in which "most recent" is not a complicated concept. The later parts delve into some of the details, but these are useful mainly for people writing very low-level synchronisation libraries or hacking on the compiler (if you're relying on such low-level details when writing application code, you're doing it wrong).
For all of the details, you should read and play with the formal specification of the memory ordering guarantees, specified using herd. The formal spec is more precise and much, much shorter than this document, but if you're not familiar with axiomatic memory models it may take longer to understand.
We can reason about the interaction between a running OCaml program
and its memory by considering the trace of the program, a sequence of
events, such as reading a value from or writing a value to a
(This document uses
ref as a shorthand for "mutable location": the
same rules apply to mutable record fields and mutable object members
ref cells). Each event corresponds to a read or a write
operation in the source code, but a particular operation in the source
code generates a new event every time it runs.
The order that a single-threaded program generates events is called program order. For multi-threaded programs, we still talk about program order, but it is no longer a total order: events generated by two different threads are neither before nor after each other in program order.
Program order describes the order operations were written in the source code, but doesn't necessarily specify the order in which events actually happen: events can be reordered, combined or optimised away both by the compiler and by the CPU. The orderings that can be relied upon are specified as a relation called happens-before: if one event happens before a second, then no thread will ever observe the second but not the first. We say that e₁ happens after e₂ if e₂ happens before e₁, and that they happen concurrently if neither happens before the other. Happens before is transitive (if e₁ happens before e₂ and e₂ happens before e₃, then e₁ happens before e₃) and irreflexive (no event happens before itself).
The API for synchronisation constructs will generally specify
happens-before relationships. For instance, a mutex specifies that one
Lock operation happens before the corresponding
and that the
Unlock happens before the next thread's
the happens-before relationships are specified for
refs and atomic
This model doesn't talk about high-level synchronisation constructs (channels, etc.) yet, mainly because we haven't implemented many of them. However, the guarantees made about atomics are sufficient to implement high-level synchronisation on top, and this model will be updated with details of high-level synchronisation features.
Reading, writing and races
The most basic happens-before guarantee is that made about a single
thread's accesses to a particular
If e₁ and e₂ are events in program order (that is, performed in order by the same thread), they access the same
ref, and they're not both reads, then e₁ happens before e₂.
Suppose we have the following program:
1. let r = ref 0 in 2. let a = !r in 3. r := 1 4. let b = !r in 5. let c = !r in ...
The above means that lines 1, 2, 3 happen in order (for memory model purposes, the initialisation of the reference on line 1 counts as a "write"). Line 3 happens before lines 4 and 5, but lines 4 and 5 happen concurrently since they are both reads.
When a read event r occurs, it reads the value written by some write w to the same location (again, initialisation counts as a "write" here). Which write that is can usually be determined by the following rule:
If a write w happens before a read r on the same
ref, and every other write w' either happens before w or after r, then r reads the value written by w.
This rule does not always apply. It may be the case that there is no such w: for instance, there may be a write that happens concurrently with r, or there may be two writes that happen before r but happen concurrently with each other. When this rule does not apply to a read event, we say that there is a data race.
Programs should avoid data races by using synchronisation mechanisms,
such as the atomics described below. Unlike e.g. the C++ memory model,
this memory model does give relatively strong guarantees even in the
presence of data races. However, programs should still strive to avoid
races: the guarantees provided by this model (see "Coherence and data
races" below) serve to limit the damage done by a data race, rather
than to make data races a reasonable way of writing programs (for
instance, in the OCaml memory model a data race on a particular
will only cause future reads of that
ref to produce odd results,
rather than making the entire program have undefined behaviour as per
The rule above is sufficient to ensure the expected behaviour for references used only by one thread. In the example program given previously, the read on line 2 will read the value written by the initialisation on line 1, since the only other write (line 3) happens after line 2. Similarly, the reads on lines 4 and 5 read the value written on line 3, since the only other write (line 1) happens before line 3.
While the above rules ensure that
ref cells have sensible behaviour
when accessed by a single thread, they do not work well for
synchronising multiple threads. For instance, consider this
message-passing idiom, where
message is an
int ref initialised to
flag is a
bool ref initialised to
thread 1: 1. message := 42 2. flag := true thread 2: 3. let seen = !flag in 4. let value = !message in 5. if seen then print_int value;
This program contains data races. The read of
flag on line 3 is racy,
since it happens concurrently with the write on line 2, and the read
message on line 4 similarly races with the write on line 1. This
program might well print
0, if the read on line 3 reads from the
write on line 2, and the read on line 4 reads from the initialisation
Concretely, this can happen for a number of reasons. For instance,
suppose thread 2 accesses
thread 2: let old = !message in let seen = !flag in let value = !message in if seen then print_int value
The compiler might optimise this by removing the second access to
thread 2: let old = !message in let seen = !flag in let value = old in if seen then print_int value
Now, this program can have
seen = true yet
value = 0, even if
executed on a sequentially consistent machine.
Even if executed without any optimisations, this program can print
if run on a weakly ordered machine such as an ARM or PowerPC. Those
machines will happily reorder the reads from or the writes to
message on their own.
In the memory model, such bad behaviour (printing
0) is permitted
because there are no happens-before relations between the operations
in thread 1 and thread 2. To provide the needed synchronisation,
atomics should be used. We rewrite the program, making
thread 1: 1. message := 42 2. Atomic.set flag true thread 2: 3. let seen = Atomic.get flag in 4. let value = !message in 5. if seen then print_int value;
This program will never print
0: it may not print anything, but if
it prints it will print
42. This is due to the two rules about
atomics, one specifying happens-before on a single thread:
Everything program-order-before an atomic event happens before it, and everything program-order-after an atomic event happens after it.
and one specifying happens-before between threads:
For any two events on the same atomic reference which are not both reads, one of them happens before the other.
By the first rule, the event on line 1 happens before line 2, and
line 3 happens before line 4. By the second, one of lines 2 and 3
happens before the other. If 2 happens before 3, then 1 happens before
4 and the program prints 42. Otherwise, 3 reads
false from the
flag, and the program does not print anything.
Note that since the events on a particular atomic reference are totally ordered by happens-before, data races cannot happen on atomics.
Comparison with other memory models
This model specifies behaviour in the presence of data races, a feature shared by Java's memory model but not by that of C++.
The atomic references of this model are analogous to the
volatile variables of Java or
seq_cst atomics in C++, but OCaml atomics have stronger properties regarding reordering with nonatomic variables. For instance, consider this program:
thread 1: Atomic.set x 1 let a = !y thread 2: y := 1 let b = Atomic.get x
In OCaml, one of the atomic accesses to
x happens before the other. If the
set happens first, then
b = 1. If the
get happens first, then the assignment to
y happens before the
get, before the
set, before the read of
a = 1. However, in both Java and C++
a = b = 0 is a possible outcome.
This (incomplete!) section is useful mainly for reasoning about behaviour in the presence of data races, and verifying that compiler optimisations are sound. Applications (other than low-level synchronisation libraries) should avoid depending on any of this.
The formal memory model is specified by defining the happens-before relation as above, and asserting two conditions: Causality and Coherence
Causality and thin air
The Causality axiom specifies:
acyclic(hb | po | rf)
hb is the happens-before relation,
po is program order, and
rf is the reads-from relation (where
w rf r if
r reads the value
w). This axiom forbids all load-buffering behaviours,
which removes the "values out of thin air" problem facing C++ and
Demanding that there be no cycles in
hb | po | rf has been proposed
as a solution to the thin-air problem in C++, but rejected on
performance grounds since it requires on weakly-ordered machines
(Power, ARM) that subsequent stores be made dependent on prior loads,
which requires inserting many useless branches. I believe this is an
acceptable trade-off in OCaml in order to ensure causality, because:
Most loads in OCaml are of immutable locations, for which an additional branch need not be inserted.
Multicore OCaml's GC design already has a branching read-barrier, so the desired branches are generally already present.
Coherence and data races
The Coherence axiom specifies:
acyclic(co | rf | fr | (hb & loc))
co is the coherence order, which is a total order on
writes to any given location,
fr is the from-reads relation
rf^(-1); co) and
loc relates events to the same location.
This means that even if there are data races on a non-atomic location, one of the racing writes will win the race and the value of the location will eventually stabilise.