Skip to content

Updating the Go memory model #47141

rsc announced in Discussions
Updating the Go memory model #47141
Jul 12, 2021 · 15 comments · 49 replies

I posted a blog post Updating the Go memory model about some changes I am planning to propose regarding Go's memory model.

This discussion is for collecting feedback about those changes before filing an official proposal.

Please make good use of the limited threading below: post a top-level comment for something new and use the reply feature to reply to someone else's comment. We hope this will help keep discussion manageable.

Thank you!

Replies

15 comments
·
49 replies

A hearty +1 to atomic.Bool and friends.

5 replies
@cristaloleg

Some of the implementation already exist, some examples https://github.com/cristalhq/atomix or more popular https://github.com/uber-go/atomic (ah, and many more inside stdlib as unexported type)

@cespare

cespare Jul 12, 2021
Collaborator

In general the implementation of those types is trivial; the API design is the only interesting bit.

(Edit: To be clear, I mean it's trivial to build things like Bool on top of existing atomic operations. Getting the underlying atomic ops right is obviously tricky and requires compiler/hardware awareness, but that work has already been done.)

@rsc

rsc Jul 12, 2021
Maintainer Author

I'd be interested to see representative examples of use of atomic.Bool. I wonder if there is a higher-level API we should be targeting for package sync as well. I mean, something like sync.Once is roughly an atomic.Bool, but a bit more, and the bit more makes it much better.

@cespare

We have an internal type at work that's essentially atomic.Bool (Get, Set, and Swap; though they should probably be named Load, Store, and Swap). For comparison, we use it roughly as much as we use sync.Once.

I read through all the call sites and they are straightforward uses where you could imagine using a sync.Mutex + a bool but the atomic.Bool is more convenient and avoids a lock; there isn't any existing mutex or other synchronization on which to piggyback. Two examples:

  • A server with a "killswitch" feature where it can be set to temporarily refuse all requests with some error (but stay up and running). The killswitch is an atomic bool; it is set to true to turn on the killswitch and every request begins by consulting the value of the killswitch.
  • A couple of TCP servers with Close methods have an atomic closed bool that we can use to return a particular error when a server has been closed (which doesn't normally happen except in tests). I checked and net/http.Server does the same thing: Server.inShutdown uses that package's atomicBool type for the same purpose.

None of our use cases seem like they want a higher-level abstraction.

@neild

I'd be interested to see representative examples of use of atomic.Bool.

The protobuf internals include several lazily initialized data structures which include an atomic uint32, set to 1 after the structure is initialized. This could be a sync.Once, but as I recall handwriting the init operation allowed inlining the fast path when the object is already initialized. We would have used a sync.Bool if one existed.

https://go.googlesource.com/protobuf/+/refs/heads/master/internal/filedesc/desc.go#41

There's a minor typo in a code sample towards the end:

i := *p
if i < 0 || i >= len(fns) {
	panic("invalid function index")
}
... complex code ...
// compiler must NOT reload i = *p here
funcs[i]()

The call to len should be len(funcs).

That aside, the proposal looks great. I too would prefer not to add unsynchronized atomics.

3 replies
@cespare

Other typos: search for "sequental" (there are two instances).

@rsc

rsc Jul 12, 2021
Maintainer Author

Thanks, fixed both, and a straighforward.

@kortschak

"This section list the adjustments..." s/list/lists/

@rsc, I don't quite follow this:

To answer an obvious suggestion, I don't see a clean way to use generics to provide just a single atomic.Atomic[T] that would let us avoid introducing Bool, Int, and so on as separate types, at least not without special cases in the compiler.

I see a specific issue with Bool in that it doesn't have an Add operation. What's the problem with defining a single parameterized type for the numeric ones? (Maybe you still want special compiler support for performance, but it sounds like you're talking about a type system issue.)

6 replies
@szabba

With a future extension to generics like switch type T:

  • the implementations could be kept type safe,
  • which branch is taken in a given generic type instantiation would be known at compile time.

Then an optimization that removed the other branches would not have to be a special case. Or am I confused and is that different from what you're talking about?

@ianlancetaylor

Note that we already have special cases in the compiler for all of the individual sync/atomic functions. The actual code in sync/atomic is not used for normal builds. So if we introduced atomic.Atomic[T atomicTypes] we would indeed require a special case in the compiler, but it is a special case that is essentially already there. The fallback method implementations could use a type switch as you suggest.

@prattmic

I wonder if we can gather data about how frequently the non-intrinsic atomic implementations are used. My guess would be not very often at all (deferred atomic operation may be the most common?).

@ianlancetaylor

I think the race detector uses the non-intrinsic versions. Not 100% sure, though.

@prattmic

Ah yes, good call. That said, the race atomics may already be sufficiently slow that a switch won't matter. (They are already assembly, so no inlining is happening).

I like the idea of being clear about what kind of invalid programs are allowed in the case when you have a data-race and how they might behave. I also like the call-out that concurrent multi-word reads/writes allow you to observe inconsistent (and potentially corrupt) structures.

One thing that I think could be more explicit is the behaviour when you have concurrent reads and writes to a memory location X that is smaller than (or equal in size to) a machine-word.

  var x int8 = 0

  go func () {
    x = 1
  }()
  go func () {
    fmt.Println(x) // could this print anything except 0 or 1?
  }()

I think either of the following options make sense, and experimentally the first seems true (and definitely the most expected!):

  • For a memory location X where a write that happened before set X=A. If a write that sets X=B and a read of X happen concurrently, then the read may observe X having a value of either A or B.
  • For a memory location X where a write that happened before set X=A. If a write that sets X=B and a read of X happen concurrently, then the read may observe X having any value.
7 replies
@ConradIrwin

I don't think the current rules actually say this (though it is quite possible that I'm missing an implication in the text – as you say, it is very subtle). The two cases I am worried about:

  1. (from your blog post) "[a compiler] must not allow a single write to write multiple values.".

    // *p = 2, i = 2
    
    *p = i + *p/2.  
    // is "optimized" to
    *p /= 2
    *p += i
    

    There are three possibilities for a concurrent read in the "optimized" case. It either sees 1, 2 or 3. As in your blog post we'd like to exclude reading a 1, but (at least my reading of) the current go memory model's rules do not:

    • In this case there is no synchronization to establish "happens before" relationships, so the read is not "guaranteed to observe" any particular write.
    • The read is "allowed to observe" any of the writes.
    • There seems to be a missing exclusion of "The read is not allowed to observe anything else"
    • I'm not sure the best way to explain the guarantee, maybe: "if any write w to a memory location l happens before a read r of l, r is guaranteed to observe either w or a write w' to l that happens before or concurrently to r and which does not happen before w."). It may need additional context for reading from uninitialized memory before any writes to that location (though I think that's only possible with unsafe/cgo?), "If no writes to l happen before a read, that read may see an arbitrary value (though this is unlikely in practice, as the write during initialization happens before any read to a variable)".
  2. A (maybe hypothetical?) worry about the behavior of writes to "machine-word" sized values or smaller. As someone who's not very familiar with hardware architecture, it's not clear what guarantees I get. The current text calls out "Reads and writes of values larger than a single machine word behave as multiple machine-word-sized operations in an unspecified order." This doesn't say anything about reads and writes of values smaller than (or equal in size to) machine words (though it does imply they probably are defined in some way). I think the suggestion for 1. is sufficient to handle this; but it could be made clearer for future readers by saying. "Even in the presence of data-races, reads of a machine-word size location will not see partially written values."

Either way this may just be a scope-creep request on top of your change to make the implications more explicit, not push-back on the idea. I'd love to see the page significantly expanded and clarified as you propose.

@rsc

rsc Jul 13, 2021
Maintainer Author

Thanks. I will work to make that text clearer.

@bshanks

Expanding on this, you say "a read must observe some other value that was actually written there. Unlike C++, the read is not allowed to observe "any value"." But as ConradIrwin points out, "There seems to be a missing exclusion of "The read is not allowed to observe anything else"".

In section Happens Before, the existing wording does say:

 A read r of a variable v is _allowed_ to observe a write w to v if both of the following hold:

    r does not happen before w.
    There is no other write w' to v that happens after w but before r.

As written, that does not disallow anything, it only allows. You might consider making it:

A read r of a variable v is allowed to observe a write w to v only if both of the following hold:

However, is this intended to prevent acausal/out-of-thin-air reads? I'm not sure that it does. In the example from https://research.swtch.com/plmm#acausality , my understanding was that there IS a write of "42" to the variable, and it does not happen-before the corresponding read (because neither happens-before the other). In other words, the read and the write of 42 are concurrent and there is no intervening write, therefore both "r does not happen before w" and "There is no other write w' to v that happens after w but before r" are satisfied. Or is the intention to allow acausal reads?

@rsc

rsc Jul 14, 2021
Maintainer Author

However, is this intended to prevent acausal/out-of-thin-air reads? I'm not sure that it does.

Acausal and out-of-thin-air reads are, informally, not allowed. However, these rules do not attempt to do that formally. As far as I can tell no one has any idea how to do that formally. Go won't be the first. I'm not concerned about that particular detail.

@roudkerk

Is it worth explicitly saying that "word tearing" cannot happen, like Java does: https://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html#jls-17.6.

In the list of ordering of operations, the word "happens" seems imprecise:

  • A receive from an unbuffered channel happens before the send on that channel completes.
  • The k'th receive on a channel with capacity C happens before the k+C'th send from that channel completes.

If you read "happens" to mean "begins and ends," then it would seem to be impossible for a read to end before the corresponding send completes. Maybe "happens" should be "begins," if that's the correct semantics.

3 replies
@rsc

rsc Jul 13, 2021
Maintainer Author

The sentences are correct, if subtle.

If you think of the send as a function call send(channel, value), then "the send completes" means "the call returns". The actual communication happens during the call, before the return, so it does make sense for the receive to happen before the send completes (unblocks and lets the code doing the send keep running).

@bshanks

(i made a similar comment below): As others noted, a potential problem is that we have both "send < receive" (from the existing text in https://golang.org/ref/mem section channel communication) and also "receive < send" (where < means happens-before). Perhaps channel send/receive could be modeled as consisting of separate begin and end events: send_begin < receive_begin < send_end < receive_end

@kylelemons

The way that I think about it, it does makes sense, because synchronous channel operations do establish a mutual happens-before relationship.

ch = make(chan int)
{ x = A ;    <-ch ; return y }
{ y = B ; ch <- 1 ; return x }

It is safe to write to x/y and to read y/x because you can trace the happens-before in both directions (for sync channels):

  • x = A happens before <-ch happens before ch <- 1 happens before return x
  • y = B happens before ch <- 1 happens before <-ch happens before return y

Some minor comments from the first pass.

s/Like in Java/As in Java/

Two bullet points in "Go's memory model today" are mutually incompatible:

  • A send on a channel happens before the corresponding receive from that channel completes.
  • A receive from an unbuffered channel happens before the send on that channel completes.

Did you miss a "buffered" in the first bullet?

s/This section list/&s/

In the summary in the next section, s/useful/practical/.

The definition of data race doesn't feel quite right to me. Maybe concurrently is the issue. Try this: A data race is defined as a write to a memory location happening while another read or write to that same location is also happening.

I like the point about encouraging external (please don't say third-party; I don't even know who the second party is) packages documenting their semantics.

Do you need to say more about the operations around atomic operations? I think there is a happens-before claim you can make about code on either side of an atomic update, but I'm not sure. What you have there now talks only about the atomics, not the rest. Look at:

a = b
<atomic operation on c>

What can you say about a here?

In "C++ provides both, of course", drop "of course". No need to editorialize.

Outside the parens, say "barriers" not "fences"

I would argue strongly against adding unsynchronized atomics, to keep with the philosophy of enough is enough, and the subtlety of adding more operations with different semantics.

Where would you document the disallowed compiler optimizations? Also, can you find a categorical definition instead of a catalog of bad examples?

3 replies
@bshanks

I think something like "A send on a channel happens before the corresponding receive from that channel completes" is needed even in unbuffered channels to prevent a paradox like printing "0" in:

// Thread 1           // Thread 2
x = 1;                while(done == 0) { /* loop */ }
done = 1;             print(x);

(a similar Go example is in https://golang.org/ref/mem section Channel communication)

Perhaps a solution is to specify channel send/receive as consisting of separate begin and end events: send_begin < receive_begin < send_end < receive_end (where < means happens-before)

@bshanks

"Concurrently" is defined in the existing text at https://golang.org/ref/mem in section Happens Before. In the definition of data race, the new text could refer to the existing definition of concurrently by adding something like "(see below for a precise definition of concurrently)".

@bshanks

I think the happens-before between "a=b" and "<atomic operation on c>" is already covered by the existing text at https://golang.org/ref/mem : "Within a single goroutine, the happens-before order is the order expressed by the program."

Possibly related: cache alignment and false sharing.

It would be great if there was a way to cause cache aligned allocation so as to avoid false sharing when concurrent access occurs.

There are some work arounds, such as adding padding, to ensure one thing is not on the same cache line as another, but having more precision, and especially having a way to align the start of a struct, would be very helpful.

1 reply
@rsc

rsc Jul 13, 2021
Maintainer Author

There have been proposals about alignment that we have not gotten to agreement on yet. Happily that's only an optimization problem and not a "different semantics" problem, so it's separable from the memory model more generally.

In general if you do a fresh allocation of a size that is a multiple of some power of two, you can generally assume it will be that-power-of-two-aligned, which is enough for programmers being very careful about this sort of thing.

While I agree with the statement about the loss of consistency of internal pointer, length and pointer, type pairs I would like to see a rationale for it, even though it may be obvious. Not being an expert I guess consistency would require synchronization, which would lead to higher memory requirements and performance penalties and would break programs relying on the current internal representation of the affected data structures and the benefit would be small because parallel access to multiple fields of a structure or multiple variables would still be inconsistent.

1 reply
@thepudds

Hi @ulikunitz, If you haven’t already read this, you might be interested in:

https://research.swtch.com/gorace

…especially the section titled “The Fix”.

The conclusion includes:

The races exist because the data representations were chosen for performance: the race-free versions introduce an extra pointer, which carries with it the cost of extra indirection and extra allocation.

…and there is also some discussion there of additional instructions required.

Hi @rsc I don't quite understand this sentence "If the effect of an atomic operation A is observed by atomic operation B, then A happens before B.", in my opinion this is obvious, I don't know if I miss some points. Could you elaborate ? thanks

3 replies
@Merovius

I think that is because you consider "happens before" colloquially, instead of formally. It is obvious that a write must occur before a read observing it, in real time. It is not obvious that a read observing a write creates an edge in the happens-before partial ordering.

Note that it's also obvious, for example, that "The go statement that starts a new goroutine happens before the goroutine's execution begins". But this isn't about making statements about the "real time", it's about what specific partial ordering edges exist in any given execution of a program.

(Personally, that's why I dislike the term "happens before" - it is not clearly enough distinguished from its colloquial use, for my taste. But it's a standard term in the field, so we're stuck with it)

@bshanks

To extend on what Merovius said, it may be helpful to think of "happens-before" as something with a weird name; for example, pretend it is called "ax343" instead of "happens before". We are building up a mathematical definition of this thing called "ax343". The purpose of doing this is to allow us to make precise mathematical proofs (or even to use an automated theorem prover) regarding the properties of the mathematical relation ax343.

@erifan

Oh, I misunderstood the "happens before" relationship, thanks for your answers.

Thank for all of your work on this. I also really loved the other blog posts in that series. I have a question about the goals for Go's memory model in the case of races, and how these goals differ from Javascript's ( for my future reference: https://tc39.es/ecma262/#sec-memory-model ). At first glance it sounds like Go wants the same as what Javascript wants; both Javascript and Go eschew "DRF or catch fire", and, like Javascript, you also say that with Go "programs with data races have defined semantics with a limited number of outcomes". But you also imply that Go wants to sit in-between Javascript and C in some fashion.

Is it accurate to say that you want less specification complexity than Javascript, and in exchange you are willing to pay the price of more formal ambiguity in the following form: sometimes Go will permit a variable to take on a 'paradoxical' value, when this would have been ruled out by Javascript's more extensive formal semantics?

1 reply
@rsc

rsc Jul 23, 2021
Maintainer Author

Go and JavaScript match well until you get to multiword values like slices or interface values. Those can lead to corruption in Go; there's no equivalent in JavaScript. But for single-word values, yes, they match.

The proposal describes the happens-before relationships that sync/atomic creates. It also, less formally, says that memory locations that are accessed with sync/atomic must never be accessed without sync/atomic. (The next section says "Whether a particular value should be accessed with atomics is therefore a property of the value and not of a particular access.")

If we can arrange for happens-before relationships that eliminate racy access to a memory location that was previously accessed with sync/atomic, can a correct program later do a non-atomic read of that location? Can a correct program do non-atomic initialization of locations that are later accessed with sync/atomic?

I think the answer is "yes, that's allowed and will work according to the usual happens-before rules". But the way that sync/atomic is discussed—both in the current memory model and with the proposed changes—makes it seem like those operations somehow taint the memory location and make the normal set of happens-before rules not completely describe the situation.

https://play.golang.org/p/Cb73lX6ZPuR

package main

import (
	"fmt"
	"sync/atomic"
)

var a int32 = 1

func init() {
	a += 2
}

func main() {
	a += 4 // hb line 18 via line 17, and hb line 21
	c := make(chan struct{})
	go func() {
		atomic.AddInt32(&a, 8) // hb line 23 via 22 and 19
		close(c)
	}()
	atomic.AddInt32(&a, 16) // hb line 23
	<-c
	a += 32
	fmt.Println(a) // could this print anything but 63?
}
1 reply
@rsc

rsc Jul 23, 2021
Maintainer Author

I think the answer is "yes, that's allowed and will work according to the usual happens-before rules".

Yes, that's the answer, and the text is carefully chosen to make that the answer.
(The only other possible answer is "or else catch fire", which we're avoiding.)

Atomic types would make that question impossible to ask, which I think is also fine,
but we still need to answer the question for the current atomic operations.

If we adopt the atomic.Int32 etc (which I'm in favor of), would that make it possible to verify the alignment requirements of atomic variables? Could the compiler (or go vet) check that atomic fields are aligned correctly and reject a program if they are not?

As discussed below, it would probably be preferable if the compiler would just automatically align the dedicated atomic types to fit the alignment guarantees.

6 replies
@Merovius

Speaking personally, I'm in favor of new atomic types just having an alignment that is guaranteed to work on the platform.

If we can, sure, that would likely be preferable. I wasn't sure the compiler is allowed to pad, but I guess it is.

FWIW I did think about the issues you bring up, but I don't think they are a problem. But there's no point arguing that - clearly, if we can just align them correctly, that's better.

@zephyrtronium

Unfortunately, forcing this new type to have a special alignment does result in additional special cases in the compiler. I don't think it's to the same extent of special casing as a generic atomic type, but that additional complexity is still a factor.

Notably, those special cases are effectively already implemented in https://go-review.googlesource.com/c/go/+/308971/, pending acceptance of #19057.

@mknyszek

Makes sense. I don't think the issues with compiler enforcement are insurmountable. The allocator could also try to do more to make the compiler error out in fewer ways, too.

RE: the compiler being allowed to pad, that's a good point and I'd like to note my thinking here explicitly. I believe it's totally fine with respect to the language specification, since it appears to be very, very loose on alignment guarantees in general. My impression is that the spirit of this looseness is to give the compiler the freedom it needs to pick an alignment so things "just work." The fact that we can't easily alter alignments a la #36606 is really just an unfortunate consequence of code depending on it. I'm no expert on the language spec however, so don't quote me on this. :)

@mknyszek

@zephyrtronium Oh, I totally forgot about that. Yeah, that works, too. :) Though, that change is only internal at the moment.

@Merovius

@mknyszek I agree that the spec doesn't preclude the compiler from padding (after I skimmed the spec again for this discussion). I fully support your suggestion that the compiler should probably automatically align the atomic types and pad structs as necessary.

I have a module which makes significant use of atomic Booleans. Having atomic.Bool would simplify some parts of its implementation, but it is insufficient to fully replace how I use atomics in it. In particular, there are situations in which I need to inspect and update multiple flags simultaneously.

This code had me on the verge of writing a proposal for an atomic.Flags type, providing bitwise rather than arithmetic operations on either uint8 or uint32. I think that such a type would be valuable even if atomic.Bool comes to exist, but there is obviously overlap between a type that describes one Boolean and a type that describes a set of related Booleans. Maybe it's time to write that proposal.

3 replies
@rsc

rsc Jul 23, 2021
Maintainer Author

I'm not sure we need atomic.Flags any more than we need non-atomic Flags.
You can always build one yourself out of a typed uint32, and building one yourself lets you give the actual flag bits a unique type.

@zephyrtronium

Non-atomic flags can be implemented on any integer type using f |= m, f &= m, &c. There are no equivalent atomic operations in sync/atomic; the only solution is CAS loops, even on e.g. amd64 where LOCK ORL would be an option for the generated code. So, I disagree with that comparison. The building blocks are different.

I should mention, I only ever considered writing a proposal because I saw these operations were available (on some platforms?) in runtime/internal/atomic. The suggestion would have been essentially to export that API, with refinements for usability.

@rsc

rsc Jul 23, 2021
Maintainer Author

Fair enough, although that would probably be better on Int etc than on a separate Flags. Thanks.

I think if I were reading the Go Memory Model page without having read much about memory models and data races before I would find this section to be a little misleading:

Programs with data races are invalid in the sense that an implementation may report the race and terminate the program. But otherwise, programs with data races have defined semantics with a limited number of outcomes, making errant programs more reliable and easier to debug.

Programs with data races may also be "invalid" in another less formal sense, which is that they can no longer rely on the language performing to spec. To give the example from elsewhere in the doc, a racing write to a map could corrupt arbitrary memory, which sure doesn't sound like "defined semantics with a limited number of outcomes"! If I understand correctly, that phrase refers to each individual memory read/write, but that may not be clear to someone who is not familiar with the underlying issues/history.

0 replies

I found the proposal missed a discussion regarding the runtime.SetFinalizer, which was previously reported in #9442 (probably the only API that falls outside the channel and sync package?) As @dvyukov pointed out:

package main

import (
	"runtime"
	"time"
)

type X struct {
	y *int
}

func main() {
	x := new(X)
	runtime.SetFinalizer(x, func(x *X) {
		if x.y != nil && *x.y != 42 {
			panic("race")
		}
	})
	i := 42
	x.y = &i
	runtime.GC()
	time.Sleep(time.Second)
}

If the compiler reorders "x.y = &i" and "*i = 42". This result can be unexpected, which falls into the category of compiler optimization rules.

@ianlancetaylor also suggested a description "Given a pointer p, there is no possible ordering such that the finalizer happens-before any read or write of *p." because the inverse of the statement is too strong that prevents possible compiler optimization.

Maybe this is more understandable (?): Given a pointer p, any read or write of *p either happens-before or happens concurrently with its finalizer.

6 replies
@Merovius

Maybe this is more understandable (?): Given a pointer p, any read or write of *p either happens-before or happens concurrently with its finalizer.

"Given a pointer p, no read or write of *p happens before its finalizer" is equivalent and snappier (though maybe more obscure in its consequences).

@rsc

rsc Jul 23, 2021
Maintainer Author

I think the rule is "A call to SetFinalizer(x, f) happens before the finalizer invocation f(x)."

@ConradIrwin

Should there be a related rule that "A call to KeepAlive(x) happens before the finalizer invocation f(x)" ? That seems like the intent from the documentation, but may be tricky to guarantee in non-trivial examples.

// If the finalizer runs, it could print 0 or 42.
type X {y int}
x := &X{}
runtime.SetFinalizer(x, func(x *X) { fmt.Println(x.y) })
x.y = 42 // [the compiler is allowed to elide this write as it does not happen before any reads]
// If the finalizer runs, it must print 42.
type X {y int}
x := &X{}
runtime.SetFinalizer(x, func(x *X) { fmt.Println(x.y) })  
x.y = 42 // [the compiler must not elide this write as it happens before the read in the finalizer]
runtime.KeepAlive(x)	
@rsc

rsc Jul 23, 2021
Maintainer Author

If we did that, KeepAlive would have to be some kind of atomic release on x instead of a run-time no-op.
That seems unnecessary in general.

Think of runtime.SetFinalizer(x, f) as 'go f(x)' where the call doesn't start until all the references to x are gone.
If code needs to synchronize between that goroutine and other goroutines, the usual mechanisms apply, such as a mutex or atomic load/store.

@ConradIrwin

Agreed, thanks for confirming!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
🐢
Discussions
Labels
None yet