Skip to content
22

Updating the Go memory model #47141

rsc announced in Discussions
Updating the Go memory model #47141
Jul 12, 2021 · 15 comments · 48 replies

I posted a blog post Updating the Go memory model about some changes I am planning to propose regarding Go's memory model.

This discussion is for collecting feedback about those changes before filing an official proposal.

Please make good use of the limited threading below: post a top-level comment for something new and use the reply feature to reply to someone else's comment. We hope this will help keep discussion manageable.

Thank you!

Replies

15 comments
·
48 replies
5

A hearty +1 to atomic.Bool and friends.

5 replies
@cristaloleg

Some of the implementation already exist, some examples https://github.com/cristalhq/atomix or more popular https://github.com/uber-go/atomic (ah, and many more inside stdlib as unexported type)

@cespare

cespare Jul 12, 2021
Collaborator

In general the implementation of those types is trivial; the API design is the only interesting bit.

(Edit: To be clear, I mean it's trivial to build things like Bool on top of existing atomic operations. Getting the underlying atomic ops right is obviously tricky and requires compiler/hardware awareness, but that work has already been done.)

@rsc

rsc Jul 12, 2021
Maintainer Author

I'd be interested to see representative examples of use of atomic.Bool. I wonder if there is a higher-level API we should be targeting for package sync as well. I mean, something like sync.Once is roughly an atomic.Bool, but a bit more, and the bit more makes it much better.

1

There's a minor typo in a code sample towards the end:

i := *p
if i < 0 || i >= len(fns) {
	panic("invalid function index")
}
... complex code ...
// compiler must NOT reload i = *p here
funcs[i]()

The call to len should be len(funcs).

That aside, the proposal looks great. I too would prefer not to add unsynchronized atomics.

3 replies
@cespare

Other typos: search for "sequental" (there are two instances).

@rsc

rsc Jul 12, 2021
Maintainer Author

Thanks, fixed both, and a straighforward.

@kortschak

"This section list the adjustments..." s/list/lists/

2

@rsc, I don't quite follow this:

To answer an obvious suggestion, I don't see a clean way to use generics to provide just a single atomic.Atomic[T] that would let us avoid introducing Bool, Int, and so on as separate types, at least not without special cases in the compiler.

I see a specific issue with Bool in that it doesn't have an Add operation. What's the problem with defining a single parameterized type for the numeric ones? (Maybe you still want special compiler support for performance, but it sounds like you're talking about a type system issue.)

6 replies
@rsc

rsc Jul 12, 2021
Maintainer Author

I don't see how to write the function bodies. One option is they have no bodies and are provided by the compiler. Another option is a giant type switch plus logic in the compiler to make sure those get small enough to inline down to a single line. That would also be a fair amount of compiler special casing too. Generics+privileged special cases is pretty different from generics.

@szabba

With a future extension to generics like switch type T:

  • the implementations could be kept type safe,
  • which branch is taken in a given generic type instantiation would be known at compile time.

Then an optimization that removed the other branches would not have to be a special case. Or am I confused and is that different from what you're talking about?

@ianlancetaylor

Note that we already have special cases in the compiler for all of the individual sync/atomic functions. The actual code in sync/atomic is not used for normal builds. So if we introduced atomic.Atomic[T atomicTypes] we would indeed require a special case in the compiler, but it is a special case that is essentially already there. The fallback method implementations could use a type switch as you suggest.

4

I like the idea of being clear about what kind of invalid programs are allowed in the case when you have a data-race and how they might behave. I also like the call-out that concurrent multi-word reads/writes allow you to observe inconsistent (and potentially corrupt) structures.

One thing that I think could be more explicit is the behaviour when you have concurrent reads and writes to a memory location X that is smaller than (or equal in size to) a machine-word.

  var x int8 = 0

  go func () {
    x = 1
  }()
  go func () {
    fmt.Println(x) // could this print anything except 0 or 1?
  }()

I think either of the following options make sense, and experimentally the first seems true (and definitely the most expected!):

  • For a memory location X where a write that happened before set X=A. If a write that sets X=B and a read of X happen concurrently, then the read may observe X having a value of either A or B.
  • For a memory location X where a write that happened before set X=A. If a write that sets X=B and a read of X happen concurrently, then the read may observe X having any value.
6 replies
@ConradIrwin

(Similarly if you have two concurrent writes, for X=B and X=C. Is it the case that a read of X that happens after both writes must observe either X having the value B or C?)

@rsc

rsc Jul 12, 2021
Maintainer Author

The existing text at https://golang.org/ref/mem in "Happens Before" ​covers this.
The post is concerned with changes, but this is one thing the existing text already gets right (I think).
For up-to-machine-word-sized values, a read must observe some other value that was actually written there.
Unlike C++, the read is not allowed to observe "any value".

@ConradIrwin

I don't think the current rules actually say this (though it is quite possible that I'm missing an implication in the text – as you say, it is very subtle). The two cases I am worried about:

  1. (from your blog post) "[a compiler] must not allow a single write to write multiple values.".

    // *p = 2, i = 2
    
    *p = i + *p/2.  
    // is "optimized" to
    *p /= 2
    *p += i
    

    There are three possibilities for a concurrent read in the "optimized" case. It either sees 1, 2 or 3. As in your blog post we'd like to exclude reading a 1, but (at least my reading of) the current go memory model's rules do not:

    • In this case there is no synchronization to establish "happens before" relationships, so the read is not "guaranteed to observe" any particular write.
    • The read is "allowed to observe" any of the writes.
    • There seems to be a missing exclusion of "The read is not allowed to observe anything else"
    • I'm not sure the best way to explain the guarantee, maybe: "if any write w to a memory location l happens before a read r of l, r is guaranteed to observe either w or a write w' to l that happens before or concurrently to r and which does not happen before w."). It may need additional context for reading from uninitialized memory before any writes to that location (though I think that's only possible with unsafe/cgo?), "If no writes to l happen before a read, that read may see an arbitrary value (though this is unlikely in practice, as the write during initialization happens before any read to a variable)".
  2. A (maybe hypothetical?) worry about the behavior of writes to "machine-word" sized values or smaller. As someone who's not very familiar with hardware architecture, it's not clear what guarantees I get. The current text calls out "Reads and writes of values larger than a single machine word behave as multiple machine-word-sized operations in an unspecified order." This doesn't say anything about reads and writes of values smaller than (or equal in size to) machine words (though it does imply they probably are defined in some way). I think the suggestion for 1. is sufficient to handle this; but it could be made clearer for future readers by saying. "Even in the presence of data-races, reads of a machine-word size location will not see partially written values."

Either way this may just be a scope-creep request on top of your change to make the implications more explicit, not push-back on the idea. I'd love to see the page significantly expanded and clarified as you propose.

3

In the list of ordering of operations, the word "happens" seems imprecise:

  • A receive from an unbuffered channel happens before the send on that channel completes.
  • The k'th receive on a channel with capacity C happens before the k+C'th send from that channel completes.

If you read "happens" to mean "begins and ends," then it would seem to be impossible for a read to end before the corresponding send completes. Maybe "happens" should be "begins," if that's the correct semantics.

3 replies
@rsc

rsc Jul 13, 2021
Maintainer Author

The sentences are correct, if subtle.

If you think of the send as a function call send(channel, value), then "the send completes" means "the call returns". The actual communication happens during the call, before the return, so it does make sense for the receive to happen before the send completes (unblocks and lets the code doing the send keep running).

@bshanks

(i made a similar comment below): As others noted, a potential problem is that we have both "send < receive" (from the existing text in https://golang.org/ref/mem section channel communication) and also "receive < send" (where < means happens-before). Perhaps channel send/receive could be modeled as consisting of separate begin and end events: send_begin < receive_begin < send_end < receive_end

@kylelemons

The way that I think about it, it does makes sense, because synchronous channel operations do establish a mutual happens-before relationship.

ch = make(chan int)
{ x = A ;    <-ch ; return y }
{ y = B ; ch <- 1 ; return x }

It is safe to write to x/y and to read y/x because you can trace the happens-before in both directions (for sync channels):

  • x = A happens before <-ch happens before ch <- 1 happens before return x
  • y = B happens before ch <- 1 happens before <-ch happens before return y
2

Some minor comments from the first pass.

s/Like in Java/As in Java/

Two bullet points in "Go's memory model today" are mutually incompatible:

  • A send on a channel happens before the corresponding receive from that channel completes.
  • A receive from an unbuffered channel happens before the send on that channel completes.

Did you miss a "buffered" in the first bullet?

s/This section list/&s/

In the summary in the next section, s/useful/practical/.

The definition of data race doesn't feel quite right to me. Maybe concurrently is the issue. Try this: A data race is defined as a write to a memory location happening while another read or write to that same location is also happening.

I like the point about encouraging external (please don't say third-party; I don't even know who the second party is) packages documenting their semantics.

Do you need to say more about the operations around atomic operations? I think there is a happens-before claim you can make about code on either side of an atomic update, but I'm not sure. What you have there now talks only about the atomics, not the rest. Look at:

a = b
<atomic operation on c>

What can you say about a here?

In "C++ provides both, of course", drop "of course". No need to editorialize.

Outside the parens, say "barriers" not "fences"

I would argue strongly against adding unsynchronized atomics, to keep with the philosophy of enough is enough, and the subtlety of adding more operations with different semantics.

Where would you document the disallowed compiler optimizations? Also, can you find a categorical definition instead of a catalog of bad examples?

3 replies
@bshanks

I think something like "A send on a channel happens before the corresponding receive from that channel completes" is needed even in unbuffered channels to prevent a paradox like printing "0" in:

// Thread 1           // Thread 2
x = 1;                while(done == 0) { /* loop */ }
done = 1;             print(x);

(a similar Go example is in https://golang.org/ref/mem section Channel communication)

Perhaps a solution is to specify channel send/receive as consisting of separate begin and end events: send_begin < receive_begin < send_end < receive_end (where < means happens-before)

@bshanks

"Concurrently" is defined in the existing text at https://golang.org/ref/mem in section Happens Before. In the definition of data race, the new text could refer to the existing definition of concurrently by adding something like "(see below for a precise definition of concurrently)".

@bshanks

I think the happens-before between "a=b" and "<atomic operation on c>" is already covered by the existing text at https://golang.org/ref/mem : "Within a single goroutine, the happens-before order is the order expressed by the program."

1

Possibly related: cache alignment and false sharing.

It would be great if there was a way to cause cache aligned allocation so as to avoid false sharing when concurrent access occurs.

There are some work arounds, such as adding padding, to ensure one thing is not on the same cache line as another, but having more precision, and especially having a way to align the start of a struct, would be very helpful.

1 reply
@rsc

rsc Jul 13, 2021
Maintainer Author

There have been proposals about alignment that we have not gotten to agreement on yet. Happily that's only an optimization problem and not a "different semantics" problem, so it's separable from the memory model more generally.

In general if you do a fresh allocation of a size that is a multiple of some power of two, you can generally assume it will be that-power-of-two-aligned, which is enough for programmers being very careful about this sort of thing.

1

While I agree with the statement about the loss of consistency of internal pointer, length and pointer, type pairs I would like to see a rationale for it, even though it may be obvious. Not being an expert I guess consistency would require synchronization, which would lead to higher memory requirements and performance penalties and would break programs relying on the current internal representation of the affected data structures and the benefit would be small because parallel access to multiple fields of a structure or multiple variables would still be inconsistent.

1 reply
@thepudds

Hi @ulikunitz, If you haven’t already read this, you might be interested in:

https://research.swtch.com/gorace

…especially the section titled “The Fix”.

The conclusion includes:

The races exist because the data representations were chosen for performance: the race-free versions introduce an extra pointer, which carries with it the cost of extra indirection and extra allocation.

…and there is also some discussion there of additional instructions required.

1

Hi @rsc I don't quite understand this sentence "If the effect of an atomic operation A is observed by atomic operation B, then A happens before B.", in my opinion this is obvious, I don't know if I miss some points. Could you elaborate ? thanks

3 replies
@Merovius

I think that is because you consider "happens before" colloquially, instead of formally. It is obvious that a write must occur before a read observing it, in real time. It is not obvious that a read observing a write creates an edge in the happens-before partial ordering.

Note that it's also obvious, for example, that "The go statement that starts a new goroutine happens before the goroutine's execution begins". But this isn't about making statements about the "real time", it's about what specific partial ordering edges exist in any given execution of a program.

(Personally, that's why I dislike the term "happens before" - it is not clearly enough distinguished from its colloquial use, for my taste. But it's a standard term in the field, so we're stuck with it)

@bshanks

To extend on what Merovius said, it may be helpful to think of "happens-before" as something with a weird name; for example, pretend it is called "ax343" instead of "happens before". We are building up a mathematical definition of this thing called "ax343". The purpose of doing this is to allow us to make precise mathematical proofs (or even to use an automated theorem prover) regarding the properties of the mathematical relation ax343.

@erifan

Oh, I misunderstood the "happens before" relationship, thanks for your answers.

1

Thank for all of your work on this. I also really loved the other blog posts in that series. I have a question about the goals for Go's memory model in the case of races, and how these goals differ from Javascript's ( for my future reference: https://tc39.es/ecma262/#sec-memory-model ). At first glance it sounds like Go wants the same as what Javascript wants; both Javascript and Go eschew "DRF or catch fire", and, like Javascript, you also say that with Go "programs with data races have defined semantics with a limited number of outcomes". But you also imply that Go wants to sit in-between Javascript and C in some fashion.

Is it accurate to say that you want less specification complexity than Javascript, and in exchange you are willing to pay the price of more formal ambiguity in the following form: sometimes Go will permit a variable to take on a 'paradoxical' value, when this would have been ruled out by Javascript's more extensive formal semantics?

1 reply
@rsc

rsc Jul 23, 2021
Maintainer Author

Go and JavaScript match well until you get to multiword values like slices or interface values. Those can lead to corruption in Go; there's no equivalent in JavaScript. But for single-word values, yes, they match.

3

The proposal describes the happens-before relationships that sync/atomic creates. It also, less formally, says that memory locations that are accessed with sync/atomic must never be accessed without sync/atomic. (The next section says "Whether a particular value should be accessed with atomics is therefore a property of the value and not of a particular access.")

If we can arrange for happens-before relationships that eliminate racy access to a memory location that was previously accessed with sync/atomic, can a correct program later do a non-atomic read of that location? Can a correct program do non-atomic initialization of locations that are later accessed with sync/atomic?

I think the answer is "yes, that's allowed and will work according to the usual happens-before rules". But the way that sync/atomic is discussed—both in the current memory model and with the proposed changes—makes it seem like those operations somehow taint the memory location and make the normal set of happens-before rules not completely describe the situation.

https://play.golang.org/p/Cb73lX6ZPuR

package main

import (
	"fmt"
	"sync/atomic"
)

var a int32 = 1

func init() {
	a += 2
}

func main() {
	a += 4 // hb line 18 via line 17, and hb line 21
	c := make(chan struct{})
	go func() {
		atomic.AddInt32(&a, 8) // hb line 23 via 22 and 19
		close(c)
	}()
	atomic.AddInt32(&a, 16) // hb line 23
	<-c
	a += 32
	fmt.Println(a) // could this print anything but 63?
}
1 reply
@rsc

rsc Jul 23, 2021
Maintainer Author

I think the answer is "yes, that's allowed and will work according to the usual happens-before rules".

Yes, that's the answer, and the text is carefully chosen to make that the answer.
(The only other possible answer is "or else catch fire", which we're avoiding.)

Atomic types would make that question impossible to ask, which I think is also fine,
but we still need to answer the question for the current atomic operations.

4

If we adopt the atomic.Int32 etc (which I'm in favor of), would that make it possible to verify the alignment requirements of atomic variables? Could the compiler (or go vet) check that atomic fields are aligned correctly and reject a program if they are not?

As discussed below, it would probably be preferable if the compiler would just automatically align the dedicated atomic types to fit the alignment guarantees.

6 replies
@mknyszek

Speaking personally, I'm in favor of new atomic types just having an alignment that is guaranteed to work on the platform.

A go vet check only ever goes so far, because if the compiler chooses to heap allocate an atomic.Uint64 for some reason, the alignment of the object containing it could vary. The compiler (as you also suggest) would have to identify that the value does actually escape to the heap, and then report an error because it can't guarantee that the type will be correctly aligned. Note that although this specific situation is somewhat contrived, it extends to heap-allocated types containing these values as well. To me, that seems like an unnecessary restriction. For larger types it's not an issue today in practice, because the minimum alignment of most types in the allocator is quite high. However, even today this is an issue for small types (see #37262). To really drive home the degree to which the compiler might need to be conservative here, consider a slice type of structures containing such a tiny structure. If its backing store is of length 1, it might have a different guaranteed alignment than length 10.

My apologies for my thoroughness here. I don't intend to shut down your suggestion by any means. :) I think it's a completely valid and useful one.

Back to my favored path forward: forcing the right alignment for a new type also doesn't have the same backwards compatibility issues that changing the default alignment of uint64 on i386 does (see #36606). That's a big plus, IMO. Unfortunately, forcing this new type to have a special alignment does result in additional special cases in the compiler. I don't think it's to the same extent of special casing as a generic atomic type, but that additional complexity is still a factor.

@Merovius

Speaking personally, I'm in favor of new atomic types just having an alignment that is guaranteed to work on the platform.

If we can, sure, that would likely be preferable. I wasn't sure the compiler is allowed to pad, but I guess it is.

FWIW I did think about the issues you bring up, but I don't think they are a problem. But there's no point arguing that - clearly, if we can just align them correctly, that's better.

@zephyrtronium

Unfortunately, forcing this new type to have a special alignment does result in additional special cases in the compiler. I don't think it's to the same extent of special casing as a generic atomic type, but that additional complexity is still a factor.

Notably, those special cases are effectively already implemented in https://go-review.googlesource.com/c/go/+/308971/, pending acceptance of #19057.

1

I have a module which makes significant use of atomic Booleans. Having atomic.Bool would simplify some parts of its implementation, but it is insufficient to fully replace how I use atomics in it. In particular, there are situations in which I need to inspect and update multiple flags simultaneously.

This code had me on the verge of writing a proposal for an atomic.Flags type, providing bitwise rather than arithmetic operations on either uint8 or uint32. I think that such a type would be valuable even if atomic.Bool comes to exist, but there is obviously overlap between a type that describes one Boolean and a type that describes a set of related Booleans. Maybe it's time to write that proposal.

3 replies
@rsc

rsc Jul 23, 2021
Maintainer Author

I'm not sure we need atomic.Flags any more than we need non-atomic Flags.
You can always build one yourself out of a typed uint32, and building one yourself lets you give the actual flag bits a unique type.

@zephyrtronium

Non-atomic flags can be implemented on any integer type using f |= m, f &= m, &c. There are no equivalent atomic operations in sync/atomic; the only solution is CAS loops, even on e.g. amd64 where LOCK ORL would be an option for the generated code. So, I disagree with that comparison. The building blocks are different.

I should mention, I only ever considered writing a proposal because I saw these operations were available (on some platforms?) in runtime/internal/atomic. The suggestion would have been essentially to export that API, with refinements for usability.

@rsc

rsc Jul 23, 2021
Maintainer Author

Fair enough, although that would probably be better on Int etc than on a separate Flags. Thanks.

2

I think if I were reading the Go Memory Model page without having read much about memory models and data races before I would find this section to be a little misleading:

Programs with data races are invalid in the sense that an implementation may report the race and terminate the program. But otherwise, programs with data races have defined semantics with a limited number of outcomes, making errant programs more reliable and easier to debug.

Programs with data races may also be "invalid" in another less formal sense, which is that they can no longer rely on the language performing to spec. To give the example from elsewhere in the doc, a racing write to a map could corrupt arbitrary memory, which sure doesn't sound like "defined semantics with a limited number of outcomes"! If I understand correctly, that phrase refers to each individual memory read/write, but that may not be clear to someone who is not familiar with the underlying issues/history.

0 replies
3

I found the proposal missed a discussion regarding the runtime.SetFinalizer, which was previously reported in #9442 (probably the only API that falls outside the channel and sync package?) As @dvyukov pointed out:

package main

import (
	"runtime"
	"time"
)

type X struct {
	y *int
}

func main() {
	x := new(X)
	runtime.SetFinalizer(x, func(x *X) {
		if x.y != nil && *x.y != 42 {
			panic("race")
		}
	})
	i := 42
	x.y = &i
	runtime.GC()
	time.Sleep(time.Second)
}

If the compiler reorders "x.y = &i" and "*i = 42". This result can be unexpected, which falls into the category of compiler optimization rules.

@ianlancetaylor also suggested a description "Given a pointer p, there is no possible ordering such that the finalizer happens-before any read or write of *p." because the inverse of the statement is too strong that prevents possible compiler optimization.

Maybe this is more understandable (?): Given a pointer p, any read or write of *p either happens-before or happens concurrently with its finalizer.

6 replies
@dvyukov

FWIW Java does not guarantee any ordering between finalizer and any accesses to the object. But the Go race detector current assumes they are ordered.

@Merovius

Maybe this is more understandable (?): Given a pointer p, any read or write of *p either happens-before or happens concurrently with its finalizer.

"Given a pointer p, no read or write of *p happens before its finalizer" is equivalent and snappier (though maybe more obscure in its consequences).

@rsc

rsc Jul 23, 2021
Maintainer Author

I think the rule is "A call to SetFinalizer(x, f) happens before the finalizer invocation f(x)."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
🐢
Discussions
Labels
None yet
Beta