Skip to content

runtime/race: running with -race misses races (mismatch with memory model) #37355

@dfava

Description

@dfava

What version of Go are you using (go version)?

$ go version
go version go1.13.7 darwin/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/danielsf/Library/Caches/go-build"
GOENV="/Users/danielsf/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/danielsf/Dropbox/edu/uio.no/favasynthesis/src/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/sb/5mzmh37n6bl7pt2yzf0sy3s80000gq/T/go-build206592919=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do

This issue was found from inspecting the code in chan.go as opposed to observing behavior during execution. I did created a program to illustrate the issue. I believe the program (shown below) is racy, but the race detector does not flag the race (no output in stdout/stderr).

(I believe the culprit are four lines of code in chan.go, as detailed in "What did you expect", with a patch suggested at the end.)

$ go build -race test_chan.go
$ ./test_chan

The program is as follows:

$ cat test_chan.go
package main

import(
  "sync"
  "time"
)

var wg sync.WaitGroup
var z = 0

func t1(c chan bool){
  defer wg.Done()

  c <- true
}

func t2(c chan bool){
  defer wg.Done()

  z = 42
  <- c
}

func t3(c chan bool){
  defer wg.Done()

  time.Sleep(1)
  c <- true
}

func t4(c chan bool){
  defer wg.Done()

  time.Sleep(1)
  <- c
  z = 13
}

func main() {
  c := make (chan bool, 1)
  wg.Add(4)
  go t1(c)
  go t2(c)
  go t3(c)
  go t4(c)
  wg.Wait()
}

What did you observe

I observed no output. The program does not print to stdout or stderr and, when running the program with the race checker enabled, the race checker also does not produce any output.

What did you expect instead

I expected the race checker to flag a race between the write in t1 and the write in t4.

The program can (and often does) produce the following trace:

sd(1,c)   # t1 sends on c
wr(2,z)   # t2 writes to shared variable z
rv(2,c)   # t2 receives on c
sd(3,c)   # t3 sends on c
rv(4,c)   # t4 receives from c
wr(4,z)   # t4 writes to shared variable z

The question is: are the two writes, wr(2,z) and wr(4,z), in a race?

According to the memory model:

  • rv(2,c) happens-before the completion of sd(3,c),
  • sd(3,c) happens-before the completion of rv(4,c),
  • however, rv(2,c) is not in happens-before with respect to rv(4,c). They are not in happens-before relation because sd(3,c) and the completion of sd(3,c) are not the same. Thus we cannot use transitivity of happens-before to derive a relation between rv(2,c) and rv(4,c).

This race is not flagged by the race detector. I believe the race detector interpret the events as follows:

  • rv(2,c) happens-before sd(3,c)
  • sd(3,c) happens-before rv(4,c)
  • therefore, rv(2,c) is in happens-before with respect to rv(4,c), thus the write by thread 2 is in the past of the write by thread 4.

Instead of not getting any output when running the program with race check enabled, I expected to see output like this:

$ /Users/danielsf/Dropbox/go/go/bin/go build -race test_chan.go
$ ./test_chan
==================
WARNING: DATA RACE
Write at 0x00000117cc18 by goroutine 9:
  main.t4()
      /Users/danielsf/Dropbox/edu/uio.no/favasynthesis/src/go/src/test_chan.go:36 +0x94

Previous write at 0x00000117cc18 by goroutine 7:
  main.t2()
      /Users/danielsf/Dropbox/edu/uio.no/favasynthesis/src/go/src/test_chan.go:20 +0x70

Goroutine 9 (running) created at:
  main.main()
      /Users/danielsf/Dropbox/edu/uio.no/favasynthesis/src/go/src/test_chan.go:45 +0xf2

Goroutine 7 (finished) created at:
  main.main()
      /Users/danielsf/Dropbox/edu/uio.no/favasynthesis/src/go/src/test_chan.go:43 +0xae
==================
Found 1 data race(s)

The output above is from the following patch:

Consider the following piece of code present in both chansend() and chanrecv() functions of runtime/chan.go:

    if raceenabled {
      raceacquire(qp)
      racerelease(qp)
    }

If in chansend(), the raceacquire() comes before racerelease(), then a thread gains happens-before information (via the raceacquire) before the completion of the send operation, as opposed to at the point of completion as specified by the memory model. Similar for the order of raceacquire() and racereceive() in chanrevc().

In practice, having raceacquire() before racerelease() can prevent the race detector from flagging certain race conditions, such as the race condition illustrated with the example program and trace above.

The fix is simple: just swap the order of raceacquire() and racerelease() in both chansend() and chanrecv() functions of runtime/chan.go:

    if raceenabled {
      racerelease(qp)
      raceacquire(qp)
    }

The patch passes the tests in all.bash.

In more detail...

Another way to look at the trace above is by transforming the sends and receives into sequences of acquire and releases. If the sends and receives are interpreted as raceacquire(); racerelease() as it is currently the case in chan.go, then the trace would be as shown below, and a race detector will not flag the trace as racy:

acq(1,c)  # this sequence of acq
rel(1,c)  # followed by rel stands in place of the send sd(1,c)
wr(2,z)   # thread 2 writes to shared variable z
acq(2,c)  # this sequence of acq
rel(2,c)  # followed by rel stands in place of the receive rv(2,c)
acq(3,c)  # this sequence of acq
rel(3,c)  # followed by rel stands in place of the send sd(3,c)
acq(4,c)  # this sequence of acq
rel(4,c)  # followed by rel stands in place of the receive rv(4,c)
wr(4,z)   # thread 4 writes to shared variable z

The absence of a race in the trace above comes from the fact that we can find a happens-before chain from wr(2,z) to wr(4,z):

  • wr(2,z) is in happens-before with rel(2,c) by program order
  • rel(2,c) is in happens-before with acq(3,c) and rel(3,c) by lock semantics and program order
  • rel(3,c) is in happens-before with acq(4,c) by lock semantics
  • acq(4,c) is in happens-before with wr(4,z) by program order

If, however, we swap the order of the raceacquire() and racerelease() in the interpretation of sends and receives, then a race detector will then report the trace as racy. The race exists because, given the trace below, it is not possible to derive that the write from thread 2 is in happens-before with respect to the write from thread 4.

rel(1,c)  # this sequence of rel
acq(1,c)  # followed by acq stands in place of the send sd(1,c)
wr(2,z)
rel(2,c)  # this sequence of rel
acq(2,c)  # followed by acq stands in place of the receive rv(2,c)
rel(3,c)  # this sequence of rel
acq(3,c)  # followed by acq stands in place of the send sd(3,c)
rel(4,c)  # this sequence of rel
acq(4,c)  # followed by acq stands in place of the receive rv(4,c)
wr(4,z)

Metadata

Metadata

Assignees

No one assigned

    Labels

    FrozenDueToAgeNeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions