Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime/race: running with -race misses races (mismatch with memory model) #37355

Closed
dfava opened this issue Feb 21, 2020 · 13 comments
Closed

runtime/race: running with -race misses races (mismatch with memory model) #37355

dfava opened this issue Feb 21, 2020 · 13 comments

Comments

@dfava
Copy link
Contributor

@dfava dfava commented Feb 21, 2020

What version of Go are you using (go version)?

$ go version
go version go1.13.7 darwin/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/danielsf/Library/Caches/go-build"
GOENV="/Users/danielsf/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/danielsf/Dropbox/edu/uio.no/favasynthesis/src/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/sb/5mzmh37n6bl7pt2yzf0sy3s80000gq/T/go-build206592919=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do

This issue was found from inspecting the code in chan.go as opposed to observing behavior during execution. I did created a program to illustrate the issue. I believe the program (shown below) is racy, but the race detector does not flag the race (no output in stdout/stderr).

(I believe the culprit are four lines of code in chan.go, as detailed in "What did you expect", with a patch suggested at the end.)

$ go build -race test_chan.go
$ ./test_chan

The program is as follows:

$ cat test_chan.go
package main

import(
  "sync"
  "time"
)

var wg sync.WaitGroup
var z = 0

func t1(c chan bool){
  defer wg.Done()

  c <- true
}

func t2(c chan bool){
  defer wg.Done()

  z = 42
  <- c
}

func t3(c chan bool){
  defer wg.Done()

  time.Sleep(1)
  c <- true
}

func t4(c chan bool){
  defer wg.Done()

  time.Sleep(1)
  <- c
  z = 13
}

func main() {
  c := make (chan bool, 1)
  wg.Add(4)
  go t1(c)
  go t2(c)
  go t3(c)
  go t4(c)
  wg.Wait()
}

What did you observe

I observed no output. The program does not print to stdout or stderr and, when running the program with the race checker enabled, the race checker also does not produce any output.

What did you expect instead

I expected the race checker to flag a race between the write in t1 and the write in t4.

The program can (and often does) produce the following trace:

sd(1,c)   # t1 sends on c
wr(2,z)   # t2 writes to shared variable z
rv(2,c)   # t2 receives on c
sd(3,c)   # t3 sends on c
rv(4,c)   # t4 receives from c
wr(4,z)   # t4 writes to shared variable z

The question is: are the two writes, wr(2,z) and wr(4,z), in a race?

According to the memory model:

  • rv(2,c) happens-before the completion of sd(3,c),
  • sd(3,c) happens-before the completion of rv(4,c),
  • however, rv(2,c) is not in happens-before with respect to rv(4,c). They are not in happens-before relation because sd(3,c) and the completion of sd(3,c) are not the same. Thus we cannot use transitivity of happens-before to derive a relation between rv(2,c) and rv(4,c).

This race is not flagged by the race detector. I believe the race detector interpret the events as follows:

  • rv(2,c) happens-before sd(3,c)
  • sd(3,c) happens-before rv(4,c)
  • therefore, rv(2,c) is in happens-before with respect to rv(4,c), thus the write by thread 2 is in the past of the write by thread 4.

Instead of not getting any output when running the program with race check enabled, I expected to see output like this:

$ /Users/danielsf/Dropbox/go/go/bin/go build -race test_chan.go
$ ./test_chan
==================
WARNING: DATA RACE
Write at 0x00000117cc18 by goroutine 9:
  main.t4()
      /Users/danielsf/Dropbox/edu/uio.no/favasynthesis/src/go/src/test_chan.go:36 +0x94

Previous write at 0x00000117cc18 by goroutine 7:
  main.t2()
      /Users/danielsf/Dropbox/edu/uio.no/favasynthesis/src/go/src/test_chan.go:20 +0x70

Goroutine 9 (running) created at:
  main.main()
      /Users/danielsf/Dropbox/edu/uio.no/favasynthesis/src/go/src/test_chan.go:45 +0xf2

Goroutine 7 (finished) created at:
  main.main()
      /Users/danielsf/Dropbox/edu/uio.no/favasynthesis/src/go/src/test_chan.go:43 +0xae
==================
Found 1 data race(s)

The output above is from the following patch:

Consider the following piece of code present in both chansend() and chanrecv() functions of runtime/chan.go:

    if raceenabled {
      raceacquire(qp)
      racerelease(qp)
    }

If in chansend(), the raceacquire() comes before racerelease(), then a thread gains happens-before information (via the raceacquire) before the completion of the send operation, as opposed to at the point of completion as specified by the memory model. Similar for the order of raceacquire() and racereceive() in chanrevc().

In practice, having raceacquire() before racerelease() can prevent the race detector from flagging certain race conditions, such as the race condition illustrated with the example program and trace above.

The fix is simple: just swap the order of raceacquire() and racerelease() in both chansend() and chanrecv() functions of runtime/chan.go:

    if raceenabled {
      racerelease(qp)
      raceacquire(qp)
    }

The patch passes the tests in all.bash.

In more detail...

Another way to look at the trace above is by transforming the sends and receives into sequences of acquire and releases. If the sends and receives are interpreted as raceacquire(); racerelease() as it is currently the case in chan.go, then the trace would be as shown below, and a race detector will not flag the trace as racy:

acq(1,c)  # this sequence of acq
rel(1,c)  # followed by rel stands in place of the send sd(1,c)
wr(2,z)   # thread 2 writes to shared variable z
acq(2,c)  # this sequence of acq
rel(2,c)  # followed by rel stands in place of the receive rv(2,c)
acq(3,c)  # this sequence of acq
rel(3,c)  # followed by rel stands in place of the send sd(3,c)
acq(4,c)  # this sequence of acq
rel(4,c)  # followed by rel stands in place of the receive rv(4,c)
wr(4,z)   # thread 4 writes to shared variable z

The absence of a race in the trace above comes from the fact that we can find a happens-before chain from wr(2,z) to wr(4,z):

  • wr(2,z) is in happens-before with rel(2,c) by program order
  • rel(2,c) is in happens-before with acq(3,c) and rel(3,c) by lock semantics and program order
  • rel(3,c) is in happens-before with acq(4,c) by lock semantics
  • acq(4,c) is in happens-before with wr(4,z) by program order

If, however, we swap the order of the raceacquire() and racerelease() in the interpretation of sends and receives, then a race detector will then report the trace as racy. The race exists because, given the trace below, it is not possible to derive that the write from thread 2 is in happens-before with respect to the write from thread 4.

rel(1,c)  # this sequence of rel
acq(1,c)  # followed by acq stands in place of the send sd(1,c)
wr(2,z)
rel(2,c)  # this sequence of rel
acq(2,c)  # followed by acq stands in place of the receive rv(2,c)
rel(3,c)  # this sequence of rel
acq(3,c)  # followed by acq stands in place of the send sd(3,c)
rel(4,c)  # this sequence of rel
acq(4,c)  # followed by acq stands in place of the receive rv(4,c)
wr(4,z)
@ALTree ALTree changed the title [runtime/chan.go] Running with -race misses races (mismatch with memory model) runtime/race: running with -race misses races (mismatch with memory model) Feb 21, 2020
@ALTree
Copy link
Member

@ALTree ALTree commented Feb 21, 2020

Loading

@gopherbot
Copy link

@gopherbot gopherbot commented Feb 21, 2020

Change https://golang.org/cl/220419 mentions this issue: runtime/chan: swapping the order of raceacquire() and racerelease()

Loading

@gopherbot
Copy link

@gopherbot gopherbot commented Apr 2, 2020

Change https://golang.org/cl/226981 mentions this issue: runtime/race: update some .syso files

Loading

gopherbot pushed a commit that referenced this issue Apr 3, 2020
Update race detector syso files for some platforms.

There's still 2 more to do, but they might take a while so I'm
mailing the ones I have now.

Note: some arm64 tests did not complete successfully due to out
of memory errors, but I suspect the .syso is correct.

Update #14481
Update #37485 (I think?)
Update #37355

Change-Id: I7e7e707a1fd7574855a538ba89dc11acc999c760
Reviewed-on: https://go-review.googlesource.com/c/go/+/226981
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
@gopherbot
Copy link

@gopherbot gopherbot commented Apr 7, 2020

Change https://golang.org/cl/227443 mentions this issue: runtime/race: update ppc64 .syso file

Loading

gopherbot pushed a commit that referenced this issue Apr 8, 2020
Update #14881
Update #37355

Change-Id: I5edd53b7532836cfe6037fb668b1b8fe8f7a32f9
Reviewed-on: https://go-review.googlesource.com/c/go/+/227443
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
@gopherbot
Copy link

@gopherbot gopherbot commented Apr 11, 2020

Change https://golang.org/cl/227867 mentions this issue: runtime/race: rebuild netbsd .syso

Loading

@gopherbot gopherbot closed this in 3afa741 Apr 12, 2020
@randall77
Copy link
Contributor

@randall77 randall77 commented Apr 16, 2020

Tsan changes are in, but the Go side isn't done yet.

Loading

@randall77 randall77 reopened this Apr 16, 2020
@gopherbot
Copy link

@gopherbot gopherbot commented Apr 30, 2020

Change https://golang.org/cl/231222 mentions this issue: [release-branch.go1.14] runtime/race: update some .syso files

Loading

@gopherbot
Copy link

@gopherbot gopherbot commented Apr 30, 2020

Change https://golang.org/cl/231297 mentions this issue: runtime/race: rebuild race detector .syso files

Loading

gopherbot pushed a commit that referenced this issue May 1, 2020
Update #37355

Change-Id: I90cc121c158a9d44df01772083a7a9301598532e
Reviewed-on: https://go-review.googlesource.com/c/go/+/231297
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
gopherbot pushed a commit that referenced this issue May 4, 2020
Update race detector syso files for some platforms.

There's still 2 more to do, but they might take a while so I'm
mailing the ones I have now.

Note: some arm64 tests did not complete successfully due to out
of memory errors, but I suspect the .syso is correct.

For #14481
For #37485 (I think?)
For #37355
Fixes #38321

Change-Id: I7e7e707a1fd7574855a538ba89dc11acc999c760
Reviewed-on: https://go-review.googlesource.com/c/go/+/226981
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
(cherry picked from commit 041bcb3)
Reviewed-on: https://go-review.googlesource.com/c/go/+/231222
Reviewed-by: Keith Randall <khr@golang.org>
@gopherbot
Copy link

@gopherbot gopherbot commented May 4, 2020

Change https://golang.org/cl/232159 mentions this issue: [release-branch.go1.14] runtime/race: update ppc64 .syso file

Loading

gopherbot pushed a commit that referenced this issue May 6, 2020
For #14881
For #37355
For #38321

Change-Id: I5edd53b7532836cfe6037fb668b1b8fe8f7a32f9
Reviewed-on: https://go-review.googlesource.com/c/go/+/227443
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
(cherry picked from commit 7a35d39)
Reviewed-on: https://go-review.googlesource.com/c/go/+/232159
Reviewed-by: Keith Randall <khr@golang.org>
@gopherbot
Copy link

@gopherbot gopherbot commented May 6, 2020

Change https://golang.org/cl/232417 mentions this issue: [release-branch.go1.14] runtime/race: rebuild netbsd .syso

Loading

gopherbot pushed a commit that referenced this issue May 6, 2020
For #14481
For #37355
For #38321

Change-Id: Idfceaf0e64d340b7304ce9562549a82ebfc27e3c
Reviewed-on: https://go-review.googlesource.com/c/go/+/227867
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
(cherry picked from commit 3afa741)
Reviewed-on: https://go-review.googlesource.com/c/go/+/232417
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
xujianhai666 added a commit to xujianhai666/go-1 that referenced this issue May 21, 2020
Update golang#37355

Change-Id: I90cc121c158a9d44df01772083a7a9301598532e
Reviewed-on: https://go-review.googlesource.com/c/go/+/231297
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
@gopherbot gopherbot closed this in 35455ff Nov 13, 2020
@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Nov 20, 2020

Reopening because CL 220419 is being reverted; see #42598.

Loading

@ianlancetaylor ianlancetaylor added this to the Backlog milestone Nov 20, 2020
@dfava
Copy link
Contributor Author

@dfava dfava commented Nov 25, 2020

Fix to #42598 in df68e01

Loading

@dfava
Copy link
Contributor Author

@dfava dfava commented Dec 2, 2020

This CL was not reverted given the fix to #42598 in df68e01.
I think it is safe to close it.

Loading

@dfava dfava closed this Dec 2, 2020
@golang golang locked and limited conversation to collaborators Dec 2, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants