New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: race detector: more aggressive scheduler perturbations #22569

Open
bcmills opened this Issue Nov 3, 2017 · 4 comments

Comments

Projects
None yet
3 participants
@bcmills
Member

bcmills commented Nov 3, 2017

A program was posted to golang-nuts, with the question “is this code thread [safe]?”

The program has a race: it executes a select with a default case many times in parallel, and the default case calls close on a channel. (You can see the race by inserting a runtime.Gosched(): https://play.golang.org/p/_PWTTCwPgi.)

To my surprise, the program runs without error even with the race detector enabled.

The Go Memory Model defines an order between sends and receives but not between closes and sends or closes and other closes, so I think this is technically a data race (and not just a logic race). The race detector should report it.

src/racy/main.go:

package main

import (
	"fmt"
	"sync/atomic"
	"time"
)

func main() {
	die := make(chan struct{})
	i := int32(0)
	closed := func() {
		select {
		case <-die:
			atomic.AddInt32(&i, 1)
		default:
			close(die)
			fmt.Println("closed")
		}
	}
	N := 100000
	for i := 0; i < N; i++ {
		go closed()
	}
	time.Sleep(10 * time.Second)
	fmt.Println(atomic.LoadInt32(&i))
}
$ go version
go version devel +1852573521 Wed Nov 1 16:58:36 2017 +0000 linux/amd64
$ go run -race src/racy/main.go
closed
99999

@ianlancetaylor ianlancetaylor changed the title from race detector: detect close-after-close and send-after-close on channels to runtime: race detector: detect close-after-close and send-after-close on channels Nov 3, 2017

@ianlancetaylor ianlancetaylor added this to the Unplanned milestone Nov 3, 2017

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Nov 3, 2017

@dvyukov

This comment has been minimized.

Member

dvyukov commented Nov 6, 2017

Calling recv and close concurrently is perfectly fine and in fact a common idiom. Behavior is defined and is useful.
It's calling close and send concurrently is problematic.

@dvyukov dvyukov closed this Nov 6, 2017

@bcmills

This comment has been minimized.

Member

bcmills commented Nov 8, 2017

Calling recv and close concurrently is perfectly fine and in fact a common idiom. Behavior is defined and is useful.

Yes, that's true. However, a select does not induce a happens-before relationship if its default case is taken (i.e., if the receive operation does not execute).

In this particular example, that makes the existence of the race itself nondeterministic: if goroutine scheduling is favorable, the first default case executes before any other goroutines are scheduled. Then subsequent goroutines take the receive case, and no race occurs.

However, if goroutine scheduling is unfavorable, the default case is taken by multiple goroutines before the close executes, and there is a close-after-close race.


Perhaps one way to make the race detector more useful for select-with-default races would be to make the scheduler more aggressive (i.e., less deterministic) when the race detector is active.

A more deterministic option might be to check whether the first (or last) send or close event on the channel happens before the select statement itself: if not, evaluate the default case up to the first observable write (or next synchronization point) in addition to whatever branch was actually taken.

@dvyukov dvyukov reopened this Nov 8, 2017

@dvyukov

This comment has been minimized.

Member

dvyukov commented Nov 9, 2017

The problem is more general and is not specific to selects. Consider:

if atomic.LoadUint32(&x) == 0 {
  atomic.StoreUint32(&x, 1)
  y++
}

If 2 goroutines sneak into the if before one of them sets x=1, there will be a race on y. But most of the time there won't be a race.
In another race detector (Relacy) I implemented systematic checking of all possible thread interleavings, but that requires executing the same code gazillions of times and is not generally feasible. Our best bet is randomizing execution order more. In the old tsan I implemented a feature called "scheduler shake" for this, but I never got around to implementing it in the new tsan.

@dvyukov dvyukov changed the title from runtime: race detector: detect close-after-close and send-after-close on channels to runtime: race detector: more aggressive scheduler perturbations Feb 26, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment