Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/playground: falsely detects a deadlock when running Cmd.StderrPipe example code #41202

Open
hundt opened this issue Sep 3, 2020 · 10 comments
Open

Comments

@hundt
Copy link

@hundt hundt commented Sep 3, 2020

What version of Go are you using (go version)?

The playground

Does this issue reproduce with the latest release?

Works on latest released web site

What did you do?

Clicked "Run" on the example for Cmd.StderrPipe

What did you expect to see?

It runs

What did you see instead?

fatal error: all goroutines are asleep - deadlock!

goroutine 1 [IO wait]:
internal/poll.runtime_pollWait(0x7efd408c4ee8, 0x72, 0x505300)
	/usr/local/go-faketime/src/runtime/netpoll.go:220 +0x55
internal/poll.(*pollDesc).wait(0xc000104198, 0x72, 0xc00011c001, 0x200, 0x200)
	/usr/local/go-faketime/src/internal/poll/fd_poll_runtime.go:87 +0x45
internal/poll.(*pollDesc).waitRead(...)
	/usr/local/go-faketime/src/internal/poll/fd_poll_runtime.go:92
internal/poll.(*FD).Read(0xc000104180, 0xc00011c000, 0x200, 0x200, 0x0, 0x0, 0x0)
	/usr/local/go-faketime/src/internal/poll/fd_unix.go:159 +0x1b1
os.(*File).read(...)
	/usr/local/go-faketime/src/os/file_posix.go:31
os.(*File).Read(0xc000102018, 0xc00011c000, 0x200, 0x200, 0xc00011c000, 0x0, 0x0)
	/usr/local/go-faketime/src/os/file.go:116 +0x71
bytes.(*Buffer).ReadFrom(0xc000068e88, 0x505200, 0xc000102018, 0x505200, 0x4eb2f0, 0x0)
	/usr/local/go-faketime/src/bytes/buffer.go:204 +0xb1
io/ioutil.readAll(0x505200, 0xc000102018, 0x200, 0x0, 0x0, 0x0, 0x0, 0x0)
	/usr/local/go-faketime/src/io/ioutil/ioutil.go:36 +0xe5
io/ioutil.ReadAll(...)
	/usr/local/go-faketime/src/io/ioutil/ioutil.go:45
main.main()
	/tmp/sandbox971301426/prog.go:21 +0x1ca

Program exited.
@rsc
Copy link
Contributor

@rsc rsc commented Sep 3, 2020

This is a bug in the faketime logic. It's now being run in a sandbox that allows interaction with external processes, but the deadlock detector assumes that's not the case, that wakeups can only come up from inside the program. Here the wakeup will come from outside the program.

@dmitshur dmitshur changed the title Playground falsely detects a deadlock when running Cmd.StderrPipe example code x/playground: falsely detects a deadlock when running Cmd.StderrPipe example code Sep 3, 2020
@gopherbot gopherbot added this to the Unreleased milestone Sep 3, 2020
@dmitshur
Copy link
Contributor

@dmitshur dmitshur commented Sep 3, 2020

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Sep 5, 2020

@prattmic
Copy link
Member

@prattmic prattmic commented Sep 9, 2020

This doesn't strictly have to come from another process.

e.g., https://play.golang.org/p/YLeam5VkpRZ

This hits the same checkdead throw. Though in this case this is arguably helpful, since the standard behavior would be to hang forever.

Note https://play.golang.org/p/TFCeA6hhJX8 does not fail. This is because there is now a M really running in a syscall, while the original example is fully blocked in netpoll.

@prattmic
Copy link
Member

@prattmic prattmic commented Sep 9, 2020

The fundamental problem here is that findrunnable immediately stops the M if nothing is ready.

The assumption here (correct me if I'm wrong), is that at this point if none of the netpoll FDs are ready, then only a timer/sleep could be preventing forward progress, and checkdead will jump time to restart progress.

Unfortunately, that simply isn't true in this expanded environment. Situations I can think of where this assumption breaks down:

  1. Waiting on an FD written by a child process (this original bug).
  2. Waiting on a kernel-written FD (timerfd comes to mind).
  3. Waiting on a socket that doesn't have synchronous responses (e.g., some netlink sockets, (maybe) loopback inet sockets).
    (Plus probably more)

There are a few options I see:

  1. If netpoll is waiting on an FD, try to jump time, but if there is no jump to be made, just wait like a normal Go program. I feel like this is the best approach, but the behavior will be a bit unpredictable. (e.g., this would make the os.Pipe hang actually turn back into a hang instead of a checkdead throw.

  2. Drop time jumps from netpoll entirely if there are FDs. This is more predictable, but will make some programs that used to work now timeout on the playground, since their sleeps don't necessarily time jump. (This is effectively a partial removal of faketime altogether, which is also an option, but presumably unacceptable).

  3. Make faketime a feature of the sandbox rather than the Go runtime, so it affects all programs equally, and could be removed from the runtime. This is likely quite difficult to make work with arbitrary Linux programs, but something to consider.

(3) is mostly listed as a bit of a strawman, but unless we have some sandbox-wide mechanism, there will always be cases we can't cover. If a subprocess is depending on timing of its input on some level, then faketime in the main Go program is likely to break that interaction.

@rsc
Copy link
Contributor

@rsc rsc commented Sep 15, 2020

As long as there is an overall wall time limit on the playground execution, we should probably just implement time as real time there and rip out nearly all the fake time code. We could still make the time when the program begins be reported as the magic date so that caching doesn't look funny.

@aclements
Copy link
Member

@aclements aclements commented Sep 15, 2020

Is it actually viable to run the playground in real time? I would love to drop faketime. Based on #30439 I thought there were reasons we couldn't just drop it, including the present tool.

@prattmic
Copy link
Member

@prattmic prattmic commented Sep 15, 2020

AIUI, there is a 5s timeout on playground execution. I'd also love to see faketime gone, but I imagine that would break lots of programs with non-trivial amounts of sleep.

@rsc
Copy link
Contributor

@rsc rsc commented Sep 15, 2020

We could instrument the playground to record what fraction of executions use more than 5s of sleep.
Or we could decide those are simply not supported anymore.
I'm honestly OK with either - the playground is for test cases and the like. Hardly any of those need 5s of sleep.

@rsc
Copy link
Contributor

@rsc rsc commented Sep 15, 2020

(And the ones that really do need 5s of sleep may not work right with fake time anyway.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
7 participants