Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

time: racy Timer access should either work or throw, not panic #37400

Closed
bcmills opened this issue Feb 24, 2020 · 21 comments
Closed

time: racy Timer access should either work or throw, not panic #37400

bcmills opened this issue Feb 24, 2020 · 21 comments
Milestone

Comments

@bcmills
Copy link
Member

@bcmills bcmills commented Feb 24, 2020

2020-02-21T20:22:19-13d73b2/linux-arm

panic: runtime error: racy use of timers
	panic: err must be non-nil

goroutine 3372 [running]:
net/http.(*http2pipe).closeWithError(0xd6f774, 0xd6f7a8, 0x0, 0x0, 0x0)
	/workdir/go/src/net/http/h2_bundle.go:3558 +0x210
net/http.(*http2pipe).CloseWithError(...)
	/workdir/go/src/net/http/h2_bundle.go:3545
net/http.(*http2clientConnReadLoop).cleanup(0xc43fd4)
	/workdir/go/src/net/http/h2_bundle.go:8228 +0x21c
panic(0x467b60, 0x580c18)
	/workdir/go/src/runtime/panic.go:967 +0x118
time.stopTimer(0x10c9e84, 0x10a9ca8)
	/workdir/go/src/runtime/time.go:224 +0x1c
time.(*Timer).Stop(...)
	/workdir/go/src/time/sleep.go:78
net/http.http2bodyWriterState.on100(0xd6f760, 0x10c9e80, 0x10a9c80, 0xefc780, 0x10c9e40, 0x3b9aca00, 0x0)
	/workdir/go/src/net/http/h2_bundle.go:9057 +0x38
net/http.(*http2clientConnReadLoop).handleResponse(0xc43fd4, 0xd6f760, 0xefc800, 0xd6f760, 0x58a650, 0x2cdc78)
	/workdir/go/src/net/http/h2_bundle.go:8433 +0x2fc
net/http.(*http2clientConnReadLoop).processHeaders(0xc43fd4, 0xefc800, 0x0, 0x0)
	/workdir/go/src/net/http/h2_bundle.go:8355 +0x14c
net/http.(*http2clientConnReadLoop).run(0xc43fd4, 0x0, 0x0)
	/workdir/go/src/net/http/h2_bundle.go:8277 +0x648
net/http.(*http2ClientConn).readLoop(0xdb2000)
	/workdir/go/src/net/http/h2_bundle.go:8174 +0x60
created by net/http.(*http2Transport).newClientConn
	/workdir/go/src/net/http/h2_bundle.go:7174 +0x538
FAIL	net/http	6.612s

CC @ianlancetaylor @bradfitz @tombergan

@bcmills bcmills added this to the Go1.15 milestone Feb 24, 2020
@bcmills
Copy link
Member Author

@bcmills bcmills commented Feb 25, 2020

Could this be related to #37196? (I'm not sure.)

@rsc
Copy link
Contributor

@rsc rsc commented Feb 25, 2020

The panic inside time.Timer.Stop should really be runtime.throw. Code does not expect panic from Stop. If there's really a racy use of timers and timers are corrupt, stop executing.

@rsc
Copy link
Contributor

@rsc rsc commented Feb 25, 2020

Also, time.stopTimer clearly called deltimer, and deltimer called badTimer, which panicked. Do we not have a stack frame for deltimer showing what line it is at? Do we not have a stack frame for badTimer? Why not?

@bcmills
Copy link
Member Author

@bcmills bcmills commented Feb 25, 2020

That may be #26206..?

@rsc
Copy link
Contributor

@rsc rsc commented Feb 25, 2020

This program crashes very quickly:

package main

import "time"

func main() {
	for i := 0; i < 1000; i++ {
		t := time.AfterFunc(1*time.Second, func() {})
		go t.Stop()
		go t.Reset(1 * time.Second)
		go t.Stop()
		go t.Reset(1 * time.Second)
	}
	time.Sleep(10 * time.Second)
}

I don't believe this program should panic. The claim in the runtime code is that the runtime timer data structures may be corrupted, but if so then there is not enough locking protecting against bad user call sequences. But the bad user call sequences should find a way to complete (serialize them somehow), no?

@rsc
Copy link
Contributor

@rsc rsc commented Feb 25, 2020

Seems like a possible Go 1.14 release blocker.

@rsc rsc modified the milestones: Go1.15, Go1.14 Feb 25, 2020
@rsc rsc added the release-blocker label Feb 25, 2020
@rsc rsc changed the title net/http: "racy use of timers" in (*http2pipe).CloseWithError time: Timer sequences newly panic with "racy use of timers" Feb 25, 2020
@rsc
Copy link
Contributor

@rsc rsc commented Feb 25, 2020

@dmitshur
Copy link
Member

@dmitshur dmitshur commented Feb 25, 2020

This program crashes very quickly

I can reproduce this with go1.14rc1, go1.14beta1, but also with go1.13.7 on darwin/amd64:


$ go run .
panic: runtime error: racy use of timers

goroutine 195 [running]:
time.stopTimer(0xc00005af08, 0x1)
	/usr/local/go/src/runtime/time.go:224 +0x2b
time.(*Timer).Stop(0xc00005af00, 0x3b9aca00)
	/usr/local/go/src/time/sleep.go:78 +0x36
created by main.main
	/var/folders/zb/5p8cwfhj29gf_m8vdy8ylmlr00jwcj/T/tmp.5VGZN9cY/main.go:10 +0xb5
exit status 2
$ go1.14beta1 run .
panic: runtime error: racy use of timers

goroutine 177 [running]:
time.stopTimer(0xc000060c88, 0x1)
	/Users/dmitshur/sdk/go1.14beta1/src/runtime/time.go:219 +0x2b
time.(*Timer).Stop(0xc000060c80, 0x3b9aca00)
	/Users/dmitshur/sdk/go1.14beta1/src/time/sleep.go:78 +0x36
created by main.main
	/var/folders/zb/5p8cwfhj29gf_m8vdy8ylmlr00jwcj/T/tmp.5VGZN9cY/main.go:10 +0xb5
panic: runtime error: racy use of timers

goroutine 198 [running]:
time.resetTimer(0xc000060e18, 0xb9845fda967c)
	/Users/dmitshur/sdk/go1.14beta1/src/runtime/time.go:228 +0x35
time.(*Timer).Reset(0xc000060e10, 0x3b9aca00, 0x0)
	/Users/dmitshur/sdk/go1.14beta1/src/time/sleep.go:126 +0x7d
created by main.main
	/var/folders/zb/5p8cwfhj29gf_m8vdy8ylmlr00jwcj/T/tmp.5VGZN9cY/main.go:11 +0xe0
exit status 2
$ go1.13.7 run .   
panic: runtime error: racy use of timers

goroutine 355 [running]:
time.startTimer(0xc000087548)
	/Users/dmitshur/sdk/go1.13.7/src/runtime/time.go:114 +0x2b
time.(*Timer).Reset(0xc000087540, 0x3b9aca00, 0x0)
	/Users/dmitshur/sdk/go1.13.7/src/time/sleep.go:126 +0x81
created by main.main
	/var/folders/zb/5p8cwfhj29gf_m8vdy8ylmlr00jwcj/T/tmp.5VGZN9cY/main.go:9 +0x93
exit status 2
@toothrot
Copy link
Contributor

@toothrot toothrot commented Feb 25, 2020

This also seems to reproduce on go1.12.17, although less reliably:

$ go1.12.17 run main.go 
panic: runtime error: racy use of timers

goroutine 248 [running]:
time.startTimer(0xc000086dc8)
	/home/rakoczy/sdk/go1.12.17/src/runtime/time.go:114 +0x2b
time.(*Timer).Reset(0xc000086dc0, 0x3b9aca00, 0x45b834)
	/home/rakoczy/sdk/go1.12.17/src/time/sleep.go:127 +0x81
created by main.main
	/tmp/37400/main.go:11 +0xe0
exit status 2
@dmitshur
Copy link
Member

@dmitshur dmitshur commented Feb 25, 2020

This issue is blocking the Go 1.14 release, which is currently underway (but has not been cut yet, so we can still safely abort it or make changes if necessary). See #37445 (comment).

@rsc
Copy link
Contributor

@rsc rsc commented Feb 25, 2020

This panic was introduced in Go 1.11, so I guess it is not a Go 1.14 release blocker. Still think it should probably either be fixed or changed to a runtime.throw. The unexpected panic is problematic (see the nested panic in the trace above). Stop should either work or crash the program, not panic.

@rsc rsc modified the milestones: Go1.14, Go1.15 Feb 25, 2020
@rsc rsc removed the release-blocker label Feb 25, 2020
@aclements aclements changed the title time: Timer sequences newly panic with "racy use of timers" time: racy Timer access should either work or throw, not panic Feb 25, 2020
@rsc
Copy link
Contributor

@rsc rsc commented Feb 25, 2020

I filed details about the net/http timer usage in #37449. The code, from 2016, is aware of the fact that Reset and Stop are racing and handled either ordering correctly. What it did not expect is the panic introduced in Go 1.11. If we decide that this panic is important to keep, we will need to rewrite that code in http2.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Feb 25, 2020

Just to be clear, the problem was not introduced in 1.11. The program above also fails with 1.10.
What happened in 1.11 was that the problem changed from a throw saying "fatal error: panic holding locks" to a panic saying "runtime error: racy use of timers". Changing from a throw to a panic was a mistake on my part. But changing from "panic holding locks" to "racy use of timers" was not.

Before 1.10 the problem still exists but is harder to reproduce. What changed in 1.10 was that we switched to many different timer buckets. With fewer timers in each bucket it's easier to create a case in which siftupTimer gets an index out of range.

In any case we can do better with the new timers.

@gopherbot
Copy link

@gopherbot gopherbot commented Feb 26, 2020

Change https://golang.org/cl/221077 mentions this issue: runtime: don't panic on racy use of timers

@makhov
Copy link

@makhov makhov commented Feb 26, 2020

We started to get panics right after updating to Go 1.14.

runtime error: racy use of timers
at time.stopTimer (time.go:224)
at time.(*Timer).Stop (sleep.go:78)
at net/http.(*http2ClientConn).roundTrip (h2_bundle.go:7480)
at net/http.(*http2Transport).RoundTripOpt (h2_bundle.go:6948)
at net/http.(*http2Transport).RoundTrip (h2_bundle.go:6909)
at net/http.http2noDialH2RoundTripper.RoundTrip (h2_bundle.go:9103)
at net/http.(*Transport).roundTrip (transport.go:515)
at net/http.(*Transport).RoundTrip (roundtrip.go:17)
at net/http.send (client.go:252)
at net/http.(*Client).send (client.go:176)
at net/http.(*Client).do (client.go:699)
at net/http.(*Client).Do (client.go:567)
@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Feb 27, 2020

@gopherbot Please open a backport to 1.14.

@gopherbot
Copy link

@gopherbot gopherbot commented Feb 27, 2020

Backport issue(s) opened: #37494 (for 1.14).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases.

@gopherbot gopherbot closed this in 98858c4 Feb 27, 2020
@gopherbot
Copy link

@gopherbot gopherbot commented Feb 27, 2020

Change https://golang.org/cl/221298 mentions this issue: [release-branch.go1.14] runtime: don't panic on racy use of timers

@thepudds
Copy link

@thepudds thepudds commented Feb 27, 2020

Hello @makhov, would you be interested in confirming it is fixed for you by trying with tip?

Using tip via https://godoc.org/golang.org/dl/gotip is very convenient, or you could directly compile from source if you prefer.

@makhov
Copy link

@makhov makhov commented Feb 27, 2020

@thepudds Thanks! I'll try it in a canary. Panics happen no such often, so I'll keep it running for the weekend.

gopherbot pushed a commit that referenced this issue Feb 27, 2020
If we see a racy use of timers, as in concurrent calls to Timer.Reset,
do the operations in an unpredictable order, rather than crashing.

Updates #37400
Updates #37449
Fixes #37494

Change-Id: Idbac295df2dfd551b6d762909d5040fc532c1b34
Reviewed-on: https://go-review.googlesource.com/c/go/+/221077
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
(cherry picked from commit 98858c4)
Reviewed-on: https://go-review.googlesource.com/c/go/+/221298
@makhov
Copy link

@makhov makhov commented Mar 3, 2020

@thepudds The panic seems to be fixed. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
8 participants
You can’t perform that action at this time.