Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: concurrent map writes #31352

Closed
OneOfOne opened this issue Apr 8, 2019 · 11 comments

Comments

Projects
None yet
6 participants
@OneOfOne
Copy link
Contributor

commented Apr 8, 2019

What version of Go are you using (go version)?

$ go version
go version go1.12.1 linux/amd64
# happens with 1.11.x too

Does this issue reproduce with the latest release?

Yes, maybe?

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
linux/amd64

What did you do?

We have a long running service that makes http requests using a single http.Client

	client := &http.Client{
		Jar:     some.Cookiejar,
		Timeout: time.Minute * 3,
	}

Every few days we get the provided crash/panic, gonna try to run it with -race but I'm not sure we can afford that for longer than a few hours.

I'm trying to make a reliable reproducer but I can't figure it out, the crash points to runtime.

What did you expect to see?

No panic.

What did you see instead?

Apr 08 21:37:04 some someAPI[25339]: fatal error: concurrent map writes
Apr 08 21:37:04 some someAPI[25339]: goroutine 3592 [running]:
Apr 08 21:37:04 some someAPI[25339]: runtime.throw(0xd39b24, 0x15)
Apr 08 21:37:04 some someAPI[25339]: /usr/local/go/src/runtime/panic.go:617 +0x72 fp=0xc087ef6400 sp=0xc087ef63d0 pc=0x42de12
Apr 08 21:37:04 some someAPI[25339]: runtime.mapassign_fast64ptr(0xbc1460, 0xc0002891d0, 0xc000c46700, 0xbea701)
Apr 08 21:37:04 some someAPI[25339]: /usr/local/go/src/runtime/map_fast64.go:266 +0x35d fp=0xc087ef6440 sp=0xc087ef6400 pc=0x41327d
Apr 08 21:37:04 some someAPI[25339]: net/http.(*Transport).setReqCanceler(0xc044368ee8, 0xc000c46700, 0xc05e8ea380)
Apr 08 21:37:04 some someAPI[25339]: /usr/local/go/src/net/http/transport.go:903 +0xa1 fp=0xc087ef6478 sp=0xc087ef6440 pc=0x704e21
Apr 08 21:37:04 some someAPI[25339]: net/http.(*Transport).getConn(0xc044368ee8, 0xc079ae21e0, 0x0, 0xc0555fe0b0, 0x5, 0xc07c673d80, 0x13,
Apr 08 21:37:04 some someAPI[25339]: /usr/local/go/src/net/http/transport.go:991 +0x3f1 fp=0xc087ef6968 sp=0xc087ef6478 pc=0x705681
Apr 08 21:37:04 some someAPI[25339]: net/http.(*Transport).roundTrip(0xc044368ee8, 0xc000c46700, 0xc079ae21b0, 0xc04f5631bc, 0xc04f5631b0)
Apr 08 21:37:04 some someAPI[25339]: /usr/local/go/src/net/http/transport.go:467 +0x6ef fp=0xc087ef6be8 sp=0xc087ef6968 pc=0x701d0f
Apr 08 21:37:04 some someAPI[25339]: net/http.(*Transport).RoundTrip(0xc044368ee8, 0xc000c46700, 0xc044368ee8, 0xbf230dcd0a0eb5db, 0x30b39
Apr 08 21:37:04 some someAPI[25339]: /usr/local/go/src/net/http/roundtrip.go:17 +0x35 fp=0xc087ef6c20 sp=0xc087ef6be8 pc=0x6e8a35
Apr 08 21:37:04 some someAPI[25339]: net/http.send(0xc000c46100, 0xe45340, 0xc044368ee8, 0xbf230dcd0a0eb5db, 0x30b395f3547, 0x14232e0, 0xc
Apr 08 21:37:04 some someAPI[25339]: /usr/local/go/src/net/http/client.go:250 +0x461 fp=0xc087ef6d88 sp=0xc087ef6c20 pc=0x6a7ea1
Apr 08 21:37:04 some someAPI[25339]: net/http.(*Client).send(0xc079ae20c0, 0xc000c46100, 0xbf230dcd0a0eb5db, 0x30b395f3547, 0x14232e0, 0xc
Apr 08 21:37:04 some someAPI[25339]: /usr/local/go/src/net/http/client.go:174 +0xfb fp=0xc087ef6e08 sp=0xc087ef6d88 pc=0x6a789b
Apr 08 21:37:04 some someAPI[25339]: net/http.(*Client).do(0xc079ae20c0, 0xc000c46100, 0x0, 0x0, 0x0)
Apr 08 21:37:04 some someAPI[25339]: /usr/local/go/src/net/http/client.go:641 +0x279 fp=0xc087ef6fd8 sp=0xc087ef6e08 pc=0x6a8ea9
Apr 08 21:37:04 some someAPI[25339]: net/http.(*Client).Do(...)
Apr 08 21:37:04 some someAPI[25339]: /usr/local/go/src/net/http/client.go:509
@OneOfOne

This comment has been minimized.

Copy link
Contributor Author

commented Apr 8, 2019

With -race after < 2 minutes:

[oneofone@some ~]$ sudo journalctl -n10000 -u srv | grep -A10 -i 'write at'
Apr 08 22:08:59 some someAPI[32261]: Write at 0x00c0001c9bf0 by goroutine 3030:
Apr 08 22:08:59 some someAPI[32261]: runtime.mapassign_fast64()
Apr 08 22:08:59 some someAPI[32261]: /usr/local/go/src/runtime/map_fast64.go:92 +0x0
Apr 08 22:08:59 some someAPI[32261]: net/http.(*Transport).setReqCanceler()
Apr 08 22:08:59 some someAPI[32261]: /usr/local/go/src/net/http/transport.go:903 +0xdc
Apr 08 22:08:59 some someAPI[32261]: net/http.(*Transport).getConn()
Apr 08 22:08:59 some someAPI[32261]: /usr/local/go/src/net/http/transport.go:991 +0x5b9
Apr 08 22:08:59 some someAPI[32261]: net/http.(*Transport).roundTrip()
Apr 08 22:08:59 some someAPI[32261]: /usr/local/go/src/net/http/transport.go:467 +0xa37
Apr 08 22:08:59 some someAPI[32261]: net/http.(*Transport).RoundTrip()
Apr 08 22:08:59 some someAPI[32261]: /usr/local/go/src/net/http/roundtrip.go:17 +0x42
--
Apr 08 22:08:59 some someAPI[32261]: Previous write at 0x00c0001c9bf0 by goroutine 3024:
Apr 08 22:08:59 some someAPI[32261]: runtime.mapassign_fast64()
Apr 08 22:08:59 some someAPI[32261]: /usr/local/go/src/runtime/map_fast64.go:92 +0x0
Apr 08 22:08:59 some someAPI[32261]: net/http.(*Transport).setReqCanceler()
Apr 08 22:08:59 some someAPI[32261]: /usr/local/go/src/net/http/transport.go:903 +0xdc
Apr 08 22:08:59 some someAPI[32261]: net/http.(*Transport).getConn()
Apr 08 22:08:59 some someAPI[32261]: /usr/local/go/src/net/http/transport.go:991 +0x5b9
Apr 08 22:08:59 some someAPI[32261]: net/http.(*Transport).roundTrip()
Apr 08 22:08:59 some someAPI[32261]: /usr/local/go/src/net/http/transport.go:467 +0xa37
Apr 08 22:08:59 some someAPI[32261]: net/http.(*Transport).RoundTrip()
Apr 08 22:08:59 some someAPI[32261]: /usr/local/go/src/net/http/roundtrip.go:17 +0x42
--
Apr 08 22:09:16 some someAPI[32261]: Write at 0x00c000032d50 by goroutine 308:
Apr 08 22:09:16 some someAPI[32261]: runtime.mapassign_fast64()
Apr 08 22:09:16 some someAPI[32261]: /usr/local/go/src/runtime/map_fast64.go:92 +0x0
Apr 08 22:09:16 some someAPI[32261]: net/http.(*Transport).setReqCanceler()
Apr 08 22:09:16 some someAPI[32261]: /usr/local/go/src/net/http/transport.go:903 +0xdc
Apr 08 22:09:16 some someAPI[32261]: net/http.(*Transport).getConn()
Apr 08 22:09:16 some someAPI[32261]: /usr/local/go/src/net/http/transport.go:991 +0x5b9
Apr 08 22:09:16 some someAPI[32261]: net/http.(*Transport).roundTrip()
Apr 08 22:09:16 some someAPI[32261]: /usr/local/go/src/net/http/transport.go:467 +0xa37
Apr 08 22:09:16 some someAPI[32261]: net/http.(*Transport).RoundTrip()
Apr 08 22:09:16 some someAPI[32261]: /usr/local/go/src/net/http/roundtrip.go:17 +0x42
--
Apr 08 22:09:16 some someAPI[32261]: Previous write at 0x00c000032d50 by goroutine 203:
Apr 08 22:09:16 some someAPI[32261]: runtime.mapassign_fast64()
Apr 08 22:09:16 some someAPI[32261]: /usr/local/go/src/runtime/map_fast64.go:92 +0x0
Apr 08 22:09:16 some someAPI[32261]: net/http.(*Transport).setReqCanceler()
Apr 08 22:09:16 some someAPI[32261]: /usr/local/go/src/net/http/transport.go:903 +0xdc
Apr 08 22:09:16 some someAPI[32261]: net/http.(*Transport).getConn()
Apr 08 22:09:16 some someAPI[32261]: /usr/local/go/src/net/http/transport.go:991 +0x5b9
Apr 08 22:09:16 some someAPI[32261]: net/http.(*Transport).roundTrip()
Apr 08 22:09:16 some someAPI[32261]: /usr/local/go/src/net/http/transport.go:467 +0xa37
Apr 08 22:09:16 some someAPI[32261]: net/http.(*Transport).RoundTrip()
Apr 08 22:09:16 some someAPI[32261]: /usr/local/go/src/net/http/roundtrip.go:17 +0x42

@OneOfOne OneOfOne changed the title runtime: concurrent map writes net/http: concurrent map writes Apr 8, 2019

@OneOfOne

This comment has been minimized.

Copy link
Contributor Author

commented Apr 8, 2019

@dgryski

This comment has been minimized.

Copy link
Contributor

commented Apr 8, 2019

Looking at the code it seems every place t.reqCanceler is accessed it's protected by t.reqMu as expected. I wonder if this is a side effect of memory corruption from something else?

@OneOfOne

This comment has been minimized.

Copy link
Contributor Author

commented Apr 8, 2019

we don't use any cgo or any atomic tricks afaik, only external library we use is boltdb, I'm still digging and I looked at every single file from the full trace (including our code), everything is either using a mutex or only set once on initalization.

Here's the weird part though, without -race, it happens randomly every 1-2 days, with -race it happens once on server init every time.

@josharian

This comment has been minimized.

Copy link
Contributor

commented Apr 9, 2019

That’s not too surprising; the race detector is far more sensitive than the best effort concurrent map write detection in the runtime.

Can you start removing chunks of startup code to narrow down the source of the race?

@OneOfOne

This comment has been minimized.

Copy link
Contributor Author

commented Apr 9, 2019

I will try sometime tomorrow.

@davecheney

This comment has been minimized.

Copy link
Contributor

commented Apr 9, 2019

@OneOfOne are you copying or deriving a value from net/http.DefaultClient ?

@davecheney

This comment has been minimized.

Copy link
Contributor

commented Apr 9, 2019

Possibly related, #30597

@OneOfOne

This comment has been minimized.

Copy link
Contributor Author

commented Apr 9, 2019

@davecheney We do not, we create a new http.Client per call, only thing shared is an http.Transport (a part of the struct, not the default one) and that only contains Proxy: http.ProxyURL(...).

Me and a co-worker have been scratching our heads trying to figure it out for a few days now.

Also #30597 uses GetBody with a custom reader, we just use strings.NewReader for the body in a new http.NewRequest.

@rsc

This comment has been minimized.

Copy link
Contributor

commented Apr 12, 2019

You are making a value copy of the http.Transport somewhere, instead of copying a pointer to the Transport. That copies the map between the two Transports but now protected by different locks. Vet should be able to tell you where.

$ cat x.go
package p

import "net/http"

var x http.Transport

func f() {
	y := x
	_ = &y
}
$ go vet x.go
# command-line-arguments
./x.go:8:7: assignment copies lock value to y: net/http.Transport contains sync.Mutex
$ 
@OneOfOne

This comment has been minimized.

Copy link
Contributor Author

commented Apr 12, 2019

@rsc thank you for pointing that out, the Transport wasn't directly copied, the struct holding it was.

We run go vet before building but apparently we had that disabled for a while and we didn't see it.

@OneOfOne OneOfOne closed this Apr 12, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.