Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Go packages were inaccessible from origin server on Oct 28, 2019 from 3:37 am to 7:56 am (UTC−04) #31

Closed
JehandadK opened this issue Oct 28, 2019 · 4 comments

Comments

@JehandadK
Copy link

@JehandadK JehandadK commented Oct 28, 2019

Fetching https://dmitri.shuralyov.com/gpu/mtl?go-get=1
https fetch failed: Get https://dmitri.shuralyov.com/gpu/mtl?go-get=1: dial tcp 172.93.50.41:443: connect: connection refused

Hi, I wonder if this can be fixed again. I see you already faced this once. I guess our servers are in Japan.

Thanks!

@dmitshur

This comment has been minimized.

Copy link
Member

@dmitshur dmitshur commented Oct 28, 2019

Hi @JehandadK,

Thanks a lot for letting me know. There was an issue on the web server causing the website to be unavailable. It should be fixed now. Please try again and let me know if you're still seeing any issues.

I'm going to look into improving it to prevent (and detect more quickly) this kind of problem in the future.

I suggest using a module proxy, for example the Go module mirror (https://proxy.golang.org), in order to be able to download modules even when the origin server is temporarily unavailable. @katiehockman's excellent GopherCon 2019 talk covered how module proxies can help mitigate issues such as this one in more detail.

@dmitshur dmitshur changed the title Go packages inaccessible for private server Go packages were inaccessible from origin server on Oct 28, 2019 from 3:37 am to 7:56 am EST Oct 28, 2019
@dmitshur dmitshur changed the title Go packages were inaccessible from origin server on Oct 28, 2019 from 3:37 am to 7:56 am EST Go packages were inaccessible from origin server on Oct 28, 2019 from 3:37 am to 7:56 am (UTC−04) Oct 28, 2019
@benjaminkomen

This comment has been minimized.

Copy link

@benjaminkomen benjaminkomen commented Oct 28, 2019

Am I interpreting https://proxy.golang.org/ correctly that if you use Go 1.13 you will already use the Go module mirror without any specific configuration? When I do go env on my CircleCi build server I see GOPROXY="https://proxy.golang.org,direct"

@dmitshur

This comment has been minimized.

Copy link
Member

@dmitshur dmitshur commented Oct 28, 2019

@benjaminkomen That is correct. Also see the second paragraph of the introduction in the Go 1.13 release notes.

@dmitshur

This comment has been minimized.

Copy link
Member

@dmitshur dmitshur commented Oct 30, 2019

Here's a timeline of the outage, showing a graph of go get requests being handled:

image

(Times are in EDT, aka UTC−04 timezone.)

The root cause was a bug in the golang.org/x/crypto/acme/autocert package that caused a nil pointer dereference in the HTTPS proxy in front of the home server:

Stack trace
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x80 pc=0x6cba45]

goroutine 17196253 [running]:
golang.org/x/crypto/acme/autocert.(*Manager).verifyRFC.func1(0xc0000e0000, 0xc00008e0a8)
	/Users/dmitri/go/src/golang.org/x/crypto/acme/autocert/autocert.go:774 +0x25
golang.org/x/crypto/acme/autocert.(*Manager).verifyRFC(0xc0000e0000, 0x802040, 0xc000414ea0, 0xc0001ea5b0, 0xc000014be0, 0xc, 0x0, 0x7fc500, 0xa46700)
	/Users/dmitri/go/src/golang.org/x/crypto/acme/autocert/autocert.go:769 +0x7f0
golang.org/x/crypto/acme/autocert.(*Manager).authorizedCert(0xc0000e0000, 0x802040, 0xc000414ea0, 0x7fdf80, 0xc00037ee40, 0xc000014be0, 0xc, 0x0, 0x0, 0x8, ...)
	/Users/dmitri/go/src/golang.org/x/crypto/acme/autocert/autocert.go:676 +0x4b5
golang.org/x/crypto/acme/autocert.(*domainRenewal).do(0xc0003b0440, 0x802040, 0xc000414ea0, 0x802040, 0xc000414ea0, 0xc000436500)
	/Users/dmitri/go/src/golang.org/x/crypto/acme/autocert/renewal.go:110 +0xfb
golang.org/x/crypto/acme/autocert.(*domainRenewal).renew(0xc0003b0440)
	/Users/dmitri/go/src/golang.org/x/crypto/acme/autocert/renewal.go:65 +0x132
created by time.goFunc
	/usr/local/go/src/time/sleep.go:168 +0x44

The HTTPS proxy program is not setup to be automatically restarted on crash, so all requests stopped being served until it was manually restarted. If automatic restarts were implemented, the outage would've been largely mitigated, but may have been less noticeable and less likely the root cause would be found and fixed (since it'd be easier to ignore). My personal website prioritizes experimentation and development over stability, and so automatic restarts are not used.

The panic happened due to issue golang/go#35225. That issue has since been fixed via CL golang.org/cl/203919, so it should not re-occur.

I've also added an alert that should help notify me of similar issues in the future.

As mentioned in #31 (comment), if reliability of your build is of high importance, then it's recommended to use a caching module proxy (such as the Go module mirror at https://proxy.golang.org), so that your module's build can be successful even when some origin servers are temporary unavailable. My personal website only has a 95%+ uptime SLA.

Closing since this is resolved.

@dmitshur dmitshur closed this Oct 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
3 participants
You can’t perform that action at this time.