Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Go packages were inaccessible from origin server on Feb 10, 2020 from 3:02~4:30 pm until 9:55 pm EST (UTC -5) #37

Closed
zlin3 opened this issue Feb 10, 2020 · 5 comments
Assignees

Comments

@zlin3
Copy link

@zlin3 zlin3 commented Feb 10, 2020

Hi the link (http://dmitri.shuralyov.com/gpu/mtl) is not accessible right now and is causing build issue --
'''
go: dmitri.shuralyov.com/gpu/mtl@v0.0.0-20190408044501-666a987793e9: unrecognized import path "dmitri.shuralyov.com/gpu/mtl" (https fetch: Get https://dmitri.shuralyov.com/gpu/mtl?go-get=1: dial tcp 172.93.50.41:443: i/o timeout)
'''

@vfouzdar

This comment has been minimized.

Copy link

@vfouzdar vfouzdar commented Feb 10, 2020

Same here

go: dmitri.shuralyov.com/service/change@v0.0.0-20181023043359-a85b471d5412: unrecognized import path "dmitri.shuralyov.com/service/change" (https fetch: Get https://dmitri.shuralyov.com/service/change?go-get=1: dial tcp 172.93.50.41:443: i/o timeout)
go: dmitri.shuralyov.com/app/changes@v0.0.0-20180602232624-0a106ad413e3: unrecognized import path "dmitri.shuralyov.com/app/changes" (https fetch: Get https://dmitri.shuralyov.com/app/changes?go-get=1: dial tcp 172.93.50.41:443: i/o timeout)
go: dmitri.shuralyov.com/state@v0.0.0-20180228185332-28bcc343414c: unrecognized import path "dmitri.shuralyov.com/state" (https fetch: Get https://dmitri.shuralyov.com/state?go-get=1: dial tcp 172.93.50.41:443: i/o timeout)
go: dmitri.shuralyov.com/html/belt@v0.0.0-20180602232347-f7d459c86be0: unrecognized import path "dmitri.shuralyov.com/html/belt" (https fetch: Get https://dmitri.shuralyov.com/html/belt?go-get=1: dial tcp 172.93.50.41:443: i/o timeout)
go: error loading module requirements

@ljagiello

This comment has been minimized.

Copy link

@ljagiello ljagiello commented Feb 10, 2020

Got the same problem.

go: dmitri.shuralyov.com/gpu/mtl@v0.0.0-20190408044501-666a987793e9: unrecognized import path "dmitri.shuralyov.com/gpu/mtl" (https fetch: Get https://dmitri.shuralyov.com/gpu/mtl?go-get=1: dial tcp 172.93.50.41:443: i/o timeout)
@dmitshur

This comment has been minimized.

Copy link
Member

@dmitshur dmitshur commented Feb 11, 2020

Thanks for reporting.

In general, I suggest using the module mirror at https://proxy.golang.org (e.g., by setting GOPROXY environment variable to point to it) for increased reliability of builds.

This particular outage should be resolved by now. See https://twitter.com/dmitshur/status/1227061882722361344 for some additional information.

I'll keep watching the server more closely to make sure it continues to be stable before closing this. Sorry for the trouble.

@dmitshur dmitshur self-assigned this Feb 11, 2020
dmitshur added a commit that referenced this issue Feb 11, 2020
These annotations help see where the errors are coming from.

Updates #37.
@zlin3

This comment has been minimized.

Copy link
Author

@zlin3 zlin3 commented Feb 11, 2020

Thanks!

@zlin3 zlin3 closed this Feb 11, 2020
@dmitshur dmitshur changed the title http://dmitri.shuralyov.com/gpu/mtl not accessible Go packages were inaccessible from origin server on Feb 10, 2020 from 3:02~4:30 pm until 9:55 pm EST (UTC -5) Feb 13, 2020
@dmitshur

This comment has been minimized.

Copy link
Member

@dmitshur dmitshur commented Feb 13, 2020

Root Cause

The root cause was a server provider hardware failure causing unavailability of the VM that was hosting the dmitri.shuralyov.com website.

Timeline

Here's an annotated timeline of the outage, showing a partial graph of go get requests being handled. Note the prometheus server was running on the same VM that was down, so unavailability of metrics data corresponds to the outage.

image

I learned about the outage while at work, and realized I couldn't even SSH into the server where my site is hosted. I didn't have credentials on me to sign into the provider's portal, so I couldn't do anything until I got home. By the time I did, I saw a confirmation from the provider that it was indeed an issue on their side and they were already working on resolving it:

image

Based on the description (a "hardware failure") and no ETA, I wasn't confident it would be resolved soon, so I started working on setting up a second instance of the site on another server, using the data backups I had. I was doing this for the first time, so it took about 40 minutes. But the DNS TTL on the dmitri.shuralyov.com host was set to 1 hour. By the time the 1 hour was closer to done, the original VM was restored, so I just restarted my site there and reverted the DNS to point back to it.

Followup

I plan to investigate setting up either a second instance of my site running on another provider to be able to switch to in case of primary provider downtime, or a simpler static page that can point to copies of existing module versions elsewhere. I've opened issue #38 to track that.

I've also opened golang.org/issue/37175 about investigating what can or should be done in golang.org/x/exp to reduce the large number of dependencies from unrelated experimental packages.

As mentioned previously in #31 (comment), if reliability of your build is of high importance, then it's recommended to use a caching module proxy (such as the Go module mirror at https://proxy.golang.org), so that your module's build can be successful even when some origin servers are temporary unavailable. My personal website only has a 95%+ uptime SLA.

Note that as of Go 1.13, the https://proxy.golang.org module mirror is used by default without needing configuration, so this issue is most likely to affect users who haven't updated yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
You can’t perform that action at this time.