Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/build: monitor/graph GCE instance-create-to-buildlet latencies #21148

Closed
bradfitz opened this issue Jul 24, 2017 · 5 comments

Comments

Projects
None yet
3 participants
@bradfitz
Copy link
Member

commented Jul 24, 2017

In the past few days our Windows GCE instances seem to create, but then the buildlet doesn't come up in 5 minutes.

Why?

Also, we need to monitor & alert on this.

/cc @adams-sarah @cybrcodr @johnsonj

@gopherbot gopherbot added this to the Unreleased milestone Jul 24, 2017

@gopherbot gopherbot added the Builders label Jul 24, 2017

@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Jul 24, 2017

(The build system does retry, though, and it seems to eventually work. But something's being flaky and thus our builds and trybots are slow.)

@johnsonj

This comment has been minimized.

Copy link
Member

commented Jul 24, 2017

+1 on monitor/alert. Looks like the buildlet process starts but then nothing:

Serial console output for buildlet-windows-amd64-2012-rnb5c1b2a

 SeaBIOS (version 1.8.2-20170419_170401-google)
Total RAM Size = 0x00000000e6600000 = 3686 MiB
CPUs found: 4     Max CPUs supported: 4
found virtio-scsi at 0:3
virtio-scsi vendor='Google' product='PersistentDisk' rev='1' type=0 removable=0
virtio-scsi blksize=512 sectors=104857600 = 51200 MiB
drive 0x000f31a0: PCHS=0/0/0 translation=lba LCHS=1024/255/63 s=104857600
Booting from Hard Disk 0...
7/24/2017 7:55:26 PM UTC: GCE Agent started (version 3.5.1.0).
7/24/2017 7:55:28 PM UTC: Starting startup scripts (version 3.5.1.0).
7/24/2017 7:55:33 PM UTC: Finished running startup scripts.
2017/07/24 19:55:51 buildlet starting.
@johnsonj

This comment has been minimized.

Copy link
Member

commented Jul 24, 2017

Created a builder and captured console output:

2017/07/24 20:31:07 network is up.
2017/07/24 20:31:07 Downloading https://storage.googleapis.com/go-builder-data/b
uildlet.windows-amd64 to .\buildlet.exe ...
2017/07/24 20:31:07 Downloaded .\buildlet.exe (7617536 bytes)
fatal error: unexpected signal during runtime execution
[signal 0xc0000005 code=0x0 addr=0xffffffffffffffff pc=0x427e42]

runtime stack:
runtime.throw(0x7620f5, 0x2a)
        /home/bradfitz/go/src/runtime/panic.go:605 +0x9c
runtime.sigpanic()
        /home/bradfitz/go/src/runtime/signal_windows.go:155 +0x184
runtime.netpoll(0xc042019901, 0xc042019901)
        /home/bradfitz/go/src/runtime/netpoll_windows.go:105 +0x332
runtime.findrunnable(0xc042016000, 0x0)
        /home/bradfitz/go/src/runtime/proc.go:2107 +0x610
runtime.schedule()
        /home/bradfitz/go/src/runtime/proc.go:2245 +0x13a
runtime.goexit0(0xc04213a480)
        /home/bradfitz/go/src/runtime/proc.go:2396 +0x24b
runtime.mcall(0x0)
        /home/bradfitz/go/src/runtime/asm_amd64.s:286 +0x5e

goroutine 1 [select]:
net/http.(*Transport).getConn(0x8fe240, 0xc0421720f0, 0x0, 0xc04217e000, 0x4, 0x
c04216c120, 0x12, 0x0, 0x0, 0xc042067648)
        /home/bradfitz/go/src/net/http/transport.go:948 +0x5c6
net/http.(*Transport).RoundTrip(0x8fe240, 0xc042182000, 0x8fe240, 0x0, 0x0)
        /home/bradfitz/go/src/net/http/transport.go:400 +0x6ad
net/http.send(0xc042182000, 0x8c92c0, 0x8fe240, 0x0, 0x0, 0x0, 0xc04216e020, 0x1
00, 0xc0420679c8, 0x1)
        /home/bradfitz/go/src/net/http/client.go:249 +0x1b0
net/http.(*Client).send(0x8f92a0, 0xc042182000, 0x0, 0x0, 0x0, 0xc04216e020, 0x0
, 0x1, 0x4)
        /home/bradfitz/go/src/net/http/client.go:173 +0x104
net/http.(*Client).Do(0x8f92a0, 0xc042182000, 0xa, 0x757505, 0x11)
        /home/bradfitz/go/src/net/http/client.go:602 +0x294
cloud.google.com/go/compute/metadata.getETag(0x8f92a0, 0xc04216c100, 0x1c, 0x0,
0x0, 0x0, 0x0, 0x0, 0x0)
        /home/bradfitz/src/cloud.google.com/go/compute/metadata/metadata.go:132
+0x1f7
cloud.google.com/go/compute/metadata.Get(0xc04216c100, 0x1c, 0x14, 0x7543c6, 0x8
, 0xc04216c100)
        /home/bradfitz/src/cloud.google.com/go/compute/metadata/metadata.go:107
+0x48
cloud.google.com/go/compute/metadata.InstanceAttributeValue(0x7543c6, 0x8, 0xc04
2067d98, 0x42b07d, 0x76ce20, 0xc042067da8)
        /home/bradfitz/src/cloud.google.com/go/compute/metadata/metadata.go:405
+0x76
main.metadataValue(0x7543c6, 0x8, 0x0, 0x0)
        /home/bradfitz/src/golang.org/x/build/cmd/buildlet/buildlet.go:329 +0x40
7
main.defaultListenAddr(0x757af6, 0x12)
        /home/bradfitz/src/golang.org/x/build/cmd/buildlet/buildlet.go:87 +0x4e
main.main()
        /home/bradfitz/src/golang.org/x/build/cmd/buildlet/buildlet.go:139 +0x8b
a

goroutine 49 [IO wait]:
internal/poll.runtime_pollWait(0x2d4e40, 0x77, 0xc04218a0b8)
        /home/bradfitz/go/src/runtime/netpoll.go:173 +0x5e
internal/poll.(*pollDesc).wait(0xc04218a158, 0x77, 0xc04216a000, 0x0, 0x0)
        /home/bradfitz/go/src/internal/poll/fd_poll_runtime.go:85 +0xb5
internal/poll.(*ioSrv).ExecIO(0x900e28, 0xc04218a0b8, 0x76c4b8, 0xc04214b1a8, 0x
c04214b1b0, 0xc04214b1a0)
        /home/bradfitz/go/src/internal/poll/fd_windows.go:191 +0x126
internal/poll.(*FD).ConnectEx(0xc04218a000, 0x8c9b00, 0xc04216c140, 0xc042162240
, 0xc04218a000)
        /home/bradfitz/go/src/internal/poll/fd_windows.go:721 +0x80
net.(*netFD).connect(0xc04218a000, 0x8cdf80, 0xc042162240, 0x0, 0x0, 0x8c9b00, 0
xc04216c140, 0x0, 0x0, 0x0, ...)
        /home/bradfitz/go/src/net/fd_windows.go:116 +0x243
net.(*netFD).dial(0xc04218a000, 0x8cdf80, 0xc042162240, 0x8cf240, 0x0, 0x8cf240,
 0xc0421721b0, 0xc04214b3a0, 0x56a395)
        /home/bradfitz/go/src/net/sock_posix.go:142 +0xf3
net.socket(0x8cdf80, 0xc042162240, 0x7533a6, 0x3, 0x2, 0x1, 0x0, 0x0, 0x8cf240,
0x0, ...)
        /home/bradfitz/go/src/net/sock_posix.go:93 +0x1c1
net.internetSocket(0x8cdf80, 0xc042162240, 0x7533a6, 0x3, 0x8cf240, 0x0, 0x8cf24
0, 0xc0421721b0, 0x1, 0x0, ...)
        /home/bradfitz/go/src/net/ipsock_posix.go:141 +0x158
net.doDialTCP(0x8cdf80, 0xc042162240, 0x7533a6, 0x3, 0x0, 0xc0421721b0, 0x920f40
, 0x0, 0x0)
        /home/bradfitz/go/src/net/tcpsock_posix.go:62 +0xc0
net.dialTCP(0x8cdf80, 0xc042162240, 0x7533a6, 0x3, 0x0, 0xc0421721b0, 0xbe55b423
665a8374, 0x77d9bc38, 0x902ee0)
        /home/bradfitz/go/src/net/tcpsock_posix.go:58 +0xeb
net.dialSingle(0x8cdf80, 0xc042162240, 0xc042180080, 0x8cbd00, 0xc0421721b0, 0x0
, 0x0, 0x0, 0x0)
        /home/bradfitz/go/src/net/dial.go:547 +0x3e9
net.dialSerial(0x8cdf80, 0xc042162240, 0xc042180080, 0xc042186090, 0x1, 0x1, 0x0
, 0x0, 0x0, 0x0)
        /home/bradfitz/go/src/net/dial.go:515 +0x24e
net.(*Dialer).DialContext(0xc042096120, 0x8cdf40, 0xc04204c078, 0x7533a6, 0x3, 0
xc04216c120, 0x12, 0x0, 0x0, 0x0, ...)
        /home/bradfitz/go/src/net/dial.go:397 +0x6f5
net.(*Dialer).Dial(0xc042096120, 0x7533a6, 0x3, 0xc04216c120, 0x12, 0x1240042176
120, 0x110, 0x110, 0xc042188000)
        /home/bradfitz/go/src/net/dial.go:320 +0x7c
net.(*Dialer).Dial-fm(0x7533a6, 0x3, 0xc04216c120, 0x12, 0xc042186060, 0xc042117
998, 0x403580, 0x60)
        /home/bradfitz/src/cloud.google.com/go/compute/metadata/metadata.go:72 +
0x59
net/http.(*Transport).dial(0x8fe240, 0x8cdf40, 0xc04204c078, 0x7533a6, 0x3, 0xc0
4216c120, 0x12, 0x0, 0x0, 0x0, ...)
        /home/bradfitz/go/src/net/http/transport.go:887 +0x82
net/http.(*Transport).dialConn(0x8fe240, 0x8cdf40, 0xc04204c078, 0x0, 0xc04217e0
00, 0x4, 0xc04216c120, 0x12, 0x0, 0xc042130120, ...)
        /home/bradfitz/go/src/net/http/transport.go:1060 +0x1d69
net/http.(*Transport).getConn.func4(0x8fe240, 0x8cdf40, 0xc04204c078, 0xc0421721
20, 0xc042176060)
        /home/bradfitz/go/src/net/http/transport.go:943 +0x7f
created by net/http.(*Transport).getConn
        /home/bradfitz/go/src/net/http/transport.go:942 +0x39a

goroutine 50 [select]:
net.(*netFD).connect.func2(0x8cdf80, 0xc042162240, 0xc04218a000, 0xc0421761e0)
        /home/bradfitz/go/src/net/fd_windows.go:105 +0xf9
created by net.(*netFD).connect
        /home/bradfitz/go/src/net/fd_windows.go:104 +0x218
2017/07/24 20:31:07 Error running buildlet: exit status 2
2017/07/24 20:31:07 (sleeping for 1 minute before failing)
@gopherbot

This comment has been minimized.

Copy link

commented Jul 24, 2017

CL https://golang.org/cl/50880 mentions this issue.

@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Jul 24, 2017

A few days ago I'd replaced with the Windows buildlet with a Go 1.9-built one.

I've reverted it to a Go 1.8-built one and it now seems to work again.

That's disconcerting, so I'm hoping I had unrelated code changes in there too. I'm going to try to repro in staging. I really hope we don't have Go 1.9-on-Windows/GCE problems.

@golang golang locked and limited conversation to collaborators Jul 31, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.