Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build: plan9/arm builder failing since 2017-08-16 #21493

Open
0intro opened this issue Aug 17, 2017 · 5 comments

Comments

@0intro
Copy link
Member

commented Aug 17, 2017

Since 2017-08-16, the buildlet running on the plan9/arm builder is unable to fetch the tgz archive provided by the coordinator successfully. Consequently, the buildlet is unable to run any Go build.

The last successful build on the plan9/arm builder was done on the 6aa3866 commit from 2017-08-16 at 05:12.

After receiving the PUT /writetgz request from the coordinator, the buildlet start to receive the tgz archive sent by the coordinator, but the connection stalls after receiving between 2 to 4 MB of data, approximately.

Buildlet logs:

cpu% buildlet -coordinator'='farmer.golang.org --reverse-type'='host-plan9-arm-0intro -halt'='false
2017/08/17 13:02:14 buildlet starting.
2017/08/17 13:02:15 Dialing coordinator farmer.golang.org:443 ...
2017/08/17 13:02:15 Doing TLS handshake with coordinator (verifying hostname "farmer.golang.org")...
2017/08/17 13:02:16 Registering reverse mode with coordinator...
2017/08/17 13:02:16 Connected to coordinator; reverse dialing active
2017/08/17 13:02:16 Removing .
2017/08/17 13:02:16 writetgz: untarring Request.Body into /tmp/workdir-host-plan9-arm-0intro/go
2017/08/17 13:02:17 extracted tarball into /tmp/workdir-host-plan9-arm-0intro/go: 1 files, 1 dirs (48.21324ms)
2017/08/17 13:02:21 writetgz: untarring Request.Body into /tmp/workdir-host-plan9-arm-0intro/go
2017/08/17 13:13:31 buildlet reverse mode exiting.

Farmer logs:

  builder: plan9-arm
      rev: 8127dbf76ad16350b72bbe29728836a90e60e3dc
 buildlet: http://rpi31 reverse peer rpi31/10.240.0.8:65216 for host type host-plan9-arm-0intro
  started: 2017-08-16 23:22:29.684144144 +0000 UTC
    ended: 2017-08-17 13:03:16.744451079 +0000 UTC
  success: false

Events:
  2017-08-16T23:22:29Z checking_for_snapshot 
  2017-08-16T23:22:29Z finish_checking_for_snapshot after 0s
  2017-08-16T23:22:29Z get_buildlet 
  2017-08-16T23:22:29Z wait_static_builder host-plan9-arm-0intro
  2017-08-16T23:22:29Z waiting_machine_in_use 
  2017-08-17T13:02:16Z finish_wait_static_builder after 13h39m47s; host-plan9-arm-0intro
  2017-08-17T13:02:16Z clean_buildlet http://rpi31 reverse peer rpi31/10.240.0.8:65216 for host type host-plan9-arm-0intro
  2017-08-17T13:02:17Z finish_clean_buildlet after 160.5ms; http://rpi31 reverse peer rpi31/10.240.0.8:65216 for host type host-plan9-arm-0intro
  2017-08-17T13:02:17Z finish_get_buildlet after 13h39m47.4s
  2017-08-17T13:02:17Z using_buildlet rpi31
  2017-08-17T13:02:17Z write_version_tar 
  2017-08-17T13:02:17Z get_source 
  2017-08-17T13:02:17Z get_source_from_gitmirror 
  2017-08-17T13:02:21Z finish_get_source_from_gitmirror after 3.84s
  2017-08-17T13:02:21Z finish_get_source after 3.95s
  2017-08-17T13:02:21Z write_go_src_tar 
  2017-08-17T13:03:16Z finish_write_go_src_tar after 55.1s; err=writing tarball from Gerrit: Put http://rpi31/writetgz?dir=go: net/http: request canceled

Build log:
Error: writing tarball from Gerrit: Put http://rpi31/writetgz?dir=go: net/http: request canceled

When looking at the network traffic with snoopy, I can see that the builder (192.168.2.75) stops to receive any packet from the coordinator (107.178.219.46) at some point.

4446  75.206661 41511 192.168.2.75 443 107.178.219.46 TCP 60 41511→443 [ACK] Seq=1131 Ack=3162010 Win=1047136 Len=0
4447  75.208371 443 107.178.219.46 41511 192.168.2.75 TLSv1.2 1478 Continuation Data
4448  75.211319 443 107.178.219.46 41511 192.168.2.75 TLSv1.2 1478 Continuation Data
4449  75.212044 41511 192.168.2.75 443 107.178.219.46 TCP 60 41511→443 [ACK] Seq=1131 Ack=3164850 Win=1047136 Len=0
4450  75.216010 443 107.178.219.46 41511 192.168.2.75 TLSv1.2 1478 Continuation Data
4451  75.218233 443 107.178.219.46 41511 192.168.2.75 TLSv1.2 1478 Continuation Data
4452  75.218933 41511 192.168.2.75 443 107.178.219.46 TCP 60 41511→443 [ACK] Seq=1131 Ack=3167690 Win=1047136 Len=0
4453  75.221868 443 107.178.219.46 41511 192.168.2.75 TLSv1.2 1478 Continuation Data
4454  75.224720 443 107.178.219.46 41511 192.168.2.75 TLSv1.2 1478 [TCP Previous segment not captured] Continuation Data
4455  75.225429 41511 192.168.2.75 443 107.178.219.46 TCP 60 41511→443 [ACK] Seq=1131 Ack=3169110 Win=1048560 Len=0
4456  75.228128 443 107.178.219.46 41511 192.168.2.75 TLSv1.2 1478 [TCP Previous segment not captured] Continuation Data
4457  75.228856 41511 192.168.2.75 443 107.178.219.46 TCP 60 [TCP Dup ACK 4455#1] 41511→443 [ACK] Seq=1131 Ack=3169110 Win=1048560 Len=0
4458  75.437238 41511 192.168.2.75 443 107.178.219.46 TLSv1.2 219 [TCP Retransmission] Application Data
4459  76.837654 41511 192.168.2.75 443 107.178.219.46 TLSv1.2 219 [TCP Retransmission] Application Data
4460  79.638258 41511 192.168.2.75 443 107.178.219.46 TLSv1.2 219 [TCP Retransmission] Application Data
4461  85.239376 41511 192.168.2.75 443 107.178.219.46 TLSv1.2 219 [TCP Retransmission] Application Data
4462  96.441390 41511 192.168.2.75 443 107.178.219.46 TLSv1.2 219 [TCP Retransmission] Application Data
4463 111.449309 41511 192.168.2.75 443 107.178.219.46 SSL 60 [TCP Retransmission] [Malformed Packet]
4464 118.850812 41511 192.168.2.75 443 107.178.219.46 TLSv1.2 219 [TCP Retransmission] Application Data
4465 133.853783 41511 192.168.2.75 443 107.178.219.46 SSL 60 [TCP Retransmission] [Malformed Packet]
4466 148.856594 41511 192.168.2.75 443 107.178.219.46 SSL 60 [TCP Retransmission] [Malformed Packet]
4467 163.661236 41511 192.168.2.75 443 107.178.219.46 TLSv1.2 219 [TCP Retransmission] Application Data
4468 178.663927 41511 192.168.2.75 443 107.178.219.46 SSL 60 [TCP Retransmission] [Malformed Packet]
4469 193.666656 41511 192.168.2.75 443 107.178.219.46 SSL 60 [TCP Retransmission] [Malformed Packet]
4470 208.669393 41511 192.168.2.75 443 107.178.219.46 SSL 60 [TCP Retransmission] [Malformed Packet]
4471 223.675221 41511 192.168.2.75 443 107.178.219.46 SSL 60 [TCP Retransmission] [Malformed Packet]

@0intro 0intro added this to the Go1.10 milestone Aug 17, 2017

@0intro 0intro self-assigned this Aug 17, 2017

@0intro

This comment has been minimized.

Copy link
Member Author

commented Aug 17, 2017

Could this be an issue on the coordinator side? I'm failing to see anything wrong on the builder's side so far.

@0intro

This comment has been minimized.

Copy link
Member Author

commented Aug 17, 2017

I've restarted the former old-style builder, while I'm investigating this issue.

@bradfitz

This comment has been minimized.

Copy link
Member

commented Aug 17, 2017

@bradfitz bradfitz modified the milestones: Go1.10, Unreleased Nov 29, 2017

@bradfitz

This comment has been minimized.

Copy link
Member

commented Nov 29, 2017

It's sometimes "ok" now. But still 95% red.

Looks like lots of flakes.

This needs love.

@0intro

This comment has been minimized.

Copy link
Member Author

commented Feb 14, 2018

The flakes on the plan9/arm builder were caused by a exec/postnote race in the Plan 9 kernel. Richard Miller wrote a patch to mitigate the issue.

The patch has been applied on the plan9/arm builder on January 22, so it should be better now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.