Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

misc/cgo/testcshared: sometimes stalls on windows-amd64-longtest builder in non-sharded longtest mode #39665

Open
dmitshur opened this issue Jun 17, 2020 · 10 comments

Comments

@dmitshur
Copy link
Contributor

@dmitshur dmitshur commented Jun 17, 2020

I've observed the following failure scenario while testing all.bash in non-sharded longtest mode via x/build/cmd/release. The build stalled on the ##### ../misc/cgo/testcshared test and did not make progress for several hours:

[...]
PASS
scatter = 0000000000EDD070
sqrt is: 0
hello from C
ok  	misc/cgo/test	7.344s

##### ../misc/cgo/testgodefs
PASS

##### ../misc/cgo/testso
ok  	misc/cgo/testso	1.977s

##### ../misc/cgo/testsovar
ok  	misc/cgo/testsovar	2.309s

##### ../misc/cgo/testcarchive
PASS

##### ../misc/cgo/testcshared

I've observed this just twice, on the latest commit of release-branch.go1.14, and on a recent commit of master (to be 1.15), using the windows-amd64-longtest builder:

$ release -target=windows-amd64-longtest -watch -version go1.14beta2 -rev=e98cafae04b78f1e994d52ea66d228451c8e6f81
$ release -target=windows-amd64-longtest -watch -version go1.15beta2 -rev=dea6d928f6c293631ce93bd3a3bb8b4020188954

It didn't happen a second time, after I re-tried. I don't yet know how common of an occurrence this is.

This is the tracking issue to collect information and investigate.

/cc @cagedmantis @toothrot @andybons @ianlancetaylor @bcmills

@dmitshur dmitshur added this to the Backlog milestone Jun 17, 2020
@dmitshur dmitshur changed the title misc/cgo/testcshared: sometimes stalls on windows-amd64-longtest builder in release testing mode misc/cgo/testcshared: sometimes stalls on windows-amd64-longtest builder in non-sharded longtest mode Jun 17, 2020
@dmitshur
Copy link
Contributor Author

@dmitshur dmitshur commented Jun 24, 2020

There may have been an occurrence of this exact problem or a similar problem in the SlowBot run of CL 239738 on the windows-amd64-2016 builder just now:

[...]
scatter = 0000000000569860
sqrt is: 0
hello from C
ok  	misc/cgo/test	16.696s

##### ../misc/cgo/testgodefs
PASS

##### ../misc/cgo/testso
ok  	misc/cgo/testso	1.981s

##### ../misc/cgo/testsovar
ok  	misc/cgo/testsovar	2.256s

##### ../misc/cgo/testcarchive
PASS


Error: runTests: dist test failed: all buildlets had network errors or timeouts, yet tests remain

@bcmills
Copy link
Member

@bcmills bcmills commented Nov 18, 2020

@bcmills
Copy link
Member

@bcmills bcmills commented Nov 18, 2020

Depending on how often this occurs, it might be a blocker for #42661. 😞

@bcmills
Copy link
Member

@bcmills bcmills commented Nov 18, 2020

Hmm, I wonder if this is related to #39349

@bcmills
Copy link
Member

@bcmills bcmills commented Jan 25, 2021

From CL 285720
(https://farmer.golang.org/temporarylogs?name=windows-amd64-2016&rev=26b0deca44e0cbf06661927dcd8b8546c5903aaf&st=0xc011c041a0):

  2021-01-25T14:15:33Z no_new_tests_remain 10.240.0.109:80
  2021-01-25T14:15:33Z closed_helper 10.240.0.109:80
  2021-01-25T14:16:03Z still_waiting_on_test testcshared
  2021-01-25T14:16:33Z still_waiting_on_test testcshared
  2021-01-25T14:17:03Z still_waiting_on_test testcshared
  2021-01-25T14:17:33Z still_waiting_on_test testcshared
  2021-01-25T14:18:03Z still_waiting_on_test testcshared
  2021-01-25T14:18:33Z still_waiting_on_test testcshared
  2021-01-25T14:19:03Z still_waiting_on_test testcshared
  2021-01-25T14:19:33Z still_waiting_on_test testcshared
  2021-01-25T14:20:03Z still_waiting_on_test testcshared
  2021-01-25T14:20:33Z still_waiting_on_test testcshared
  2021-01-25T14:21:03Z still_waiting_on_test testcshared
  2021-01-25T14:21:33Z still_waiting_on_test testcshared
  2021-01-25T14:22:03Z still_waiting_on_test testcshared
  2021-01-25T14:22:33Z still_waiting_on_test testcshared
  2021-01-25T14:23:03Z still_waiting_on_test testcshared
  2021-01-25T14:23:33Z still_waiting_on_test testcshared
  2021-01-25T14:24:03Z still_waiting_on_test testcshared
  2021-01-25T14:24:33Z still_waiting_on_test testcshared
  2021-01-25T14:25:03Z still_waiting_on_test testcshared
  2021-01-25T14:25:33Z still_waiting_on_test testcshared
  2021-01-25T14:26:03Z still_waiting_on_test testcshared
  2021-01-25T14:26:33Z still_waiting_on_test testcshared
  2021-01-25T14:27:03Z still_waiting_on_test testcshared
  2021-01-25T14:27:33Z still_waiting_on_test testcshared
  2021-01-25T14:28:03Z still_waiting_on_test testcshared
  2021-01-25T14:28:33Z still_waiting_on_test testcshared
  2021-01-25T14:29:03Z still_waiting_on_test testcshared
  2021-01-25T14:29:33Z still_waiting_on_test testcshared
  2021-01-25T14:30:03Z still_waiting_on_test testcshared
  2021-01-25T14:30:33Z still_waiting_on_test testcshared
  2021-01-25T14:31:03Z still_waiting_on_test testcshared
  2021-01-25T14:31:33Z still_waiting_on_test testcshared
  +12.0s (now)

@cuonglm
Copy link
Member

@cuonglm cuonglm commented Oct 26, 2021

@aclements
Copy link
Member

@aclements aclements commented Jan 11, 2022

I have a theory about this. For most tests, the dist tool lets go test compile and run the test, but for testcshared, it separately compiles and runs the test binary. go test implements a backstop timeout in case the test binary wedges too hard to timeout itself, but dist does not implement this logic for test binaries it runs directly. Hence, if testcshared wedges, the builder itself will eventually timeout with no further output.

@bcmills
Copy link
Member

@bcmills bcmills commented Jan 11, 2022

Still just windows-amd64-2008; curiously no new failures since October. 🤔

That suggests a possible connection to CL 365994 / #49457 (CC @ianlancetaylor, @bufflig, @cherrymui) — if TestGo2C2Go was the one timing out, then the timeouts would have stopped due to skipping that test.

greplogs --dashboard -md -l -e '(?m)##### \.\./misc/cgo/testcshared.*\n\z' --since=2021-08-28

2021-10-26T14:24:17-283d8a3/windows-amd64-2008
2021-10-14T07:18:59-1349c6e/windows-amd64-2008
2021-09-27T18:14:10-ecac351/windows-amd64-2008
2021-09-20T22:14:47-6e81f78/windows-amd64-2008
2021-09-20T16:20:33-2d9b486/windows-amd64-2008
2021-09-08T16:19:36-409434d/windows-amd64-2008
2021-08-31T23:45:48-2d98a4b/windows-amd64-2008
2021-08-30T22:07:53-3342aa5/windows-amd64-2008

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Jan 11, 2022

It is possible that this is related to TestGo2C2Go. When I tried to reproduce #49457 I saw it sometimes hangs (#49457 (comment)). And the underlying issue of #49457 can definitely cause it to hang. As there is no new failures after that CL, I think the connection is very likely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants