Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/build: frequent "communication error to buildlet" failures on all plan9 builders #49756

Open
bcmills opened this issue Nov 23, 2021 · 2 comments
Open

Comments

@bcmills
Copy link
Member

@bcmills bcmills commented Nov 23, 2021

greplogs --dashboard -md -l -e 'communication error to buildlet' --since=2021-11-01

2021-11-22T23:51:43-83bfed9/plan9-amd64-0intro
2021-11-22T23:09:35-9678f79/plan9-amd64-0intro
2021-11-22T20:34:40-f13fcd9/plan9-amd64-0intro
2021-11-22T16:53:57-cd0bf38/plan9-arm
2021-11-22T04:27:29-e30ebaa/plan9-amd64-0intro
2021-11-20T00:32:49-57aba32/plan9-386-0intro
2021-11-19T21:59:14-5e774b0/plan9-amd64-0intro
2021-11-19T21:41:33-ba9f0f6/plan9-amd64-0intro
2021-11-18T16:18:50-feb330d/plan9-amd64-0intro
2021-11-17T19:51:32-9a33945/plan9-amd64-0intro
2021-11-17T19:51:32-9a33945/plan9-arm
2021-11-17T19:18:24-aa34ea2/plan9-amd64-0intro
2021-11-17T19:18:24-aa34ea2/plan9-arm
2021-11-17T18:14:23-9bdbed1/plan9-arm
2021-11-16T14:02:36-bddb79f/plan9-arm
2021-11-12T18:10:48-0c6a6cd/plan9-arm
2021-11-10T21:32:50-f410786/plan9-arm
2021-11-10T17:15:54-8a3be15/plan9-arm
2021-11-09T20:08:48-f48115c/plan9-arm
2021-11-07T04:57:22-9e6ad46/plan9-amd64-0intro
2021-11-05T21:13:38-091948a/plan9-arm
2021-11-05T17:46:27-4f543b5/plan9-arm
2021-11-05T17:39:43-0bc98b3/plan9-arm
2021-11-05T17:23:06-37951d8/plan9-arm
2021-11-05T16:35:00-f249fa2/plan9-arm
2021-11-04T18:22:03-b2149ac/plan9-386-0intro
2021-11-04T18:22:03-b2149ac/plan9-amd64-0intro
2021-11-04T17:07:48-5772877/plan9-amd64-0intro
2021-11-04T14:54:46-00d6d20/plan9-amd64-0intro
2021-11-04T14:17:18-901bf29/plan9-arm
2021-11-04T07:05:31-2622235/plan9-amd64-0intro
2021-11-04T02:57:53-2cf85b1/plan9-amd64-0intro
2021-11-04T02:57:48-5fd0c49/plan9-amd64-0intro
2021-11-04T00:15:18-be0cd9e/plan9-arm
2021-11-02T03:54:24-3c61cb3/plan9-arm
2021-11-01T22:55:50-02e5913/plan9-arm
2021-11-01T15:55:20-d2b5121/plan9-arm

CC @millerresearch @0intro

@golang/release, what provokes this error message in the coordinator? (Is this a failing keepalive of some sort?)

@dmitshur
Copy link
Contributor

@dmitshur dmitshur commented Nov 23, 2021

This looks relevant to:

// If a build fails multiple times due to communication
// problems with the buildlet, assume something's wrong with
// the buildlet or machine and fail the build, rather than
// looping forever. This promotes the err (communication
// error) to a remoteErr (an error that occurred remotely and
// is terminal).
if rerr := st.repeatedCommunicationError(err); rerr != nil {
	remoteErr = rerr
	err = nil
	doneMsg = "communication error to buildlet (promoted to terminal error): " + rerr.Error()
	fmt.Fprintf(st, "\n%s\n", doneMsg)
}

The current implementation of repeatedCommunicationError is implemented only for Plan 9, and doesn't allow for any retries yet:

// For now, only do this for plan9, which is flaky (Issue 31261)
if strings.HasPrefix(st.Name, "plan9-") && execErr == errBuildletsGone {
	// TODO: give it two tries at least later (store state
	// somewhere; global map?). But for now we're going to
	// only give it one try.
	return fmt.Errorf("network error promoted to terminal error: %v", execErr)
}

So this is related to issues #31261 and #13026.

Loading

@millerresearch
Copy link

@millerresearch millerresearch commented Nov 23, 2021

When I observe this on the plan9-arm builders, it's because the builder machine has crashed or rebooted for some reason unrelated to the go tests (eg power flicker, file server disk full). I often use the retrybuilds utility to clear and restart the failed build, and it's fine.

Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants