Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

all: occasional "resource temporarily unavailable" flakes on linux-s390x builder #32328

Open
bcmills opened this issue May 30, 2019 · 9 comments

Comments

Projects
None yet
5 participants
@bcmills
Copy link
Member

commented May 30, 2019

It's not clear to me whether this is related to CL 177599 and #32205.

From https://build.golang.org/log/8842ba4fe354ba0d2a48ea5918280b3a2a202dcb:

##### ../misc/cgo/errors
removing /data/golang/workdir/tmp/TestPointerChecks471752625
--- FAIL: TestPointerChecks (0.98s)
    --- FAIL: TestPointerChecks/exportok (0.00s)
        ptr_test.go:596: 
        ptr_test.go:597: failed unexpectedly: fork/exec /data/golang/workdir/tmp/TestPointerChecks471752625/src/ptrtest/ptrtest.exe: resource temporarily unavailable
FAIL

CC @ianlancetaylor @rsc

@bcmills bcmills added this to the Go1.13 milestone May 30, 2019

@bcmills

This comment has been minimized.

Copy link
Member Author

commented May 30, 2019

Here's a seemingly-related failure in the runtime test on the same builder:

--- FAIL: TestPanicTraceback (0.00s)
    crash_test.go:67: starting testprog PanicTraceback: fork/exec /data/golang/workdir/tmp/go-build726730629/testprog.exe: resource temporarily unavailable
FAIL

(https://build.golang.org/log/e518ab1802ac73754f4f3a51d7fb3cba86868b4e)

So perhaps it's not specific to the misc/cgo/errors test.

@bcmills bcmills changed the title misc/cgo/errors: TestPointerChecks flake on linux-s390x all: occasional "resource temporarily unavailable" flakes on linux-s390x builder May 30, 2019

@bcmills

This comment has been minimized.

Copy link
Member Author

commented May 30, 2019

@gopherbot

This comment has been minimized.

Copy link

commented May 30, 2019

Change https://golang.org/cl/179603 mentions this issue: misc/cgo/errors: limit number of parallel executions

@gopherbot gopherbot closed this in d53f380 May 31, 2019

@bcmills

This comment has been minimized.

Copy link
Member Author

commented Jun 14, 2019

I haven't seen misc/cgo/errors flake again, but this still occurs sporadically in other tests:
https://build.golang.org/log/75ddd2b8e6749643c9150bf8846ed69f5afdcddf
https://build.golang.org/log/7028d294ba985b9bd6e5cb2024af1fa2a07f7b37
https://build.golang.org/log/2cc3638b3235b7a7b63a474df5991ec468dc404d

I think this may need a deeper fix in our fork/exec wrapper.

@bcmills bcmills reopened this Jun 14, 2019

@mundaym

This comment has been minimized.

Copy link
Member

commented Jun 14, 2019

I've done a bit of digging (systemd isn't something I'm very familiar with) and I think this might be due to the default systemd TasksMax setting in SLES 12. It is only 512 threads which includes stage0 and the buildlet and everything they spawn...

I don't know if the fork/exec wrapper can do much if it hits a limit like this, I don't think retrying will necessarily solve the situation.

I've added the following lines to the buildlet service:

TasksMax=65536
LimitNOFILE=65536
LimitNPROC=65536

Hopefully this will make the s390x builder less flaky in future...

@mundaym

This comment has been minimized.

Copy link
Member

commented Jun 14, 2019

Does anyone know if there is a way to check the current cgroup's resource limits? Maybe we can get the buildlet to print some of them.

@mundaym

This comment has been minimized.

Copy link
Member

commented Jun 14, 2019

@bradfitz and @dmitshur: any thoughts on the systemd task settings we should be using for buildlets?

@bradfitz

This comment has been minimized.

Copy link
Member

commented Jun 14, 2019

any thoughts on the systemd task settings we should be using for buildlets?

For Go stuff I've always just used the defaults. But maybe the defaults have changed or your distro has lower limits or s390x ends up creating more threads for some reason?

Does anyone know if there is a way to check the current cgroup's resource limits? Maybe we can get the buildlet to print some of them.

I don't. But that's a good idea.

@mundaym

This comment has been minimized.

Copy link
Member

commented Jun 14, 2019

But maybe the defaults have changed or your distro has lower limits or s390x ends up creating more threads for some reason?

I think it's a distro defaults thing. Ubuntu 18.04 defaults to 4915 tasks which is a lot more headroom.

@andybons andybons modified the milestones: Go1.13, Go1.14 Jul 8, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.