-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x/build: solaris-amd64-oraclerel failures with "no space left on device" #46362
Comments
"Bryan C. Mills" ***@***.***> writes:
The `solaris-amd64-oraclerel` builder seems to be failing moderately
frequently with `no space left on device` errors:
[...]
It's not obvious to me whether the buildlet script is failing to clean
something up, the device's disk is getting too full for other reasons, or
perhaps the builder is just configured to run too many builds in parallel.
I've looked around and there may be several issues:
* In every one of the failing builds, /tmp was full. It's ca. 40 GB no
the build host, but resides in tmpfs, thus shares space with swap.
* While golang left ca. 931 MB around in
/tmp/workdir-host-solaris-oracle-amd64-oraclerel (almost half of that
in go/pkg/obj/go-build) even after the build service was stopped,
there still was plenty of free space left.
* I don't the parallelism is too high: the golang buildlets uses 4 cores
max, and an llvm buildbot running on the same host another 8, while
the host has 24 cores.
* Given all this, I suspect (but this is just a hunch) that some llvm
testcase either exhausts /tmp (unlikely given that the llvm tmp files
seem to reside in /var/tmp exclusively) or VM/swap (way more likely: I
had runaway llvm testcases like this in the past), and if they are as
lazy with resource control as they are with cleaning up tmp files
(350k files in /var/tmp), this seems to be the most plausible cause.
For remedy, there are several options:
* Increasing either/or RAM or swap.
* Limit the VM consumption of the services.
I'll look into either of those.
|
This has started occurring intermittently again.
2022-04-23T05:38:56-9717e8f/solaris-amd64-oraclerel |
"Bryan C. Mills" ***@***.***> writes:
This has started occurring intermittently again.
`greplogs --dashboard -md -l -e '(?ms)\Asolaris-amd64-oraclerel.* no space
left on device' --since=2021-03-26`
[2022-04-23T05:38:56-9717e8f/solaris-amd64-oraclerel](https://build.golang.org/log/e886bcdb39aa314c4c2020f982cb1c08abc0b0ec)
[2022-04-19T17:05:22-4804c43-689dc17/solaris-amd64-oraclerel](https://build.golang.org/log/2a5a54de4df66c6483c78610d7bf4cfa8a237c8f)
[note 11-month gap!]
[...]
I was recently forced to migrate the zone hosting the builder to a
different machine. In the process, swap was inadvertently reduced from
32 GB to 4 GB. With WORKDIR residing in /tmp (tmpfs), VM shortage could
lead to those errors.
I've now restored the previous swap size, which should make the problem
vanish, like it did for the last year.
|
|
The
solaris-amd64-oraclerel
builder seems to be failing moderately frequently withno space left on device
errors:2021-05-24T20:15:56-15d9d4a/solaris-amd64-oraclerel
2021-05-10T15:11:50-ecb7392/solaris-amd64-oraclerel
2021-05-03T16:42:22-169155d/solaris-amd64-oraclerel
2021-04-28T19:13:50-ad989c7/solaris-amd64-oraclerel
2021-04-26T21:27:41-9f60169/solaris-amd64-oraclerel
2021-04-08T20:55:59-0243799/solaris-amd64-oraclerel
2021-04-08T02:08:45-b261fe9/solaris-amd64-oraclerel
2021-03-30T21:06:17-4fbd30e/solaris-amd64-oraclerel
It's not obvious to me whether the buildlet script is failing to clean something up, the device's disk is getting too full for other reasons, or perhaps the builder is just configured to run too many builds in parallel.
CC @golang/release @rorth
The text was updated successfully, but these errors were encountered: