Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/build: linux-arm builder can't finish an all.bash run when test sharding isn't used #40872

Open
randall77 opened this issue Aug 18, 2020 · 9 comments
Assignees
Milestone

Comments

@randall77
Copy link
Contributor

@randall77 randall77 commented Aug 18, 2020

When I run a trybot on linux/arm, I get "out of memory" or "no space left on device" errors.

ok  	reflect	3.259s
ok  	regexp	0.949s
ok  	regexp/syntax	4.910s
# cmd/compile/internal/ssa
fatal error: runtime: out of memory

runtime stack:
runtime.throw(0x29ca71, 0x16)
	/workdir/go/src/runtime/panic.go:1116 +0x5c
runtime.sysMap(0x11400000, 0x3000000, 0x42bc70)
	/workdir/go/src/runtime/mem_linux.go:169 +0xa8
runtime.(*linearAlloc).alloc(0x41d280, 0x3000000, 0x400000, 0x42bc70, 0x0)
	/workdir/go/src/runtime/malloc.go:1447 +0x94
ok  	cmd/addr2line	28.924s
ok  	cmd/api	113.949s
ok  	cmd/asm/internal/asm	33.778s
ok  	cmd/asm/internal/lex	0.233s
# cmd/fix.test
panic: no space left on device

goroutine 1 [running]:
cmd/link/internal/ld.Main(0x3f33c0, 0x4, 0x8, 0x1, 0xd, 0xe, 0x0, 0x0, 0x285316, 0x12, ...)
	/workdir/go/src/cmd/link/internal/ld/main.go:319 +0x1c08
main.main()
	/workdir/go/src/cmd/link/main.go:68 +0x12c
/workdir/go/pkg/tool/linux_arm/vet: fork/exec /workdir/go/pkg/tool/linux_arm/vet: cannot allocate memory
/workdir/go/pkg/tool/linux_arm/vet: fork/exec /workdir/go/pkg/tool/linux_arm/vet: cannot allocate memory
/workdir/go/pkg/tool/linux_arm/vet: fork/exec /workdir/go/pkg/tool/linux_arm/vet: cannot allocate memory

Is there anything we can do to fix this? Can we get more memory and/or disk space on these builders?

I could work on making cmd/compile/internal/ssa tests take less memory, perhaps. Not sure how much we could save.

@gopherbot gopherbot added the Builders label Aug 18, 2020
@gopherbot gopherbot added this to the Unreleased milestone Aug 18, 2020
@randall77
Copy link
Contributor Author

@randall77 randall77 commented Aug 18, 2020

How do the trybots succeed? Is it because they shard out the tests to multiple machines?

@dmitshur
Copy link
Member

@dmitshur dmitshur commented Aug 18, 2020

Can you please include links to CLs where you've seen this? That way we'll have more information (e.g., which commit was being tested exactly, etc.). How often is this happening? Did it start recently?

The linux-arm builder is defined https://github.com/golang/build/blob/148ff27ab5b70970002d390c9e1da4b861f6da9f/dashboard/builders.go#L1736-L1756. They run on Scaleway (also see here), so adjusting resources will be limited to what's available there (we might already be maxed out; but need to look again to be more confident).

I see that linux-arm trybots are currently disabled because of other issues:

tryBot:            nil, // Issue 22748, Issue 22749

Is this issue about that builder when requested via SlowBots or something else?

/cc @cagedmantis @toothrot @andybons per builder owners.

@randall77
Copy link
Contributor Author

@randall77 randall77 commented Aug 18, 2020

This happens when using gomote to run all.bash manually:

gomote create linux-arm
gomote push user-khr-linux-arm-0
gomote run go/src/all.bash

Sorry, I guess I'm using the term "trybot" to mean both the thing that tests CLs as well as manual gomotes. I mean the latter (except in the context of my second comment).

@dmitshur dmitshur changed the title x/build: linux-arm trybots can't finish a run x/build: linux-arm builder can't finish an all.bash run when test sharding isn't used Aug 18, 2020
@dmitshur
Copy link
Member

@dmitshur dmitshur commented Aug 18, 2020

@cagedmantis Do you expect #36841 will be able to help with this (by enabling a linux-arm builder with bigger limits)?

@cagedmantis
Copy link
Contributor

@cagedmantis cagedmantis commented Aug 18, 2020

@dmitshur Yes, I'm actively working on the linux-arm-aws builder with more resources. I will assign myself to this issue.

@cagedmantis cagedmantis self-assigned this Aug 18, 2020
@dmitshur
Copy link
Member

@dmitshur dmitshur commented Aug 19, 2020

Oh, I believe this is the same issue as #35628. /cc @cherrymui I'll close it in favor of that one, and move your assignment @cagedmantis if you don't mind.

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Aug 19, 2020

This is not exactly the same. #35628 is about trybot, this is about "when test sharding isn't used" e.g. manual gomote runs. The trybot one has weird STALE errors, whereas this one is OOMing or out of disk space.

About disk space, if I remember correctly, last time I looked, the machine actually has reasonably sizable disk space, but we're running on a very small partition.

@dmitshur
Copy link
Member

@dmitshur dmitshur commented Aug 19, 2020

We can re-open this if it'd be helpful to confirm this issue is fixed when #35628 is fixed, but as I understand, this builder is broken in all contexts other than as a post-submit builder (on build.golang.org).

@dmitshur dmitshur reopened this Aug 19, 2020
@gopherbot
Copy link

@gopherbot gopherbot commented Aug 19, 2020

Change https://golang.org/cl/249420 mentions this issue: cmd/coordinator: warn about known linux-arm SlowBot issue

gopherbot pushed a commit to golang/build that referenced this issue Aug 20, 2020
The current linux-arm builder is known to have trouble when used as
a SlowBot. Start warning about it when the builder is requested via
the TRY= SlowBot UI.

I've considered also removing or disabling the "arm" SlowBot alias,
but that would make it easier to miss that there's an issue, since
SlowBots don't warn about unknown builders:

	If you specify an unknown TRY= token, it'll just ignore it
	and won't report an error.

We can consider making further changes as this situation evolves.
The goal here is to start notifying about a known problem sooner.

For golang/go#35628.
For golang/go#40872.

Change-Id: Ibc1205720c44ec4823c632c04fc2f887368258c1
Reviewed-on: https://go-review.googlesource.com/c/build/+/249420
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Alexander Rakoczy <alex@golang.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
You can’t perform that action at this time.