Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/build: spacemonkey arm5 builders broken, clean tmp dirs automatically #28041

Closed
bradfitz opened this issue Oct 5, 2018 · 16 comments

Comments

Projects
None yet
5 participants
@bradfitz
Copy link
Member

commented Oct 5, 2018

Looks like the spacemonkey builders were broken by https://go-review.googlesource.com/c/go/+/139418 (os: add UserHomeDir) because they're running without $HOME set and are using an old stage0 before we set it automatically in https://go-review.googlesource.com/30599 ?

I could update cmd/buildlet instead, though. It'd be slightly redundant with stage0, but wouldn't require changes on the arm5 hosts.

/cc @zeebo

@gopherbot gopherbot added this to the Unreleased milestone Oct 5, 2018

@gopherbot gopherbot added the Builders label Oct 5, 2018

@gopherbot

This comment has been minimized.

Copy link

commented Oct 5, 2018

Change https://golang.org/cl/140177 mentions this issue: cmd/buildlet: set USER and HOME if unset on the arm5 builder

@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Oct 5, 2018

Looks like the arm5 builders aren't reloading the buildlet per run like they're supposed to.

I don't know how they're configured.

@zeebo?

@bradfitz bradfitz reopened this Oct 5, 2018

@zeebo

This comment has been minimized.

Copy link
Contributor

commented Oct 6, 2018

I have since moved on from there. I’ll try to get in contact with whoever owns these now and point them here.

I don’t know how long that will take or if they will care, so any temporary measures to fix this are fine by me.

@onionjake

This comment has been minimized.

Copy link

commented Oct 9, 2018

I can look into this. I haven't connected to them before so it might take a bit for me to sort that out.

@onionjake

This comment has been minimized.

Copy link

commented Oct 11, 2018

@bradfitz Looks like the process is failing because the builders are run with the user builder with the home of /home/builder and it is getting access denied trying to write to /root. Would you prefer that set $USER and $HOME properly so you don't need the workaround at all?

@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Oct 11, 2018

Which process is failing?

Actually, let's back up. Can you describe how those machines are configured? Is there a systemd unit? What's it run, or what's its complete definition? If not systemd, what runs the loop that connects to the coordinator?

I can then document all this in our repo.

@onionjake

This comment has been minimized.

Copy link

commented Oct 11, 2018

It uses daemontools. I added the exports for USER and HOME just now.

root@go-builder-3:~# cat /etc/service/stage0/run 
#!/bin/bash

set -e

SCRATCH=~builder/stage0scratch
export TMPDIR=${SCRATCH}/tmp
export WORKDIR=${SCRATCH}/workdir

mkdir -p ${TMPDIR}
mkdir -p ${WORKDIR}
chown -R builder:builder ${SCRATCH}
cd ${SCRATCH}

export GO_BUILDER_ENV=linux-arm-arm5spacemonkey
export GO_TEST_TIMEOUT_SCALE=5
export USER=builder
export HOME=/home/builder
exec bash -c "setuidgid builder /usr/local/bin/stage0 2>&1 | logger"

Builder was looking for the build key in /root/, which got permission denied. I think the builder is running again now that I set USER and HOME correctly.

@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Oct 11, 2018

I see 2 connected now at https://farmer.golang.org/ ...

host-linux-arm5spacemonkey: 2/2 (1 missing)

There used to be 3, though. Is one still coming online?

@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Oct 11, 2018

Nevermind, the third just showed up.

@onionjake

This comment has been minimized.

Copy link

commented Oct 12, 2018

Looks like two different failures now?

# _/home/builder/stage0scratch/workdir/go/misc/cgo/test.test
/home/builder/stage0scratch/workdir/go/pkg/tool/linux_arm/link: flushing /home/builder/stage0scratch/tmp/go-link-851625167/go.o: write /home/builder/stage0scratch/tmp/go-link-851625167/go.o: no space left on device
FAIL	_/home/builder/stage0scratch/workdir/go/misc/cgo/test [build failed]
##### ../misc/cgo/testplugin
PASS
something
# command-line-arguments
/home/builder/stage0scratch/workdir/go/pkg/tool/linux_arm/link: running arm-linux-gnueabi-gcc failed: exit status 1
collect2: error: ld returned 1 exit status

2018/10/11 23:21:42 Failed: exit status 2
2018/10/11 23:21:47 FAILED

I will double check on the disk space.

@onionjake

This comment has been minimized.

Copy link

commented Oct 12, 2018

It looks like there is a lot of stuff left around in tmp should that be cleaned between each run?

root@go-builder-2:/home/builder/stage0scratch# du -hs workdir/ tmp
619M    workdir/
1.2G    tmp
@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Oct 12, 2018

@dmitshur, can you investigate the tempdir situation and who's responsible for cleaning it that's not?

I suspect what should happen (but likely isn't) is that the buildlet should figure out the temp dir it's been given (using https://golang.org/pkg/os/#TempDir) and then it should make a subdirectory under that and then set the appropriate environment variable(s) for all child processes (in handleExec). Then on graceful shutdown (handleHalt) it can clean up its own directories (as best it can), and on ungraceful shutdowns it can instead nuke it if it exists.

So os.TempDir returns (likely) /tmp, and then on start-up we nuke /tmp/buildlet, and then all child processes run with, say, TMP=/tmp/buildlet and on shutdown we nuke /tmp/buildlet.

I'm not sure what the story is now.

@bradfitz bradfitz added the NeedsFix label Oct 12, 2018

@bradfitz bradfitz changed the title x/build: spacemonkey arm5 builders broken x/build: spacemonkey arm5 builders broken, clean tmp dirs automatically Oct 12, 2018

@onionjake

This comment has been minimized.

Copy link

commented Oct 12, 2018

I will stop the process, clean the dirs, and start it again to see if we can get some passing and make sure there are not other issues.

@onionjake

This comment has been minimized.

Copy link

commented Oct 14, 2018

With the space cleaned it looks like all the builds are consistently failing with:

##### API check
Error running API checker: exit status 1
...
exit status 1
2018/10/12 16:39:17 Failed: exit status 1
2018/10/12 16:39:23 FAILED

I will try and look to see if there is something else that might be causing this issue.

@gopherbot

This comment has been minimized.

Copy link

commented Oct 25, 2018

Change https://golang.org/cl/144637 mentions this issue: cmd/buildlet: set up & clean TMPDIR and GOCACHE for child processes

@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Oct 25, 2018

Okay, this the new arm5 buildlet binary is pushed.

I'll watch it for any new problems.

It'll keep itself cleaned for new stuff, but you might have to clean legacy messes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.