Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/build: cross-compile ARM for speed? #17105

Closed
bradfitz opened this issue Sep 14, 2016 · 32 comments

Comments

Projects
None yet
8 participants
@bradfitz
Copy link
Member

commented Sep 14, 2016

As part of #17104 to improve trybot speed and get Trybots down to 5 minutes, I now see that linux-arm is the slowest builder, even sharded 8 machines wide.

The problem is that just make.bash on linux-arm takes 5 minutes itself, even without running any tests, so sharding 8 machines wide doesn't help much.

What do people think about cross-compiling the linux-arm make.bash on linux-amd64 (on Kubernetes) first (which takes about 33 seconds), and then pushing that out to 7 real ARM machines for tests? (instead of pushing out the same everything-built tarball from the ARM machine)

In parallel, we could run a real ARM make.bash (for 5 minutes) to verify it works, but never use its output for testing.

Thoughts?

/cc @ianlancetaylor @quentinmit @davecheney @minux @cherrymui

@bradfitz bradfitz added the Builders label Sep 14, 2016

@bradfitz bradfitz added this to the Unreleased milestone Sep 14, 2016

@bradfitz bradfitz self-assigned this Sep 14, 2016

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Sep 14, 2016

SGTM

@cherrymui

This comment has been minimized.

Copy link
Contributor

commented Sep 14, 2016

SGTM. It should be fine as there are already many tests that do invoke the compiler/linker/etc.

@minux

This comment has been minimized.

Copy link
Member

commented Sep 14, 2016

@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Sep 14, 2016

SGTM. It should be fine as there are already many tests that do invoke the compiler/linker/etc.

Good point. I didn't consider that. So maybe the parallel make.bash on real hardware is a little pointless.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Sep 14, 2016

I think running make.bash on real hardware is still useful.

@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Sep 14, 2016

Yeah, it makes me feel more comfortable too. And it's easy enough and still basically within my time goal.

@gopherbot

This comment has been minimized.

Copy link

commented Sep 23, 2016

CL https://golang.org/cl/29670 mentions this issue.

gopherbot pushed a commit to golang/build that referenced this issue Sep 23, 2016

dashboard: add a make.bash-only builder on real ARM hardware
This is a new builder in prep for the change to the "linux-arm"
builder where the GOARCH=arm make.bash will be cross-compiled from a
Kubernetes container on fast hardware.

Updates golang/go#17105 (cross-compile ARM builders' make.bash)
Updates golang/go#17104 (5 minute trybots)

Change-Id: Icfd2644d77639f731151abe54839322960418254
Reviewed-on: https://go-review.googlesource.com/29670
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
@gopherbot

This comment has been minimized.

Copy link

commented Sep 23, 2016

CL https://golang.org/cl/29677 mentions this issue.

@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Sep 23, 2016

With @jfrazelle's help, we've almost got this working.

But we just saw a crash when running the cross-built GOROOT files on the normal Scaleway.com ARM machine:

--- FAIL: TestCgoExternalThreadPanic (0.01s)
    crash_test.go:105: testprogcgo CgoExternalThreadPanic exit status: signal: segmentation fault
    crash_cgo_test.go:72: want failure containing "panic: BOOM". output:

--- FAIL: TestEnsureDropM (0.01s)
    crash_test.go:105: testprogcgo EnsureDropM exit status: signal: segmentation fault
    crash_cgo_test.go:154: expected "OK\n", got 
FAIL
FAIL    runtime 68.340s

Not sure what to make of that.

@ianlancetaylor?

@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Sep 23, 2016

But on the same machine, it seems to work by hand:

# go test -c runtime
#  ./runtime.test -test.v -test.run='Panic$|EnsureDrop'   
=== RUN   TestCallersPanic
--- PASS: TestCallersPanic (0.00s)
    callers_test.go:46: functions seen: runtime.Callers runtime.call16 runtime.gopanic runtime_test.f2 runtime_test.TestCallersPanic testing.tRunner runtime.goexit runtime_test.TestCallersPanic.func1 runtime_test.f3 runtime_test.f1
=== RUN   TestCgoExternalThreadPanic
--- PASS: TestCgoExternalThreadPanic (9.49s)
    crash_test.go:105: testprogcgo CgoExternalThreadPanic exit status: exit status 2
=== RUN   TestEnsureDropM
--- PASS: TestEnsureDropM (0.01s)
=== RUN   TestRecursivePanic
--- PASS: TestRecursivePanic (2.79s)
    crash_test.go:105: testprog RecursivePanic exit status: exit status 2
=== RUN   TestGoexitInPanic
--- PASS: TestGoexitInPanic (0.04s)
    crash_test.go:105: testprog GoexitInPanic exit status: exit status 2
=== RUN   TestDeferPtrsPanic
--- PASS: TestDeferPtrsPanic (0.04s)
=== RUN   TestStackPanic
--- PASS: TestStackPanic (0.00s)
PASS
@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Sep 23, 2016

It's weird that those tests failed with a segmentation fault but that there was no output. One thing that can cause that is if the signal handler itself gets a signal, but neither of those tests is expected to get a signal. Both of those tests involve a non-Go thread calling a Go function, so my guess is that it has something to do with that. I can't think of anything else useful.

@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Sep 23, 2016

And it failed on the staging builder again in the same way. Doesn't seem to be a flake.

@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Sep 23, 2016

@ianlancetaylor, the only difference I can see between how I'm running it "by hand" vs by the builders is that when I run it by hand and it works, it's running under bash. The builders run it under the Go buildlet binary.

Is there some environment difference I'm not considering?

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Sep 23, 2016

For these tests the test itself will run go build for a program that uses cgo (runtime/testdata/testprogcgo), which means that the test will invoke the C compiler. Are you sure that you are getting the same C compiler when you run it by hand as when it fails?

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Sep 23, 2016

That is, what is PATH for bash and for the buildlet?

@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Sep 23, 2016

@ianlancetaylor, ah hah! I bet that's the issue. I can totally believe the CC_FOR_TARGET or CGO_ENABLED isn't being set in the tests.

Thanks!

@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Sep 23, 2016

Er, on second thought: we're not cross-compiling when running the tests, so we're using the system default:

# gcc --version
gcc (Debian 4.9.2-10) 4.9.2
Copyright (C) 2014 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

In the Kubernetes container where we cross-compile make.bash, we use https://packages.debian.org/stretch/gcc-arm-linux-gnueabihf ...

# arm-linux-gnueabihf-gcc --version
arm-linux-gnueabihf-gcc (Debian 6.1.1-9) 6.1.1 20160705
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Are you saying that mixing those is the problem?

Maybe we need an older arm-linux-gnueabihf-gcc version?

@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Sep 23, 2016

The Scaleway machines are running Debian GNU/Linux 8.1 (jessie).

@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Sep 24, 2016

gopherbot pushed a commit to golang/build that referenced this issue Sep 24, 2016

build/env: change armhf builder from stretch to jessie
The far superior linux distro of champions.

Updates golang/go#17105

Change-Id: I5ea0cd2361753f61bb74bf3d4dea6c181f1427fa
Reviewed-on: https://go-review.googlesource.com/29687
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Sep 24, 2016

We switched to Jessie in the cross-compiling Kubernetes container, but no luck. It still fails, and I see this on Scaleway builder just before it fails while running the test:

16303 ?        Ss   131:35 /lib/systemd/systemd-journald
 7634 ?        Ssl    0:01 /usr/local/bin/buildlet-stage0
 7641 ?        Sl     0:35  \_ ./buildlet.exe --workdir=/workdir --hostname=scaleway-staging-02 --halt=false --reverse=linux-arm,linux-
 7706 ?        Sl     0:00      \_ /workdir/go/bin/go tool dist test --no-rebuild --banner=XXXBANNERXXX: go_test:runtime
 7717 ?        Sl     0:00          \_ /workdir/go/pkg/tool/linux_arm/dist test --no-rebuild --banner=XXXBANNERXXX: go_test:runtime
 7746 ?        Sl     0:00              \_ go test -short -tags= -timeout=6m0s -gcflags= runtime
 7819 ?        Sl     0:02                  \_ /tmp/go-build528979954/runtime/_test/runtime.test -test.short=true -test.timeout=6m0s
 7843 ?        Sl     0:00                      \_ go build -o /tmp/go-build961658414/testprogcgo.exe
 7856 ?        Sl     0:00                          \_ /workdir/go/pkg/tool/linux_arm/cgo -objdir /tmp/go-build444027289/_/workdir/go/s
 7863 ?        S      0:00                              \_ arm-linux-gnueabihf-gcc -w -Wno-error -o/tmp/go-build444027289/_/workdir/go/
 7864 ?        R      0:00                                  \_ /usr/lib/gcc/arm-linux-gnueabihf/4.9/cc1 -quiet -I /tmp/go-build44402728

And on that same Scaleway machine:

# gcc --version
gcc (Debian 4.9.2-10) 4.9.2
Copyright (C) 2014 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

# uname -a
Linux buildlet-prep 3.2.34-30 #17 SMP Mon Apr 13 15:53:45 UTC 2015 armv7l GNU/Linux

# lsb_release  -a
No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux 8.1 (jessie)
Release:    8.1
Codename:   jessie

# arm-linux-gnueabihf-gcc --version
arm-linux-gnueabihf-gcc (Debian 4.9.2-10) 4.9.2
Copyright (C) 2014 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Sep 24, 2016

And in the Kubernetes container:

root@85ec3929230a:/# arm-linux-gnueabihf-gcc --version
arm-linux-gnueabihf-gcc ( 4.9.2-10) 4.9.2
Copyright (C) 2014 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

So at least now we seem to be running the same compiler? (albeit one on an amd64 host and one on an armhf host)

@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Sep 24, 2016

@jfrazelle and I are stumped. Going to take a break from this for now. Clues welcome.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Sep 24, 2016

What system libraries are available on the cross-compiler host and on the real host?

Is there any for you to snag a copy of the testproccgo program that is failing?

@crawshaw

This comment has been minimized.

Copy link
Contributor

commented Sep 24, 2016

Can you run one of the failing binaries under gdb?

@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Sep 24, 2016

I'll work on both those things. I just finally now got it to reproduce by hand in a shell.

I made the buildlet log the environment it runs the test with:

In a browser, watching the hacked-up coordinator:

:: Running /workdir/go/bin/go with args ["/workdir/go/bin/go" "tool" "dist" "test" "--no-rebuild" "--banner=XXXBANNERXXX:" "go_test:runtime"] and env ["PATH=/workdir/go/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" "GOROOT_BOOTSTRAP=/usr/local/go" "WORKDIR=/workdir" "GO_BUILDER_NAME=linux-arm" "GO_BUILDER_FLAKY_NET=1" "GOROOT=/workdir/go"] in dir /workdir

--- FAIL: TestCgoExternalThreadPanic (0.01s)
    crash_test.go:105: testprogcgo CgoExternalThreadPanic exit status: signal: segmentation fault
    crash_cgo_test.go:72: want failure containing "panic: BOOM". output:

--- FAIL: TestEnsureDropM (0.01s)
    crash_test.go:105: testprogcgo EnsureDropM exit status: signal: segmentation fault
    crash_cgo_test.go:154: expected "OK\n", got 
FAIL
FAIL    runtime 78.894s
2016/09/24 01:31:09 Failed: exit status 1
:: Running /workdir/go/bin/go with args ["/workdir/go/bin/go" "tool" "dist" "test" "--no-rebuild" "--banner=XXXBANNERXXX:" "go_test:runtime"] and env ["PATH=/workdir/go/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" "GOROOT_BOOTSTRAP=/usr/local/go" "WORKDIR=/workdir" "GO_BUILDER_NAME=linux-arm" "GO_BUILDER_FLAKY_NET=1" "GOROOT=/workdir/go"] in dir /workdir

And then I was able to make it do it by hand:

In ssh:

root@buildlet-prep:/workdir# PATH=/workdir/go/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin GOROOT_BOOTSTRAP=/usr/local/go WORKDIR=/workdir GO_BUILDER_NAME=linux-arm GO_BUILDER_FLAKY_NET=1 GOROOT=/workdir/go /workdir/go/bin/go tool dist test --no-rebuild go_test:runtime

##### Testing packages.
--- FAIL: TestCgoExternalThreadPanic (0.01s)
    crash_test.go:105: testprogcgo CgoExternalThreadPanic exit status: signal: segmentation fault
    crash_cgo_test.go:72: want failure containing "panic: BOOM". output:

--- FAIL: TestEnsureDropM (0.01s)
    crash_test.go:105: testprogcgo EnsureDropM exit status: signal: segmentation fault
    crash_cgo_test.go:154: expected "OK\n", got 
FAIL
FAIL    runtime 65.839s
2016/09/24 01:36:48 Failed: exit status 1

So now I can actually modify things easily and see what's happening I hope.

@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Sep 24, 2016

Mystery/clue: running go test passes but go tool dist test fails !?

# PATH=/workdir/go/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin GOROOT_BOOTSTRAP=/usr/local/go WORKDIR=/workdir GO_BUILDER_NAME=linux-arm GO_BUILDER_FLAKY_NET=1 GOROOT=/workdir/go /workdir/go/bin/go test -v -short runtime
....
PASS
ok      runtime 137.799s

(Almost all that time is TestCollisions, it turns out... #17217)

@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Sep 24, 2016

Okay, got a binary.

root@buildlet-prep:/workdir# file /tmp/go-build332353381/testprogcgo.exe 
/tmp/go-build332353381/testprogcgo.exe: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-armhf.so.3, for GNU/Linux 2.6.32, BuildID[sha1]=52eb3d9211e9b2896578e92f9bce32d439670bc6, not stripped

root@buildlet-prep:/workdir# ldd /tmp/go-build332353381/testprogcgo.exe 
    libpthread.so.0 => /lib/arm-linux-gnueabihf/libpthread.so.0 (0x402e5000)
    libc.so.6 => /lib/arm-linux-gnueabihf/libc.so.6 (0x40308000)
    /lib/ld-linux-armhf.so.3 (0x400d6000)

root@buildlet-prep:/workdir# gdb /tmp/go-build332353381/testprogcgo.exe
GNU gdb (Debian 7.7.1+dfsg-5) 7.7.1
This GDB was configured as "arm-linux-gnueabihf".
...
Reading symbols from /tmp/go-build332353381/testprogcgo.exe...done.
warning: File "/workdir/go/src/runtime/runtime-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".
To enable execution of this file add
    add-auto-load-safe-path /workdir/go/src/runtime/runtime-gdb.py
line to your configuration file "/root/.gdbinit".
To completely disable this security protection add
    set auto-load safe-path /
line to your configuration file "/root/.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
    info "(gdb)Auto-loading safe path"
(gdb) run EnsureDropM
Starting program: /tmp/go-build332353381/testprogcgo.exe EnsureDropM
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1".
[New Thread 0x40939460 (LWP 21409)]
[New Thread 0x41199460 (LWP 21410)]
[New Thread 0x41999460 (LWP 21411)]
[New Thread 0x42aff460 (LWP 21413)]
[New Thread 0x42199460 (LWP 21412)]
[New Thread 0x432ff460 (LWP 21414)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x432ff460 (LWP 21414)]
_sfloat () at /workdir/go/src/runtime/vlop_arm.s:75
75      MOVW    m_locks(R8), R1
(gdb) n

Program received signal SIGSEGV, Segmentation fault.
runtime.raise () at /workdir/go/src/runtime/sys_linux_arm.s:137
137     RET
(gdb) c
Continuing.
[Thread 0x432ff460 (LWP 21414) exited]
[Thread 0x42aff460 (LWP 21413) exited]
[Thread 0x42199460 (LWP 21412) exited]
[Thread 0x41199460 (LWP 21410) exited]
[Thread 0x41999460 (LWP 21411) exited]
[Thread 0x40939460 (LWP 21409) exited]

Program terminated with signal SIGSEGV, Segmentation fault.
The program no longer exists.
(gdb) 

Wrong GOARM= level?

@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Sep 24, 2016

From cmd/dist:

func xgetgoarm() string {
//...
        if gohostarch != "arm" || goos != gohostos {
                // Conservative default for cross-compilation.                                                                         
                return "5"
        }

That's my best guess at the moment.

@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Sep 24, 2016

On Scaleway,

# ./go/pkg/tool/linux_arm/dist -check-goarm
VFPv1 OK.
VFPv3 OK.

So I should probably make the Kubernetes cross-compiler set GOARM=7

@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Sep 24, 2016

Yup! That was it.

Thanks @jfrazelle, @ianlancetaylor, and @crawshaw!

@jessfraz

This comment has been minimized.

Copy link
Contributor

commented Sep 24, 2016

Omg yay!!!

On Friday, September 23, 2016, Brad Fitzpatrick notifications@github.com
wrote:

Yup! That was it.

Thanks @jfrazelle https://github.com/jfrazelle, @ianlancetaylor
https://github.com/ianlancetaylor, and @crawshaw
https://github.com/crawshaw!


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#17105 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABYNbI61WMCS2D2f9KpQOrldsQcUHgb4ks5qtI7wgaJpZM4J8_-h
.

Jessie Frazelle
4096R / D4C4 DD60 0D66 F65A 8EFC 511E 18F3 685C 0022 BFF3
pgp.mit.edu http://pgp.mit.edu/pks/lookup?op=get&search=0x18F3685C0022BFF3

@davecheney

This comment has been minimized.

Copy link
Contributor

commented Sep 24, 2016

That default cross compilation setting gets you every time. We should
probably update it to be 6, which is the default for local compiles.

On Sat, Sep 24, 2016 at 12:32 PM, Brad Fitzpatrick <notifications@github.com

wrote:

From cmd/dist:

func xgetgoarm() string {//...
if gohostarch != "arm" || goos != gohostos {
// Conservative default for cross-compilation.
return "5"
}

That's my best guess at the moment.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#17105 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAAcA6kD21aU5U0P1anmODHcD3N5vvjVks5qtIu_gaJpZM4J8_-h
.

@golang golang locked and limited conversation to collaborators Sep 30, 2017

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.