New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/build: openbsd-amd64-64 trybots are too slow #29223

Open
bradfitz opened this Issue Dec 13, 2018 · 21 comments

Comments

Projects
None yet
6 participants
@bradfitz
Copy link
Member

bradfitz commented Dec 13, 2018

openbsd-amd64-64 trybots are taking 11+ minutes (which causes TryBots as a whole to take 11+ minutes rather than ~5)

We need to figure out what's slow on them, and/or just shard it out more.

/cc @dmitshur @bcmills @andybons

@bradfitz bradfitz added the NeedsFix label Dec 13, 2018

@gopherbot gopherbot added this to the Unreleased milestone Dec 13, 2018

@gopherbot gopherbot added the Builders label Dec 13, 2018

@bradfitz

This comment has been minimized.

Copy link
Member

bradfitz commented Jan 18, 2019

SELECT Builder, AVG(Seconds) as Sec FROM builds.Builds WHERE IsTry=True AND StartTime > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 100 HOUR) and Repo = "go" AND FailureURL = "" GROUP BY 1 ORDER BY Sec DESC;


Row | Builder | Sec |  
-- | -- | -- | --
1 | openbsd-amd64-64 | 579.7221071917958 |  
2 | linux-amd64-race | 488.40762166018374 |  
3 | nacl-386 | 434.8139809606734 |  
4 | windows-amd64-2016 | 424.604860819551 |  
5 | nacl-amd64p32 | 418.4696299015715 |  
6 | windows-386-2008 | 414.7469431190204 |  
7 | js-wasm | 371.9747404238125 |  
8 | misc-vet-vetall | 358.80661270393875 |  
9 | linux-386 | 353.81094730244905 |  
10 | linux-amd64 | 345.036077108898 |  
11 | misc-compile | 337.44598333253055 |  
12 | misc-compile-mips | 335.70810570520416 |  
13 | freebsd-amd64-12_0 | 328.52744295724483 |  
14 | misc-compile-openbsd | 293.41003601271416 |  
15 | misc-compile-netbsd | 292.8116776015307 |  
16 | misc-compile-freebsd | 292.80485985481636 |  
17 | misc-compile-nacl | 288.17948818259185 |  
18 | misc-compile-plan9 | 273.5849724516735 |  
19 | misc-compile-ppc | 251.7265086680816
@bradfitz

This comment has been minimized.

Copy link
Member

bradfitz commented Jan 18, 2019

SELECT Builder, Event, AVG(Seconds) as Sec FROM builds.Spans WHERE Builder LIKE 'openbsd-amd64%' AND Error='' And IsTry=True AND StartTime > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 100 HOUR) and Repo = "go" GROUP BY 1, 2 ORDER BY Sec DESC;

Row | Builder | Event | Sec |  
-- | -- | -- | -- | --
1 | openbsd-amd64-64 | make_and_test | 534.0490917608572 |  
2 | openbsd-amd64-64 | make | 292.82514376291306 |  
3 | openbsd-amd64-64 | run_test:go_test:cmd/go | 150.02306724040426 |  
4 | openbsd-amd64-64 | run_test:cgo_test | 88.3467336194681 |  
5 | openbsd-amd64-64 | run_test:runtime:cpu124 | 78.4045334334468 |  
6 | openbsd-amd64-64 | run_test:go_test:cmd/compile/internal/gc | 63.46040540948936 |  
7 | openbsd-amd64-64 | run_test:go_test:net | 62.921281942893614 |  
8 | openbsd-amd64-64 | run_test:go_test:runtime | 59.65075333531915 |  
9 | openbsd-amd64-64 | run_test:cgo_errors | 39.74668672670213 |  
10 | openbsd-amd64-64 | get_helper | 36.28992712104489 |  
11 | openbsd-amd64-64 | create_gce_buildlet | 35.651753116401366 |  
12 | openbsd-amd64-64 | get_buildlet | 33.989285913102044 |  
13 | openbsd-amd64-64 | run_test:go_test:cmd/compile/internal/ssa | 31.60199228663415 |  
14 | openbsd-amd64-64 | create_gce_instance | 30.614607733554422 |  
15 | openbsd-amd64-64 | run_test:go_test:cmd/vet | 25.966327777999997 |  
16 | openbsd-amd64-64 | run_test:go_test:net/http | 23.022429393063828 |  
17 | openbsd-amd64-64 | write_snapshot_to_gcs | 19.260539691499996 |  
18 | openbsd-amd64-64 | run_test:doc_progs | 16.33576500525532 |  
19 | openbsd-amd64-64 | run_test:go_test:runtime/pprof | 13.31036519180851 |  
20 | openbsd-amd64-64 | run_test:go_test:reflect | 11.999205834765958 |  
21 | openbsd-amd64-64 | run_test:go_test:time | 11.741334522617022 |  
22 | openbsd-amd64-64 | run_test:sync_cpu | 11.533254003170732 |  
23 | openbsd-amd64-64 | write_snapshot_tar | 10.8773563 |  
24 | openbsd-amd64-64 | run_test:go_test:cmd/compile | 10.752162821125 |  
25 | openbsd-amd64-64 | run_test:go_test:cmd/fix | 10.435803139355556 |  
26 | openbsd-amd64-64 | run_tests_multi | 10.243288797263473 |  
27 | openbsd-amd64-64 | run_test:go_test:strings | 10.110943746 |  
28 | openbsd-amd64-64 | run_test:go_test:cmd/link | 10.061246428116279 |  
29 | openbsd-amd64-64 | run_test:go_test:cmd/link/internal/ld | 10.037645228209302 |  
30 | openbsd-amd64-64 | run_test:go_test:go/types | 9.813017223727273 |  
31 | openbsd-amd64-64 | run_test:go_test:syscall | 9.579807829382979 |  
32 | openbsd-amd64-64 | run_test:go_test:strconv | 9.096669728574469 |  
33 | openbsd-amd64-64 | run_test:nolibgcc:net | 8.353404652658536 |  
34 | openbsd-amd64-64 | run_test:go_test:os/signal | 8.149411695148936 |  
35 | openbsd-amd64-64 | run_test:go_test:os | 8.032081740425532 |  
36 | openbsd-amd64-64 | run_test:go_test:math | 7.8759232913157895 |  
37 | openbsd-amd64-64 | run_test:go_test:net/http/httptrace | 7.7749100352500005 |  
38 | openbsd-amd64-64 | run_test:go_test:math/big | 7.75858940580851 |  
39 | openbsd-amd64-64 | run_test:go_test:cmd/internal/obj/x86 | 7.6362746259500005 |  
40 | openbsd-amd64-64 | get_source_from_gitmirror | 7.515116951666666 |  
41 | openbsd-amd64-64 | get_source | 7.277790430666666 |  
42 | openbsd-amd64-64 | run_test:bench_go1 | 7.0304439564893615 |  
43 | openbsd-amd64-64 | run_test:moved_goroot | 6.851026539365853 |  
44 | openbsd-amd64-64 | run_test:go_test:cmd/nm | 6.5756059088085115 |  
45 | openbsd-amd64-64 | run_test:go_test:cmd/cover | 6.451060486723404 |  
46 | openbsd-amd64-64 | run_test:go_test:cmd/objdump | 6.444223596553191 |  
47 | openbsd-amd64-64 | run_test:go_test:runtime/trace | 6.383058027941177 |  
48 | openbsd-amd64-64 | run_test:go_test:testing | 5.998117573319149 |  
49 | openbsd-amd64-64 | run_test:go_test:cmd/vendor/github.com/google/pprof/internal/driver | 5.980447624906977 |  
50 | openbsd-amd64-64 | run_test:wiki | 5.823946847042554
@bradfitz

This comment has been minimized.

Copy link
Member

bradfitz commented Jan 18, 2019

Wow, just running make.bash (which isn't sharded out over N buildlets) is more than twice as slow as other platforms:

SELECT Builder, Event, AVG(Seconds) as Sec FROM builds.Spans WHERE Event = 'make' AND Error='' And IsTry=True AND StartTime > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 100 HOUR) and Repo = "go" GROUP BY 1, 2 ORDER BY Sec DESC;

Row | Builder | Event | Sec |  
-- | -- | -- | -- | --
1 | openbsd-amd64-64 | make | 292.82514376291306 |  
2 | nacl-386 | make | 176.9535785543913 |  
3 | nacl-amd64p32 | make | 169.24032677876087 |  
4 | windows-386-2008 | make | 158.65642536708697 |  
5 | windows-amd64-2016 | make | 142.23586712976086 |  
6 | js-wasm | make | 137.46539279367394 |  
7 | linux-386 | make | 134.50720768395655 |  
8 | freebsd-amd64-12_0 | make | 124.52324519041304 |  
9 | misc-vet-vetall | make | 124.14415335852175 |  
10 | linux-amd64-race | make | 123.95929911093478 |  
11 | linux-amd64 | make | 123.54718755441306
@bradfitz

This comment has been minimized.

Copy link
Member

bradfitz commented Jan 18, 2019

Likely suspect: #18314 (use a tmpfs on OpenBSD)

@bradfitz

This comment has been minimized.

Copy link
Member

bradfitz commented Jan 18, 2019

I tried doing the memory filesystem on /tmp/ in an OpenBSD 6.4 amd64 instance (via gomote ssh) and it works, but it's still not any faster.

Still 5 minutes ....

bradfitz@gdev:~/src/golang.org/x/build$ time gomote run user-bradfitz-openbsd-amd64-64-0 go/src/make.bash
Building Go cmd/dist using /tmp/workdir/go1.4.
Building Go toolchain1 using /tmp/workdir/go1.4.
Building Go bootstrap cmd/go (go_bootstrap) using Go toolchain1.
Building Go toolchain2 using go_bootstrap and Go toolchain1.
Building Go toolchain3 using go_bootstrap and Go toolchain2.
Building packages and commands for openbsd/amd64.
---
Installed Go for openbsd/amd64 in /tmp/workdir/go
Installed commands in /tmp/workdir/go/bin

real    5m3.824s
user    0m0.136s
sys     0m0.024s
bradfitz@gdev:~/src/golang.org/x/build$ time gomote run -system user-bradfitz-openbsd-amd64-64-0 mount
/dev/sd0a on / type ffs (local, wxallowed)
mfs:85198 on /tmp type mfs (asynchronous, local, nodev, nosuid, size=2097152 512-blocks)

real    0m0.108s
user    0m0.064s
sys     0m0.044s
bradfitz@gdev:~/src/golang.org/x/build$ time gomote run -system user-bradfitz-openbsd-amd64-64-0 df
Filesystem  512-blocks      Used     Avail Capacity  Mounted on
/dev/sd0a     18153212   1652976  15592576    10%    /
mfs:85198      2057756   1656516    298356    85%    /tmp

real    0m0.107s
user    0m0.096s
sys     0m0.012s

It sees 4 cores:

buildlet$ sysctl hw.ncpufound
hw.ncpufound=4

buildlet$ sysctl -a | grep cpu  
kern.ccpu=1948
hw.ncpu=4
hw.cpuspeed=2500
hw.ncpufound=4
hw.ncpuonline=4
machdep.cpuvendor=GenuineIntel
machdep.cpuid=0x306e4
machdep.cpufeature=0x1f9bfbff

buildlet$ dmesg | grep -i core
cpu0: smt 0, core 0, package 0
cpu1: smt 0, core 1, package 0
cpu2: smt 1, core 0, package 0
cpu3: smt 1, core 1, package 0

The kernel we're running is:

OpenBSD 6.4 (GENERIC.MP) #364: Thu Oct 11 13:30:23 MDT 2018
    deraadt@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP

Is this Spectre/Meldown mitigations shutting down SMT? Can we disable that for the builders?

/cc @mdempsky

@mdempsky

This comment has been minimized.

Copy link
Member

mdempsky commented Jan 18, 2019

@bradfitz I think you can try setting "sysctl hw.smt=1" to re-enable hyper threading.

https://man.openbsd.org/sysctl.2#HW_SMT_2

@bradfitz

This comment has been minimized.

Copy link
Member

bradfitz commented Jan 18, 2019

It's already enabled:

$ sysctl -a | grep -i smt                                                                                                                                                                    
hw.smt=1

So, that's not it. It's crazy that OpenBSD is 2x slower. If it were 10% slower I'd assume, "Oh, OpenBSD prioritizes security over performance" and be fine with that. But 2x makes me think we have a configuration problem somewhere.

@stmuk

This comment has been minimized.

Copy link

stmuk commented Jan 18, 2019

Have you tried increasing login.conf limits (as I suggested on twitter)?

@bradfitz

This comment has been minimized.

Copy link
Member

bradfitz commented Jan 18, 2019

Which would you increase? We have:

default:\
        :path=/usr/bin /bin /usr/sbin /sbin /usr/X11R6/bin /usr/local/bin /usr/local/sbin:\
        :umask=022:\
        :datasize-max=768M:\
        :datasize-cur=768M:\
        :maxproc-max=256:\
        :maxproc-cur=128:\
        :openfiles-max=1024:\
        :openfiles-cur=512:\
        :stacksize-cur=4M:\
        :localcipher=blowfish,a:\
        :tc=auth-defaults:\
        :tc=auth-ftp-defaults:
@stmuk

This comment has been minimized.

Copy link

stmuk commented Jan 18, 2019

The default settings are low. You could try setting datasize-max/cur and stacksize-cur to "unlimited"

@mdempsky

This comment has been minimized.

Copy link
Member

mdempsky commented Jan 18, 2019

@stmuk Wouldn't the resource limits being too low just cause the build to fail rather than to proceed slowly?

@bradfitz

This comment has been minimized.

Copy link
Member

bradfitz commented Jan 18, 2019

Yeah. The issue is speed, not failure to build.

@stmuk

This comment has been minimized.

Copy link

stmuk commented Jan 18, 2019

@mdempsky Maybe but it's easy enough to try.
@bradfitz How does your build work? Do you build bootstrap 1.4 with OpenBSD clang and then compile go? If so do you see the slow down with both steps?

@bradfitz

This comment has been minimized.

Copy link
Member

bradfitz commented Jan 18, 2019

@mdempsky Maybe but it's easy enough to try.

This is all very tedious & slow to work on, so I don't eagerly pursue avenues that don't at least make sense. Maybe if I were really desperate. But given limited time, I'd rather spend it on trying to collect system-wide profiling information or otherwise getting more visibility into the problem, rather than just changing random things.

How does your build work? Do you build bootstrap 1.4 with OpenBSD clang and then compile go? If so do you see the slow down with both steps?

We push a pre-built Go 1.4 to it and use that.

@mdempsky

This comment has been minimized.

Copy link
Member

mdempsky commented Jan 18, 2019

@bradfitz Maybe a first step would be to use cmd/dist's GOBUILDTIMELOGFILE to see if any particular steps are slower, or the whole thing is proportionally slower?

$ GOBUILDTIMELOGFILE=/tmp/buildtime.txt ./make.bash
Building Go cmd/dist using /usr/lib/google-golang.
Building Go toolchain1 using /usr/lib/google-golang.
Building Go bootstrap cmd/go (go_bootstrap) using Go toolchain1.
Building Go toolchain2 using go_bootstrap and Go toolchain1.
Building Go toolchain3 using go_bootstrap and Go toolchain2.
Building packages and commands for linux/amd64.
---
Installed Go for linux/amd64 in /usr/local/google/home/mdempsky/wd/go
Installed commands in /usr/local/google/home/mdempsky/wd/go/bin
$ cat /tmp/buildtime.txt
Fri Jan 18 13:37:01 PST 2019 start make.bash
Fri Jan 18 13:37:03 PST 2019 +2.2s start dist bootstrap
Fri Jan 18 13:37:03 PST 2019 +2.6s build toolchain1
Fri Jan 18 13:37:18 PST 2019 +18.0s build go_bootstrap
Fri Jan 18 13:37:28 PST 2019 +27.9s build toolchain2
Fri Jan 18 13:37:45 PST 2019 +44.1s build toolchain3
Fri Jan 18 13:38:00 PST 2019 +59.9s build toolchain
Fri Jan 18 13:38:11 PST 2019 +70.3s end dist bootstrap
@stmuk

This comment has been minimized.

Copy link

stmuk commented Jan 18, 2019

@bradfitz Too many negatives in that for me to parse that or motivate me to try and help further. I just regret wasting my time trying to help.

@bradfitz

This comment has been minimized.

Copy link
Member

bradfitz commented Jan 18, 2019

@stmuk, sorry, I didn't mean to waste your time. But with me and @mdempsky both thinking that such a tweak wouldn't do anything, it's not a high priority of mine to try. I appreciate you throwing it out there, even if it's not the answer. I at least went and read the OpenBSD man pages for those knobs.

@stmuk

This comment has been minimized.

Copy link

stmuk commented Jan 19, 2019

@bradfitz You were right the login cap limit relaxation made no difference whatever.

@mdempsky Running on i5 Gen 5 Vbox host with OEL7.6 and OpenBSD 6.4 guests under vagrant I get the unexpected result of a slightly faster OpenBSD build!

There are different compilers in use to build the 1.4 I bootstrapped tip off. OpenBSD has their patched clang 6 whereas Linux has gcc 4.8.5. OBSD has a noatime mount but otherwise no changes were made.

I'm wondering if we are just seeing differences due to the underlying virtualisation. I may experiment with QEMU and more similar C compilers if I get a chance.

go version devel +5538a9a Fri Jan 18 22:41:47 2019 +0000 linux/amd64
real 3m10.367s
user 2m41.822s
sys 0m14.216s
Sat Jan 19 12:37:19 UTC 2019 start make.bash
Sat Jan 19 12:37:20 UTC 2019 +1.4s start dist bootstrap
Sat Jan 19 12:37:20 UTC 2019 +1.4s build toolchain1
Sat Jan 19 12:37:38 UTC 2019 +19.0s build go_bootstrap
Sat Jan 19 12:37:57 UTC 2019 +38.2s build toolchain2
Sat Jan 19 12:38:57 UTC 2019 +98.2s build toolchain3
Sat Jan 19 12:39:48 UTC 2019 +149.1s build toolchain
Sat Jan 19 12:40:29 UTC 2019 +190.6s end dist bootstrap

go version devel +5538a9a Fri Jan 18 22:41:47 2019 +0000 openbsd/amd64
real 2m41.425s
user 1m55.670s
sys 2m0.150s
Sat Jan 19 04:51:44 PST 2019 start make.bash
Sat Jan 19 04:51:46 PST 2019 +2.2s start dist bootstrap
Sat Jan 19 04:51:46 PST 2019 +2.3s build toolchain1
Sat Jan 19 04:52:07 PST 2019 +23.6s build go_bootstrap
Sat Jan 19 04:52:25 PST 2019 +41.4s build toolchain2
Sat Jan 19 04:53:14 PST 2019 +90.0s build toolchain3
Sat Jan 19 04:53:51 PST 2019 +127.2s build toolchain
Sat Jan 19 04:54:26 PST 2019 +162.1s end dist bootstrap

@juanfra684

This comment has been minimized.

Copy link

juanfra684 commented Jan 20, 2019

The problem is not the compiler or the VM software or the FS. I'm the maintainer of BaCon which also runs a big bash script and it's slow as hell. Something happens between bash and the OpenBSD base which makes the bash scripts slow. Maybe something related to the memory protections.

@bradfitz

This comment has been minimized.

Copy link
Member

bradfitz commented Jan 21, 2019

@juanfra684, our bash script is a few lines that just calls into a Go program that does the rest. Our performance issue is not bash related.

@juanfra684

This comment has been minimized.

Copy link

juanfra684 commented Jan 21, 2019

You're right, sorry for the misunderstanding. Go generates static binaries and the bootstrap launches a lot of threads, so the problem is in the kernel and it's something related to memory, threads or launching new processes.

I've built the go (only make, no tests) port on 6.4 and -current, and there is a -14% of difference:

6.4
real       205.50
user       303.09
sys        138.58

-current
real       186.69
user       300.99
sys         73.94

Recently the malloc code was changed to work better with multithreaded programs.

OpenBSD doesn't have magic knobs to speedup things but you could tune a few thing to help the bootstrap. Firstly, if the VM host is using flash drives for storage, forget mfs. It's not an equivalent in speed to Linux tmpfs and you can usually run the FS operations faster in a simple consumer grade SSD.

About the mount options, use noatime, softdep. Linux is using relatime and a log backed FS by default, so the comparison with a plain OpenBSD installation is not fair.

You could add also a few entries to /etc/sysctl.conf:

  • kern.pool_debug=0: 0 is the default for stable versions but you could forget it if you're comparing the performance of stable with current.
  • kern.bufcachepercent=80: the default is 20 and the maximum is 80. The correct percentage depends of how much RAM the system has.
  • kern.maxvnodes: check what value the system has and increase it. The default value is very very conservative. It's the limit of vnodes which the system can maintain in memory.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment