Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build: all.bash fails to saturate 6 cores #17751

Open
bcmills opened this issue Nov 2, 2016 · 25 comments

Comments

@bcmills
Copy link
Member

commented Nov 2, 2016

go version devel +eed2bb7 Wed Nov 2 19:21:16 2016 +0000 linux/amd64

What did you do?

./clean.bash
time ./all.bash

What did you expect to see?

I do my development on a virtualized 6-core Xeon running Ubuntu. (runtime.NumCPU() returns 6.)

Per @rsc's comments on #10571, I expected "Both the build half and the test half of all.bash should consume nearly all the CPUs for most of the time."

What did you see instead?

all.bash seems to saturate at ~2 CPU cores for a significant fraction of the run. It consumes less than twice as much user time than real. Oddly, sys usage is even higher than user; I'm not sure what to make of that.

real    8m55.199s
user    14m13.511s
sys     8m51.402s

Here's a chart of CPU saturation over time as measured by mpstat. You can see that there are a couple bursts of parallelism with mostly poor saturation in between. The machine doesn't appear to be hitting swap during the build.

@bradfitz

This comment has been minimized.

Copy link
Member

commented Nov 2, 2016

I don't have access to your CPU saturation chart. google.com accounts can't share outside of Google, IIRC. Use your golang account?

@bcmills

This comment has been minimized.

Copy link
Member Author

commented Nov 2, 2016

That's annoying. I don't have a golang account; I'll see if I can just upload an image instead.

@bcmills

This comment has been minimized.

Copy link
Member Author

commented Nov 2, 2016

image

@minux

This comment has been minimized.

Copy link
Member

commented Nov 2, 2016

@bradfitz

This comment has been minimized.

Copy link
Member

commented Nov 2, 2016

on a virtualized 6-core Xeon

Which virtualization? (might explain high sys time?)

@josharian

This comment has been minimized.

Copy link
Contributor

commented Nov 3, 2016

#15736 will help with understanding this. Sorry I didn't get it into 1.8. Making compilation more concurrent will also help saturate the cores (as would #15734, but the consensus is to focus on intra-package concurrency instead).

@bradfitz

This comment has been minimized.

Copy link
Member

commented Nov 3, 2016

Also, the make.bash portion of Bryan's graph doesn't look super terrible.

The terrible part is running the tests, almost none of which use t.Parallel so are run serialized.

If people could go around and sprinkle some magic t.Parallel fairy dusty on tests where it's safe to do (where it doesn't introduce flakiness under load), that would be great.

@josharian

This comment has been minimized.

Copy link
Contributor

commented Nov 3, 2016

Here are the 20 slowest packages to test, at least on my machine:

$ pbpaste | grep "^ok" | awk '{print $3"\t"$2}' | sort -n -r | head -n 20
67.474s cmd/go
54.461s runtime
42.232s cmd/compile/internal/gc
10.048s net/http
9.891s  go/build
9.311s  cmd/objdump
8.648s  cmd/cover
8.295s  cmd/vet
7.839s  net
6.487s  cmd/nm
5.483s  cmd/pack
4.813s  cmd/compile
4.534s  os/signal
3.899s  cmd/addr2line
2.784s  net/http/cgi
2.692s  time
2.615s  go/types
2.246s  log/syslog
2.209s  compress/flate
2.176s  runtime/pprof

Of these, cmd/go, runtime, net/http, net, os/signal, net/http/cgi, time, and runtime/pprof seem the most likely to become flakier under load. I'll take a look at cmd/compile/internal/gc and leave the rest for others.

@dsnet

This comment has been minimized.

Copy link
Member

commented Nov 3, 2016

I can hammer on compress/flate. It's a package whose tests I want to clean up anyways.

@bradfitz

This comment has been minimized.

Copy link
Member

commented Nov 3, 2016

Here are the 20 slowest packages to test, at least on my machine

I have the timing data for every trybot + build run on all machines in BigQuery, btw.

@gopherbot

This comment has been minimized.

Copy link

commented Nov 3, 2016

CL https://golang.org/cl/32585 mentions this issue.

gopherbot pushed a commit that referenced this issue Nov 3, 2016
cmd/compile/internal/gc: make tests run faster
TestAssembly takes 20s on my machine,
which is too slow for normal operation.
Marking as -short has its dangers (#17472),
but hopefully we'll soon have a builder for that.

All the SSA tests are hermetic and not time sensitive
and can thus be run in parallel.
Reduces the cmd/compile/internal/gc test time during
all.bash on my laptop from 42s to 7s.

Updates #17751

Change-Id: Idd876421db23b9fa3475e8a9b3355a5dc92a5a29
Reviewed-on: https://go-review.googlesource.com/32585
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>

@josharian josharian changed the title all.bash fails to saturate 6 cores build: all.bash fails to saturate 6 cores Nov 3, 2016

@gopherbot

This comment has been minimized.

Copy link

commented Nov 3, 2016

CL https://golang.org/cl/32673 mentions this issue.

@bcmills

This comment has been minimized.

Copy link
Member Author

commented Nov 3, 2016

@bradfitz

Which virtualization? (might explain high sys time?)

Xen (via Ganeti), I believe.

I suspect that I/O performance on my system is pretty poor, so that might explain the high sys time.

gopherbot pushed a commit that referenced this issue Nov 4, 2016
cmd/objdump: speed up tests
Rebuild cmd/objdump once instead of twice.
Speeds up standalone 'go test cmd/objdump' on my
machine from ~1.4s to ~1s.

Updates #17751

Change-Id: I15fd79cf18c310f892bc28a9e9ca47ee010c989a
Reviewed-on: https://go-review.googlesource.com/32673
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@gopherbot

This comment has been minimized.

Copy link

commented Nov 4, 2016

CL https://golang.org/cl/32684 mentions this issue.

gopherbot pushed a commit that referenced this issue Nov 4, 2016
net/http: speed up tests, use t.Parallel when it's safe
Before: 8.9 seconds for go test -short
 After: 2.8 seconds

There are still 250 tests without t.Parallel, but I got the important
onces using a script:

    $ go test -short -v 2>&1 | go run ~/slowtests.go

Where slowtests.go is https://play.golang.org/p/9mh5Wg1nLN

The remaining 250 (the output lines from slowtests.go) all have a
reported duration of 0ms, except one 50ms test which has to be serial.

Where tests can't be parallel, I left a comment at the top saying why,
so people don't add t.Parallel later and get surprised at failures.

Updates #17751

Change-Id: Icbe32cbe2b996e23c89f1af6339287fa22af5115
Reviewed-on: https://go-review.googlesource.com/32684
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
@gopherbot

This comment has been minimized.

Copy link

commented Nov 4, 2016

CL https://golang.org/cl/32751 mentions this issue.

gopherbot pushed a commit that referenced this issue Nov 4, 2016
all: sprinkle t.Parallel on some slow tests
I used the slowtests.go tool as described in
https://golang.org/cl/32684 on packages that stood out.

go test -short std drops from ~56 to ~52 seconds.

This isn't a huge win, but it was mostly an exercise.

Updates #17751

Change-Id: I9f3402e36a038d71e662d06ce2c1d52f6c4b674d
Reviewed-on: https://go-review.googlesource.com/32751
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
@gopherbot

This comment has been minimized.

Copy link

commented Nov 4, 2016

CL https://golang.org/cl/32754 mentions this issue.

gopherbot pushed a commit that referenced this issue Nov 6, 2016
cmd/vet: parallelize tests
Was 2.3 seconds. Now 1.4 seconds.

Next win would be not running a child process and refactoring main so
it could be called from tests easily. But that would also require
rewriting the errchk written in Perl. This appears to be the last user
of errchk in the tree.

Updates #17751

Change-Id: Id7c3cec76f438590789b994e756f55b5397be07f
Reviewed-on: https://go-review.googlesource.com/32754
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rob Pike <r@golang.org>
@gopherbot

This comment has been minimized.

Copy link

commented Nov 7, 2016

CL https://golang.org/cl/32850 mentions this issue.

gopherbot pushed a commit that referenced this issue Nov 7, 2016
cmd/go: parallelize some tests
Cuts tests from 35 to 25 seconds.

Many of these could be parallel if the test runner were modified to
give each test its own workdir cloned from the tempdir files they
use. But later. This helps for now.

Updates #17751

Change-Id: Icc2ff87cca60a33ec5fd8abb1eb0a9ca3e85bf95
Reviewed-on: https://go-review.googlesource.com/32850
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>

@rsc rsc added this to the Unplanned milestone Jan 4, 2017

@josharian

This comment has been minimized.

Copy link
Contributor

commented Aug 4, 2017

I'm curious whether concurrent compilation in 1.9 helps much here.

@bcmills

This comment has been minimized.

Copy link
Member Author

commented Sep 13, 2017

I'm curious whether concurrent compilation in 1.9 helps much here.

The initial build transient seems to saturate a bit better, and a few of the tests spike to saturation, but the overall utilization is still pretty similar.

From a clean build at tip:
chart

FiloSottile pushed a commit to FiloSottile/go that referenced this issue Oct 12, 2018
all: sprinkle t.Parallel on some slow tests
I used the slowtests.go tool as described in
https://golang.org/cl/32684 on packages that stood out.

go test -short std drops from ~56 to ~52 seconds.

This isn't a huge win, but it was mostly an exercise.

Updates golang#17751

Change-Id: I9f3402e36a038d71e662d06ce2c1d52f6c4b674d
Reviewed-on: https://go-review.googlesource.com/32751
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
FiloSottile pushed a commit to FiloSottile/go that referenced this issue Oct 12, 2018
all: sprinkle t.Parallel on some slow tests
I used the slowtests.go tool as described in
https://golang.org/cl/32684 on packages that stood out.

go test -short std drops from ~56 to ~52 seconds.

This isn't a huge win, but it was mostly an exercise.

Updates golang#17751

Change-Id: I9f3402e36a038d71e662d06ce2c1d52f6c4b674d
Reviewed-on: https://go-review.googlesource.com/32751
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
@rsc

This comment has been minimized.

Copy link
Contributor

commented May 16, 2019

When I run "time all.bash" now I get

real 5m26.407s
user 24m59.594s
sys 11m20.201s

on my 6-core 12-hyperthread 2018 MacBook Pro.

That looks like (user+sys)/real = (1499+680)/326 = 6.7 cores used on average during all.bash.

Going to call that fixed.
Also note that in the original report it says 6-core but it may well be 3-core 6-hyperthread, in which case the observed (853+531)/535 = 2.6X is not too bad.

@rsc rsc closed this May 16, 2019

@rsc rsc added help and removed help wanted labels May 16, 2019

@bcmills

This comment has been minimized.

Copy link
Member Author

commented May 16, 2019

FWIW, the chart (measured by mpstat 1) still shows spotty utilization as before, although the trough is less severe than it was.

This is on a non-virtualized Xeon E5-1650 v4 (6 cores, 12 hardware threads, and I'm not sure whether HT is enabled).

I expect the initial bootstrap phase not to parallelize very well (because it's using Go 1.4), and the initial build and the tail end of the tests do seem to parallelize well, but the ordinary package tests in the middle still don't even come close to saturating.

all bash CPU saturation

@bcmills bcmills removed help labels May 16, 2019

@bcmills

This comment has been minimized.

Copy link
Member Author

commented May 16, 2019

time for the above run reported:

real    5m32.000s
user    15m8.107s
sys     2m58.300s

That's only ~3 cores saturated on average.

@bcmills bcmills added the ToolSpeed label May 16, 2019

@bcmills

This comment has been minimized.

Copy link
Member Author

commented May 16, 2019

on my 6-core 12-hyperthread 2018 MacBook Pro.

We've noticed that cmd/go performance on macOS tends to be a lot slower in general (#28739). I wonder whether that slowness allows make.bash to saturate more easily.

It's also possible that the thermal limits of a laptop make saturation easier: laptops tend to have lower sustained turbo frequencies, so perhaps parallel operations that complete quickly with a higher turbo end up doing more of that work in parallel at a lower clock speed.

@josharian

This comment has been minimized.

Copy link
Contributor

commented May 16, 2019

@bcmills what is going on at each of the interesting transition points in that graph? (The tracing charts of #15736 would be really helpful here, but maybe the -debug-actiongraph timing information added by @rsc will help clarify as well) Are we spending a bunch of time waiting for stragglers? Or are we not starting enough compilations/tests in parallel? Should cmd/dist be doing higher level parallelization? Or...?

@rsc

This comment has been minimized.

Copy link
Contributor

commented May 16, 2019

export GOBUILDTIMELOGFILE=/tmp/time.log before all.bash.

@bcmills bcmills reopened this May 16, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants
You can’t perform that action at this time.