Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/dist: huge increase in runtime for 'dist test' on slow builders after CL 432535 #57734

Closed
millerresearch opened this issue Jan 11, 2023 · 7 comments
Assignees
Labels
NeedsFix The path to resolution is known, but the work has not been done. ToolSpeed
Milestone

Comments

@millerresearch
Copy link
Contributor

CL 432535 caused a 75% increase in total runtime for go tests on plan9-arm builders. I believe the effect will have been similar on other builders with slow filesystems, for example openbsd-arm. Looking at local logs for a raspberry pi in the plan9-arm cluster around the time the CL was merged, I see this [the last column is total runtime in minutes]:

pi4h.1666769752 2022/10/26 16:28:48 79
pi4h.1666812046 2022/10/26 21:42:13 81
pi4h.1666823232 2022/10/27 00:47:48 79
pi4h.1666839373 2022/10/27 06:54:21 80
pi4h.1666850781 2022/10/27 12:57:53 79
pi4h.1666892893 2022/10/27 20:09:07 80
pi4h.1666912765 2022/10/28 04:08:04 82
pi4h.1666972603 2022/10/28 20:01:41 84
pi4h.1667253819 2022/11/01 00:26:54 142
pi4h.1667262442 2022/11/01 02:48:46 140
pi4h.1667270954 2022/11/01 05:09:20 139
pi4h.1667519311 2022/11/04 03:44:27 140
pi4h.1667538040 2022/11/04 07:36:40 76
pi4h.1667572836 2022/11/04 17:02:27 141
pi4h.1667582128 2022/11/04 19:37:00 141
pi4h.1667593858 2022/11/04 23:15:51 164
pi4h.1667603779 2022/11/05 01:38:11 141
pi4h.1667612320 2022/11/05 04:00:40 141
pi4h.1667620868 2022/11/05 06:22:59 140
pi4h.1667631635 2022/11/05 09:23:03 142
pi4h.1667686330 2022/11/07 13:04:59 141
pi4h.1667844527 2022/11/07 20:31:07 142

The increased time comes from this new call in cmd/dist/test.go:

// The cache used by dist when building is different from that used when
// running dist test, so rebuild (but don't install) std and cmd to make
// sure packages without install targets are cached so they are not stale.
goCmd("go", "build", "std", "cmd") // make sure dependencies of targets are cached

(Parenthetically, could someone explain what "packages without install targets" refers to?)

Note that this extra build is performed at the start of every dist test invocation in a test run. Only the first one after the make.bash/make.rc might actually build something; subsequent builds will just amount to a staleness test. On a fast linux machine with aggressive caching of directories and inodes, the staleness test may seem like a "no-op", but at last count it involves 9999 open and 6326 stat calls on 4580 files and directories. On a plan9-arm builder this adds nearly a minute of overhead to each dist test:

cpu% time go test -count 1 container/list
ok  	container/list	0.076s
0.27u 0.24s 3.83r 	 go test -count 1 container/list ...
cpu% time go tool dist test --no-rebuild go_test:container/list

##### Test execution environment.
# GOARCH: arm
# CPU: 
# GOOS: plan9
# OS Version: 2000

##### Testing packages.
ok  	container/list	0.083s

ALL TESTS PASSED (some were excluded)
0.02u 0.09s 59.67r 	 go tool dist test --no-rebuild ...

Because the tests in a run are partitioned (badly, and in my opinion unnecessarily - see #49343), each test run of 275 tests currently involves 131 invocations of dist test - so, potentially more than two hours spent on redundant dist test overhead

Solutions? I would suggest one or more of:

  • use the same cache for building and running dist test
  • using a parameter to dist test to indicate that a build has just been done so no staleness test is needed (maybe - radical idea - just respect --no-rebuild?)
  • stop partitioning the tests for slow builders without helpers
@bcmills
Copy link
Member

bcmills commented Jan 11, 2023

use the same cache for building and running dist test

That is https://go.dev/cl/452679, which I plan to merge when the tree reopens for Go 1.21.

using a parameter to dist test to indicate that a build has just been done so no staleness test is needed

I think that is essentially https://go.dev/cl/452775? (But without a parameter — skipping the staleness check always.)

@bcmills
Copy link
Member

bcmills commented Jan 11, 2023

Parenthetically, could someone explain what "packages without install targets" refers to?

Prior to Go 1.20, all packages in std were installed to locations in $GOROOT/pkg. As of 1.20, unless GODEBUG=installgoroot=all is set, those packages are not installed in $GOROOT— only stored in the build cache like any other non-main package.

@bcmills bcmills added ToolSpeed NeedsFix The path to resolution is known, but the work has not been done. labels Jan 11, 2023
@bcmills bcmills added this to the Go1.21 milestone Jan 11, 2023
@bcmills bcmills self-assigned this Jan 11, 2023
@millerresearch
Copy link
Contributor Author

use the same cache for building and running dist test

That is https://go.dev/cl/452679, which I plan to merge when the tree reopens for Go 1.21.

using a parameter to dist test to indicate that a build has just been done so no staleness test is needed

I think that is essentially https://go.dev/cl/452775? (But without a parameter — skipping the staleness check always.)

Great, thank you! Those changes should both make a big improvement on filesystem-limited builders.

@gopherbot
Copy link

Change https://go.dev/cl/452775 mentions this issue: cmd/dist: skip rebuilding packages during builder testing

@gopherbot
Copy link

Change https://go.dev/cl/452679 mentions this issue: cmd/dist: restore the original GOCACHE before building std and cmd

gopherbot pushed a commit that referenced this issue Jan 26, 2023
The user is likely to run other commands that need these libraries
immediately after they are built.

For #57734.
Updates #56889.

Change-Id: I2a1a234e6031d85f017ee692ea1ace8c6e0e7355
Reviewed-on: https://go-review.googlesource.com/c/go/+/452679
Run-TryBot: Bryan Mills <bcmills@google.com>
Auto-Submit: Bryan Mills <bcmills@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
gopherbot pushed a commit that referenced this issue Jan 30, 2023
Since packages in "std" no longer have install targets, checking them
for staleness is somewhat meaningless: if they are not cached they
will be rebuilt anyway, and without installed archives against which
we can compare them the staleness check will not detect builder skew.

It would still be meaningful to check "cmd" for staleness, but
(especially on sharded VM-based builders) that is a fairly expensive
operation relative to its benefit. If we are really interested in
detecting builder skew and/or build reproducibility, we could instead
add a "misc" test (similar to "misc/reboot", or perhaps even a part of
that test) that verifies that bootstrapped binaries are reproducible.

For #57734.
Updates #47257.
Updates #56896.

Change-Id: I8683ee81aefe8fb59cce9484453df9729bdc587c
Reviewed-on: https://go-review.googlesource.com/c/go/+/452775
Run-TryBot: Bryan Mills <bcmills@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Bryan Mills <bcmills@google.com>
Reviewed-by: Austin Clements <austin@google.com>
@bcmills
Copy link
Member

bcmills commented Feb 2, 2023

I believe this should be fixed as of CL 452775, with perhaps an additional boost from CL 464736.

Looking at the dashboard, it also seems to me that the plan9-arm builder is having an easier time keeping up now.

@millerresearch, I'm going to close this out but feel free to reopen if we need to consider other mitigations.

@bcmills bcmills closed this as completed Feb 2, 2023
@millerresearch
Copy link
Contributor Author

millerresearch commented Feb 2, 2023

Thanks, test time on the biggest plan9-arm builder has dropped from about 150 minutes to about 80 minutes. Much better!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsFix The path to resolution is known, but the work has not been done. ToolSpeed
Projects
None yet
Development

No branches or pull requests

3 participants