Skip to content

cmd/dist: "dist test" overhead increased by more than x10 from 1.9 to 1.10 on plan9/arm #24300

@millerresearch

Description

@millerresearch

The new style plan9/arm buildlet (running on a Raspberry Pi 3) has become impossibly slow in go1.10 because of increased overhead in the "dist test" which is run for each test shard.

Previously:

cpu% go version
go version go1.9.4 plan9/arm
cpu% time go tool dist test go_test:bufio

##### Testing packages.
ok  	bufio	0.634s

ALL TESTS PASSED (some were excluded)
0.05u 0.32s 14.84r 	 go tool dist test go_test:bufio ...

But now:

cpu% go version
go version go1.10 plan9/arm
cpu% time go tool dist test go_test:bufio

##### Testing packages.
ok  	bufio	0.677s

ALL TESTS PASSED (some were excluded)
0.49u 1.80s 194.42r 	 go tool dist test go_test:bufio ...

An extra 180s for each shard, times 50+ shards, adds 2.5 hours to the buildlet run time.

The relevant change appears to be this new code in src/cmd/dist/test.go:

// Complete rebuild bootstrap, even with -no-rebuild.
// If everything is up-to-date, this is a no-op.
....
if !t.listMode {
    goInstall("go", append([]string{"-i"}, toolchain...)...)
    goInstall("go", append([]string{"-i"}, toolchain...)...)
    goInstall("go", "std", "cmd")
    checkNotStale("go", "std", "cmd")
}

Note the comment; this must be a candidate for the most expensive no-op ever.
In the buildlet context, where we have just built everything with make.rc, could we possibly skip this staleness check?
Alternatively, in the context of a single buildlet with no helpers, could we skip the sharding and run all the tests in a single invocation of "dist test" to avoid multiplying the overhead?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions