Skip to content

x/build: improve LUCI builder test sharding strategy for the main Go repository #65814

@mknyszek

Description

@mknyszek

The current test sharding strategy for the LUCI builders is to generate 4 test shards and distribute the go tool dist test -list tests across them via a hash of their names. This strategy means the test execution order and grouping is deterministic, which is useful for reproducibility, but doesn't take into account how long the tests take to run.

As a result, we've observed differences in test shard run times up to 2x (longest vs. shortest). The worst cases tend to be on builders where certain tests are disproportionately slower, either because of the build mode or the platform. (The race mode builders and Windows builders are hit particularly hard.)

There are a few things we can do to fix this. The easiest one is to just find a hash that distributes the tests more evenly. This seems fragile at first, but the go tool dist test -list names change very infrequently, so this might work well. Another is to weigh the tests according to historical runtimes on a particular builder, and bucket them according to some load balancing scheme. (Probably not on every run; we still value determinism, so the weights will probably be hard-coded and updated only occasionally.)

We can probably get back a few minutes of build latency for presubmit runs this way.

Metadata

Metadata

Assignees

Labels

Buildersx/build issues (builders, bots, dashboards)NeedsFixThe path to resolution is known, but the work has not been done.ToolSpeed

Type

No type

Projects

Status

In Progress

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions