Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/build: linux-arm.*-aws TryBots are often stragglers #54679

Open
bcmills opened this issue Aug 25, 2022 · 2 comments
Open

x/build: linux-arm.*-aws TryBots are often stragglers #54679

bcmills opened this issue Aug 25, 2022 · 2 comments
Labels
Builders x/build issues (builders, bots, dashboards) NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@bcmills
Copy link
Member

bcmills commented Aug 25, 2022

On several CLs lately, I've seen nearly all of the builders complete within 10 minutes or so, but the linux-arm-aws and linux-arm64-aws TryBots always seem to take 15 minutes or more.

It appears that these builders aren't currently sharded at all:
https://cs.opensource.google/go/x/build/+/master:dashboard/builders.go;l=2417;drc=9ca9dc28e477c63197a65122b31a98146c49d07d
https://cs.opensource.google/go/x/build/+/master:dashboard/builders.go;l=2434;drc=9ca9dc28e477c63197a65122b31a98146c49d07d

Is there something we can do to bring the latency for these builders up to par with the other TryBots? (Perhaps enable sharding?)

(CC @prattmic, who I think has done some recent latency measurements)

@gopherbot gopherbot added the Builders x/build issues (builders, bots, dashboards) label Aug 25, 2022
@gopherbot gopherbot added this to the Unreleased milestone Aug 25, 2022
@bcmills bcmills added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Aug 25, 2022
@prattmic
Copy link
Member

prattmic commented Aug 25, 2022

I did some measurements a while ago analyzing builds from May 1 - June 10. Paraphrasing the results I found:

Rank (by median) Builder Median Total Time (s) p90 Total Time (s)
1 darwin-arm64-12_0-toothrot 424 1257
...
6 linux-arm64-aws 495 820
...
11 linux-amd64 530 853
...
32 linux-arm-aws 668 1105
...
82 linux-amd64-longtest 2760 3107

Rerunning the analysis since August 1, I get:

Rank (by median) Builder Median Total Time (s) p90 Total Time (s)
1 misc-compile-mips 264 366
...
18 linux-amd64 385 659
...
27 linux-arm64-aws 525 867
...
36 linux-arm-aws 720 1204
...
87 linux-amd64-longtest 2541 2814

I think the main difference is that https://go.dev/cl/419077 changed most GCE builders from 4vCPU/16GB to 16vCPU/64GB instances, significantly speeding them up, but linux-arm{64}-aws are still on 4vCPU/16GB instances (m6g.xlarge).

There is a 16vCPU/64GB instance type available (m6g.4xlarge), which we could potentially switch to.

@prattmic
Copy link
Member

prattmic commented Aug 25, 2022

These are the post-submit builds, but data is similar for trybots (both arm builders are actually slightly faster in trybots).

That said, this data doesn't seem to match your observation "linux-arm-aws and linux-arm64-aws TryBots always seem to take 15 minutes or more." For linux-arm64-aws, even the p90 case is faster than 15 minutes (p99 is 916s). Maybe something is missing from analysis (it does include time spent waiting for a buildet, which is ~60s median for both builders).

@heschi heschi added this to Planned in Go Release Team Aug 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Builders x/build issues (builders, bots, dashboards) NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
Status: Planned
Development

No branches or pull requests

3 participants