Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/build: darwin-amd64 trybot waiting for 84+ minutes #23856

Closed
mdempsky opened this issue Feb 15, 2018 · 6 comments

Comments

Projects
None yet
4 participants
@mdempsky
Copy link
Member

commented Feb 15, 2018

https://farmer.golang.org/try?commit=99b72794

darwin-amd64-10_11 running 84.4 min

darwin-amd64-10_11 rev 99b72794 (trybot set for Ic3a9667); running; (nil *buildlet.Client), 1h24m22.84109737s ago
  2018-02-15T20:33:47Z checking_for_snapshot 
  2018-02-15T20:34:11Z finish_checking_for_snapshot after 176.3ms
  2018-02-15T20:34:11Z get_buildlet 
  2018-02-15T20:34:11Z wait_static_builder host-darwin-10_11
  2018-02-15T20:34:11Z waiting_machine_in_use 
 +2775.3s (now)

Is this expected? Are the darwin trybots just this heavily overloaded at the moment?

/cc @bradfitz

@gopherbot gopherbot added this to the Unreleased milestone Feb 15, 2018

@gopherbot gopherbot added the Builders label Feb 15, 2018

@mdempsky

This comment has been minimized.

Copy link
Member Author

commented Feb 15, 2018

/cc @andybons

@bradfitz

This comment has been minimized.

Copy link
Member

commented Feb 15, 2018

Expected. @aclements just committed a bazillion CLs, and we only have 20 Mac VMs.

screen shot 2018-02-15 at 1 45 36 pm

We have an open bug for a scheduler (#19178) to smartly assign buildlets to builds, with priorities, and that's up soon on my list, as I'm ramping back up to work.

@bradfitz bradfitz closed this Feb 15, 2018

@mdempsky

This comment has been minimized.

Copy link
Member Author

commented Feb 15, 2018

I see. I saw the build dashboard was pretty busy looking, but it seemed odd darwin-amd64 (which I thought is usually on the fast side of things) was the only hanging trybot.

Is darwin-amd64 the only trybot that can't scale with demand?

@bradfitz

This comment has been minimized.

Copy link
Member

commented Feb 15, 2018

Is darwin-amd64 the only trybot that can't scale with demand?

Currently, yes. There was also linux-arm in the trybot set, which had a fixed number (50), but those were disabled due to #22748 (first) and #22749 (most recently). I could probably re-enabled them, and wait until their networking sucks again, or redesign things to not depend on the network as much, which is unfortunate.

We could buy more Macs, or hope that #19178 and #23858 are sufficient.

@aclements

This comment has been minimized.

Copy link
Member

commented Feb 15, 2018

Sorry :(

I'm not sure this is the only problem, though. When I was trybot'ing those changes earlier today the dashboard was pretty quiet and I don't think any trybots other than mine were running, but it still took over two hours for all of the darwin-amd64 trybots to finish. It's only 16 CLs, so shouldn't 20 Mac VMs be enough to handle this plus a bit?

@bradfitz

This comment has been minimized.

Copy link
Member

commented Feb 15, 2018

@aclements, well, each CL consumes 3 Mac VMs for sharding (or up to 4 for Trybots). And some of the VMs are currently statically partitioned into distinct roles (some for macOS 10.8, some for macOS 10.12 Sierra, etc).

We expect only 15 VMs for macOS 10.11, which is what we run for TryBots.

But, ah --- only 7 are connected at the moment, which I see via https://farmer.golang.org/status/reverse.json and the front page of https://farmer.golang.org/

We don't alert on that, of course. #22603 and #21315 and #15760 track that.

@andybons and I need to come up with a plan, now that the go-cloud team has stopped working on it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.