x/build/cmd/coordinator: active gomote sessions and coordinator deploys are mutually exclusive #39280

dmitshur · 2020-05-27T17:39:04Z

There are times when people need to investigate issues that require configuration or environment that is hard to reproduce locally, and they use gomote instances for such debugging sessions.

A limitation of the current implementation of this system is that all remote buildlets are lost when cmd/coordinator restarts. This limitation is documented in doc/remote-buildlet.txt:

Currently, if the coordinator dies or restarts, all buildlets are lost.

An unfortunate consequence is that this can reduce the window of time when cmd/coordinator can be re-deployed without disrupting investigative done by others.

This can generally be worked around by coordinating with people who are using gomotes to find a good time for a deploy, but it scales poorly when there are more concurrent investigations being done, especially during business hours.

This is the tracking issue to track how much of a problem it is and investigate ways to improve this situation if it ends up becoming a bottleneck.

/cc @cagedmantis @toothrot @andybons

The text was updated successfully, but these errors were encountered:

dmitshur · 2020-05-27T17:44:09Z

An ultimate solution would be a mechanism that makes it possible to redeploy coordinator without losing gomote sessions, or at least makes it possible to re-connect to them without their state getting lost.

Short of that solution, if finding a deploy window starts to become difficult to do and starts to block other work, then perhaps some sort of a queue can be arranged so that a future deploy can be scheduled in advance, and people don't start new gomote sessions just as older ones wrap up.

In my experience this isn't a huge problem that warrants spending a lot of time on yet, but I just wanted to share some initial thoughts.

toothrot · 2020-05-27T17:53:11Z

An interim solution could be to break out the gomote ssh proxy endpoint into its own service, as it likely needs to be deployed less frequently than the coordinator itself. This would still require some persistent state to be managed across coordinator deploys, though.

dmitshur added Builders NeedsInvestigation labels May 27, 2020

dmitshur added this to the Unreleased milestone May 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

x/build/cmd/coordinator: active gomote sessions and coordinator deploys are mutually exclusive #39280

x/build/cmd/coordinator: active gomote sessions and coordinator deploys are mutually exclusive #39280

dmitshur commented May 27, 2020 •

edited

Loading

dmitshur commented May 27, 2020

toothrot commented May 27, 2020

x/build/cmd/coordinator: active gomote sessions and coordinator deploys are mutually exclusive #39280

x/build/cmd/coordinator: active gomote sessions and coordinator deploys are mutually exclusive #39280

Comments

dmitshur commented May 27, 2020 • edited Loading

dmitshur commented May 27, 2020

toothrot commented May 27, 2020

dmitshur commented May 27, 2020 •

edited

Loading