There are times when people need to investigate issues that require configuration or environment that is hard to reproduce locally, and they use gomote instances for such debugging sessions.
A limitation of the current implementation of this system is that all remote buildlets are lost when cmd/coordinator restarts. This limitation is documented in doc/remote-buildlet.txt:
Currently, if the coordinator dies or restarts, all buildlets are lost.
An unfortunate consequence is that this can reduce the window of time when cmd/coordinator can be re-deployed without disrupting investigative done by others.
This can generally be worked around by coordinating with people who are using gomotes to find a good time for a deploy, but it scales poorly when there are more concurrent investigations being done, especially during business hours.
This is the tracking issue to track how much of a problem it is and investigate ways to improve this situation if it ends up becoming a bottleneck.
An ultimate solution would be a mechanism that makes it possible to redeploy coordinator without losing gomote sessions, or at least makes it possible to re-connect to them without their state getting lost.
Short of that solution, if finding a deploy window starts to become difficult to do and starts to block other work, then perhaps some sort of a queue can be arranged so that a future deploy can be scheduled in advance, and people don't start new gomote sessions just as older ones wrap up.
In my experience this isn't a huge problem that warrants spending a lot of time on yet, but I just wanted to share some initial thoughts.
An interim solution could be to break out the gomote ssh proxy endpoint into its own service, as it likely needs to be deployed less frequently than the coordinator itself. This would still require some persistent state to be managed across coordinator deploys, though.