Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: remote runtime #17672

vsekhar opened this issue Oct 30, 2016 · 8 comments

Proposal: remote runtime #17672

vsekhar opened this issue Oct 30, 2016 · 8 comments


Copy link

@vsekhar vsekhar commented Oct 30, 2016


Provide a way to compile binaries with a split runtime: a runtime stub compiled with the user's binary and a separately-developed runtime server. This would allow for alternative implementations of the runtime, possibly running on remote machines, for remote debugging and providing a building block to distributed execution.


Distributed computing often involves expressing conceptually simple algorithms within a morass of plumbing.

Go's concurrency primatives mean a Go program already expresses the concurrent nature a program. Go's default of static linking means binaries can already migrate between machines without needing extensive dependency preparation.

Go is thus uniquely positioned to automate much of the boilerplate tasks involved in running distributed code.

While this is a broader endeavor, an initial step might be enabling the compiler and runtime to automatically use remote resources for existing Go programs. This is the scope of the current proposal.


The current runtime assumes the binary it is executing is colocated with the machine and operating system that are providing the underlying resources:

|--------Process 1--------|
 User code <--> Go runtime <--> OS
|------------Machine 1------------|

External inspection of a running Go process requires the user's code to interrogate the runtime package and expose an API. External control of a running Go process requires changes to gdb or rewriting source code. Even with such techniques, the ability to control a process or provide alternative implementations of the resources it requires is limited.

Thus externalizing runtime functions cannot be done at the library level and modifications to the runtime itself are required.


Create a new build mode that divides the services of the runtime across two processes:

|--------Process 1----------|           |---Process 2--|
 User code <--> runtime stub <--(API)--> runtime server <--> server OS
|--------Machine 1----------|           |----------Machine ?----------|

The runtime stub would be implemented as part of the go compiler.

A passthrough runtime server would be provided as part of the go command (see below).

The API between runtime stub and runtime server would be standardized and published so that developers can implement their own runtime servers.


User code would be compiled for this arrangement via a flag:

$ go build -buildmode=remote main.go


The resulting binary will not be able to run directly on the local machine since it contains only a runtime stub.

$ ./main
panic: remote binary requires runtime server

Developers write their own runtime servers that manage execution of Go processes over the API:

$ go build -buildmode=remote main.go
$ start tracelogger -port=11235
$ ./main -go.remote=localhost:11235

Service providers can implement runtime servers on reliable distributed infrastructure:

$ GOARCH=AMD64 go build -buildmode=remote main.go
$ ./main

Runtime server implementations determine what resources the user's code will 'see,' e.g. local CPU information, local + remote memory, remote filesystems.

TODO: environment var?

Local execution

A new go remote command can run a remote binary attached to a simple passthrough runtime server that uses the local machine's resources:

$ go build -buildmode=remote main.go
$ go remote ./main

In this case, the binary and its runtime server are run on the same machine

|--------Process 1----------|           |---Process 2--|
 User code <--> runtime stub <--(API)--> runtime server <--> server OS
|----------------------------Machine 1--------------------------------|

producing the same result as would be produced with:

$ go build main.go
$ ./main

Server-side execution

Running go remote without a local binary causes it to wait for a binary to execute from the runtime server:

$ go remote


Runtime servers are responsible for providing remote memory and OS resources to running processes and stepping through their code (as a debugger would).

This means runtime servers are also able to migrate running goroutines from an existing process to new uninitialized instances of the same binary, possibly on a different host:

              |----Runtime server----|
              | Remote vars          |
              | OS resources         |
                   ^             ^
                   |    (api)    |
                   v             v
|-------Process 1-------|    |-------Process 2-------|
| goroutine 1 (main())  |    | goroutine 2 (func2()) |
|  - program counter    |    |  - program counter    |
|  - stacks             |    |  - stacks             |
|  - heap               |    |  - heap               |
|-----------------------|    |-----------------------|

The original developer need only ensure they have structured their program using the standard concurrency primitives provided by Go and the remote runtime can migrate workloads as needed.


A new standard library package runtime/remote will be added to provide information about the remote execution environment and support the implementation of runtime servers.



When a program is compiled with -buildmode=remote, the compiler packages it with a runtime stub and modifies the program for execution with a remote runtime.

GOOS and GOARCH must match the architecture of the system running the remote binary, not that of the runtime server to which it is attached.

GOOSREMOTE specifies the OS abstraction presented by the runtime server. In the case of a passthrough server, it is simply the OS on which the runtime server is running. Other runtime servers may require different GOOSREMOTE values (or new ones). If not specified, GOOSREMOTE defaults to GOOS.


Before main() or any init() functions are called, the runtime stub will attempt to connect to its runtime server. If it fails, the process panics. If an error occurs during communication with the runtime server, the process panics. If the connection is lost, the process panics.

Concurrency escape analysis

With a remote runtime, there are three possible places for variable allocation:

  • Stack
  • Heap
  • Remote

Just as escape analysis is used today to make the stack vs. heap decision, when building a remote binary the compiler computes the lexical closure of each goroutine and determines which variables are referenced in more than one goroutine or which addresses escape between them. These variables are assigned 'remote' allocation via the runtime server, and all accesses to them are converted to get/set API calls to the runtime server.

Built-in data types such as slices, maps, channels, or their underlying data structures may receive remote allocation. This means the runtime server must implement these built-in types and provide appropriate methods for manipulating them.

OS resources

Accesses to external operating system resources (e.g. file descriptors) are always translated into calls to the runtime server.


The runtime stub instruments each goroutine to track CPU time, memory pressure and runtime API usage along each code block. This data is used in any migration decisions the runtime server may make.


The runtime server can read the state of a goroutine (program counter, stacks, memory containing pointers), launch another instance of the remote binary, initialize it with the given state, and resume just that one goroutine in the new process. Concurrency escape analysis ensures shared variables required by the migrated goroutine will have already been allocated on the runtime server and that all accesses to those variables were re-written into API calls to the runtime server.

Local blocks

As an optimization, the compiler can indentify code blocks that access only variables and resources not allocated on the runtime server. These 'local blocks' can be executed in a single step from the runtime server, rather than stepping through individual instructions.

Open issues (if applicable)

  • Many
@robpike robpike added the Proposal label Oct 31, 2016
Copy link

@robpike robpike commented Oct 31, 2016

This is an interesting idea but it's hard to see a route forward with the current runtime. If you can build a working prototype that would be interesting to share.

I will say this is a big change and is unlikely to land in the core, but it is an interesting proposal that I'd love to see explored, and thanks for doing such a good job writing it up.

Closing for now, but with encouragement.

@robpike robpike closed this Oct 31, 2016
Copy link

@minux minux commented Oct 31, 2016

Copy link
Contributor Author

@vsekhar vsekhar commented Nov 1, 2016

Thanks for the encouragement. I'll give some thought to a suitable subset for prototyping.

@minux I don't have all the details, but here's what I think follows from the above:

All goroutine creation (including the first one) happens via the runtime server. The runtime server assigns goroutines to runtime stubs. Stubs are responsible for scheduling assigned goroutines to their local OS threads. In addition to the usual blocking events for goroutines, the goroutine would also block anytime it accesses a variable with remote allocation while it waits for the API call to the server to complete.

Garbage collection for stack and heap variables (which the compiler will have previously determined to be local to a goroutine) are the responsibility of the runtime stub. Runtime stubs also garbage collect remote variables for their own slice of the universe (perhaps using proxies stored on the heap). When a remote variable is garbage collected on a stub, the stub reports this to the server. The server (via the compiler) knows which goroutines (and thus which runtime stubs) might access which remote variables. It can thus determine when all possible runtime stubs have garbage collected that variable, and the server can garbage collect the variable itself.

Copy link

@randall77 randall77 commented Nov 1, 2016

It definitely sounds like an interesting idea.
By "interesting", I think I mean a combination of "wacky way-out-there idea" and "potentially revolutionary if it could work". I agree with Rob, it seems too researchy to consider adding it to the current runtime at the moment. We're too real-world at this point to consider as a proposal something this radical.

I would ask a higher-level question - what is the programming model? What is a "runtime service"? It that just scheduling and memory allocation? What if I receive on a socket? What if I write to a file? How can I ever hope to write something performant? What if the remote runtime service is unavailable? I'm paying a remote operation for each runtime service, what am I getting in return?

Copy link
Contributor Author

@vsekhar vsekhar commented Nov 1, 2016

Scheduling, memory allocation and I/O are indeed the primary responsibilities of the runtime server, though I see these more as hooks rather than services themselves.

re: performance, on any cloud infrastructure your IO is already running through multiple layers of RPCs. The runtime server can be co-located with binaries on the same machine, rack, or at the very least the same cloud service but importantly they need not be just another single process. Runtime services can be internally arbitrarily complex, including running as a distributed and transparently redundant global service. Runtime servers can take advantage of any internal infrastructure (e.g. RDMA NICs) without exposing its complexity to customers. In addition, runtime servers can be silently upgraded without changing (or perhaps even stopping) customer code. So I suspect one more layer of RPCs could be made performant enough for most non-concurrent workloads and very much better for concurrent ones.

Sockets and files may not be amenable to concurrency. Scaling sockets requires load balancers which necessarily takes us out of the world of a single Go binary. But service providers can make load balanced endpoints more natively available to users through libraries:

runtimeservicelib.ListenAndServe("my_cloud_endpoint_id", handler)

and their implementation can spawn goroutines to which their runtime can then route packets.

The infrastructure could manipulate the running process in the usual ways (spreading out CPU, I/O and memory load) but also in novel ways (clustering goroutines that communicate a lot together on the same machine) or even just by experimenting and learning about a given binary and its workload. The infrastructure could also respond to external signals like the diurnal cycle or geographic traffic patterns by firing up "Asia goroutines" or "Europe goroutines," all without any burden on the developer. The abstraction of a single process containing goroutines is preserved, it just happens to be running on a planet-scale machine.

File concurrency might require record-structured or KV data, but the right solution here is likely some distributed database service which, like load balancing, is outside the scope of the language and runtime.

One last benefit is reliable execution. All computation of a goroutine that happens between calls to the runtime server has by definition no externally visible effect. In other words each call to the runtime server is a chance to "checkpoint" a running goroutine (stacks, heap, PC--likely not a lot of data). If a machine fails, the goroutines running on it could be transparently restarted elsewhere. This could also happen across runtime server calls that can be implemented with transactions and playback. As always this would be invisible to the developer, their binary just runs forever.

To summarize, the (long-term) benefit is accessing the performance, reach, and redundancy of a modern distributed system with few code changes and almost no new concepts to learn. The infrastructure has the hooks into your program to measure and scale it how it wants, and as the infrastructure gets better your program does too.

Copy link

@minux minux commented Nov 1, 2016

Copy link
Contributor Author

@vsekhar vsekhar commented Nov 1, 2016

My goal is the former. My hypothesis is that the design of the language and runtime make this possible. Though I'm a little stuck trying to determine how to test this hypothesis without re-implementing the compiler and runtime...

Copy link

@minux minux commented Nov 1, 2016

@golang golang locked and limited conversation to collaborators Nov 1, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
You can’t perform that action at this time.