Provide a way to compile binaries with a split runtime: a runtime stub compiled with the user's binary and a separately-developed runtime server. This would allow for alternative implementations of the runtime, possibly running on remote machines, for remote debugging and providing a building block to distributed execution.
Distributed computing often involves expressing conceptually simple algorithms within a morass of plumbing.
Go's concurrency primatives mean a Go program already expresses the concurrent nature a program. Go's default of static linking means binaries can already migrate between machines without needing extensive dependency preparation.
Go is thus uniquely positioned to automate much of the boilerplate tasks involved in running distributed code.
While this is a broader endeavor, an initial step might be enabling the compiler and runtime to automatically use remote resources for existing Go programs. This is the scope of the current proposal.
The current runtime assumes the binary it is executing is colocated with the machine and operating system that are providing the underlying resources:
External inspection of a running Go process requires the user's code to interrogate the
Thus externalizing runtime functions cannot be done at the library level and modifications to the runtime itself are required.
Create a new build mode that divides the services of the runtime across two processes:
The runtime stub would be implemented as part of the go compiler.
A passthrough runtime server would be provided as part of the
The API between runtime stub and runtime server would be standardized and published so that developers can implement their own runtime servers.
User code would be compiled for this arrangement via a flag:
$ go build -buildmode=remote main.go
The resulting binary will not be able to run directly on the local machine since it contains only a runtime stub.
$ ./main panic: remote binary requires runtime server
Developers write their own runtime servers that manage execution of Go processes over the API:
$ go build -buildmode=remote main.go $ start tracelogger -port=11235 $ ./main -go.remote=localhost:11235
Service providers can implement runtime servers on reliable distributed infrastructure:
$ GOARCH=AMD64 go build -buildmode=remote main.go $ ./main -go.remote=goruntime.myproject.cloud.com
Runtime server implementations determine what resources the user's code will 'see,' e.g. local CPU information, local + remote memory, remote filesystems.
TODO: environment var?
$ go build -buildmode=remote main.go $ go remote ./main
In this case, the binary and its runtime server are run on the same machine
producing the same result as would be produced with:
$ go build main.go $ ./main
$ go remote -runtime=myhost.com:11235
Runtime servers are responsible for providing remote memory and OS resources to running processes and stepping through their code (as a debugger would).
This means runtime servers are also able to migrate running goroutines from an existing process to new uninitialized instances of the same binary, possibly on a different host:
The original developer need only ensure they have structured their program using the standard concurrency primitives provided by Go and the remote runtime can migrate workloads as needed.
A new standard library package
When a program is compiled with
Concurrency escape analysis
With a remote runtime, there are three possible places for variable allocation:
Just as escape analysis is used today to make the stack vs. heap decision, when building a remote binary the compiler computes the lexical closure of each goroutine and determines which variables are referenced in more than one goroutine or which addresses escape between them. These variables are assigned 'remote' allocation via the runtime server, and all accesses to them are converted to get/set API calls to the runtime server.
Built-in data types such as slices, maps, channels, or their underlying data structures may receive remote allocation. This means the runtime server must implement these built-in types and provide appropriate methods for manipulating them.
Accesses to external operating system resources (e.g. file descriptors) are always translated into calls to the runtime server.
The runtime stub instruments each goroutine to track CPU time, memory pressure and runtime API usage along each code block. This data is used in any migration decisions the runtime server may make.
The runtime server can read the state of a goroutine (program counter, stacks, memory containing pointers), launch another instance of the remote binary, initialize it with the given state, and resume just that one goroutine in the new process. Concurrency escape analysis ensures shared variables required by the migrated goroutine will have already been allocated on the runtime server and that all accesses to those variables were re-written into API calls to the runtime server.
As an optimization, the compiler can indentify code blocks that access only variables and resources not allocated on the runtime server. These 'local blocks' can be executed in a single step from the runtime server, rather than stepping through individual instructions.
Open issues (if applicable)
The text was updated successfully, but these errors were encountered:
This is an interesting idea but it's hard to see a route forward with the current runtime. If you can build a working prototype that would be interesting to share.
I will say this is a big change and is unlikely to land in the core, but it is an interesting proposal that I'd love to see explored, and thanks for doing such a good job writing it up.
Closing for now, but with encouragement.
I'd like to know exactly what parts of the runtime will be externalized. The proposal only derails how to use external runtime but gives no concrete definition of what a remote runtime is. For example, is the garbage collector local or remote? What about the goroutune scheduler?
Thanks for the encouragement. I'll give some thought to a suitable subset for prototyping.
@minux I don't have all the details, but here's what I think follows from the above:
All goroutine creation (including the first one) happens via the runtime server. The runtime server assigns goroutines to runtime stubs. Stubs are responsible for scheduling assigned goroutines to their local OS threads. In addition to the usual blocking events for goroutines, the goroutine would also block anytime it accesses a variable with remote allocation while it waits for the API call to the server to complete.
Garbage collection for stack and heap variables (which the compiler will have previously determined to be local to a goroutine) are the responsibility of the runtime stub. Runtime stubs also garbage collect remote variables for their own slice of the universe (perhaps using proxies stored on the heap). When a remote variable is garbage collected on a stub, the stub reports this to the server. The server (via the compiler) knows which goroutines (and thus which runtime stubs) might access which remote variables. It can thus determine when all possible runtime stubs have garbage collected that variable, and the server can garbage collect the variable itself.
It definitely sounds like an interesting idea.
I would ask a higher-level question - what is the programming model? What is a "runtime service"? It that just scheduling and memory allocation? What if I receive on a socket? What if I write to a file? How can I ever hope to write something performant? What if the remote runtime service is unavailable? I'm paying a remote operation for each runtime service, what am I getting in return?
Scheduling, memory allocation and I/O are indeed the primary responsibilities of the runtime server, though I see these more as hooks rather than services themselves.
re: performance, on any cloud infrastructure your IO is already running through multiple layers of RPCs. The runtime server can be co-located with binaries on the same machine, rack, or at the very least the same cloud service but importantly they need not be just another single process. Runtime services can be internally arbitrarily complex, including running as a distributed and transparently redundant global service. Runtime servers can take advantage of any internal infrastructure (e.g. RDMA NICs) without exposing its complexity to customers. In addition, runtime servers can be silently upgraded without changing (or perhaps even stopping) customer code. So I suspect one more layer of RPCs could be made performant enough for most non-concurrent workloads and very much better for concurrent ones.
Sockets and files may not be amenable to concurrency. Scaling sockets requires load balancers which necessarily takes us out of the world of a single Go binary. But service providers can make load balanced endpoints more natively available to users through libraries:
and their implementation can spawn goroutines to which their runtime can then route packets.
The infrastructure could manipulate the running process in the usual ways (spreading out CPU, I/O and memory load) but also in novel ways (clustering goroutines that communicate a lot together on the same machine) or even just by experimenting and learning about a given binary and its workload. The infrastructure could also respond to external signals like the diurnal cycle or geographic traffic patterns by firing up "Asia goroutines" or "Europe goroutines," all without any burden on the developer. The abstraction of a single process containing goroutines is preserved, it just happens to be running on a planet-scale machine.
File concurrency might require record-structured or KV data, but the right solution here is likely some distributed database service which, like load balancing, is outside the scope of the language and runtime.
One last benefit is reliable execution. All computation of a goroutine that happens between calls to the runtime server has by definition no externally visible effect. In other words each call to the runtime server is a chance to "checkpoint" a running goroutine (stacks, heap, PC--likely not a lot of data). If a machine fails, the goroutines running on it could be transparently restarted elsewhere. This could also happen across runtime server calls that can be implemented with transactions and playback. As always this would be invisible to the developer, their binary just runs forever.
To summarize, the (long-term) benefit is accessing the performance, reach, and redundancy of a modern distributed system with few code changes and almost no new concepts to learn. The infrastructure has the hooks into your program to measure and scale it how it wants, and as the infrastructure gets better your program does too.
Is it your goal to move goroutines that belong to one process to another physical machine with the help the proposed runtime servers? That will be very hard research problem for distributed computing..... Or is it your goal to stub out, say, network access from Go programs, so that, say, service providers could provide a proprietary "socket" implementation for clients at the application level? If that's the case, my question, does it has to be tightly integrated with the runtime? Can it be implemented as a set of packages that replaces the existing net, net/http packages? AppEngine does just that. Anyway, I think you proposed a solution without first giving specifics of what you're trying to achieve with the solution.