Skip to content

Proposal: remote runtime #17672

@vsekhar

Description

@vsekhar

Abstract

Provide a way to compile binaries with a split runtime: a runtime stub compiled with the user's binary and a separately-developed runtime server. This would allow for alternative implementations of the runtime, possibly running on remote machines, for remote debugging and providing a building block to distributed execution.

Motivation

Distributed computing often involves expressing conceptually simple algorithms within a morass of plumbing.

Go's concurrency primatives mean a Go program already expresses the concurrent nature a program. Go's default of static linking means binaries can already migrate between machines without needing extensive dependency preparation.

Go is thus uniquely positioned to automate much of the boilerplate tasks involved in running distributed code.

While this is a broader endeavor, an initial step might be enabling the compiler and runtime to automatically use remote resources for existing Go programs. This is the scope of the current proposal.

Background

The current runtime assumes the binary it is executing is colocated with the machine and operating system that are providing the underlying resources:

|--------Process 1--------|
 User code <--> Go runtime <--> OS
|------------Machine 1------------|

External inspection of a running Go process requires the user's code to interrogate the runtime package and expose an API. External control of a running Go process requires changes to gdb or rewriting source code. Even with such techniques, the ability to control a process or provide alternative implementations of the resources it requires is limited.

Thus externalizing runtime functions cannot be done at the library level and modifications to the runtime itself are required.

Proposal

Create a new build mode that divides the services of the runtime across two processes:

|--------Process 1----------|           |---Process 2--|
 User code <--> runtime stub <--(API)--> runtime server <--> server OS
|--------Machine 1----------|           |----------Machine ?----------|

The runtime stub would be implemented as part of the go compiler.

A passthrough runtime server would be provided as part of the go command (see below).

The API between runtime stub and runtime server would be standardized and published so that developers can implement their own runtime servers.

Build

User code would be compiled for this arrangement via a flag:

$ go build -buildmode=remote main.go

Execution

The resulting binary will not be able to run directly on the local machine since it contains only a runtime stub.

$ ./main
panic: remote binary requires runtime server

Developers write their own runtime servers that manage execution of Go processes over the API:

$ go build -buildmode=remote main.go
$ start tracelogger -port=11235
$ ./main -go.remote=localhost:11235

Service providers can implement runtime servers on reliable distributed infrastructure:

$ GOARCH=AMD64 go build -buildmode=remote main.go
$ ./main -go.remote=goruntime.myproject.cloud.com

Runtime server implementations determine what resources the user's code will 'see,' e.g. local CPU information, local + remote memory, remote filesystems.

TODO: environment var? GORUNTIMESERVER=myhost.com:11235

Local execution

A new go remote command can run a remote binary attached to a simple passthrough runtime server that uses the local machine's resources:

$ go build -buildmode=remote main.go
$ go remote ./main

In this case, the binary and its runtime server are run on the same machine

|--------Process 1----------|           |---Process 2--|
 User code <--> runtime stub <--(API)--> runtime server <--> server OS
|----------------------------Machine 1--------------------------------|

producing the same result as would be produced with:

$ go build main.go
$ ./main

Server-side execution

Running go remote without a local binary causes it to wait for a binary to execute from the runtime server:

$ go remote -runtime=myhost.com:11235

Migration

Runtime servers are responsible for providing remote memory and OS resources to running processes and stepping through their code (as a debugger would).

This means runtime servers are also able to migrate running goroutines from an existing process to new uninitialized instances of the same binary, possibly on a different host:

              |----Runtime server----|
              | Remote vars          |
              | OS resources         |
              |----------------------|
                   ^             ^
                   |    (api)    |
                   v             v
|-------Process 1-------|    |-------Process 2-------|
| goroutine 1 (main())  |    | goroutine 2 (func2()) |
|  - program counter    |    |  - program counter    |
|  - stacks             |    |  - stacks             |
|  - heap               |    |  - heap               |
|-----------------------|    |-----------------------|

The original developer need only ensure they have structured their program using the standard concurrency primitives provided by Go and the remote runtime can migrate workloads as needed.

Compatibility

A new standard library package runtime/remote will be added to provide information about the remote execution environment and support the implementation of runtime servers.

Implementation

Compilation

When a program is compiled with -buildmode=remote, the compiler packages it with a runtime stub and modifies the program for execution with a remote runtime.

GOOS and GOARCH must match the architecture of the system running the remote binary, not that of the runtime server to which it is attached.

GOOSREMOTE specifies the OS abstraction presented by the runtime server. In the case of a passthrough server, it is simply the OS on which the runtime server is running. Other runtime servers may require different GOOSREMOTE values (or new ones). If not specified, GOOSREMOTE defaults to GOOS.

Connection

Before main() or any init() functions are called, the runtime stub will attempt to connect to its runtime server. If it fails, the process panics. If an error occurs during communication with the runtime server, the process panics. If the connection is lost, the process panics.

Concurrency escape analysis

With a remote runtime, there are three possible places for variable allocation:

  • Stack
  • Heap
  • Remote

Just as escape analysis is used today to make the stack vs. heap decision, when building a remote binary the compiler computes the lexical closure of each goroutine and determines which variables are referenced in more than one goroutine or which addresses escape between them. These variables are assigned 'remote' allocation via the runtime server, and all accesses to them are converted to get/set API calls to the runtime server.

Built-in data types such as slices, maps, channels, or their underlying data structures may receive remote allocation. This means the runtime server must implement these built-in types and provide appropriate methods for manipulating them.

OS resources

Accesses to external operating system resources (e.g. file descriptors) are always translated into calls to the runtime server.

Instrumentation

The runtime stub instruments each goroutine to track CPU time, memory pressure and runtime API usage along each code block. This data is used in any migration decisions the runtime server may make.

Migration

The runtime server can read the state of a goroutine (program counter, stacks, memory containing pointers), launch another instance of the remote binary, initialize it with the given state, and resume just that one goroutine in the new process. Concurrency escape analysis ensures shared variables required by the migrated goroutine will have already been allocated on the runtime server and that all accesses to those variables were re-written into API calls to the runtime server.

Local blocks

As an optimization, the compiler can indentify code blocks that access only variables and resources not allocated on the runtime server. These 'local blocks' can be executed in a single step from the runtime server, rather than stepping through individual instructions.

Open issues (if applicable)

  • Many

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions