Skip to content

cmd/compile, runtime: GOEXPERIMENT to add two non-pointer words to iface/eface #45494

Open
@zephyrtronium

Description

@zephyrtronium

Background

Due to GC shape requirements, storing scalar (non-pointer) values into any interface-typed variable forces the value to be stored indirectly, usually allocated on the heap. In some applications, this can lead to many unexpected allocations and extraordinary load on the allocator and garbage collector, causing significant performance degradation. In the worst case, this could be a DOS vector for services that are not extensively optimized, especially those using packages like image or encoding/json.

Interface values are concretely represented as two distinct types: runtime.iface for interfaces with non-empty method sets and runtime.eface for interface{}.

go/src/runtime/runtime2.go

Lines 202 to 210 in 1129a60

type iface struct {
tab *itab
data unsafe.Pointer
}
type eface struct {
_type *_type
data unsafe.Pointer
}

Proposal

In order to reduce allocations in programs using small types in interfaces, I propose adding a value of GOEXPERIMENT, such as GOEXPERIMENT=largeiface, to change runtime.iface and runtime.eface to the following:

type iface2 struct {
	tab  *itab
	data unsafe.Pointer
	sdat [2]uintptr // scalar data
}

type eface2 struct {
	_type *_type
	data  unsafe.Pointer
	sdat  [2]uintptr
}

Then, whenever a type contains no more than one pointer and two scalar words, in any order, among any number of fields plus padding, values of that type may be copied into an iface2 or eface2 value, without being allocated on the heap. If a type contains more than one pointer or more than two scalar words, then only pointers to values of that type are stored when assigned to interface-typed variables. This extends the current behavior, which is the same for zero rather than two scalar words.

Note that these types are named differently from the existing ones. The names iface and eface would not exist in the runtime while the GOEXPERIMENT is enabled, and iface2 and eface2 would not exist while it is disabled. This improves maintainability by ensuring the correct name for the experiment setting is always used.

Examples

With this proposal, the following types would become directly assignable to interface values on all supported targets:

int
int64
string
[]T // for any type T
struct {
	b [8]byte
	p *T
}

// color.(N)RGBA64
struct {
	R, G, B, A uint16
}

// assuming unsafe.Alignof(new(T)) == unsafe.Sizeof(uintptr(0))
// and unsafe.Sizeof(thistype{}) % unsafe.Alignof(new(T)) == 0
struct {
	a uint8
	p *T
	b uint8
}

The following types would remain assignable only indirectly to interface values:

interface{} // too many pointers
[2]*T // too many pointers

// reflect.SliceHeader; too many scalar words
struct {
	Data uintptr
	Len  int
	Cap  int
}

// too many scalar words with padding,
// assuming the compiler never reorders struct fields
struct {
	a uint8
	u uintptr
	b uint8
}

Assignment combinations

This section assumes that the compiler never reorders struct fields.

There are two possible approaches to implement transfers of fields between iface2 (eface2) values and dynamic values, in order to support fields in any order. The first is to add a new uint8 field to runtime._type with three two-bit fields describing whether each successive data field of the iface2 is transferred to the first, second, or third word of the dynamic value, or not transferred at all. This is very simple to implement for both convT2E/I and assertions.

The second approach is to enumerate the 9 unique assignment combinations and store the appropriate combination either in a new uint8 field or in unused bits of the tflag field. This either uses no additional storage or leaves room for additional data about the permutation, such as whether each scalar data field is a floating-point value for more efficient interaction with the new register ABI.

Note that the unique assignment combinations are, denoting a pointer as P and a scalar word as S: (no assignment, i.e. size zero type), P, S, PS, SP, SS, PSS, SPS, SSP. Adding consideration for floating-point values creates two alternatives for each S, increasing the number of combinations to 23. Alternatively, the no-assignment case could be transformed to the P case by storing a reference to runtime.zerobase, reducing the combinations to 8.

Impact

The choice is specifically two scalar words because this is sufficient to avoid allocations for nearly every non-composite type in Go. Booleans, all numeric types except complex128 on targets where uintptr is four bytes, strings, slices, maps, and channels (and function values?) can be stored this way. Of course, qualifying struct and array types are included. Many common implementations of various standard library interfaces, especially color.Color, also fit in this representation.

The fact that these types would no longer force allocations could lead to significant performance improvements within the standard library. There are several alternative implementations of the functionality of packages like fmt, log, and encoding/json which are specifically designed to avoid interfaces for the sake of throughput. This GOEXPERIMENT would likely shorten the gap between standard library and highly optimized APIs from a few orders of magnitude to a few percent for common uses with small types like int and string.

I propose adding this as a GOEXPERIMENT rather than an outright change for two reasons. First, doubling the size of every interface value may significantly penalize some Go programs, especially those running in environments like cloud functions where available memory may be very small. Second, there are several Go repositories (not an exhaustive search; notably, some repositories show up many times in vendor directories but not in this search) that depend on the current layout of interface values, using unsafe or assembly. GOEXPERIMENT provides mechanisms – build tags and assembly definitions – for such code to be updated to work with the experiment both enabled and disabled.

The cost is that, as a GOEXPERIMENT, this will require a significant amount of parallel maintenance. cmd/compile and other compilers that implement the experiment will need to generate different code for interface assignments and conversions, in addition to detecting eligible types and computing their assignment combinations. The runtime, reflect, and internal/reflectlite will need significant duplicated code paths to handle the different layouts. The implementation of atomic.Value in package sync/atomic will need to be duplicated, and sync.Pool will need some minor duplication. Some third-party packages will require duplicated code if they intend to support the experiment.

As an experiment, the goal should be to collect data about the space–performance tradeoff. We should find which programs benefit most from reduced allocations and garbage collection, and how much benefit they experience. Moreover, we should measure the change in memory usage across many programs and find which programs experience unacceptable increases. If the experiment reveals a result like "CPU usage decreases, and memory usage is within a few percent for almost all applications due to fewer spans devoted to small objects," then it could be promoted to the default case.

Related issues

Open issues that this would (entirely or mostly) resolve:

Some related closed issues:

Metadata

Metadata

Assignees

No one assigned

    Labels

    NeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.help wanted

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions