Description
Background
Due to GC shape requirements, storing scalar (non-pointer) values into any interface-typed variable forces the value to be stored indirectly, usually allocated on the heap. In some applications, this can lead to many unexpected allocations and extraordinary load on the allocator and garbage collector, causing significant performance degradation. In the worst case, this could be a DOS vector for services that are not extensively optimized, especially those using packages like image or encoding/json.
Interface values are concretely represented as two distinct types: runtime.iface
for interfaces with non-empty method sets and runtime.eface
for interface{}
.
Lines 202 to 210 in 1129a60
Proposal
In order to reduce allocations in programs using small types in interfaces, I propose adding a value of GOEXPERIMENT
, such as GOEXPERIMENT=largeiface
, to change runtime.iface
and runtime.eface
to the following:
type iface2 struct {
tab *itab
data unsafe.Pointer
sdat [2]uintptr // scalar data
}
type eface2 struct {
_type *_type
data unsafe.Pointer
sdat [2]uintptr
}
Then, whenever a type contains no more than one pointer and two scalar words, in any order, among any number of fields plus padding, values of that type may be copied into an iface2 or eface2 value, without being allocated on the heap. If a type contains more than one pointer or more than two scalar words, then only pointers to values of that type are stored when assigned to interface-typed variables. This extends the current behavior, which is the same for zero rather than two scalar words.
Note that these types are named differently from the existing ones. The names iface
and eface
would not exist in the runtime while the GOEXPERIMENT is enabled, and iface2
and eface2
would not exist while it is disabled. This improves maintainability by ensuring the correct name for the experiment setting is always used.
Examples
With this proposal, the following types would become directly assignable to interface values on all supported targets:
int
int64
string
[]T // for any type T
struct {
b [8]byte
p *T
}
// color.(N)RGBA64
struct {
R, G, B, A uint16
}
// assuming unsafe.Alignof(new(T)) == unsafe.Sizeof(uintptr(0))
// and unsafe.Sizeof(thistype{}) % unsafe.Alignof(new(T)) == 0
struct {
a uint8
p *T
b uint8
}
The following types would remain assignable only indirectly to interface values:
interface{} // too many pointers
[2]*T // too many pointers
// reflect.SliceHeader; too many scalar words
struct {
Data uintptr
Len int
Cap int
}
// too many scalar words with padding,
// assuming the compiler never reorders struct fields
struct {
a uint8
u uintptr
b uint8
}
Assignment combinations
This section assumes that the compiler never reorders struct fields.
There are two possible approaches to implement transfers of fields between iface2 (eface2) values and dynamic values, in order to support fields in any order. The first is to add a new uint8
field to runtime._type
with three two-bit fields describing whether each successive data field of the iface2 is transferred to the first, second, or third word of the dynamic value, or not transferred at all. This is very simple to implement for both convT2E/I and assertions.
The second approach is to enumerate the 9 unique assignment combinations and store the appropriate combination either in a new uint8
field or in unused bits of the tflag
field. This either uses no additional storage or leaves room for additional data about the permutation, such as whether each scalar data field is a floating-point value for more efficient interaction with the new register ABI.
Note that the unique assignment combinations are, denoting a pointer as P and a scalar word as S: (no assignment, i.e. size zero type), P, S, PS, SP, SS, PSS, SPS, SSP. Adding consideration for floating-point values creates two alternatives for each S, increasing the number of combinations to 23. Alternatively, the no-assignment case could be transformed to the P case by storing a reference to runtime.zerobase
, reducing the combinations to 8.
Impact
The choice is specifically two scalar words because this is sufficient to avoid allocations for nearly every non-composite type in Go. Booleans, all numeric types except complex128
on targets where uintptr
is four bytes, strings, slices, maps, and channels (and function values?) can be stored this way. Of course, qualifying struct and array types are included. Many common implementations of various standard library interfaces, especially color.Color
, also fit in this representation.
The fact that these types would no longer force allocations could lead to significant performance improvements within the standard library. There are several alternative implementations of the functionality of packages like fmt, log, and encoding/json which are specifically designed to avoid interfaces for the sake of throughput. This GOEXPERIMENT would likely shorten the gap between standard library and highly optimized APIs from a few orders of magnitude to a few percent for common uses with small types like int and string.
I propose adding this as a GOEXPERIMENT rather than an outright change for two reasons. First, doubling the size of every interface value may significantly penalize some Go programs, especially those running in environments like cloud functions where available memory may be very small. Second, there are several Go repositories (not an exhaustive search; notably, some repositories show up many times in vendor directories but not in this search) that depend on the current layout of interface values, using unsafe or assembly. GOEXPERIMENT provides mechanisms – build tags and assembly definitions – for such code to be updated to work with the experiment both enabled and disabled.
The cost is that, as a GOEXPERIMENT, this will require a significant amount of parallel maintenance. cmd/compile and other compilers that implement the experiment will need to generate different code for interface assignments and conversions, in addition to detecting eligible types and computing their assignment combinations. The runtime, reflect, and internal/reflectlite will need significant duplicated code paths to handle the different layouts. The implementation of atomic.Value in package sync/atomic will need to be duplicated, and sync.Pool will need some minor duplication. Some third-party packages will require duplicated code if they intend to support the experiment.
As an experiment, the goal should be to collect data about the space–performance tradeoff. We should find which programs benefit most from reduced allocations and garbage collection, and how much benefit they experience. Moreover, we should measure the change in memory usage across many programs and find which programs experience unacceptable increases. If the experiment reveals a result like "CPU usage decreases, and memory usage is within a few percent for almost all applications due to fewer spans devoted to small objects," then it could be promoted to the default case.
Related issues
Open issues that this would (entirely or mostly) resolve:
- database/sql: provide optional way to mitigate convT2E allocations #6918 – Slow performance due to frequent assignments of []byte to eface in database/sql causing many allocations. The issue proposes a new API that database drivers can use to avoid allocations. With this experiment, those assignments instead no longer allocate.
- runtime: remove unnecessary allocations in convT2E #8892 – Avoid allocating for 4-byte scalars assigned to iface/eface on 64-bit platforms by using bit masks in the data pointer. This experiment subsumes that case.
- image: optimize Image.At().RGBA() #15759 – Retrieving pixel data from images causes O(m×n) allocations that are each immediately discarded. This experiment allows all concrete color types in the standard library to be assigned to color.Color without allocating.
- cmd/compile: stack allocate string and slice headers when passed through non-escaping interfaces #23676 – Avoid allocating for string and slice headers assigned to iface/eface through better escape analysis. This experiment allows those headers to be assigned without allocating, regardless of whether the value escapes.
- cmd/compile, runtime: pack info into low bits of interface type pointers #26680 – Reinterpret the low bits of the itab/type pointer of iface/eface values as bit fields describing properties of the value. This experiment subsumes every potential use case described.
reflect: map iteration does unnecessary excessive allocation for non-pointer types #32424 – Using reflect to iterate over maps with scalar key or element types causes each map value to be allocated. When the key or element type in the map is small, including every type mentioned in the issue, this experiment would prevent those allocations.I forgot that reflect would use Value, not interfaces. 🙂- proposal: encoding/json: garbage-free reading of tokens #40128 – The encoding/json decoder "is a garbage factory" because it assigns tokens to the json.Token interface. Every type currently assigned to json.Token would avoid allocations with this experiment.
- image, image/draw: add interfaces for using RGBA64 directly #44808 – Similar to image: optimize Image.At().RGBA() #15759 above, but proposing to add interfaces to image and image/draw to avoid interface allocations explicitly. This experiment would largely obviate the need for those interfaces.
Some related closed issues:
- cmd/gc: make interface updates atomic wrt garbage collector #8405 – The original issue relating to assigning values to interfaces indirectly. Notably, this includes discussion about adding one scalar field to iface and eface.
- runtime: don't allocate when putting a bool into an interface #17725 – Allocation-free assignments of small integers to interfaces. This experiment greatly extends the advantages of this change.
- cmd/compile: don't allocate when putting constant strings in an interface #18704 – Preventing allocations for constants assigned to interface values. This experiment may be able to reclaim the binary size increase associated with the associated CL, since all typed constants would be allocation-free anyway (except complex128 constants on 32-bit targets).