Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile, runtime: GOEXPERIMENT to add two non-pointer words to iface/eface #45494

Open
zephyrtronium opened this issue Apr 10, 2021 · 4 comments

Comments

@zephyrtronium
Copy link
Contributor

@zephyrtronium zephyrtronium commented Apr 10, 2021

Background

Due to GC shape requirements, storing scalar (non-pointer) values into any interface-typed variable forces the value to be stored indirectly, usually allocated on the heap. In some applications, this can lead to many unexpected allocations and extraordinary load on the allocator and garbage collector, causing significant performance degradation. In the worst case, this could be a DOS vector for services that are not extensively optimized, especially those using packages like image or encoding/json.

Interface values are concretely represented as two distinct types: runtime.iface for interfaces with non-empty method sets and runtime.eface for interface{}.

go/src/runtime/runtime2.go

Lines 202 to 210 in 1129a60

type iface struct {
tab *itab
data unsafe.Pointer
}
type eface struct {
_type *_type
data unsafe.Pointer
}

Proposal

In order to reduce allocations in programs using small types in interfaces, I propose adding a value of GOEXPERIMENT, such as GOEXPERIMENT=largeiface, to change runtime.iface and runtime.eface to the following:

type iface2 struct {
	tab  *itab
	data unsafe.Pointer
	sdat [2]uintptr // scalar data
}

type eface2 struct {
	_type *_type
	data  unsafe.Pointer
	sdat  [2]uintptr
}

Then, whenever a type contains no more than one pointer and two scalar words, in any order, among any number of fields plus padding, values of that type may be copied into an iface2 or eface2 value, without being allocated on the heap. If a type contains more than one pointer or more than two scalar words, then only pointers to values of that type are stored when assigned to interface-typed variables. This extends the current behavior, which is the same for zero rather than two scalar words.

Note that these types are named differently from the existing ones. The names iface and eface would not exist in the runtime while the GOEXPERIMENT is enabled, and iface2 and eface2 would not exist while it is disabled. This improves maintainability by ensuring the correct name for the experiment setting is always used.

Examples

With this proposal, the following types would become directly assignable to interface values on all supported targets:

int
int64
string
[]T // for any type T
struct {
	b [8]byte
	p *T
}

// color.(N)RGBA64
struct {
	R, G, B, A uint16
}

// assuming unsafe.Alignof(new(T)) == unsafe.Sizeof(uintptr(0))
// and unsafe.Sizeof(thistype{}) % unsafe.Alignof(new(T)) == 0
struct {
	a uint8
	p *T
	b uint8
}

The following types would remain assignable only indirectly to interface values:

interface{} // too many pointers
[2]*T // too many pointers

// reflect.SliceHeader; too many scalar words
struct {
	Data uintptr
	Len  int
	Cap  int
}

// too many scalar words with padding,
// assuming the compiler never reorders struct fields
struct {
	a uint8
	u uintptr
	b uint8
}

Assignment combinations

This section assumes that the compiler never reorders struct fields.

There are two possible approaches to implement transfers of fields between iface2 (eface2) values and dynamic values, in order to support fields in any order. The first is to add a new uint8 field to runtime._type with three two-bit fields describing whether each successive data field of the iface2 is transferred to the first, second, or third word of the dynamic value, or not transferred at all. This is very simple to implement for both convT2E/I and assertions.

The second approach is to enumerate the 9 unique assignment combinations and store the appropriate combination either in a new uint8 field or in unused bits of the tflag field. This either uses no additional storage or leaves room for additional data about the permutation, such as whether each scalar data field is a floating-point value for more efficient interaction with the new register ABI.

Note that the unique assignment combinations are, denoting a pointer as P and a scalar word as S: (no assignment, i.e. size zero type), P, S, PS, SP, SS, PSS, SPS, SSP. Adding consideration for floating-point values creates two alternatives for each S, increasing the number of combinations to 23. Alternatively, the no-assignment case could be transformed to the P case by storing a reference to runtime.zerobase, reducing the combinations to 8.

Impact

The choice is specifically two scalar words because this is sufficient to avoid allocations for nearly every non-composite type in Go. Booleans, all numeric types except complex128 on targets where uintptr is four bytes, strings, slices, maps, and channels (and function values?) can be stored this way. Of course, qualifying struct and array types are included. Many common implementations of various standard library interfaces, especially color.Color, also fit in this representation.

The fact that these types would no longer force allocations could lead to significant performance improvements within the standard library. There are several alternative implementations of the functionality of packages like fmt, log, and encoding/json which are specifically designed to avoid interfaces for the sake of throughput. This GOEXPERIMENT would likely shorten the gap between standard library and highly optimized APIs from a few orders of magnitude to a few percent for common uses with small types like int and string.

I propose adding this as a GOEXPERIMENT rather than an outright change for two reasons. First, doubling the size of every interface value may significantly penalize some Go programs, especially those running in environments like cloud functions where available memory may be very small. Second, there are several Go repositories (not an exhaustive search; notably, some repositories show up many times in vendor directories but not in this search) that depend on the current layout of interface values, using unsafe or assembly. GOEXPERIMENT provides mechanisms – build tags and assembly definitions – for such code to be updated to work with the experiment both enabled and disabled.

The cost is that, as a GOEXPERIMENT, this will require a significant amount of parallel maintenance. cmd/compile and other compilers that implement the experiment will need to generate different code for interface assignments and conversions, in addition to detecting eligible types and computing their assignment combinations. The runtime, reflect, and internal/reflectlite will need significant duplicated code paths to handle the different layouts. The implementation of atomic.Value in package sync/atomic will need to be duplicated, and sync.Pool will need some minor duplication. Some third-party packages will require duplicated code if they intend to support the experiment.

As an experiment, the goal should be to collect data about the space–performance tradeoff. We should find which programs benefit most from reduced allocations and garbage collection, and how much benefit they experience. Moreover, we should measure the change in memory usage across many programs and find which programs experience unacceptable increases. If the experiment reveals a result like "CPU usage decreases, and memory usage is within a few percent for almost all applications due to fewer spans devoted to small objects," then it could be promoted to the default case.

Related issues

Open issues that this would (entirely or mostly) resolve:

  • #6918 – Slow performance due to frequent assignments of []byte to eface in database/sql causing many allocations. The issue proposes a new API that database drivers can use to avoid allocations. With this experiment, those assignments instead no longer allocate.
  • #8892 – Avoid allocating for 4-byte scalars assigned to iface/eface on 64-bit platforms by using bit masks in the data pointer. This experiment subsumes that case.
  • #15759 – Retrieving pixel data from images causes O(m×n) allocations that are each immediately discarded. This experiment allows all concrete color types in the standard library to be assigned to color.Color without allocating.
  • #23676 – Avoid allocating for string and slice headers assigned to iface/eface through better escape analysis. This experiment allows those headers to be assigned without allocating, regardless of whether the value escapes.
  • #26680 – Reinterpret the low bits of the itab/type pointer of iface/eface values as bit fields describing properties of the value. This experiment subsumes every potential use case described.
  • #32424 – Using reflect to iterate over maps with scalar key or element types causes each map value to be allocated. When the key or element type in the map is small, including every type mentioned in the issue, this experiment would prevent those allocations. I forgot that reflect would use Value, not interfaces. 🙂
  • #40128 – The encoding/json decoder "is a garbage factory" because it assigns tokens to the json.Token interface. Every type currently assigned to json.Token would avoid allocations with this experiment.
  • #44808 – Similar to #15759 above, but proposing to add interfaces to image and image/draw to avoid interface allocations explicitly. This experiment would largely obviate the need for those interfaces.

Some related closed issues:

  • #8405 – The original issue relating to assigning values to interfaces indirectly. Notably, this includes discussion about adding one scalar field to iface and eface.
  • #17725 – Allocation-free assignments of small integers to interfaces. This experiment greatly extends the advantages of this change.
  • #18704 – Preventing allocations for constants assigned to interface values. This experiment may be able to reclaim the binary size increase associated with the associated CL, since all typed constants would be allocation-free anyway (except complex128 constants on 32-bit targets).
@gopherbot gopherbot added this to the Proposal milestone Apr 10, 2021
@josharian
Copy link
Contributor

@josharian josharian commented Apr 12, 2021

Is your intent that this experiment might live on forever, or that at some point we would decide which representation to use and then switch as necessary? I'm reluctant to maintain two representations forever—it ends up being a knob, and Go generally eschews knobs.

@josharian
Copy link
Contributor

@josharian josharian commented Apr 12, 2021

Does this struct allocate, in the proposal?

struct {
  a [10]byte
  p *int
  b [2]byte
}

That is, are you proposing to re-pack structs, or only to re-order fields?

How about [2]int64 on a 64 bit system? That requires "splitting" a field into its components.

@zephyrtronium
Copy link
Contributor Author

@zephyrtronium zephyrtronium commented Apr 12, 2021

@josharian

Is your intent that this experiment might live on forever, or that at some point we would decide which representation to use and then switch as necessary? I'm reluctant to maintain two representations forever—it ends up being a knob, and Go generally eschews knobs.

To be honest, I expect this proposal to be declined primarily because of the amount of work it will be to implement and maintain. It would touch a lot of code. Even finding everything that needs to be updated could be difficult; that was an argument against adding a scalar word to iface/eface in #8405. On the other hand, "it might be hard" is a poor reason not to try, as they say.

The idea of this becoming a knob is a concern for me; it should be an experiment, not the new -O2. Deciding ahead of time that the experiment will exist for some fixed number of releases – perhaps long enough to include a developer survey – is a nice solution.

Does this struct allocate, in the proposal?

[omitted because I feel like I'm being way too long-winded in this thread]

That is, are you proposing to re-pack structs, or only to re-order fields?

This would allocate on all current targets, because the alignment requirement for *int adds padding between a and p and after b, such that there are too many scalar words. I am assuming that pointer types require alignment equal to their size, but I believe that is true everywhere.

The proposal is to reorder fields at run-time when transferring between concretely typed values and interfaces. This does not include packing. Another way to phrase the proposed behavior is to say that a value must be able to be reinterpreted (unsafely, ignoring pointers, but otherwise without loss of data) as [n]uintptr where n is 0, 1, 2, or 3; then the rules about one pointer and two scalar words apply.

How about [2]int64 on a 64 bit system? That requires "splitting" a field into its components.

This would not allocate. In general, I use the term "scalar word" rather than "scalar field," because the assignment combination applies regardless of how a type is divided into fields – it is at the memory level, not the type level. For example, this type would not allocate on 64-bit targets:

struct {
	a, b, c, d  byte
	e, f        int16
	p           *int
	g, h, i, j, k, l, m, n int8
}

It occurs to me now, though, that the alignment of a type might need to be at least that of uintptr for this to apply.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Apr 12, 2021

I don't see a reason for this to go through the proposal process. The only purpose of an experiment like this would be to see whether it should become the new default implementation. That is really a decision for the runtime and compiler team to make: whether this is worth trying, who will do the work, how to decide whether to adopt the new idea or not.

So I'm going to take this out of the proposal process. (I don't have an opinion as to whether this is a good idea or not.)

Thanks for raising the idea.

@ianlancetaylor ianlancetaylor changed the title proposal: cmd/compile, runtime: GOEXPERIMENT to add two non-pointer words to iface/eface cmd/compile, runtime: GOEXPERIMENT to add two non-pointer words to iface/eface Apr 12, 2021
@ianlancetaylor ianlancetaylor modified the milestones: Proposal, Unplanned Apr 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants