Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: teach escape analysis to conditionally stack alloc interface method call parameters like a slice in w.Write(b) #72036

Open
thepudds opened this issue Feb 28, 2025 · 4 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. Implementation Issues describing a semantics-preserving change to the Go implementation. NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. Performance
Milestone

Comments

@thepudds
Copy link
Contributor

Go version

tip

What did you do?

Run this benchmark:

func CallWrite(w io.Writer, x byte) {
	// The backing array for this slice currently escapes
	// because it is used as an interface method call arg.
	b := make([]byte, 0, 64)
	b = append(b, x)
	w.Write(b) // Interface method call.
}

func BenchmarkCallWrite(b *testing.B) {
	g := &GoodWriter{}
	b.ReportAllocs()
	for i := 0; i < b.N; i++ {
		CallWrite(g, 0)
	}
}

What did you see happen?

Interface method arguments are marked as escaping. In our example, the byte slice is heap allocated due to its use as the argument to the w.Write:

BenchmarkCallWrite	31895619	        47.36 ns/op	     64 B/op	       1 allocs/op

What did you expect to see?

It would be nice to see 0 allocations.

Suggestion

I suspect in many cases we can teach escape analysis to only conditionally heap allocate an interface method parameter (the byte slice b in our example) depending on what concrete implementation is passed in at execution time behind the interface.

Today, escape analysis cannot generally reach different conclusions for distinct control flow paths, nor set up a single value to be conditionally stack or heap allocated, but we could teach it to do those things in certain situations, including for the type assertions that are automatically introduced at compile time by PGO-based devirtualization. The suggested approach doesn't solely rely on PGO, but people who are concerned about performance really should be using PGO, including PGO has some inlining superpowers that are helpful for avoiding allocations. (Today, PGO devirtualization doesn't avoid the allocation we are discussing).

This is to help target cases where the concrete type behind an interface is not statically known. (That is, the compiler today is able to recognize in cases like w := &GoodWriter{}; w.Write(b) that statically w must have a specific concrete type, but that does not help today when the concrete type for w is not statically known, such as if w is passed in as a parameter as in our example above).

I have a basic working version, and I plan to send a CL. It still needs cleanup, more tests, I need to look at a broader set of results, etc., and it has a couple of shortcuts that I am in the middle of removing, but I am cautiously hopeful it can be made to work robustly.

Benchmark results for the example from the playground link above using that CL with PGO enabled:


             │    1.24.out       │             new.out                  │
             │     sec/op        │     sec/op     vs base               │
CallWrite-4     48.850n ± 2%        4.385n ± 1%  -91.02% (p=0.000 n=20)

             │    1.24.out       │             new.out                  │
             │      B/op         │     B/op     vs base                 │
CallWrite-4       64.00 ± 0%        0.00 ± 0%  -100.00% (p=0.000 n=20)

             │    1.24.out       │             new.out                  │
             │    allocs/op      │   allocs/op   vs base                │
CallWrite-4      1.000 ± 0%         0.000 ± 0%  -100.00% (p=0.000 n=20)

Any feedback welcome, especially of the flavor "This will never work for reasons X and Y" (which would save everyone time 😅).

Discussion

When compiling CallWrite, the compiler does not know the concrete implementation hiding behind interface w. For example, CallWrite might be invoked with a nice concrete implementation that does not retain/leak the w.Write byte slice argument (like GoodWriter in our example above), or CallWrite might be invoked with something like LeakingWriter:

// LeakingWriter leaks its Write argument to the heap.
type LeakingWriter struct{ a, b int }

func (w *LeakingWriter) Write(p []byte) (n int, err error) {
	global = p
	return len(p), nil
}

var global []byte

io.Writer happens to document that implementations must not retain the byte slice, but the compiler doesn't read documentation and this type of allocation also affects other interfaces as well that don't document similar prohibitions.

If we collect a profile that is able to observe that CallWrite is frequently called with GoodWriter, the compiler can use PGO to conditionally devirtualize w in most cases, and effectively rewrite the IR at compile time to something akin to:

func OneTypeAssert(w io.Writer, x byte) {
	b := make([]byte, 0, 64)
	b = append(b, x)

	// PGO-based rewrite of w.Write(b)
	tmpw, ok := w.(*GoodWriter)
	if ok {
		tmpw.Write(b) // concrete method call
	} else {
		w.Write(b) // interface method call
	}
}

However, the byte slice still must be heap allocated given the function can be called with something other than GoodWriter.

We might be tempted to help things by manually adding a second type assertion around the make:

func TwoTypeAsserts(w io.Writer, x byte) {
	// Human attempt to help.
	var b []byte
	_, ok := w.(*GoodWriter)
	if ok {
		b = make([]byte, 0, 64)
	} else {
		b = make([]byte, 0, 64)
	}

	b = append(b, x)

	// PGO-based rewrite of w.Write(b)
	tmpw, ok := w.(*GoodWriter)
	if ok {
		tmpw.Write(b) // concrete method call
	} else {
		w.Write(b) // interface method call
	}
}

However, that still heap allocates with today's escape analysis, which constructs a directed data-flow graph that is insensitive to control flow branches.

WIP CL

Our WIP CL teaches escape analysis to recognize type assertions like those created by PGO devirtualization (or by a human), track what happens in the concrete call case (e.g., does the concrete call also cause b to escape), propagate the interface method argument use back to locations like the make in our original example, and if warranted, rewrite a single potentially allocating location (like a single make) to have instead have two locations protected by appropriate type assertions in the IR.

The net result is the slice backing array in one location has been proven to be safe to place on the stack and while the other location is heap allocated (and which one of those happens depends on what interface is passed in at execution time).

In other words, a human can write the original three-line CallWrite from above, and the compiler can conclude it is worthwhile to transform it to the TwoTypeAsserts form and do the work to make the heap allocation conditional.

In the WIP implementation, some current limitations include:

  1. The new tracking information does not yet flow across function boundaries for interprocedural analysis, though I have hopefully set things up to be able to tackle that piece soon.
  2. For simplicity, the interface (w in our example) must currently be passed in as a function parameter. I think this restriction could be relaxed, though a tricky case is if w is a field on another struct. (I'll briefly comment more on that below).
  3. The interface variable (e.g., w) must not be reassigned in view of the allocating location or interface method call. This might be a reasonable restriction for a first cut. (OTOH, the potentially allocating value can be reassigned, as shown in our examples above).
@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Feb 28, 2025
@prattmic
Copy link
Member

We've been calling this "dynamic escapes". @cherrymui has done some experiments in this area. I'm not sure if we have an existing issue filed, I couldn't find one.

cc @golang/compiler

@prattmic prattmic added this to the Backlog milestone Feb 28, 2025
@prattmic prattmic added Performance NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. labels Feb 28, 2025
@gabyhelp gabyhelp added the Implementation Issues describing a semantics-preserving change to the Go implementation. label Feb 28, 2025
@thepudds
Copy link
Contributor Author

Some possible extensions:

  1. Slices can be heap allocated for multiple reasons, including when they are a variable size that is only known at execution time. To help with that (and for other reasons), I have a mostly working runtime.free that can be inserted by the compiler (and not for use by normal Go code of course), and I'm looking at hooking it up a few different ways, including here when a slice is both variable size and also used as an interface method arg. (I'll probably open a separate issue on this, but I'm also attempting to free memory in some runtime map growth cases, some append cases, etc. The runtime.free is "mostly working" in the sense that it runs and allows memory to be reused quickly, though it's still proof-of-concept and for example does not yet correctly update the GC stats).

  2. As alluded to above, if the interface variable is stored as a field elsewhere, that is tricky for the current approach outlined in this issue in part because for something like b := make(...); Foo(); x.y.z.w.Write(b);, we don't know at the time
    of the make if Foo changed w on us between the make and the Write. One approach might be to try to prove when that cannot happen (e.g., an allowlist of simple-enough IR nodes that we know can't allow w to chance, for example excluding function calls). It might also be possible to prove it is safe to optimistically stack allocate and copy if disappointed (though that could not be proven to be safely applicable in all cases). A different approach might be to try to prove when it is safe to heap allocate but then free depending on the concrete type. There are pros/cons to each. Free currently seems it could be reasonably proven to be applicable in more cases, but maybe both make sense, or there's a better solution than either of those.

  3. A different extension is that this machinery involves passing type information within escape analysis. (I have a version for cmd/compile: avoid always escaping the receivers in interface method calls #62653 that does as well). I think it might be able to helps us mark the set of concrete types for example Bar returns in w := Bar(); w.Write(b) that can be statically proven for Bar to ever return (e.g., say, GoodWriter1 and GoodWriter2) and could then we could use the machinery from this issue to rewrite the w.Write in a analogous manner that PGO does today via measured profiles.

  4. In general, I'm guessing the approach outlined above could be viewed as a building block. (For example, a year or so ago I had sketched what I thought was a plausible way to do this fully dynamically with additional information encoded in the runtime type & interface information, which at the time I thought would be necessary to avoid interface method arg escapes, but a little later it occurred to me (a) the trickiest piece might be teaching escape analysis to conditionally allocate, so try to work on that first, and (b) it seemed possible to do more statically than I had initially thought, while still setting things up to use type information already available at execution time, which is now the approach of the WIP CL. That said, if this works out, a good chunk of the machinery could plausibly be reused in a subsequent phase that encodes more escape-related info at runtime).

Or, maybe not. 😅

@thepudds
Copy link
Contributor Author

Thanks @prattmic.

Hi @cherrymui, I'll be curious to hear your thoughts, especially if you've done some experiments. (FWIW, I've been making progress on #62653 and #8618 (comment), though I switched back to this because (1) this actually has fewer moving parts and (2) I'm currently planning or at least hoping to use some of the machinery from here back on #62653).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. Implementation Issues describing a semantics-preserving change to the Go implementation. NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. Performance
Projects
Development

No branches or pull requests

5 participants