Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: function signature based optimizations #71628

Open
mateusz834 opened this issue Feb 8, 2025 · 14 comments
Open

cmd/compile: function signature based optimizations #71628

mateusz834 opened this issue Feb 8, 2025 · 14 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. Implementation Issues describing a semantics-preserving change to the Go implementation. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Milestone

Comments

@mateusz834
Copy link
Member

It is widely known that passing anything to an interface method call with arguments that contain pointers causes heap allocations of that value. This happens because the compiler’s escape analysis cannot prove that the call would not cause the values to be escaped. Even though we don't know exactly which function is going to be called, there is one thing that we know at the caller site: the signature of the function being called. Using that fact, the compiler could collect all functions with the same function signature, run escape analysis over them and then use the per-signature result to decide whether dynamic dispatch calls need to have the arguments escaped. You can think of this as a fallback. Currently the fallback is to heap allocate, with this approach, if none of the functions with the same signature cause their arguments to escape, then dynamic dispatch calls would not require heap allocation either.

If this was implemented in the compiler I would expect a reduced heap allocations count in most programs, some examples:

  • The byte slice passed to io.Reader/io.Writer would likely not need to be allocated on the heap.
  • Arguments passed to http.Handler to be stack allocated (they are often wrapped, especially the http.ResponseWriter).
  • Log functions in log/slog would not cause allocations, and the optimization in slog.Record would not be really needed anymore:

    go/src/log/slog/logger.go

    Lines 271 to 272 in 215de81

    r := NewRecord(time.Now(), level, msg, pc)
    r.AddAttrs(attrs...)

    // Allocation optimization: an inline array sized to hold
    // the majority of log calls (based on examination of open-source
    // code). It holds the start of the list of Attrs.
    front [nAttrsInline]Attr
    // The number of Attrs in front.
    nFront int
    // The list of Attrs except for those in front.
    // Invariants:
    // - len(back) > 0 iff nFront == len(front)
    // - Unused array elements are zero. Used to detect mistakes.
    back []Attr

    It is also worth noting that log/slog API would likely not need the Enabled() method if this kind of optimization was present:
    type Handler interface {
    // Enabled reports whether the handler handles records at the given level.
    // The handler ignores records whose level is lower.
    // It is called early, before any arguments are processed,
    // to save effort if the log event should be discarded.
    // If called from a Logger method, the first argument is the context
    // passed to that method, or context.Background() if nil was passed
    // or the method does not take a context.
    // The context is passed so Enabled can use its values
    // to make a decision.
    Enabled(context.Context, Level) bool
  • This would reduce the amount of people reaching for sync.Pool (proposal: net/http: add Request.CopyTo #68501)

Obviously, this would not work in all cases and has some drawbacks, it only takes one function to force heap allocation of other dynamic dispatch calls (of functions with same signature). This might easily prevent signatures based solely on basic types from benefiting from this optimization, say: func(b []byte) { go func ( /* do sth with b */ }, so folks might pollute functions with some unused and unneeded parameters just to make this optimization work: func(b []byte, _ ...struct{}). This optimization would be beneficial for signatures that take/return custom (package-local) types, like log/slogs Handle(context.Context, slog.Record) error.

This is just an idea that came to my mind while reading #62653. I'm not even sure whether this change is rational, because changes in the escape analysis results would most likely require the dependencies to be recompiled, changes to the cache and probably some other stuff that I am not even aware of.

The no-copy string -> []byte optimization could also take benefit from this kind of optimization.

CC @golang/compiler

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Feb 8, 2025
@gabyhelp gabyhelp added the Implementation Issues describing a semantics-preserving change to the Go implementation. label Feb 8, 2025
@randall77
Copy link
Contributor

Unfortunately this kind of thing really requires whole-program analysis, which we are loth to do (because Go should scale to really large codebases).
It also runs afoul of the plugin package and maybe one day, reflect or similar things (#16522, #4146, #21670, #47487) that can make interfaces dynamically.

@mateusz834
Copy link
Member Author

It also runs afoul of the plugin package and maybe one day, reflect or similar things (#16522, #4146, #21670, #47487) that can make interfaces dynamically.

It of course can be turned off for packages importing the plugin package, it is not used that much.

@Jorropo
Copy link
Member

Jorropo commented Feb 9, 2025

Currently we do backward escape analysis.
Ideally with perfect forward and backward escape analysis when we see a virtual call we would look up down left and right in the call stack to find out exactly which are the possible implementations and do the best given all the possible implementations.
Instead you are proposing that we would assume it could be any method in the program on a type that implement the interface which is probably simpler, even simpler to do incrementally (useful caches for similar programs).

So let's say all the implementations of io.Writer in your program do not leak, any call to io.Writer would not leak.
However this is quite an unpredictable optimization regime, importing a new library, on a completely unrelated piece of code could then impact the results of escape analysis everywhere.
Import time side effects are not usually a good thing, and this would be like import time side effect but for performance, and it's everywhere, and you can't grep for it.
For example simple interfaces like io.Writer if you somehow import io anywhere (which you almost certainly do) it would leak since it would match io.Pipe which leaks.

@dmitshur dmitshur added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Feb 10, 2025
@dmitshur dmitshur added this to the Backlog milestone Feb 10, 2025
@adonovan
Copy link
Member

the compiler could collect all functions with the same function signature

As Keith said, the compiler can see only the functions in the current package (plus summaries of their transitive dependencies), but it cannot predict what is to come in packages higher up within the build. So, this kind of optimization is limited to a single compilation unit. For example, if an interface method is foo(bar), where both the method name foo and the parameter type name bar are unexported, the compiler can safely assume that all implementations of that method reside in the same package, and simplify accordingly. (Although a type from a higher-level package may also have a foo(bar) method thanks to embedding, the implementation of that method must be defined in the same package as foo.)

@mateusz834
Copy link
Member Author

if an interface method is foo(bar), where both the method name foo and the parameter type name bar are unexported, the compiler can safely assume that all implementations of that method reside in the same package, and simplify accordingly.

Not sure about that, you can use type inference to hack that is some edge cases:

package foo

type bar struct{ a [128]byte }

type iface interface {
	foo(*bar)
}

type fooImpl func(*bar) *bar

func (f fooImpl) foo(b *bar) { f(b) }

func Iface(f func(b *bar) *bar) {
	var iface iface = fooImpl(f)
	iface.foo(&bar{})
}
package main

import (
	"aa/foo"
	"runtime"
)

func Val[T any](t T) T {
	go func() {
		runtime.KeepAlive(t)
	}()
	return t
}

func main() {
	foo.Iface(Val)
}

@adonovan
Copy link
Member

Yeah, I realized just after hitting send that the my mention of the parameter type bar was a distracting irrelevance because you don't need to name a type to create values of it: the name of the method is all that matters and all that the compiler can count on.

In any case, in your example, the call iface.foo(&bar{}) must nonetheless dispatch to an implementation of method foo in the same package: in this case, there is only one.

BTW I'm not sure what go func() { KeepAlive(t) } () is supposed to achieve: it doesn't ensure anything about the lifetime of t.

@mateusz834
Copy link
Member Author

BTW I'm not sure what go func() { KeepAlive(t) } () is supposed to achieve: it doesn't ensure anything about the lifetime of t.

My idea was: In package foo signature func(*bar) *bar does not force allocations, but the Val[T] from main would cause such (The reason for a goroutine was to force an allocation).

@adonovan
Copy link
Member

My idea was: In package foo signature func(*bar) *bar does not force allocations, but the Val[T] from main would cause such (The reason for a goroutine was to force an allocation).

A Go compiler is free to optimize away the goroutine entirely since it has no dependable consequence. Even assuming Val's parameter does escape, I don't think it changes anything: the key point is that iface.foo is a reference to the abstract method foo, whose concrete implementation's source (even if generic) must reside in the same package.

@mateusz834
Copy link
Member Author

A Go compiler is free to optimize away the goroutine entirely since it has no dependable consequence.

This is now offtopic, but shouldn't runtime.KeepAlive guarantee that it would not be eliminated? Based on the comment i would assume so:

Introduce a use of x that the compiler can't eliminate.

go/src/runtime/mfinal.go

Lines 568 to 575 in a704d39

func KeepAlive(x any) {
// Introduce a use of x that the compiler can't eliminate.
// This makes sure x is alive on entry. We need x to be alive
// on entry for "defer runtime.KeepAlive(x)"; see issue 21402.
if cgoAlwaysFalse {
println(x)
}
}

@adonovan
Copy link
Member

adonovan commented Feb 10, 2025

This is now offtopic, but shouldn't runtime.KeepAlive guarantee that it would not be eliminated?

It does, in sequential code, but in your example there are no happens-before relationships between the two goroutines, so the effect of the KeepAlive is to ensure only that parameter t is live at some unspecified moment in the future. The compiler is free to assume that moment is in the past by the time return t is executed, making the KeepAlive ineffective.

In any case I don't see how the question of t's escape matters.

@Splizard
Copy link

Splizard commented Feb 18, 2025

Unfortunately this kind of thing really requires whole-program analysis

It doesn't. This can be runtime-supported and work with plugins by including escape-analysis bits in function pointers and interfaces. Any time a function pointer or interface is constructed, associate up to two bits for each value that could escape and then query these bits at runtime to determine whether to allocate on the stack, a goroutine-local heap (ie. an arena) or the program heap.

This won't catch everything but could still provide significant performance improvements. Heap allocations are usually the biggest cost in hot functions. In some cases the compiler may be able to forward escape bits in trivial function/interface wrappers, ie. the following MyReader could refer to the escape bits of the internal io.Reader.

type MyReader struct {
    io.Reader  // escape bits for Read are forwarded from this value.
}

func (my MyReader) Read(buf []byte) (int, error) {
    // do something that doesn't cause buf to escape
    return my.Reader.Read(buf)
}

@adonovan
Copy link
Member

adonovan commented Feb 18, 2025

This can be runtime-supported and work with plugins by including escape-analysis bits in function pointers and interfaces. Any time a function pointer or interface is constructed, associate up to two bits for each value that could escape and then query these bits at runtime to determine whether to allocate on the stack, a goroutine-local heap (ie. an arena) or the program heap.

How does this compose in the large? Static escape analysis composes by induction, so for a chain of static calls f -> g -> h you can prove things about f based on h. But I don't see how a dynamic scheme can scale, because the information about whether h will escape its argument doesn't exist at the time you call f, forcing the caller of f to be conservative.

Has this approach been implemented before in a production allocator? Has it been written up?

@Splizard
Copy link

Splizard commented Feb 19, 2025

How does this compose in the large?

I believe you are referring to dynamic cases, such that whether x escapes or not depends on the escape bits of a number of internal y interfaces/functions. Such as in the MyReader example above, or say in an io.MultiReader. In such cases, the escape bits of each y io.Reader can be AND'd together (assuming 1 means "does not escape"). As long as the set of ys is proven to be immutable for the duration between the allocation and the interface/function call, then the escape bits can be used ahead of time to decide where to allocate x.

If it cannot be determined whether or not the MyReader wrapper can add a new dynamic io.Reader to the set of ys within that duration then yes, then the compiler would have all the escape bits for buf set to zero (escapes) and the buffer would have to be heap allocated. This is all dependent on local information though, it does not require full-program analysis.

Most function/interface wrappers (think io.Reader) wouldn't be arbitrarily injecting new dynamic ys into themselves as part of their implementations, this is an edge-case where conservative heap allocation is more than reasonable. Many more function/interfaces will handle their arguments directly without passing them to another dynamic function/interface, as such their escape bits are constant and can be inlined in some cases.

Has this approach been implemented before in a production allocator? Has it been written up?

I'm not aware of any implementation or papers on this, I do think this approach is worth considering, as it's the only known alternative that I'm aware of if you don't want to do full-program analysis at compile time.

As far as an implementation suggestions go, the easiest approach to start from would probably be:

For each method/function that has any pointer-like arguments, generate additional method/function that can be used to query escape information.

type MyDiscardWriter struct{}
func (MyDiscardWriter) Write(buf []byte) (int, error) { return len(buf), nil }
func (MyDiscardWriter) noescape_Write(arg_index int) bool  { return true }  // no argument escapes, so just returns true

type MyReader struct{ io.Reader }
func (my MyReader) Read(buf []byte) (int, error) { return my.Reader.Read(buf) }
func (my MyReader) noescape_Read(arg_index int) (int, error) { return my.Reader.noescape_Read(arg_index) }

Then use the result of any dependant noescape_ function with the argument index to decide whether or not to allocate on the stack provided that it can be proven that none of the implementations are modified before they are called (in practise that means if any dynamic/non-trivial functions are called between the allocation and an interface/function call with dynamic escape information, then x escapes). Do the same thing with functions/closures so that each function pointer consists of a pointer to the function and a pointer to the noescape_ equivalent.

There could still be room for improvements but it would probably be a good place to start from.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. Implementation Issues describing a semantics-preserving change to the Go implementation. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Projects
Development

No branches or pull requests

9 participants