Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: stack reuse for aggregate data #62077

Open
jinlin-bayarea opened this issue Aug 16, 2023 · 3 comments
Open

cmd/compile: stack reuse for aggregate data #62077

jinlin-bayarea opened this issue Aug 16, 2023 · 3 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Milestone

Comments

@jinlin-bayarea
Copy link

We propose adding support for aggregate data in the stack reuse to the Go compiler. The Go 1.20 compiler cannot handle the local array effectively in the stack allocation. Consider an example as follows.

package simple

type Field struct {
	Integer   int64
	String    string
	Interface interface{}
}

func a(i int8) Field {
	return Field{}
}

func b(i int16) Field {
	return Field{}
}

func c(i int32) Field {
	return Field{}
}

//Simple does
func Simple(i interface{}) Field {
	switch val := i.(type) {
	case int8:
		return a(val)
	case int16:
		return b(val)
	case int32:
		return c(val)
	default:
		return Field{}
	}
}

The corresponding disassembly is as below.

        0x01a5 00421 (simple.go:29)	MOVQ	AX, command-line-arguments..autotmp_12+48(SP)
	0x01aa 00426 (simple.go:29)	MOVQ	BX, command-line-arguments..autotmp_12+56(SP)
	0x01af 00431 (simple.go:29)	MOVQ	CX, command-line-arguments..autotmp_12+64(SP)
	0x01b4 00436 (simple.go:29)	MOVQ	DI, command-line-arguments..autotmp_12+72(SP)
	0x01b9 00441 (simple.go:29)	MOVQ	SI, command-line-arguments..autotmp_12+80(SP)

In the example provided, it is evident that a distinct temporary stack variable is allocated to store the returned Field from the callees, ensuring that they do not share the same stack memory. However, this approach is suboptimal due to the live ranges of these variables are not overlapped. Each temporary occupies its own exclusive space, leading to the necessity of sharing the stack slot. To achieve optimal code, the allocation should only cater to the maximum size among all possible temporaries across all switch cases, rather than summing up their individual sizes.

Multiple compilers for C/C++ and other programming languages have the capability to reuse the stack for scalar variables and/or arrays. For instance, LLVM's stack coloring pass allows for the reuse of the stack memory for local arrays.

In summary, we propose adding stack reuse for aggregate data in the go compiler. Inputs are welcome.

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Aug 16, 2023
@cherrymui
Copy link
Member

Yeah, currently the compiler doesn't reuse stack slots for named variables and some temporaries. It might be a little trickier for data containing pointers, due to GC metadata. But (as you mentioned) for scalars, it should be fine.

for aggregate data

I don't think whether it is aggregate matters. We could reuse stack slots for a local variable of type int (non-aggregate), as well as [3]int or struct{ a, b int }.

Do you have a sense how much it helps in real code? It will reduce some stack sizes, but I'm not sure how much it matters in practice.

@cherrymui cherrymui added this to the Unplanned milestone Aug 16, 2023
@rabbbit
Copy link

rabbbit commented Aug 16, 2023

uber-go/zap#1310 implemented a fix for this that reduced stack usage for zap.Any from 4856 to 192 bytes.

Due to stack goroutine stack growth/copying this was enough to notice a visible impact on profiles (~5% of total CPU in some workloads)

@dmitshur dmitshur added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Aug 17, 2023
@rabbbit
Copy link

rabbbit commented Aug 18, 2023

yarpc/yarpc-go#2220 also hit a similar issue:

TLDR; Reduce stack usage from the rpc handler function from 2520 bytes to 608 bytes.

call object was allocated 3 times on the stack.

A large refactoring was initially proposed to avoid this, but in the end the fix was to reduce the size of call object.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Projects
Status: No status
Development

No branches or pull requests

5 participants