Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: minimize stack occupation before function calls #36067

Open
CAFxX opened this issue Dec 10, 2019 · 7 comments
Open

cmd/compile: minimize stack occupation before function calls #36067

CAFxX opened this issue Dec 10, 2019 · 7 comments
Labels
NeedsInvestigation
Milestone

Comments

@CAFxX
Copy link
Contributor

@CAFxX CAFxX commented Dec 10, 2019

In go 1.13 variables on the stack are kept alive allocated until the function returns. This is normally harmless, but can lead to surprising results: e.g. in the following example with a buf size of 1<<15 the process crashes because we reach the maximum stack size, but with a size of 1<<16 or more the process does not crash (as the slice is not allocated on the stack).

package main 

func main() {
    rec(0)
}

func rec(n int) (m int) {
	buf := make([]byte, 1<<15)
	foo(buf)
	// buf is dead now
	if n >= 1<<16 {
		return n
	}
    return rec(n+1)
}

//go:noinline
func foo([]byte) {}

This is obviously a contrived example, but it would help if the set of live allocated variables were pruned as stack variables die, or if that is too expensive, at least immediately before performing a function call (especially if recursive).

Such an optimization would not only help in case of recursive functions that allocate a lot on the stack, but would rather help minimize time spent in morestack (#18138) by reducing stack growth on any call tree that does not contain only live variables (as a further example, in API servers it is somewhat common to have handlers wrapped in middleware, each middleware being often used to execute some logic before and/or after the wrapped handler is executed. In the case of logic executed only or mostly before the wrapped handler, it is likely that most variables in the stack frame of the middleware are dead during the call to the wrapped handler)

A more compact stack, besides requiring less time to be spent in morestack, would also help dcache efficiency.

@mvdan
Copy link
Member

@mvdan mvdan commented Dec 10, 2019

Would this have any advantages beyond reaching the stack size limit slower, such as lower memory usage, or any form of performance gain?

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Dec 10, 2019

This is not a matter of liveness as we normally use that term. In your example, buf is dead at the point of the recursive call, and the compiler knows that. The problem is not that buf is live, it's that it continues to occupy space on the stack. Changing that would be difficult, would lead to less efficient code for almost all programs, and would help essentially no real programs. It doesn't seem like a good tradeoff.

@randall77
Copy link
Contributor

@randall77 randall77 commented Dec 10, 2019

I think there is some room to adjust the heuristics for when we stack allocate. Right now, we stack allocate any variable <= 10MB. That's regardless of how many such variables live in a frame, whether the frame's function might be recursive, etc.
i don't think any such heuristic adjustment would help this example, though.

@randall77
Copy link
Contributor

@randall77 randall77 commented Dec 10, 2019

Related: #27447.

@thanm
Copy link
Contributor

@thanm thanm commented Dec 10, 2019

A somewhat related problem comes up in the context of inlining-- say you have a program fragment like

func foo() int {
	largebuf := make([]byte, 8192)
	//...
	return expr
}

func bar() {
	//...
	if somecondition {
		foo()
	}
	baz()
}

In the Go compiler, if 'foo' is inlined into 'bar', the local variables in 'foo' get promoted into local variables in 'bar' (this is a fairly standard approach, but it means that inlining tends to drive up stack usage overall).

I worked on a C++ compiler back in the late 90's where at one point the inliner would insert alloca() and dealloca() operations around the body of an inlined function, e.g.

func bar() {
	//...
	if somecondition {
	        stackpointer -= [size of foo's stack frame]
		...body of foo...
	        stackpointer += [size of foo's stack frame]
	}
	baz()
}

This forced the use of a frame pointer and (as I understand it) made unwind-gen more complicated, but the back end had to support alloca() ops already, so it wasn't too bad.

These days I'm not aware of anything like this in either clang or GCC.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Dec 10, 2019

I suppose we could have heuristics specifically around recursive calls: if a large variable is dead at the point of a recursive call, don't stack allocate it. But I don't know how many real programs that would help.

@toothrot toothrot added this to the Backlog milestone Dec 10, 2019
@toothrot toothrot added the NeedsInvestigation label Dec 10, 2019
@CAFxX
Copy link
Contributor Author

@CAFxX CAFxX commented Jan 20, 2020

In your example, buf is dead at the point of the recursive call, and the compiler knows that. The problem is not that buf is live, it's that it continues to occupy space on the stack. Changing that would be difficult, would lead to less efficient code for almost all programs, and would help essentially no real programs. It doesn't seem like a good tradeoff.

I haven't given this a lot of thought, but what I was thinking was something along the lines of:

  • at compile-time
    • sort stack allocations so that slots of variables that die first are at the top of the frame;
    • when emitting a callsite (ideally, that we can not prove to be non-recursive) and only if from the previous callsite some variables died, adjust the stack pointer before the call instruction so that the space used by the slots of dead variables can be reused by the stack frame of the callee

At runtime this would entail an additional SP change per call, so the overhead wouldn't be 0 - but it should not be significant either (no copying of stack slots would be required). This wouldn't solve all problems, sure, as it would be a best-effort optimization, and in any case it wouldn't apply to the caller arguments (otherwise stack traces would be incomplete), but it would help alleviate heavy stack usage.

edit: I added another example in the issue description about other cases in which this would help

@CAFxX CAFxX changed the title cmd/compile: minimize stack live set before a recursive call cmd/compile: minimize stack live set before a function call Feb 18, 2022
@CAFxX CAFxX changed the title cmd/compile: minimize stack live set before a function call cmd/compile: minimize stack occupation before function calls Feb 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation
Projects
None yet
Development

No branches or pull requests

6 participants