Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: minimize stack live set before a recursive call #36067

Open
CAFxX opened this issue Dec 10, 2019 · 7 comments
Open

cmd/compile: minimize stack live set before a recursive call #36067

CAFxX opened this issue Dec 10, 2019 · 7 comments
Milestone

Comments

@CAFxX
Copy link
Contributor

@CAFxX CAFxX commented Dec 10, 2019

In go 1.13 variables on the stack are kept alive until the function returns. This is normally harmless, but can lead to surprising results: e.g. in the following example with a buf size of 1<<15 the process crashes because we reach the maximum stack size, but with a size of 1<<16 or more the process does not crash (as the slice is not allocated on the stack).

package main 

func main() {
    rec(0)
}

func rec(n int) (m int) {
	buf := make([]byte, 1<<15)
	foo(buf)
	// buf is dead now
	if n >= 1<<16 {
		return n
	}
    return rec(n+1)
}

//go:noinline
func foo([]byte) {}

This is obviously a contrived example, but it would help if the set of live variables were pruned as stack variables die, or if that is too expensive, at least immediately before performing a function call (especially if recursive).

@mvdan
Copy link
Member

@mvdan mvdan commented Dec 10, 2019

Would this have any advantages beyond reaching the stack size limit slower, such as lower memory usage, or any form of performance gain?

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Dec 10, 2019

This is not a matter of liveness as we normally use that term. In your example, buf is dead at the point of the recursive call, and the compiler knows that. The problem is not that buf is live, it's that it continues to occupy space on the stack. Changing that would be difficult, would lead to less efficient code for almost all programs, and would help essentially no real programs. It doesn't seem like a good tradeoff.

@randall77
Copy link
Contributor

@randall77 randall77 commented Dec 10, 2019

I think there is some room to adjust the heuristics for when we stack allocate. Right now, we stack allocate any variable <= 10MB. That's regardless of how many such variables live in a frame, whether the frame's function might be recursive, etc.
i don't think any such heuristic adjustment would help this example, though.

@randall77
Copy link
Contributor

@randall77 randall77 commented Dec 10, 2019

Related: #27447.

@thanm
Copy link
Member

@thanm thanm commented Dec 10, 2019

A somewhat related problem comes up in the context of inlining-- say you have a program fragment like

func foo() int {
	largebuf := make([]byte, 8192)
	//...
	return expr
}

func bar() {
	//...
	if somecondition {
		foo()
	}
	baz()
}

In the Go compiler, if 'foo' is inlined into 'bar', the local variables in 'foo' get promoted into local variables in 'bar' (this is a fairly standard approach, but it means that inlining tends to drive up stack usage overall).

I worked on a C++ compiler back in the late 90's where at one point the inliner would insert alloca() and dealloca() operations around the body of an inlined function, e.g.

func bar() {
	//...
	if somecondition {
	        stackpointer -= [size of foo's stack frame]
		...body of foo...
	        stackpointer += [size of foo's stack frame]
	}
	baz()
}

This forced the use of a frame pointer and (as I understand it) made unwind-gen more complicated, but the back end had to support alloca() ops already, so it wasn't too bad.

These days I'm not aware of anything like this in either clang or GCC.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Dec 10, 2019

I suppose we could have heuristics specifically around recursive calls: if a large variable is dead at the point of a recursive call, don't stack allocate it. But I don't know how many real programs that would help.

@toothrot toothrot added this to the Backlog milestone Dec 10, 2019
@CAFxX
Copy link
Contributor Author

@CAFxX CAFxX commented Jan 20, 2020

In your example, buf is dead at the point of the recursive call, and the compiler knows that. The problem is not that buf is live, it's that it continues to occupy space on the stack. Changing that would be difficult, would lead to less efficient code for almost all programs, and would help essentially no real programs. It doesn't seem like a good tradeoff.

I haven't given this a lot of thought, but what I was thinking was something along the lines of:

  • at compile-time
    • sort stack allocations so that slots of variables that die first are at the top of the frame;
    • when emitting a callsite (ideally, that we can not prove to be non-recursive) and only if from the previous callsite some variables died, adjust the stack pointer before the call instruction so that the space used by the slots of dead variables can be reused by the stack frame of the callee

At runtime this would entail an additional SP change per call, so the overhead wouldn't be 0 - but it should not be significant either (no copying of stack slots would be required). This wouldn't solve all problems, sure, as it would be a best-effort optimization, and in any case it wouldn't apply to the caller arguments (otherwise stack traces would be incomplete), but it would help alleviate heavy stack usage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
6 participants
You can’t perform that action at this time.