Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: better escape analysis of `[]byte` -> `string` conversions #43752

Open
mdempsky opened this issue Jan 17, 2021 · 6 comments
Open

cmd/compile: better escape analysis of `[]byte` -> `string` conversions #43752

mdempsky opened this issue Jan 17, 2021 · 6 comments
Assignees
Milestone

Comments

@mdempsky
Copy link
Member

@mdempsky mdempsky commented Jan 17, 2021

@aclements points out 3 cases that escape analysis could be better at:

  1. unsafe conversion from []byte to string:
func unsafeBytesToString(buf []byte) string {
	if len(buf) == 0 {
		return ""
	}
	var str string
	pStr := (*reflect.StringHeader)(unsafe.Pointer(&str))
	pStr.Data = uintptr(unsafe.Pointer(&buf[0]))
	pStr.Len = len(buf)
	return str
}

This currently forces buf to the heap, because the pStr.Data = uintptr(unsafe.Pointer(&buf[0])) assignment is treated as (*pStr).Data = unsafe.Pointer(&buf[0]) and assignments through a pointer indirection are currently always treated as escaping to the heap.

This could be improved if we noticed during escape analysis that pStr always points to str. Then instead of flowing to the heap, &buf[0] could just flow to str directly. Consequently, the function should overall analyze to "parameter buf leaks to ~r0" instead of "leaks to heap".

  1. A direct, non-escaping conversion like:
var buf [32]byte
... populate buf ...
strconv.Atoi(string(buf[:]))
... no more uses of buf ...

In this case, strconv.Atoi doesn't leak its string argument anywhere, so we should be able to pretty easily recognize that the OBYTES2STR argument can be safely promoted to OBYTES2STRTMP.

  1. A direct, escaping conversion like:
var buf [32]byte
... populate buf ...
return string(buf[:])
... no more uses of buf ...

Similar to 2 above, if we recognize that buf is never reused after its conversion to string, we should be able to directly reuse the byte array memory for the string without copying. On dev.regabi, closure analysis (i.e., deciding whether to capture variables by value or reference) is now part of escape analysis, and it seems reasonable to extend it to handle this use case as well.

We'd need to recognize that operations like string(buf[:]) are logically copying buf by value, rather than taking a reference to it.

@mdempsky mdempsky added this to the Go1.17 milestone Jan 17, 2021
@mdempsky mdempsky self-assigned this Jan 17, 2021
@martisch
Copy link
Contributor

@martisch martisch commented Jan 19, 2021

Doesnt strconv.Atoi leak the argument ( not the copy of the header but the backing array itself) into the returned error (in the general case where there is error handling)?

"Similar to 2 above, if we recognize that buf is never reused after its conversion to string".
Note that the byte array memory is currently allocated on the stack and then copied to the heap. The new way presumably be to allocate 32byte on the heap and then reuse it directly in the return. However that can lead to a performance degradation if there is a fast path returning a constant string in e.g. an error case. This may now heap allocate the buf that is never returned.

var buf [32]byte
... populate buf ...
if SOMECONDITION {
 return ""
}
return string(buf[:])
@bcmills bcmills added the Performance label Jan 19, 2021
@mdempsky
Copy link
Member Author

@mdempsky mdempsky commented Jan 19, 2021

@martisch Thanks, good points.

I forgot that strconv.Atoi has an error result that leaks the string argument. Agreed that would prevent it from being optimized (unless we had additional call context information to know the error never leaks). It also occurred to me that we would need to somehow know strconv.Atoi can't actually modify buf through any obscure side channels.

And yeah, that's a fair point about case 3. Similarly, if the strings are usually small, but the code conservatively allocates a large buffer just in case, we'd end up holding onto the entire thing all the time. So probably we should only optimize away the copy when the string doesn't leak either. I think that's doable and should still be beneficial.

@gopherbot
Copy link

@gopherbot gopherbot commented Jan 21, 2021

Change https://golang.org/cl/285232 mentions this issue: [dev.regabi] cmd/compile: more zero-copy []byte-to-string conversions

@mdempsky
Copy link
Member Author

@mdempsky mdempsky commented Jan 21, 2021

CL 285232 implements case 3, with restrictions based on @martisch 's feedback (i.e., only reusing the byte slice memory when it's on the stack already and can stay there, so it can't introduce new heap allocations or cause references to retain more memory than they would otherwise).

It triggers a handful of times in the standard library, but nothing that seemed like a particularly big win to me:

$GOROOT/src/syscall/str.go:9:21: zero-copy string conversion
$GOROOT/src/os/str.go:12:21: zero-copy string conversion
$GOROOT/src/cmd/vendor/golang.org/x/sys/unix/str.go:11:21: zero-copy string conversion
$GOROOT/src/net/parse.go:178:21: zero-copy string conversion
$GOROOT/src/net/dnsclient.go:36:15: zero-copy string conversion
$GOROOT/src/net/dnsclient.go:36:43: zero-copy string conversion
$GOROOT/src/net/dnsclient.go:36:71: zero-copy string conversion
$GOROOT/src/net/dnsclient.go:36:99: zero-copy string conversion
$GOROOT/src/net/ip.go:534:34: zero-copy string conversion
$GOROOT/src/net/http/server.go:1844:98: zero-copy string conversion
$GOROOT/src/net/http/server.go:3544:15: zero-copy string conversion
$GOROOT/src/cmd/compile/internal/escape/escape.go:2180:20: zero-copy string conversion
$GOROOT/src/cmd/compile/internal/escape/escape.go:2206:20: zero-copy string conversion
$GOROOT/src/cmd/compile/internal/escape/escape.go:2234:19: zero-copy string conversion
$GOROOT/src/cmd/compile/internal/escape/escape.go:1890:22: zero-copy string conversion

The related refinements also help escape analysis a little too; these allocations can now be stack allocated (i.e., diagnostics here used to be printed by the compiler, but no longer do after CL 285232):

$GOROOT/src/cmd/dist/test.go:748:3: moved to heap: nShards
$GOROOT/src/cmd/go/internal/work/exec.go:61:22: moved to heap: ctx
$GOROOT/src/cmd/vendor/golang.org/x/tools/go/analysis/internal/facts/facts.go:203:43: f.PkgPath escapes to heap
$GOROOT/src/go/internal/srcimporter/srcimporter.go:167:2: moved to heap: open
$GOROOT/src/go/printer/printer.go:805:21: "negative indentation:" escapes to heap
$GOROOT/src/go/printer/printer.go:805:47: p.indent escapes to heap
$GOROOT/src/go/printer/printer.go:948:22: "whitespace buffer not empty" escapes to heap
$GOROOT/src/testing/benchmark.go:754:2: moved to heap: grain

Finally, it also looks like 21 variables in std cmd can now be captured by value by a function literal, whereas they used to be captured by reference. I accidentally changed the line numbering in the diagnostics though, so I can't immediately list out which ones. I'll identify them later when I tease apart the CL and it's easier to check again.

@bcmills
Copy link
Member

@bcmills bcmills commented Jan 22, 2021

@mdempsky, as another interesting data point you could perhaps check whether there is any impact on functions like my unsafeslice.OfString and unsafeslice.AsString as written when built with -tags=unsafe.

Semantically they are just like case (1), but the actual code is a bit more aggressive about using headers and Data fields instead of len checks.

@mdempsky
Copy link
Member Author

@mdempsky mdempsky commented Jan 22, 2021

@bcmills Thanks. I expect handling case 1 should naturally handle those 2 functions as well, but I'll be sure to verify that once I get around to looking into it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants