-
Notifications
You must be signed in to change notification settings - Fork 18k
cmd/compile: poor escape analysis of strings.SplitSeq
#73524
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
A simple patch equalizes performance with callback version, though: Patchdiff --git a/src/strings/iter.go b/src/strings/iter.go
--- a/src/strings/iter.go
+++ b/src/strings/iter.go
@@ -32,25 +32,24 @@ func Lines(s string) iter.Seq[string] {
}
// explodeSeq returns an iterator over the runes in s.
-func explodeSeq(s string) iter.Seq[string] {
- return func(yield func(string) bool) {
- for len(s) > 0 {
- _, size := utf8.DecodeRuneInString(s)
- if !yield(s[:size]) {
- return
- }
- s = s[size:]
+func explodeSeq(s string, yield func(string) bool) {
+ for len(s) > 0 {
+ _, size := utf8.DecodeRuneInString(s)
+ if !yield(s[:size]) {
+ return
}
+ s = s[size:]
}
}
// splitSeq is SplitSeq or SplitAfterSeq, configured by how many
// bytes of sep to include in the results (none or all).
func splitSeq(s, sep string, sepSave int) iter.Seq[string] {
- if len(sep) == 0 {
- return explodeSeq(s)
- }
return func(yield func(string) bool) {
+ if len(sep) == 0 {
+ explodeSeq(s, yield)
+ return
+ }
for {
i := Index(s, sep)
if i < 0 {
$ go test -bench=. -benchmem
goos: linux
goarch: amd64
pkg: example.com/module
cpu: AMD Ryzen 9 5950X 16-Core Processor
BenchmarkRowContainsSeq-32 52767405 22.72 ns/op 0 B/op 0 allocs/op
PASS
ok example.com/module 1.202s |
strings.SplitSeq
strings.SplitSeq
Good catch, do you want to send a CL ? At first glance I don't see why the compiler couldn't compile the previous one without the heap but your patch is a simple quick win. |
An even smaller patch to avoid the (apparent) escape is to observe that s is never loaded/stored outside the closure, so the closure can use a distinct variable initialized to copy of the parameter instead, like so: func explodeSeq(s string) iter.Seq[string] {
return func(yield func(string) bool) {
+ s := s
for len(s) > 0 { A smarter compiler could make this change automatically. I'm not yet convinced the (apparent) escape of s is actually the problem though. When explodeSeq is considered in isolation, its s parameter must escape because it is updated by the function closure. In other words, the escape of the result implies the escape of s. But considering functions in isolation may be pessimistic, as in this example, where f's parameter x must escape (for the same reason as above), but when f is actually called from main, the closure clearly doesn't escape, so there's no need for x to escape either: func f(x int) any { // <---- x appears to escape
return func() int { x++; return x }
}
func main() {
println(f(1).(func() int)()) // <--- yet this is compiled without allocating
} In other words, escape analysis is synergistic with inlining, so the escape that the compiler reports for a source function might be more pessimistic than what you get when you actually call it. I suspect that may be happening in your case (but I haven't checked yet). If so, the slowdown of the iterator version may be because range-over-func always allocates a heap variable to hold the continuation state. [I misspoke here; see below for correction.] |
Change https://go.dev/cl/669735 mentions this issue: |
This CL slightly changes flow of splitSeq to help compiler to inline the iterator closure. goos: linux goarch: amd64 pkg: strings cpu: AMD Ryzen 9 5950X 16-Core Processor │ sec/op │ sec/op vs base │ SplitSeqEmptySeparator-32 3.590m ± 0% 3.430m ± 2% -4.46% (p=0.000 n=30) SplitSeqSingleByteSeparator-32 647.0µ ± 0% 656.1µ ± 0% +1.41% (p=0.000 n=30) SplitSeqMultiByteSeparator-32 423.9µ ± 1% 384.5µ ± 0% -9.31% (p=0.000 n=30) SplitAfterSeqEmptySeparator-32 3.372m ± 4% 3.514m ± 0% +4.20% (p=0.000 n=30) SplitAfterSeqSingleByteSeparator-32 648.5µ ± 2% 537.6µ ± 0% -17.10% (p=0.000 n=30) SplitAfterSeqMultiByteSeparator-32 423.3µ ± 2% 364.4µ ± 2% -13.91% (p=0.000 n=30) geomean 984.7µ 917.3µ -6.85% │ B/op │ B/op vs base │ SplitSeqEmptySeparator-32 24.00 ± 0% 0.00 ± 0% -100.00% (p=0.000 n=30) SplitSeqSingleByteSeparator-32 24.00 ± 0% 0.00 ± 0% -100.00% (p=0.000 n=30) SplitSeqMultiByteSeparator-32 24.00 ± 0% 0.00 ± 0% -100.00% (p=0.000 n=30) SplitAfterSeqEmptySeparator-32 24.00 ± 0% 0.00 ± 0% -100.00% (p=0.000 n=30) SplitAfterSeqSingleByteSeparator-32 24.00 ± 0% 0.00 ± 0% -100.00% (p=0.000 n=30) SplitAfterSeqMultiByteSeparator-32 24.00 ± 0% 0.00 ± 0% -100.00% (p=0.000 n=30) geomean 24.00 ? For #73524 Change-Id: Ic83c5751a41c65030356a208e4ad1f500723e695 Reviewed-on: https://go-review.googlesource.com/c/go/+/669735 Auto-Submit: Alan Donovan <adonovan@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Alan Donovan <adonovan@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: qiu laidongfeng2 <2645477756@qq.com> Commit-Queue: Alan Donovan <adonovan@google.com>
Is this completely fixed with https://go.dev/cl/669735? |
@adonovan anything left to do? |
No; that CL reduced the complexity of the function so that it became inlineable, causing the entire iterator to be melted down, thus saving an escaping allocation, making it faster. The task of this issue is to make the compiler's escape analysis notice that when an outer variable is assigned once and never read by the outer function after the closure is created, the closure can be transformed to use a local variable initialized from the outer variable, so the outer var becomes a "by value" free variable, and needn't escape. [I misspoke here; see below for correction.] |
As much as I'd like to encourage inlining and avoid unnecessary escapes, I don't think the compiler should automatically perform such a transformation, simply because it would change the semantics of the program. It would essentially turn a stateful/resumable (or "single-use", for lack of a better term) iterator into a stateless/non-resumable one, a change that can be observed. Let's take package main
import (
"fmt"
"iter"
"unicode/utf8"
)
func main() {
seq := explodeSeq("Hello, 世界")
for s := range seq {
if s == "o" {
break
}
fmt.Printf("%q, ", s)
}
fmt.Println()
for s := range seq {
fmt.Printf("%q, ", s)
}
}
func explodeSeq(s string) iter.Seq[string] {
return func(yield func(string) bool) {
// s := s // <---
for len(s) > 0 {
_, size := utf8.DecodeRuneInString(s)
if !yield(s[:size]) {
return
}
s = s[size:]
}
}
} Output when
Output when
Related discussion: #61901 (comment) |
An even simpler case that demonstrates that @adonovan's suggested heuristic is unsound is func main() {
f := F(0)
fmt.Println(f())
fmt.Println(f())
}
func F(i int) func() int {
return func() int {
i++
return i
}
} This fits the criteria of the heuristic, but it would obviously change behavior if the compiler did that transformation. In other words: The outer function is not the only place that could read the closed-over variable. Other calls to the closure do as well. That's pretty much the point of a closure. I think the important addition is, that the compiler can notice that there is only a single call to the closure and it does not escape. In that case, @adonovan's heuristic works and after sufficient inlining it would catch the case of |
Thanks @jub0bs and @Merovius for pointing out my error: the transformation from an outer, var shared across calls, to a local var, belonging to each call, is unsound. Nonetheless, there is still a redundant allocation here that could be avoided, even if the change cannot be expressed neatly at source level. In the latter example, evaluating the func literal allocates a closure that refers to the outer variable i indirectly; i escapes and thus must be heap allocated too. But the closure could contain the variable i as a direct field, so that there is only one allocation (and better locality). This wouldn't change the semantics. |
Go version
go version go1.24.2 linux/amd64
Output of
go env
in your module/workspace:What did you do?
https://go.dev/play/p/ug9pN6sRb5M
What did you see happen?
Parameters, result and
#state1
leak to the heap.tip (at f9ce1dd) is slightly better:
What did you expect to see?
No allocations at all.
Callback version (https://go.dev/play/p/IRc-1K3qP6M) of this code yields better results:
The text was updated successfully, but these errors were encountered: