-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Description
Go version
go version go1.23.0 darwin/arm64(gotip too)
Output of go env in your module/workspace:
GO111MODULE='on'
GOARCH='arm64'
GOBIN=''
GOCACHE='/Users/admin/Library/Caches/go-build'
GOENV='/Users/admin/Library/Application Support/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='arm64'
GOHOSTOS='darwin'
GOINSECURE=''
GOMODCACHE='/Users/admin/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='darwin'
GOPATH='/Users/admin/go'
GOPRIVATE=''
GOPROXY=''
GOROOT='/opt/homebrew/Cellar/go/1.23.0/libexec'
GOSUMDB='sum.golang.google.cn'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/opt/homebrew/Cellar/go/1.23.0/libexec/pkg/tool/darwin_arm64'
GOVCS=''
GOVERSION='go1.23.0'
GODEBUG=''
GOTELEMETRY='on'
GOTELEMETRYDIR='/Users/admin/Library/Application Support/go/telemetry'
GCCGO='gccgo'
GOARM64='v8.0'
AR='ar'
CC='cc'
CXX='c++'
CGO_ENABLED='1'
GOMOD='/Users/admin/Developer/test/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -ffile-prefix-map=/var/folders/7s/9gl4fgf97rlgr4gt47nythtw0000gn/T/go-build1880376490=/tmp/go-build -gno-record-gcc-switches -fno-common'What did you do?
Related Go files:
iter: https://go.dev/play/p/iRuU4kNXngq
iter_test: https://go.dev/play/p/4C_EbsSnlQH
go test -bench=. -benchmem
goos: darwin
goarch: arm64
pkg: ksco/test
cpu: Apple M3 Pro
BenchmarkSliceFunctions/AllForLoop-10-12 321253351 3.475 ns/op 0 B/op 0 allocs/op
BenchmarkSliceFunctions/All-10-12 250128255 4.530 ns/op 0 B/op 0 allocs/op
BenchmarkSliceFunctions/BackwardForLoop-10-12 344788078 3.509 ns/op 0 B/op 0 allocs/op
BenchmarkSliceFunctions/Backward-10-12 87433018 13.84 ns/op 0 B/op 0 allocs/op
BenchmarkSliceFunctions/ValuesForLoop-10-12 344200261 3.476 ns/op 0 B/op 0 allocs/op
BenchmarkSliceFunctions/Values-10-12 263804847 4.544 ns/op 0 B/op 0 allocs/op
BenchmarkSliceFunctions/AppendForLoop-10-12 13531671 87.18 ns/op 248 B/op 5 allocs/op
BenchmarkSliceFunctions/AppendSeq-10-12 8190546 145.6 ns/op 312 B/op 8 allocs/op
BenchmarkSliceFunctions/CollectForLoop-10-12 92614030 13.41 ns/op 80 B/op 1 allocs/op
BenchmarkSliceFunctions/Collect-10-12 8045440 146.8 ns/op 312 B/op 8 allocs/op
BenchmarkSliceFunctions/SortForLoop-10-12 57416722 20.59 ns/op 80 B/op 1 allocs/op
BenchmarkSliceFunctions/Sorted-10-12 7757234 153.0 ns/op 312 B/op 8 allocs/op
BenchmarkSliceFunctions/ChunkForLoop-10-12 1000000000 0.7995 ns/op 0 B/op 0 allocs/op
BenchmarkSliceFunctions/Chunk-10-12 231948748 5.167 ns/op 0 B/op 0 allocs/op
BenchmarkMapFunctions/AllForLoopMap-10-12 15668906 76.65 ns/op 0 B/op 0 allocs/op
BenchmarkMapFunctions/AllMap-10-12 15576559 76.58 ns/op 0 B/op 0 allocs/op
BenchmarkMapFunctions/KeysForLoopMap-10-12 15780648 75.70 ns/op 0 B/op 0 allocs/op
BenchmarkMapFunctions/KeysMap-10-12 15699544 76.53 ns/op 0 B/op 0 allocs/op
BenchmarkMapFunctions/ValuesForLoopMap-10-12 15928665 75.93 ns/op 0 B/op 0 allocs/op
BenchmarkMapFunctions/ValuesMap-10-12 15532413 76.55 ns/op 0 B/op 0 allocs/op
BenchmarkMapFunctions/InsertForLoopMap-10-12 1803122 668.6 ns/op 1401 B/op 2 allocs/op
BenchmarkMapFunctions/InsertMap-10-12 1655206 728.1 ns/op 1489 B/op 5 allocs/op
BenchmarkMapFunctions/CollectForLoopMap-10-12 5277978 226.0 ns/op 420 B/op 1 allocs/op
BenchmarkMapFunctions/CollectMap-10-12 3718111 321.8 ns/op 716 B/op 5 allocs/op
PASS
ok ksco/test 34.819s
Linux machines and x86 will also be a bit slower. Gotip was also used, with similar results.
Additionally, when examining the assembly output generated by
go build -gcflags="-S" iter.go
I noticed that certain functions contain additional instructions that appear to be unnecessary, which could be contributing to the observed performance differences.
What did you see happen?
Analysis of the generated assembly revealed that iterator-based implementations (e.g., slices.All, slices.Backward, slices.Chunk) introduce additional overhead compared to traditional for-loops:
-
Additional function calls:
- Iterator functions themselves
- Closure function calls
- Yield function calls
-
Memory allocations:
- Heap allocations for closures and iterator states (via
runtime.newobject) - Larger stack frames
- Heap allocations for closures and iterator states (via
-
Additional control flow:
- Iterator state checks
- Yield function return checks
-
Indirect function calls:
- Calls through function pointers (e.g.,
CALL (R4)observed in thechunkfunction)
- Calls through function pointers (e.g.,
-
Increased register usage and stack operations:
- More registers used for managing iterator state
- More frequent stack operations for saving and restoring state
-
Additional safety checks:
- E.g., slice size validation in
slices.Chunk
- E.g., slice size validation in
-
Increased code size:
- Iterator versions of functions are typically larger than their for-loop counterparts
Specifically for slices.Chunk observed:
runtime.newobjectcalls for creating closure objects- Closure setup, including function pointer and captured variable initialization
- Creation and invocation of
slices.Chunk[go.shape.[]int,go.shape.int].func1 - Multiple closure calls during iteration
- Checks on yield function return values
Similar issues were observed in other iterator-related function implementations.
What did you expect to see?
According to the Go Wiki's Rangefunc Experiment documentation, the optimized code structure in simple cases is almost identical to a manually written for loop.
However, assembly analysis suggests that the current implementations may introduce complexity and potential performance overhead. While these implementations are already quite effective, there is hope that further optimizations could align their performance with traditional for loops in most simple scenarios.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status