What version of Go are you using (go version)?
$ go version
go version go1.19.4 linux/amd64
Does this issue reproduce with the latest release?
I haven't tried the latest release but from what I see there's not much change to affected code so far.
What operating system and processor architecture are you using (go env)?
go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/root/.cache/go-build"
GOENV="/root/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/root/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/root/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.19.4"
GCCGO="gccgo"
GOAMD64="v1"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/src/go-crc32-issue/go.mod"
GOWORK=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build1199147618=/tmp/go-build -gno-record-gcc-switches"
uname -sr: Linux 5.15.91-1-MANJARO
/lib64/libc.so.6: GNU C Library (GNU libc) stable release version 2.28.
What did you do?
I wrote a simple benchmark to showcase the problem.
https://go.dev/play/p/H_cHsnLhMiW
Or you can see runtime.MemStats with growing field TotalAlloc here:
https://go.dev/play/p/8geJhJ6OOqS
What did you expect to see?
The data variable in the benchmark should be allocated on the stack because Update does not retain the slice. Hence no allocations should be made per operation.
What did you see instead?
BenchmarkCrc32MoveToHeap-4 21018186 51.65 ns/op 64 B/op 1 allocs/op
We have an allocation due to data escaping to heap. I believe the reason is that updateCastagnoli which does the calculation is declared as the closure and it is initialized during runtime depending on CPU capabilities (SSE4.2 etc).
./crc32_test.go:11:6: cannot inline BenchmarkCrc32MoveToHeap: function too complex: cost 214 exceeds budget 80
./crc32_test.go:12:16: inlining call to testing.(*B).ReportAllocs
./crc32_test.go:17:7: data escapes to heap:
./crc32_test.go:17:7: flow: {heap} = &data:
./crc32_test.go:17:7: from data (address-of) at ./crc32_test.go:18:32
./crc32_test.go:17:7: from data[:] (slice) at ./crc32_test.go:18:36
./crc32_test.go:17:7: from crc32.Update(crc, tab, data[:]) (call parameter) at ./crc32_test.go:18:21
./crc32_test.go:11:31: b does not escape
./crc32_test.go:17:7: moved to heap: data
Maybe we could rework initialization of crc32 algorithm without additional level of indirection. I relocated some of Go assembly code for SSE4.2 to my private project and it shows no allocation and a significant performance boost (by a factor of 3 or 4 on some data).
What version of Go are you using (
go version)?Does this issue reproduce with the latest release?
I haven't tried the latest release but from what I see there's not much change to affected code so far.
What operating system and processor architecture are you using (
go env)?go envOutputWhat did you do?
I wrote a simple benchmark to showcase the problem.
https://go.dev/play/p/H_cHsnLhMiW
Or you can see
runtime.MemStatswith growing fieldTotalAllochere:https://go.dev/play/p/8geJhJ6OOqS
What did you expect to see?
The
datavariable in the benchmark should be allocated on the stack becauseUpdatedoes not retain the slice. Hence no allocations should be made per operation.What did you see instead?
We have an allocation due to
dataescaping to heap. I believe the reason is thatupdateCastagnoliwhich does the calculation is declared as the closure and it is initialized during runtime depending on CPU capabilities (SSE4.2 etc).Maybe we could rework initialization of crc32 algorithm without additional level of indirection. I relocated some of Go assembly code for SSE4.2 to my private project and it shows no allocation and a significant performance boost (by a factor of 3 or 4 on some data).