-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Description
Go version
go version go1.25.3 linux/amd64
Output of go env in your module/workspace:
AR='ar'
CC='gcc'
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_ENABLED='1'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
CXX='g++'
GCCGO='gccgo'
GO111MODULE=''
GOAMD64='v1'
GOARCH='amd64'
GOAUTH='netrc'
GOBIN=''
GOCACHE='/home/runner/.cache/go-build'
GOCACHEPROG=''
GODEBUG=''
GOENV='/home/runner/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFIPS140='off'
GOFLAGS=''
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build1722691548=/tmp/go-build -gno-record-gcc-switches'
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMOD='/work/go.mod'
GOMODCACHE='/home/runner/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/runner/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/usr/local/go'
GOSUMDB='sum.golang.org'
GOTELEMETRY='local'
GOTELEMETRYDIR='/home/runner/.config/go/telemetry'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/usr/local/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.25.3'
GOWORK=''
PKG_CONFIG='pkg-config'What did you do?
I have an uninitialized memory crash with -msan that I can only reproduce in Github CI and using a docker container that mirrors the CI runner.
This issue is a request to guide me towards obtaining more information.
I'll start with the relevant parts of my program. The relevant source code is here: https://github.com/RaduBerinde/pebble/blob/e016d44c914386a5b45e7956d5d5dce1aa5090d8/internal/deletepacer/delete_pacer.go#L236
Basically I have a queue storing entries of this type:
type queueEntry struct {
ObsoleteFile
JobID int
}
type ObsoleteFile struct {
FileType base.FileType
FS vfs.FS
Path string
FileNum base.DiskFileNum
FileSize uint64 // approx for log files
IsLocal bool
}
The queue is implemented using a structure (https://github.com/RaduBerinde/pebble/blob/e016d44c914386a5b45e7956d5d5dce1aa5090d8/internal/deletepacer/queue.go) which will essentially allocate this object (with T = queueEntry):
type queueNode[T any] struct {
buf [queueNodeSize]T
head, len int32
next *queueNode[T]
}
There is something important about this specific structure. If I replace this queue with a simple slice I can no longer reproduce the failure.
In my reproduction case, we push exactly one entry into this queue. A single background goroutine (mainLoop) pops it from the queue - first it makes a copy here: https://github.com/RaduBerinde/pebble/blob/e016d44c914386a5b45e7956d5d5dce1aa5090d8/internal/deletepacer/delete_pacer.go#L236
The problem is with the file.Path field. At some later point, when we actually try to delete the file, msan complains that this string is uninitialized. The string is generated via fmt.Sprintf() and passed through a path.Join; it's a simple heap string, no unsafe shenanigans. I made sure the string pointer is correct (same one that was enqueued).
I cannot reproduce if I turn the GC off. I used runtime.trace and GODEBUG=traceallocfree=1 and confirmed that at some point after we pop the element from the queue, the string object gets freed.
I figured out that if I add a runtime.GC call right after the pop (this is the code version I linked), the GC finds this problem right there:
file to delete _meta/foo/standard-012/data/000003.log 0xc00048a030 38
PushBack 0xc00048a030 _meta/foo/standard-012/data/000003.log
notification wait end
popped
- [JOB 6] compacting(move) L0 [000006] (3.2KB) Score=0.00 + L6 [] (0B) Score=0.00; OverlappingRatio: Single 0.00, Multi 0.00
runtime: marked free object in span 0x7a50dc78d660, elemsize=48 freeindex=1 (bad use of unsafe.Pointer or having race conditions? try -d=checkptr or -race)
0xc00048a000 alloc marked
0xc00048a030 free marked zombie <-----------
I looked at the assembly and found where the structure is stored on the stack, and later we retrieve the string from the stack. So the pointer is on the stack for this time period and does not change. I looked at the stkobj map and from my understanding it does mark that stack slot as containing a pointer.
This is part of a larger codebase that does employ various unsafe tricks. However, I am able to reproduce with a fairly small amount of code actually running in-between the time we push the entry onto the queue and the failure, and I couldn't find any unsafe use there.
I was never able to reproduce any failure without -msan (I tried -race, -asan). I am suspecting some obscure bug that is specific to msan integration.
I can provide instructions on how I reproduced, but it would be pretty tedious so I believe at this point it would be easiest if I did the work to get more information. I would appreciate some guidance on how to narrow down the problem.
What did you see happen?
Uninitialized memory crashes, and later (after adding runtime.GC() at the right place) fatal error: found pointer to free object
What did you expect to see?
No failure.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status