Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spanner: Deadlock in Spanner client library #7496

Closed
osterante opened this issue Feb 27, 2023 · 4 comments · Fixed by #7501
Closed

spanner: Deadlock in Spanner client library #7496

osterante opened this issue Feb 27, 2023 · 4 comments · Fixed by #7501
Assignees
Labels
api: spanner Issues related to the Spanner API. triage me I really want to be triaged.

Comments

@osterante
Copy link

Client

Spanner v1.44.0

Environment

The local environment on M1 MacBookPro

Go Environment
$ go version
go version go1.20 darwin/arm64

$ go env
GO111MODULE=""
GOARCH="arm64"
GOBIN=""
GOCACHE="/Users/UserName/Library/Caches/go-build"
GOENV="/Users/UserName/Library/Application Support/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="arm64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/UserName/go/pkg/mod"
GOOS="darwin"
GOPATH="/Users/UserName/go"
GOPROXY="https://proxy.golang.org/,direct"
GOROOT="/Users/UserName/.anyenv/envs/goenv/versions/1.20.0"
GOSUMDB="[sum.golang.org](http://sum.golang.org/)"
GOTMPDIR=""
GOTOOLDIR="/Users/UserName/.anyenv/envs/goenv/versions/1.20.0/pkg/tool/darwin_arm64"
GOVCS=""
GOVERSION="go1.20"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/Users/UserName/Projects/Project/go.mod"
GOWORK=""
CGO_CFLAGS="-O2 -g"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-O2 -g"
CGO_FFLAGS="-O2 -g"
CGO_LDFLAGS="-O2 -g"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/x3/0yyktc4d5_11gvj8gsjxsck5mc3vtb/T/go-build887811120=/tmp/go-build -gno-record-gcc-switches -fno-common"

We found a deadlock in the Spanner library on our e2e test.
Seems to try to acquire multiple locks at https://github.com/googleapis/google-cloud-go/blob/main/spanner/session.go#L322 and https://github.com/googleapis/google-cloud-go/blob/main/spanner/session.go#L961.

The stack trace is following:

goroutine 7372 [sync.Mutex.Lock, 9 minutes]:
sync.runtime_SemacquireMutex(0x14003632758?, 0xa8?, 0xa1665279301?)
        /Users/UserName/.anyenv/envs/goenv/versions/1.20.0/src/runtime/sema.go:77 +0x28
sync.(*Mutex).lockSlow(0x14000651340)
        /Users/UserName/.anyenv/envs/goenv/versions/1.20.0/src/sync/mutex.go:171 +0x174
sync.(*Mutex).Lock(...)
        /Users/UserName/.anyenv/envs/goenv/versions/1.20.0/src/sync/mutex.go:90
[cloud.google.com/go/spanner.(*sessionPool).remove(0x14000651340](http://cloud.google.com/go/spanner.(*sessionPool).remove(0x14000651340)?, 0x37e11d600?, 0x0?)
        /Users/UserName/Projects/Project/vendor/[cloud.google.com/go/spanner/session.go:961](http://cloud.google.com/go/spanner/session.go:961) +0x88
[cloud.google.com/go/spanner.(*session).destroyWithContext(0x14000ea0000](http://cloud.google.com/go/spanner.(*session).destroyWithContext(0x14000ea0000), {0x1400012e008?, 0x1083edf00?}, 0x0?)
        /Users/UserName/Projects/Project/vendor/[cloud.google.com/go/spanner/session.go:342](http://cloud.google.com/go/spanner/session.go:342) +0x30
[cloud.google.com/go/spanner.(*session).destroy(0x14000651340](http://cloud.google.com/go/spanner.(*session).destroy(0x14000651340)?, 0x0?)
        /Users/UserName/Projects/Project/vendor/[cloud.google.com/go/spanner/session.go:337](http://cloud.google.com/go/spanner/session.go:337) +0x70
[cloud.google.com/go/spanner.(*session).recycle(0x14000ea0000)](http://cloud.google.com/go/spanner.(*session).recycle(0x14000ea0000))
        /Users/UserName/Projects/Project/vendor/[cloud.google.com/go/spanner/session.go:327](http://cloud.google.com/go/spanner/session.go:327) +0xd8
[cloud.google.com/go/spanner.(*sessionHandle).recycle(0x1083edf40](http://cloud.google.com/go/spanner.(*sessionHandle).recycle(0x1083edf40)?)
        /Users/UserName/Projects/Project/vendor/[cloud.google.com/go/spanner/session.go:82](http://cloud.google.com/go/spanner/session.go:82) +0x15c
[cloud.google.com/go/spanner.(*Client).rwTransaction.func1()](http://cloud.google.com/go/spanner.(*Client).rwTransaction.func1())
        /Users/UserName/Projects/Project/vendor/[cloud.google.com/go/spanner/client.go:500](http://cloud.google.com/go/spanner/client.go:500) +0x28
[cloud.google.com/go/spanner.(*Client).rwTransaction(0x140014ae790](http://cloud.google.com/go/spanner.(*Client).rwTransaction(0x140014ae790), {0x108445818, 0x140041d2ba0}, 0x14003632c60, {{0xb0?}, {0x0?, 0xa8?}, 0x8247080?, 0x1?})
        /Users/UserName/Projects/Project/vendor/[cloud.google.com/go/spanner/client.go:539](http://cloud.google.com/go/spanner/client.go:539) +0x174
[cloud.google.com/go/spanner.(*Client).ReadWriteTransaction(0x14000fbdf10](http://cloud.google.com/go/spanner.(*Client).ReadWriteTransaction(0x14000fbdf10)?, {0x108445818?, 0x140041d2720?}, 0x107103480?)
        /Users/UserName/Projects/Project/vendor/[cloud.google.com/go/spanner/client.go:471](http://cloud.google.com/go/spanner/client.go:471) +0xb8

Additional context

It happens after #6344 is merged.

@osterante osterante added the triage me I really want to be triaged. label Feb 27, 2023
@product-auto-label product-auto-label bot added the api: spanner Issues related to the Spanner API. label Feb 27, 2023
@rahul2393
Copy link
Contributor

@osterante Thanks for raising the request, do you have test case handy for replicating the deadlock? #6344 was merged in v1.35.0 so to replicate should I use the same version?

@osterante
Copy link
Author

osterante commented Feb 27, 2023

@rahul2393 Sorry, I don't have minimal code to reproduce it and I can't provide our production codes.
You can reproduce this with v1.35.0 or later.

s.destroy() calls s.pool.mu.Lock() so if s.pool.recycleLocked(s) returned false, it must be deadlock.
(It may happen if client.Close() and RWTransaction() are called same time??)

s.pool.mu.Lock()
defer s.pool.mu.Unlock()
if !s.pool.recycleLocked(s) {
// s is rejected by its home session pool because it expired and the
// session pool currently has enough open sessions.
s.destroy(false)
}

@rahul2393
Copy link
Contributor

@osterante Can you please check if the patch solves the deadlock at your end?

add the line in your go.mod file to verify

replace cloud.google.com/go/spanner => cloud.google.com/go/spanner v1.44.1-0.20230301101930-6671f7ca47f3

@osterante
Copy link
Author

@rahul2393 Seems fixed 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: spanner Issues related to the Spanner API. triage me I really want to be triaged.
Projects
None yet
2 participants