Skip to content

Commit

Permalink
feat(spanner): enable long running transaction clean up (#8969)
Browse files Browse the repository at this point in the history
* feat(spanner): enable long running txn clean up with default as WARN

* feat(spanner): remove mutex that does not exist in main branch

* feat(spanner): update tests

* feat(spanner): comment update
  • Loading branch information
harshachinta committed Nov 15, 2023
1 parent 9f9b01e commit 5d181bb
Show file tree
Hide file tree
Showing 4 changed files with 107 additions and 25 deletions.
82 changes: 82 additions & 0 deletions spanner/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,4 +31,86 @@ row, err := client.Single().ReadRow(ctx, "Users",
if err != nil {
log.Fatal(err)
}
```

### Session Leak
A `Client` object of the Client Library has a limit on the number of maximum sessions. For example the
default value of `MaxOpened`, which is the maximum number of sessions allowed by the session pool in the
Golang Client Library, is 400. You can configure these values at the time of
creating a `Client` by passing custom `SessionPoolConfig` as part of `ClientConfig`. When all the sessions are checked
out of the session pool, every new transaction has to wait until a session is returned to the pool.
If a session is never returned to the pool (hence causing a session leak), the transactions will have to wait
indefinitely and your application will be blocked.

#### Common Root Causes
The most common reason for session leaks in the Golang client library are:
1. Not stopping a `RowIterator` that is returned by `Query`, `Read` and other methods. Always use `RowIterator.Stop()` to ensure that the `RowIterator` is always closed.
2. Not closing a `ReadOnlyTransaction` when you no longer need it. Always call `ReadOnlyTransaction.Close()` after use, to ensure that the `ReadOnlyTransaction` is always closed.

As shown in the example below, the `txn.Close()` statement releases the session after it is complete.
If you fail to call `txn.Close()`, the session is not released back to the pool. The recommended way is to use `defer` as shown below.
```go
client, err := spanner.NewClient(ctx, "projects/P/instances/I/databases/D")
if err != nil {
log.Fatal(err)
}
txn := client.ReadOnlyTransaction()
defer txn.Close()
```

#### Debugging and Resolving Session Leaks

##### Logging inactive transactions
This option logs warnings when you have exhausted >95% of your session pool. It is enabled by default.
This could mean two things; either you need to increase the max sessions in your session pool (as the number
of queries run using the client side database object is greater than your session pool can serve), or you may
have a session leak. To help debug which transactions may be causing this session leak, the logs will also contain stack traces of
transactions which have been running longer than expected if `TrackSessionHandles` under `SessionPoolConfig` is enabled.

```go
sessionPoolConfig := spanner.SessionPoolConfig{
TrackSessionHandles: true,
InactiveTransactionRemovalOptions: spanner.InactiveTransactionRemovalOptions{
ActionOnInactiveTransaction: spanner.Warn,
},
}
client, err := spanner.NewClientWithConfig(
ctx, database, spanner.ClientConfig{SessionPoolConfig: sessionPoolConfig},
)
if err != nil {
log.Fatal(err)
}
defer client.Close()

// Example Log message to warn presence of long running transactions
// session <session-info> checked out of pool at <session-checkout-time> is long running due to possible session leak for goroutine
// <Stack Trace of transaction>

```

##### Automatically clean inactive transactions
When the option to automatically clean inactive transactions is enabled, the client library will automatically detect
problematic transactions that are running for a very long time (thus causing session leaks) and close them.
The session will be removed from the pool and be replaced by a new session. To dig deeper into which transactions are being
closed, you can check the logs to see the stack trace of the transactions which might be causing these leaks and further
debug them.

```go
sessionPoolConfig := spanner.SessionPoolConfig{
TrackSessionHandles: true,
InactiveTransactionRemovalOptions: spanner.InactiveTransactionRemovalOptions{
ActionOnInactiveTransaction: spanner.WarnAndClose,
},
}
client, err := spanner.NewClientWithConfig(
ctx, database, spanner.ClientConfig{SessionPoolConfig: sessionPoolConfig},
)
if err != nil {
log.Fatal(err)
}
defer client.Close()

// Example Log message for when transaction is recycled
// session <session-info> checked out of pool at <session-checkout-time> is long running and will be removed due to possible session leak for goroutine
// <Stack Trace of transaction>
```
12 changes: 6 additions & 6 deletions spanner/client_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -216,7 +216,7 @@ func TestClient_Single_WhenInactiveTransactionsAndSessionIsNotFoundOnBackend_Rem
MinOpened: 1,
MaxOpened: 1,
InactiveTransactionRemovalOptions: InactiveTransactionRemovalOptions{
actionOnInactiveTransaction: WarnAndClose,
ActionOnInactiveTransaction: WarnAndClose,
},
},
})
Expand Down Expand Up @@ -1044,7 +1044,7 @@ func TestClient_ReadOnlyTransaction_WhenMultipleOperations_SessionLastUseTimeSho
MinOpened: 1,
MaxOpened: 1,
InactiveTransactionRemovalOptions: InactiveTransactionRemovalOptions{
actionOnInactiveTransaction: WarnAndClose,
ActionOnInactiveTransaction: WarnAndClose,
idleTimeThreshold: 300 * time.Millisecond,
},
},
Expand Down Expand Up @@ -1507,7 +1507,7 @@ func TestClient_ReadWriteTransaction_WhenLongRunningSessionCleaned_TransactionSh
MinOpened: 1,
MaxOpened: 1,
InactiveTransactionRemovalOptions: InactiveTransactionRemovalOptions{
actionOnInactiveTransaction: WarnAndClose,
ActionOnInactiveTransaction: WarnAndClose,
},
},
})
Expand Down Expand Up @@ -1560,7 +1560,7 @@ func TestClient_ReadWriteTransaction_WhenMultipleOperations_SessionLastUseTimeSh
MinOpened: 1,
MaxOpened: 1,
InactiveTransactionRemovalOptions: InactiveTransactionRemovalOptions{
actionOnInactiveTransaction: WarnAndClose,
ActionOnInactiveTransaction: WarnAndClose,
idleTimeThreshold: 300 * time.Millisecond,
},
},
Expand Down Expand Up @@ -1660,7 +1660,7 @@ func TestClient_ReadWriteTransaction_WhenLongRunningExecuteBatchUpdate_TakeNoAct
MinOpened: 1,
MaxOpened: 1,
InactiveTransactionRemovalOptions: InactiveTransactionRemovalOptions{
actionOnInactiveTransaction: WarnAndClose,
ActionOnInactiveTransaction: WarnAndClose,
},
},
})
Expand Down Expand Up @@ -4232,7 +4232,7 @@ func TestClient_WhenLongRunningPartitionedUpdateRequest_TakeNoAction(t *testing.
MaxOpened: 1,
healthCheckSampleInterval: 10 * time.Millisecond, // maintainer runs every 10ms
InactiveTransactionRemovalOptions: InactiveTransactionRemovalOptions{
actionOnInactiveTransaction: WarnAndClose,
ActionOnInactiveTransaction: WarnAndClose,
executionFrequency: 15 * time.Millisecond, // check long-running sessions every 15ms
},
},
Expand Down
24 changes: 12 additions & 12 deletions spanner/session.go
Original file line number Diff line number Diff line change
Expand Up @@ -59,9 +59,9 @@ const (

// InactiveTransactionRemovalOptions has configurations for action on long-running transactions.
type InactiveTransactionRemovalOptions struct {
// actionOnInactiveTransaction is the configuration to choose action for inactive transactions.
// ActionOnInactiveTransaction is the configuration to choose action for inactive transactions.
// It can be one of Warn, Close, WarnAndClose.
actionOnInactiveTransaction ActionOnInactiveTransactionKind
ActionOnInactiveTransaction ActionOnInactiveTransactionKind
// long-running transactions will be cleaned up if utilisation is
// greater than the below value.
usedSessionsRatioThreshold float64
Expand Down Expand Up @@ -505,7 +505,7 @@ var DefaultSessionPoolConfig = SessionPoolConfig{
HealthCheckWorkers: 10,
HealthCheckInterval: healthCheckIntervalMins * time.Minute,
InactiveTransactionRemovalOptions: InactiveTransactionRemovalOptions{
actionOnInactiveTransaction: NoAction, // Make default to Warn from NoAction later
ActionOnInactiveTransaction: Warn,
executionFrequency: 2 * time.Minute,
idleTimeThreshold: 60 * time.Minute,
usedSessionsRatioThreshold: 0.95,
Expand Down Expand Up @@ -635,8 +635,8 @@ func newSessionPool(sc *sessionClient, config SessionPoolConfig) (*sessionPool,
if config.healthCheckSampleInterval == 0 {
config.healthCheckSampleInterval = time.Minute
}
if config.actionOnInactiveTransaction == actionUnspecified {
config.actionOnInactiveTransaction = DefaultSessionPoolConfig.actionOnInactiveTransaction
if config.ActionOnInactiveTransaction == actionUnspecified {
config.ActionOnInactiveTransaction = DefaultSessionPoolConfig.ActionOnInactiveTransaction
}
if config.idleTimeThreshold == 0 {
config.idleTimeThreshold = DefaultSessionPoolConfig.idleTimeThreshold
Expand Down Expand Up @@ -726,23 +726,23 @@ func (p *sessionPool) getLongRunningSessionsLocked() []*sessionHandle {
sh.mu.Lock()
diff := time.Now().Sub(sh.lastUseTime)
if !sh.eligibleForLongRunning && diff.Seconds() >= p.idleTimeThreshold.Seconds() {
if (p.actionOnInactiveTransaction == Warn || p.actionOnInactiveTransaction == WarnAndClose) && !sh.isSessionLeakLogged {
if p.actionOnInactiveTransaction == Warn {
if (p.ActionOnInactiveTransaction == Warn || p.ActionOnInactiveTransaction == WarnAndClose) && !sh.isSessionLeakLogged {
if p.ActionOnInactiveTransaction == Warn {
if sh.stack != nil {
logf(p.sc.logger, "session %s checked out of pool at %s is long running due to possible session leak for goroutine: \n%s", sh.session.getID(), sh.checkoutTime.Format(time.RFC3339), sh.stack)
} else {
logf(p.sc.logger, "session %s checked out of pool at %s is long running due to possible session leak for goroutine: \nEnable SessionPoolConfig.TrackSessionHandles to get stack trace associated with the session", sh.session.getID(), sh.checkoutTime.Format(time.RFC3339))
}
sh.isSessionLeakLogged = true
} else if p.actionOnInactiveTransaction == WarnAndClose {
} else if p.ActionOnInactiveTransaction == WarnAndClose {
if sh.stack != nil {
logf(p.sc.logger, "session %s checked out of pool at %s is long running and will be removed due to possible session leak for goroutine: \n%s", sh.session.getID(), sh.checkoutTime.Format(time.RFC3339), sh.stack)
} else {
logf(p.sc.logger, "session %s checked out of pool at %s is long running and will be removed due to possible session leak for goroutine: \nEnable SessionPoolConfig.TrackSessionHandles to get stack trace associated with the session", sh.session.getID(), sh.checkoutTime.Format(time.RFC3339))
}
}
}
if p.actionOnInactiveTransaction == WarnAndClose || p.actionOnInactiveTransaction == Close {
if p.ActionOnInactiveTransaction == WarnAndClose || p.ActionOnInactiveTransaction == Close {
longRunningSessions = append(longRunningSessions, sh)
}
}
Expand All @@ -760,7 +760,7 @@ func (p *sessionPool) removeLongRunningSessions() {
p.mu.Unlock()

// destroy long-running sessions
if p.actionOnInactiveTransaction == WarnAndClose || p.actionOnInactiveTransaction == Close {
if p.ActionOnInactiveTransaction == WarnAndClose || p.ActionOnInactiveTransaction == Close {
var leakedSessionsRemovedCount uint64
for _, sh := range longRunningSessions {
// removes inner session out of the pool to reduce the probability of two processes trying
Expand Down Expand Up @@ -895,7 +895,7 @@ var errGetSessionTimeout = spannerErrorf(codes.Canceled, "timeout / context canc
// sessions being checked out of the pool.
func (p *sessionPool) newSessionHandle(s *session) (sh *sessionHandle) {
sh = &sessionHandle{session: s, checkoutTime: time.Now(), lastUseTime: time.Now()}
if p.TrackSessionHandles || p.actionOnInactiveTransaction == Warn || p.actionOnInactiveTransaction == WarnAndClose || p.actionOnInactiveTransaction == Close {
if p.TrackSessionHandles || p.ActionOnInactiveTransaction == Warn || p.ActionOnInactiveTransaction == WarnAndClose || p.ActionOnInactiveTransaction == Close {
p.mu.Lock()
sh.trackedSessionHandle = p.trackedSessionHandles.PushBack(sh)
p.mu.Unlock()
Expand Down Expand Up @@ -1513,7 +1513,7 @@ func (hc *healthChecker) maintainer() {

// task to remove or log sessions which are unexpectedly long-running
if now.After(hc.pool.InactiveTransactionRemovalOptions.lastExecutionTime.Add(hc.pool.executionFrequency)) {
if hc.pool.actionOnInactiveTransaction == Warn || hc.pool.actionOnInactiveTransaction == WarnAndClose || hc.pool.actionOnInactiveTransaction == Close {
if hc.pool.ActionOnInactiveTransaction == Warn || hc.pool.ActionOnInactiveTransaction == WarnAndClose || hc.pool.ActionOnInactiveTransaction == Close {
hc.pool.removeLongRunningSessions()
}
hc.pool.InactiveTransactionRemovalOptions.lastExecutionTime = now
Expand Down
14 changes: 7 additions & 7 deletions spanner/session_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -416,7 +416,7 @@ func TestSessionLeak_WhenInactiveTransactions_RemoveSessionsFromPool(t *testing.
MinOpened: 0,
MaxOpened: 1,
InactiveTransactionRemovalOptions: InactiveTransactionRemovalOptions{
actionOnInactiveTransaction: WarnAndClose,
ActionOnInactiveTransaction: WarnAndClose,
},
TrackSessionHandles: true,
},
Expand Down Expand Up @@ -489,7 +489,7 @@ func TestMaintainer_LongRunningTransactionsCleanup_IfClose_VerifyInactiveSession
MaxOpened: 3,
healthCheckSampleInterval: 10 * time.Millisecond, // maintainer runs every 10ms
InactiveTransactionRemovalOptions: InactiveTransactionRemovalOptions{
actionOnInactiveTransaction: WarnAndClose,
ActionOnInactiveTransaction: WarnAndClose,
executionFrequency: 15 * time.Millisecond, // check long-running sessions every 20ms
},
},
Expand Down Expand Up @@ -560,7 +560,7 @@ func TestLongRunningTransactionsCleanup_IfClose_VerifyInactiveSessionsClosed(t *
MinOpened: 1,
MaxOpened: 3,
InactiveTransactionRemovalOptions: InactiveTransactionRemovalOptions{
actionOnInactiveTransaction: WarnAndClose,
ActionOnInactiveTransaction: WarnAndClose,
},
},
})
Expand Down Expand Up @@ -628,7 +628,7 @@ func TestLongRunningTransactionsCleanup_IfLog_VerifyInactiveSessionsOpen(t *test
MinOpened: 1,
MaxOpened: 3,
InactiveTransactionRemovalOptions: InactiveTransactionRemovalOptions{
actionOnInactiveTransaction: Warn,
ActionOnInactiveTransaction: Warn,
},
},
})
Expand Down Expand Up @@ -715,7 +715,7 @@ func TestLongRunningTransactionsCleanup_UtilisationBelowThreshold_VerifyInactive
MaxOpened: 3,
incStep: 1,
InactiveTransactionRemovalOptions: InactiveTransactionRemovalOptions{
actionOnInactiveTransaction: WarnAndClose,
ActionOnInactiveTransaction: WarnAndClose,
},
},
})
Expand Down Expand Up @@ -774,7 +774,7 @@ func TestLongRunningTransactions_WhenAllExpectedlyLongRunning_VerifyInactiveSess
MinOpened: 1,
MaxOpened: 3,
InactiveTransactionRemovalOptions: InactiveTransactionRemovalOptions{
actionOnInactiveTransaction: Warn,
ActionOnInactiveTransaction: Warn,
},
},
})
Expand Down Expand Up @@ -842,7 +842,7 @@ func TestLongRunningTransactions_WhenDurationBelowThreshold_VerifyInactiveSessio
MinOpened: 1,
MaxOpened: 3,
InactiveTransactionRemovalOptions: InactiveTransactionRemovalOptions{
actionOnInactiveTransaction: Warn,
ActionOnInactiveTransaction: Warn,
},
},
})
Expand Down

0 comments on commit 5d181bb

Please sign in to comment.