feat: base transaction retries on error codes#953
Conversation
c757ce5 to
65c61cb
Compare
65c61cb to
5c77e0c
Compare
wilhuff
left a comment
There was a problem hiding this comment.
A few questions. Also, I don't feel competent to review dev/test/transaction.ts. There's some kind of higher-order ninjitsu going on there that I don't grok.
| } | ||
|
|
||
| this._writeBatch._reset(); | ||
| await this.maybeBackoff(lastError); |
dev/src/transaction.ts
Outdated
| this._backoff.resetToMax(); | ||
| } else if (error.code === Status.ABORTED) { | ||
| // We don't backoff for ABORTED to avoid starvation | ||
| this._backoff.reset(); |
There was a problem hiding this comment.
This contradicts comments in go/transaction-retry-matrix. Do we have experimental data to support that starvation is a problem? Given multiple clients contending on a write, multiple clients failing to back off (with randomness) means they'll never make progress.
There was a problem hiding this comment.
I checked with the Firestore team and removed the special-casing on ABORTED, which simplifies the code a bit since we now always back off. This also allowed me to simplify the test suite.
| if (error.code !== undefined) { | ||
| // This list is based on https://github.com/firebase/firebase-js-sdk/blob/master/packages/firestore/src/core/transaction_runner.ts#L112 | ||
| switch (error.code) { | ||
| case Status.ABORTED: |
There was a problem hiding this comment.
Is it worth distinguishing retry-the-rpc vs retry-the-whole-transaction errors?
There was a problem hiding this comment.
The per RPC retry is actually driven by GAX. The one downside of this approach is that if say a RunQuery RPC fails with DEADLINE_EXCEEDED, the RPC will retried by GAX first. If GAX retries don't resolve the issue, then we will restart the transaction.
It's probably possible to turn off GAX retries, but it is a pretty invasive change.
Codecov Report
@@ Coverage Diff @@
## master #953 +/- ##
==========================================
+ Coverage 97.39% 97.45% +0.06%
==========================================
Files 25 25
Lines 15875 15896 +21
Branches 1258 1193 -65
==========================================
+ Hits 15461 15491 +30
+ Misses 411 402 -9
Partials 3 3
Continue to review full report at Codecov.
|
Port of googleapis/nodejs-firestore#953 Implements go/transaction-retry-matrix for Java. Transaction retries are now dependent upon error code, and will only be retried on specific error codes. This change takes care of all of the state tracking and cleanup necessary to retry transactions.
This PR implements consistent retry behavior all transaction methods:
As per go/transaction-retry-matrix, it implements the following behavior by error code and RPC:
RPC: BeginTransaction, Rollback
RPC: RunQuery, BatchGetDocuments, Commit
A restart consists of a rollback and a BeginTransaction RPC with a
retryTransaction.I also updated the test cases to allow us to verify the backoff behavior.
Fixes #889
Fixes #827