PHPLIB-1719 Exponential backoff and jitter in retry loops#1880
PHPLIB-1719 Exponential backoff and jitter in retry loops#1880GromNaN merged 10 commits intomongodb:v2.xfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Updates the library’s retry/backoff behavior and test coverage to align with the Client Backpressure specification, and refactors shared test utilities for controlling jitter deterministically.
Changes:
- Adds Client Backpressure prose tests for exponential backoff behavior and maximum retry count on overload errors.
- Extracts a shared
Util::setFixedJitter()helper for spec tests to deterministically control backoff jitter. - Updates
WithTransactionjitter handling logic (used by retry/backoff computations) and gates relevant tests onext-mongodb >= 2.3.0dev.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/UnifiedSpecTests/Util.php | Adds shared helper to inject a fixed jitter generator into WithTransaction for deterministic timing assertions. |
| tests/SpecTests/TransactionsConvenientApi/Prose4_RetryBackoffIsEnforcedTest.php | Switches to the shared jitter helper and gates the test on ext-mongodb >= 2.3.0dev. |
| tests/SpecTests/ClientBackpressure/Prose3_OverloadErrorMaxRetryTest.php | New prose test validating overload errors stop retrying after the maximum retry count. |
| tests/SpecTests/ClientBackpressure/Prose1_OpRetryExponentialBackoffTest.php | New prose test intended to validate exponential backoff timing under overload errors, using fixed jitter. |
| src/Operation/WithTransaction.php | Modifies jitter selection logic used for backoff computations in transaction retry loops. |
Comments suppressed due to low confidence (1)
src/Operation/WithTransaction.php:216
- The null-check in getJitter() is inverted: when $this->jitterGenerator is null, the code attempts to invoke it as a callable, which will cause a TypeError the first time backoff is computed. This should call the generator only when it is non-null, otherwise fall back to generating a random jitter value.
private function getJitter(): float
{
if ($this->jitterGenerator === null) {
return ($this->jitterGenerator)();
}
// Jitter is a random float from [0, 1]
// 2 ** 53 is the largest integer that can be represented in a float without losing precision
return random_int(0, 2 ** 53) / 2 ** 53;
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## v2.x #1880 +/- ##
=========================================
Coverage 87.75% 87.75%
Complexity 3308 3308
=========================================
Files 447 447
Lines 6607 6607
=========================================
Hits 5798 5798
Misses 809 809
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
alcaeus
left a comment
There was a problem hiding this comment.
Implemented tests look good, I just added a couple of suggestions for direct links to individual prose tests.
The PR is missing an implementation of prose test 4 (testing maxAdaptiveRetries)
as it will be reused across spec tests with the introduction of client backpressure tests
As per the spec: specifications/blob/master/source/client-backpressure/tests/README.md
Assert that the backoff delay is approximately 0.3 seconds (sum of 2 backoffs with jitter=1) with a 0.3-second tolerance for variance.
Add skipIfServerVersion('<', '4.3.1', ...) guards to Prose1 and Prose3
client backpressure tests since older servers reject errorLabels in
failpoints.
| 'data' => [ | ||
| 'failCommands' => ['insert'], | ||
| 'errorCode' => 2, | ||
| 'errorLabels' => ['SystemOverloadedError', 'RetryableError'], |
There was a problem hiding this comment.
The failpoint only adds SystemOverloadedError and RetryableError labels, but WithTransaction::checkForRetryableError() only retries (and calls backoff()) for exceptions with the TransientTransactionError label. As written, execute() will rethrow immediately and jitter/backoff timing assertions won’t be exercising the intended retry path. Consider adjusting the failpoint/error labels (or the operation under test) so it triggers the retry/backoff logic you want to measure.
| 'errorLabels' => ['SystemOverloadedError', 'RetryableError'], | |
| 'errorLabels' => ['SystemOverloadedError', 'RetryableError', 'TransientTransactionError'], |
- Add Prose4: test that maxAdaptiveRetries=1 limits retries to 2 total attempts
- Remove #[RequiresPhpExtension('mongodb', '>= 2.3.0dev')] from all three
backpressure prose tests since ext-mongodb ^2.3 is now required in composer.json
Good catch, test added. |
|
Prose1 test randomly failing; even if it implements the prose test rigorously. |
|
Closing this PR after deeper analysis of Prose test 1. Core issue
As a result, both measurements ( Open questionProse tests 3 and 4 (command count assertions) are not affected by this issue and can be reopened in a separate PR once ext-mongodb 2.3 is stable. |
The overload retry and its exponential backoff are implemented inside ext-mongodb (C level). WithTransaction only retries on TransientTransactionError, not on SystemOverloadedError, so setFixedJitter() had no effect on the test timing, making the spec assertion randomly fail. Replace the two-run jitter comparison with a single run that asserts the operation completes within the maximum possible backoff window (MAX_RETRIES × MAX_BACKOFF = 20s). A comment explains why the full spec assertion cannot be implemented from PHPLIB.
| * | ||
| * As partial verification, we assert that the operation completed within the | ||
| * maximum possible backoff window: MAX_RETRIES (2) × MAX_BACKOFF (10s) = 20s. */ | ||
| self::assertLessThan(20.0, $elapsed); |
There was a problem hiding this comment.
The assertion has been updated to reflect the fact that we cannot force a specific jitter value.
Closes PHPLIB-1719
Implements exponential backoff and jitter in retry loops as specified in the Client Backpressure spec.
Changes
maxAdaptiveRetriesoption when configuredsetFixedJitter()helper toUtilso it can be shared across spec test suites (was previously duplicated inTransactionsConvenientApi)Checklist
Blocked by Require ext-mongodb ^2.3 and test against dev branch #1879(no longer since ext-mongodb ^2.3 is now required)