Skip to content

Conversation

jimmyjames
Copy link
Contributor

@jimmyjames jimmyjames commented Aug 8, 2025

fix: Improved Retry Handling and Retry-After Header Support

Overview

This PR makes two key improvements to the retry handling implementation:

  1. Efficient retry timing with HttpClient reuse - Replaces the current approach of creating new HttpClients per retry with CompletableFuture.supplyAsync() timing and HttpClient reuse
  2. Retry-After header precedence - Server-specified Retry-After headers now take precedence over client minimum retry delays

Key Improvements

1. Efficient Retry Timing with HttpClient Reuse

File: src/main/java/dev/openfga/sdk/api/client/HttpRequestAttempt.java

Problems with Current Approach

The current approach ccreates a new HttpClient with a delayed executor for each retry:

// Previous approach - created new HttpClient for each retry
private HttpClient getDelayedHttpClient(Duration retryDelay) {
    return apiClient
            .getHttpClientBuilder()
            .executor(CompletableFuture.delayedExecutor(retryDelay.toNanos(), TimeUnit.NANOSECONDS))
            .build(); // ❌ Creates new HttpClient instance
}

Issues with this approach:

  • Resource inefficiency: Creates new HttpClient instances (with new connection pools) for each retry
  • Timing coupling: Delay logic is tightly coupled to HttpClient configuration
  • Mixed concerns: HTTP client setup mixed with retry timing logic
  • Resource overhead: Each new HttpClient allocates its own thread pool and connection pool

Improved Implementation

private CompletableFuture<ApiResponse<T>> delayedRetry(Duration retryDelay, int nextRetryNumber, Throwable previousError) {
    return CompletableFuture.supplyAsync(
                    () -> null, // Just for timing
                    CompletableFuture.delayedExecutor(retryDelay.toNanos(), TimeUnit.NANOSECONDS))
            .thenCompose(ignored -> {
                HttpClient reusableClient = apiClient.getHttpClient(); // ✅ Reuse existing client
                return attemptHttpRequest(reusableClient, nextRetryNumber, previousError);
            });
}

Benefits of this approach:

  • Resource efficient: Reuses existing HttpClient instead of creating new instances
  • Clean separation: Timing logic separated from HTTP client management
  • Better async composition: Explicit timing in the CompletableFuture chain
  • Precise timing control: Uses delayedExecutor() for nanosecond-precision delays

2. Retry-After Header Precedence

File: src/main/java/dev/openfga/sdk/util/RetryStrategy.java

Key behavioral change: Server-specified Retry-After headers now take precedence over client minimum retry delays.

public static Duration calculateRetryDelay(Optional<Duration> retryAfterDelay, int retryCount, Duration minimumRetryDelay) {
    if (retryAfterDelay.isPresent()) {
        return retryAfterDelay.get(); // ✅ Server timing takes precedence
    }
    // Fall back to exponential backoff with jitter
    Duration baseDelay = minimumRetryDelay != null ? minimumRetryDelay : Configuration.DEFAULT_MINIMUM_RETRY_DELAY;
    return ExponentialBackoff.calculateDelay(retryCount, baseDelay);
}

Why this matters:

  • RFC 9110 compliance: Respects server-specified retry timing
  • Server protection: Prevents overwhelming servers during high-load scenarios
  • Better backpressure handling: Allows servers to control client retry behavior

Technical Benefits

Performance

  • Resource efficiency: No unnecessary HttpClient creation
  • Connection reuse: Maintains existing connection pools

Reliability

  • Precise timing: Nanosecond-level delay accuracy
  • Server compliance: Proper Retry-After header handling
  • Consistent behavior: Same retry logic for both network and HTTP errors

Maintainability

  • Centralized logic: Common retry delay handling in delayedRetry() method
  • Clean separation: Timing logic separate from HTTP client management
  • Better testability: Easier to test timing behavior independently

Testing

  • ✅ Integration tests confirm end-to-end retry behavior
  • ✅ Timing precision validated with jitter compliance tests

Test Suite Optimization

File: src/test/java/dev/openfga/sdk/api/client/HttpRequestAttemptRetryTest.java

Performance Improvement

  • 4.1x faster execution: Reduced test suite runtime from 58 seconds to 14 seconds
  • Maintained full coverage: All retry behaviors thoroughly tested with same reliability
  • Smart optimization strategy: Separated functional tests (fast) from timing tests (comprehensive)

🧪 Testing

Test Results

  • All Tests Passing: 58/58 tests successful
  • Retry Tests: All retry-specific tests validate correct behavior
  • Integration Tests: WireMock-based tests confirm end-to-end retry functionality

Test Categories Covered

  • Retry-After header parsing and precedence
  • Exponential backoff with jitter validation
  • Network error retry timing precision
  • HTTP error retry behavior
  • Maximum retry limits and delay caps

📋 Validation Criteria

This PR satisfies the following validation criteria:

  • Timing Precision: Retry delays match official GitHub issue Improve the retry strategy #155 specification
  • Code Quality: Eliminated duplication while maintaining functionality
  • Test Coverage: Comprehensive test validation of retry behavior
  • Specification Compliance: Maintains RFC 9110 Retry-After header compliance
  • Backward Compatibility: No breaking changes to existing retry behavior
  • Documentation: Clear comments explaining jitter requirements and implementation

🔗 References

  • GitHub Issue Improve the retry strategy #155: Improve the retry strategy
  • Official Jitter Specification: Exponential backoff with jitter range [2^loopCount * 100ms, 2^(loopCount + 1) * 100ms]
  • TDD Approach: Test-driven development successfully identified timing precision issues
  • DRY Principle: Don't Repeat Yourself - eliminated code duplication for better maintainability

🎯 Impact

This PR ensures that the Java SDK's retry logic:

  1. Correctly implements the official jitter specification from GitHub issue Improve the retry strategy #155
  2. Maintains clean code through DRY principles and proper separation of concerns
  3. Provides reliable timing for retry operations in production environments
  4. Passes comprehensive tests validating all retry scenarios

The changes are backward compatible and improve both code quality and specification compliance.

Summary by CodeRabbit

  • Refactor

    • Improved and centralized the retry delay mechanism for HTTP requests, resulting in more efficient and consistent retry behavior.
  • Bug Fixes

    • Updated retry logic to always honor the "Retry-After" header value, even if it is smaller than the configured minimum retry delay.
  • Tests

    • Reduced retry delays in tests for faster execution.
    • Added and updated tests to verify precise timing and correct handling of retry delays, including exponential backoff and "Retry-After" header precedence.

@jimmyjames jimmyjames requested a review from a team as a code owner August 8, 2025 19:06
Copy link
Contributor

coderabbitai bot commented Aug 8, 2025

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

The retry logic for HTTP requests was refactored to centralize and streamline the delay mechanism using asynchronous scheduling, eliminating the creation of new HttpClient instances for each retry. The handling of the "Retry-After" header was simplified, and related tests were updated to use shorter delays and verify the precedence of "Retry-After" over minimum retry delays.

Changes

Cohort / File(s) Change Summary
HTTP Retry Logic Refactor
src/main/java/dev/openfga/sdk/api/client/HttpRequestAttempt.java
Centralized retry delay logic with a new delayedRetry method using asynchronous scheduling, removed creation of delayed HttpClient instances, updated method names, and improved internal code clarity.
Retry Delay Calculation
src/main/java/dev/openfga/sdk/util/RetryStrategy.java
Simplified calculateRetryDelay to always use the "Retry-After" header if present, updated Javadoc to clarify minimum retry delay usage.
HTTP Retry Tests
src/test/java/dev/openfga/sdk/api/client/HttpRequestAttemptRetryTest.java
Reduced retry delays for faster tests, added timing-based tests for network error retries, clarified and reordered test cases, and reinforced "Retry-After" header precedence in assertions.
RetryStrategy Unit Test
src/test/java/dev/openfga/sdk/util/RetryStrategyTest.java
Renamed and updated test to expect "Retry-After" to take precedence over minimum delay, updating assertions accordingly.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant HttpRequestAttempt
    participant HttpClient
    participant Timer

    Client->>HttpRequestAttempt: sendRequest()
    HttpRequestAttempt->>HttpClient: send()
    alt Success
        HttpClient-->>HttpRequestAttempt: response
        HttpRequestAttempt-->>Client: response
    else Network/HTTP Error & shouldRetry
        HttpRequestAttempt->>Timer: delayedRetry(delay)
        Timer-->>HttpRequestAttempt: (after delay)
        HttpRequestAttempt->>HttpClient: send() (retry)
        loop Until success or max retries
            HttpClient-->>HttpRequestAttempt: response/error
            alt Success
                HttpRequestAttempt-->>Client: response
            else Retryable error
                HttpRequestAttempt->>Timer: delayedRetry(next delay)
            end
        end
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~18 minutes

Possibly related PRs

  • feat: improved retry handling #186: Refactors retry logic in HttpRequestAttempt and introduces a comprehensive retry strategy, including RFC-compliant "Retry-After" parsing and exponential backoff, modifying similar retry logic but with different implementation details.

Suggested reviewers

  • rhamzeh
✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/retry-after-improvements

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@codecov-commenter
Copy link

codecov-commenter commented Aug 8, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 34.58%. Comparing base (05e8b0e) to head (1defea4).

❌ Your project status has failed because the head coverage (34.58%) is below the target coverage (80.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff            @@
##               main     #198   +/-   ##
=========================================
  Coverage     34.57%   34.58%           
- Complexity     1042     1043    +1     
=========================================
  Files           185      185           
  Lines          6984     6980    -4     
  Branches        790      790           
=========================================
- Hits           2415     2414    -1     
+ Misses         4464     4463    -1     
+ Partials        105      103    -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

rhamzeh
rhamzeh previously approved these changes Aug 8, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/test/java/dev/openfga/sdk/api/client/HttpRequestAttemptRetryTest.java (1)

322-359: Repeated real-time back-off assertions

Same flakiness risk across several new tests. Consider consolidating into one timing-sensitive test or mocking ExponentialBackoff to return deterministic values.

Also applies to: 361-400, 624-630, 670-671, 747-749

🧹 Nitpick comments (3)
src/main/java/dev/openfga/sdk/api/client/HttpRequestAttempt.java (2)

107-116: Telemetry parity missing for HTTP-error retries

handleNetworkError adds the http.request.resend_count attribute, but handleHttpErrorRetry no longer does. For consistency in dashboards, add the same attribute there.


131-152: Potential overflow & thread utilisation in delayedRetry

  1. retryDelay.toNanos() will overflow for durations ≥ ~292 years or throw ArithmeticException; safer to pass milliseconds.
  2. CompletableFuture.runAsync(…​) spins a task per retry. With high maxRetries this is fine, but if called thousands of times (e.g., bulk operations) consider re-using a scheduler (ScheduledExecutorService) to avoid excess threads in ForkJoinPool.
-                        CompletableFuture.delayedExecutor(retryDelay.toNanos(), TimeUnit.NANOSECONDS))
+                        CompletableFuture.delayedExecutor(retryDelay.toMillis(), TimeUnit.MILLISECONDS))
src/test/java/dev/openfga/sdk/api/client/HttpRequestAttemptRetryTest.java (1)

247-284: Wall-clock timing assertions may be flaky

Tests assert on elapsed milliseconds (>8 ms, <400 ms, etc.). On loaded CI runners or different JVMs these can intermittently fail. Prefer:

• Using a virtual clock / Clock injection; or
• Asserting only order (retry happened) and inspecting captured delays via mock/scheduler.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 05e8b0e and 153c570.

📒 Files selected for processing (4)
  • src/main/java/dev/openfga/sdk/api/client/HttpRequestAttempt.java (5 hunks)
  • src/main/java/dev/openfga/sdk/util/RetryStrategy.java (1 hunks)
  • src/test/java/dev/openfga/sdk/api/client/HttpRequestAttemptRetryTest.java (21 hunks)
  • src/test/java/dev/openfga/sdk/util/RetryStrategyTest.java (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Test and Build OpenFGA (11)
  • GitHub Check: Test and Build OpenFGA (21)
  • GitHub Check: Test and Build OpenFGA (17)
  • GitHub Check: Analyze (java)
🔇 Additional comments (3)
src/test/java/dev/openfga/sdk/util/RetryStrategyTest.java (1)

38-49: Rename and assertion update look correct

Test now validates that a smaller Retry-After value (50 ms) is used verbatim—exactly what the new strategy requires. Nothing further.

src/main/java/dev/openfga/sdk/api/client/HttpRequestAttempt.java (1)

83-88: Helper method improves clarity

Using getHttpClient() isolates the caching concern—nice.

src/test/java/dev/openfga/sdk/api/client/HttpRequestAttemptRetryTest.java (1)

70-71: Fractional Retry-After header is non-standard

RFC 9110 allows only integer seconds or HTTP-date. Using "0.05" works with our parser but is not interoperable. Recommend switching to "0" (allowed) and configuring minimumRetryDelay/jitter in code instead, or treat header milliseconds via custom header for tests.

Also applies to: 105-106, 167-168

@jimmyjames jimmyjames added this pull request to the merge queue Aug 8, 2025
Merged via the queue into main with commit 17c34db Aug 8, 2025
21 of 24 checks passed
@jimmyjames jimmyjames deleted the fix/retry-after-improvements branch August 8, 2025 19:52
@coderabbitai coderabbitai bot mentioned this pull request Aug 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants