Skip to content

Conversation

@sarroutbi
Copy link
Contributor

@sarroutbi sarroutbi commented Jul 18, 2025

This patch introduces two key refinements to the agent's exponential
backoff and retry logic, making it more patient and aligned with
standard network practices.

  • Increased Default Backoff Timings (config/base.rs)
    The default values for the exponential backoff have been
    significantly increased:

    • Initial Delay: Changed from 2 seconds to 10 seconds. This makes
      the agent wait longer before the first retry, which is more suitable for
      services that might be slow to initialize.

    • Maximum Delay: Changed from 60 seconds to 300 seconds (5
      minutes). This allows the delay between retries to grow larger,
      accommodating longer-term service disruptions.

  • Smarter Retry Strategy (resilient_client.rs)
    The core logic in the custom StopOnSuccessStrategy has been improved
    to be more intelligent about when to retry.

    • Before: The strategy would retry on any non-success status code,
      including 4xx client errors (like 404 Not Found), which are typically
      not temporary.

    • After: The strategy now delegates the decision for non-success
      codes to reqwest-retry's built-in default_on_request_success function.
      This default logic is smarter:

  • It will retry on 5xx server errors (e.g., 503 Service Unavailable).

  • It will NOT retry on most 4xx client errors (e.g., 404 Not
    Found, 403 Forbidden), as these indicate a problem with the request
    itself, not a temporary server issue.

  • Similarly, the handling of network errors is now delegated to
    default_on_request_failure, ensuring consistent and robust behavior.

@sarroutbi sarroutbi force-pushed the 202507181003-fix-exponential-backoff branch from 4a7860e to 9318b27 Compare July 18, 2025 08:14
@sarroutbi sarroutbi marked this pull request as ready for review July 18, 2025 08:28
@sarroutbi sarroutbi requested a review from ansasaki July 18, 2025 08:30
@sarroutbi sarroutbi requested a review from sergio-correia July 18, 2025 08:45
Copy link
Contributor

@ansasaki ansasaki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ansasaki ansasaki force-pushed the 202507181003-fix-exponential-backoff branch from 9318b27 to a3eb15f Compare July 18, 2025 10:15
This patch introduces two key refinements to the agent's exponential
backoff and retry logic, making it more patient and aligned with
standard network practices.

* Increased Default Backoff Timings (config/base.rs)
The default values for the exponential backoff have been
significantly increased:

* Initial Delay: Changed from 2 seconds to 10 seconds. This makes
the agent wait longer before the first retry, which is more suitable for
services that might be slow to initialize.

* Maximum Delay: Changed from 60 seconds to 300 seconds (5
minutes). This allows the delay between retries to grow larger,
accommodating longer-term service disruptions.

* Smarter Retry Strategy (resilient_client.rs)
The core logic in the custom StopOnSuccessStrategy has been improved
to be more intelligent about when to retry.

  - Before: The strategy would retry on any non-success status code,
including 4xx client errors (like 404 Not Found), which are typically
not temporary.

  - After: The strategy now delegates the decision for non-success
codes to reqwest-retry's built-in default_on_request_success function.
This default logic is smarter:

* It will retry on 5xx server errors (e.g., 503 Service
Unavailable).

* It will NOT retry on most 4xx client errors (e.g., 404 Not
Found, 403 Forbidden), as these indicate a problem with the request
itself, not a temporary server issue.

* Similarly, the handling of network errors is now delegated to
default_on_request_failure, ensuring consistent and robust behavior.

Signed-off-by: Sergio Arroutbi <sarroutb@redhat.com>
@sarroutbi sarroutbi force-pushed the 202507181003-fix-exponential-backoff branch from a3eb15f to 43cdac6 Compare July 18, 2025 11:26
@codecov
Copy link

codecov bot commented Jul 18, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 57.84%. Comparing base (7c892c8) to head (43cdac6).

Additional details and impacted files
Flag Coverage Δ
e2e-testsuite 57.84% <100.00%> (ø)
upstream-unit-tests 57.84% <100.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
keylime/src/config/base.rs 86.90% <ø> (ø)
keylime/src/resilient_client.rs 48.68% <100.00%> (ø)

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ansasaki ansasaki merged commit b2e7fba into keylime:master Jul 18, 2025
11 checks passed
@sarroutbi sarroutbi deleted the 202507181003-fix-exponential-backoff branch July 18, 2025 15:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants