Skip to content

Conversation

@sarroutbi
Copy link
Contributor

@sarroutbi sarroutbi commented Jul 14, 2025

This pull request introduces a resilient HTTP client with an exponential backoff strategy to enhance the robustness of the Keylime push model agent. This new client is then integrated into the push-model agent's attestation process, making it more tolerant of transient network failures and temporary server-side errors.

  1. Add Resilient HTTP Client with Exponential Backoff

The first commit adds a new, reusable module named resilient_client.

  • New Module: Creates keylime/src/resilient_client.rs.
  • Functionality: This module provides a ResilientClient that wraps reqwest. It uses the reqwest-retry and reqwest-middleware crates to automatically retry failed HTTP requests.
  • Custom Retry Strategy: A custom retry strategy, StopOnSuccessStrategy, is implemented. This allows the client to intelligently stop retrying not only on failures but also when it receives a specific, configurable list of successful HTTP status codes (e.g., 200 OK, 202 Accepted).
  • Configurability: The client is configurable with parameters for maximum retries, initial delay, and maximum delay, providing flexible control over the backoff behavior.
  • Dependencies: Adds reqwest-middleware, reqwest-retry, and retry-policies to the project. All these newly introduced crates (and used versions) exist on Fedora:
    https://src.fedoraproject.org/rpms/rust-retry-policies
    https://src.fedoraproject.org/rpms/rust-reqwest-middleware
    https://src.fedoraproject.org/rpms/rust-reqwest-retry
  1. Integrate Resilient Client into Push Attestation

The second commit applies the new ResilientClient to the agent's attestation workflow.

  • Integration: The keylime-push-model-agent's AttestationClient is refactored to use the new ResilientClient instead of a standard reqwest::Client.
  • Increased Robustness: HTTP calls for attestation negotiation and evidence submission will now automatically retry on connection errors or server-side statuses like 503 Service Unavailable.
  • New Configuration: New configuration options have been added to the agent's config file (keylime.conf) to control the retry behavior:
    • expbackoff_max_retries
    • expbackoff_initial_delay
    • expbackoff_max_delay
  • Improved Reliability: This change makes the push-model agent significantly more reliable in environments where the verifier or registrar might be temporarily unavailable.

@sarroutbi sarroutbi force-pushed the 202507141205-resilient-client branch 4 times, most recently from be3aa9a to 8c12871 Compare July 14, 2025 11:14
@codecov
Copy link

codecov bot commented Jul 14, 2025

Codecov Report

Attention: Patch coverage is 63.15789% with 28 lines in your changes missing coverage. Please review.

Project coverage is 58.86%. Comparing base (7b746b0) to head (933cd4d).
Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
keylime/src/resilient_client.rs 47.82% 24 Missing ⚠️
keylime-push-model-agent/src/attestation.rs 81.81% 4 Missing ⚠️
Additional details and impacted files
Flag Coverage Δ
e2e-testsuite 58.86% <63.15%> (+0.31%) ⬆️
upstream-unit-tests 58.86% <63.15%> (+0.31%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
keylime-push-model-agent/src/main.rs 45.79% <100.00%> (+0.60%) ⬆️
keylime/src/config/base.rs 86.90% <100.00%> (+0.18%) ⬆️
keylime/src/config/push_model.rs 60.00% <ø> (ø)
keylime-push-model-agent/src/attestation.rs 39.69% <81.81%> (+17.55%) ⬆️
keylime/src/resilient_client.rs 47.82% <47.82%> (ø)

... and 7 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Sergio Arroutbi <sarroutb@redhat.com>
@sarroutbi sarroutbi force-pushed the 202507141205-resilient-client branch from 8c12871 to 82ed57d Compare July 14, 2025 15:09
Signed-off-by: Sergio Arroutbi <sarroutb@redhat.com>
@sarroutbi
Copy link
Contributor Author

sarroutbi commented Jul 14, 2025

Test coverage continues being imprecise:
image

@sarroutbi sarroutbi marked this pull request as ready for review July 14, 2025 17:23
Copy link
Contributor

@ansasaki ansasaki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I just put a single note.

Signed-off-by: Sergio Arroutbi <sarroutb@redhat.com>
To get a complete predictable way of working,
jitter will be disabled by default

Signed-off-by: Sergio Arroutbi <sarroutb@redhat.com>
@sarroutbi sarroutbi merged commit d1948f7 into keylime:master Jul 15, 2025
12 checks passed
@sarroutbi sarroutbi deleted the 202507141205-resilient-client branch July 15, 2025 09:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants