Skip to content

PDFCLOUD-5382 Add retries and logging#2

Merged
datalogics-cgreen merged 5 commits intopdfrest:mainfrom
datalogics-kam:pdfcloud-5382-retries
Dec 3, 2025
Merged

PDFCLOUD-5382 Add retries and logging#2
datalogics-cgreen merged 5 commits intopdfrest:mainfrom
datalogics-kam:pdfcloud-5382-retries

Conversation

@datalogics-kam
Copy link
Copy Markdown
Contributor

@datalogics-kam datalogics-kam commented Nov 11, 2025

PDFCLOUD-5382

Summary

Adds configurable, robust retries with exponential backoff and jitter to the pdfRest clients, along with structured, sanitized logging for requests, responses, and retry behavior. Improves error handling and observability, and introduces comprehensive tests covering success-after-retry, limits, and edge cases.

Changes

  • Add retry logic and backoff mechanism for client requests

    • Exponential backoff with jitter; configurable max_retries.
    • Enhanced handling of transport, timeout, and server-side errors.
    • Tests for retry success, error propagation after limits, negative limits, and delay calculations.
  • Add logging to pdfRest clients

    • Structured logs for requests/responses, attempts, delays, payloads, and errors.
    • Sanitized sensitive headers in log output.
    • Exception handling augmented with detailed log context.
    • Logging can be enabled by setting PDFREST_LOG to either info or debug

@datalogics-cgreen datalogics-cgreen self-requested a review November 11, 2025 15:42
@datalogics-cgreen
Copy link
Copy Markdown
Contributor

Please rebase to eliminate the duplicate commits.

- Implemented exponential backoff with jitter for retryable errors.
- Added `max_retries` parameter to clients for configurable retry limits.
- Enhanced error handling for transport, timeout, and server-side errors.
- Updated unit tests to validate retry behavior, including success after
  retries and error propagation after retry limits.
- Covered edge cases, such as negative retry limits and proper delay calculations.

Assisted-by: Codex
- Introduced structured logging for request and response details,
  including retry attempts, delays, payloads, and errors.
- Integrated sanitized logging for sensitive header information.
- Improved debugging of request execution and retry behavior for better
  traceability.
- Updated exception handling to include detailed log outputs.

Assisted-by: Codex
@datalogics-kam
Copy link
Copy Markdown
Contributor Author

Please rebase to eliminate the duplicate commits.

Done.

Technically there aren't duplicate commits because there isn't a Y-shaped branch pattern with the same commits on each side of the Y-fork. But I did it anyway, because GitHub doesn't clean up the commit list unless something happens.

@datalogics-cgreen
Copy link
Copy Markdown
Contributor

So I had Codex review Codex, and:

- Retry delay ignores Retry-After: both sync and async paths always sleep for the exponential/
  jittered delay computed in src/pdfrest/client.py:476-784 and never look at Retry-After headers
  from 429/5xx responses. The main branch had no retry logic, so this is a regression risk unique
  to the new implementation: rate-limit responses will continue to be retried on the client’s
  schedule rather than the server’s requested window, which can prolong outages and keep the
  client throttled. Consider reading the header (seconds or HTTP-date) and using the larger of the
  exponential and server delays.

I don't believe pdfRest ever returns a 429 response, so maybe this is irrelevant.

- 408 Request Timeout responses are treated as permanent failures: _is_retryable_status only
  whitelists 429 and ≥500 (src/pdfrest/client.py:465-474). If pdfRest ever returns HTTP 408
  (server-side timeout) you now raise immediately even though the same scenario handled via network
  exceptions (httpx.TimeoutException → PdfRestTimeoutError) is retried. This asymmetry didn’t exist
  on main (no retries) but is easy to fix by including 408 (and possibly other transient 4xx such
  as 425/499 if pdfRest emits them) in the retryable set so behaviour matches server guidance.

Similar deal with 408s?

- Tests only cover server 500/503 flows: the retry/backoff helpers are exercised in tests/
  test_client.py:196-276, but there’s no coverage for 429/rate-limit handling, no assertion that
  timeouts/transport errors are retried, and no test that download_file/streaming calls also
  route through _execute_with_retry. Given how central this logic now is, that leaves gaps in
  confidence compared to main. Adding parameterized tests that simulate 429 with mocked Retry-
  After, httpx.TimeoutException, and download streaming errors would lock in the expected behaviour
  for both sync and async variants.

See item the 1st.

@datalogics-kam
Copy link
Copy Markdown
Contributor Author

In the time this PR has been waiting, the pdfRest container actually started to return a 429.

I don't think anything's sending Retry-After unless it's from a gateway put in front of the server.

Thanks for having Codex review Codex. I had Codex review its work, and it didn't find as much, but I gave Codex these concerns to chew on.

- Implemented logic to track and rewind file streams between retries.
- Enhanced retry mechanism to respect `Retry-After` headers with both
  seconds and HTTP-date formats.
- Improved handling of non-seekable streams, logging errors and aborting
  retries gracefully.
- Updated tests to cover stream behavior during retries, including
  seekable and non-seekable streams.
- Refactored retry logic to include customizable pre-attempt hooks and
  delay adjustments based on responses.
- Handle 408, 429, and 499 retries.

Assisted-by: Codex
@datalogics-kam datalogics-kam marked this pull request as draft December 1, 2025 16:37
- To prevent complexity, the file stream and its position belong to the
  caller.
- Any failure that happens after the connection starts might have read
  from the stream. After that, doing the retry is the responsibility of
  the caller.
- Removed complex stream tracking logic in favor of a simplified
  `_has_stream_uploads` attribute.
- Introduced exceptions `PdfRestConnectTimeoutError` and
  `PdfRestPoolTimeoutError` for granularity in timeout scenarios,
  including new cases for retry logic.
- Updated `translate_httpx_error` method for finer mapping between
  `httpx` exceptions and retry behavior.
- Enhanced `_contains_open_stream` method for better stream detection in
  nested structures.
- Updated tests for refined handling of retries and no-retry cases,
  including transport errors, timeouts, and server errors.

Assisted-by: Codex
@datalogics-kam datalogics-kam marked this pull request as ready for review December 1, 2025 21:54
- Introduced `send_request_once` and `run_with_retry` for consistent
  request execution and retry handling.
- Implemented retries for `create_from_paths` in both sync and async
  clients, covering 429, transport, and timeout errors.
- Added concurrency control using semaphores for async file info fetches
  after uploads.
- Updated tests to validate retry behavior, including various error
  scenarios and proper retry counts.

Assisted-by: Codex
@datalogics-kam
Copy link
Copy Markdown
Contributor Author

I took another look at this and:

  • Removed file rewinding. Uploads with streams are now only retried if they are retrying the inability to connect. That leaves the stream management up to the caller, which will know if the stream can be rewound, a file reopened, or if a retry isn't really possible with that kind of stream.
  • create_from_paths has full retry, because it can reopen the files.

@datalogics-cgreen datalogics-cgreen merged commit 185227d into pdfrest:main Dec 3, 2025
8 checks passed
@datalogics-kam datalogics-kam deleted the pdfcloud-5382-retries branch December 3, 2025 22:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants