Skip to content

Lack of built-in retry mechanism for handling transient network errors and rate limits #20

@Deadpool2000

Description

@Deadpool2000

Currently, the OpenAPI Python SDK is designed with low-level HTTP primitives. If an API request fails due to a transient error, the client raises an error or fails immediately:

  1. Transient Network Errors: Minor network drops, socket timeouts, or DNS hiccups fail immediately.
  2. Temporary Server Overloads: Gateway errors (HTTP 502 Bad Gateway, 503 Service Unavailable, 504 Gateway Timeout) trigger immediate crashes.
  3. Rate Limits (HTTP 429): If the client hits API rate limits, it fails instantly rather than waiting for the reset window.

To handle these scenarios, developers using the SDK must write custom retry loops and catch blocks in their application code, leading to unnecessary boilerplate.


Proposed Feature

Implement an optional, built-in retry mechanism for both synchronous and asynchronous clients (Client, AsyncClient, OauthClient, and AsyncOauthClient).

This retry mechanism should feature:

  1. Exponential Backoff: Gradually increase wait times between retries (e.g. wait 1s, then 2s, then 4s) to give the server time to recover.
  2. Jitter (Randomized Noise): Add random variation to the sleep durations to prevent multiple clients from hitting the server at the exact same millisecond.
  3. Smart Rate-Limit Handling: Automatically detect HTTP 429 status codes, parse the Retry-After header if available, and sleep for the requested duration before retrying.
  4. Customizable Configuration: Allow developers to configure the retries during client initialization, with safe defaults.

Proposed API Design

Add optional retry parameters to the client constructors:

client = Client(
    token="your_token",
    max_retries=3,          # Number of retry attempts (0 to disable)
    backoff_factor=1.0,     # Multiplier for exponential sleep delay
    retry_on_status=[429, 502, 503, 504]  # Status codes that trigger a retry
)

To ensure 100% backward compatibility, max_retries should default to 0 (disabled), ensuring no existing production code or tests are affected unless the developer explicitly opts in.


Benefits

  • Resilience: SDK handles transient internet drops and brief server downtime gracefully.
  • Ease of Use: Reduces boilerplate code in the consumer's application.
  • Backward-Compatible: Defaulting max_retries to 0 guarantees existing behavior is preserved.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions