Python: Add auto_retry.py sample for rate limit handling#4223
Python: Add auto_retry.py sample for rate limit handling#4223eavanvalkenburg merged 5 commits intomainfrom
Conversation
Co-authored-by: eavanvalkenburg <13749212+eavanvalkenburg@users.noreply.github.com>
|
@copilot this is not what I asked. The first version, a class based one, should not be a wrapped, it should be a class decorator that can take any |
Co-authored-by: eavanvalkenburg <13749212+eavanvalkenburg@users.noreply.github.com>
Updated the sample so approach 1 is now a class decorator ( |
eavanvalkenburg
left a comment
There was a problem hiding this comment.
@copilot close, some minor comments
Co-authored-by: eavanvalkenburg <13749212+eavanvalkenburg@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds a new Python sample demonstrating practical retry patterns for rate-limited model calls (HTTP 429) using tenacity, aligned with the existing samples structure and intended to be runnable via PEP 723 tooling.
Changes:
- Added
auto_retry.pysample showing (1) an instance-based client wrapper aroundget_response()and (2) chat middleware-based retries (class + function styles). - Included PEP 723 inline script metadata to declare additional dependency on
tenacity.
Co-authored-by: eavanvalkenburg <13749212+eavanvalkenburg@users.noreply.github.com>
Updated approach 1 back to a class-decorator pattern ( |
Motivation and Context
Every model inference API is rate-limited. Without built-in retry support in the framework, every consumer must write similar boilerplate. This sample provides practical retry patterns using
tenacityso developers can focus on agent logic.Description
Adds
python/samples/02-agents/auto_retry.pydemonstrating two approaches to automatic retry onRateLimitError(HTTP 429), with updates from review feedback:Approach 1 – Class decorator
Uses a class decorator (
with_rate_limit_retry) that can be applied to anySupportsChatGetResponse-compatible client type and patchesget_response()with retry behavior for non-streaming requests.Streaming calls are passed through unchanged (with a note that streaming retry requires more delicate handling).
The retry flow in this path uses
AsyncRetryingto avoid constructing a new decorated callable on each request.Approach 2 – Chat middleware (two styles)
RateLimitRetryMiddleware(ChatMiddleware)wrapsawait call_next()inAsyncRetrying.@chat_middlewaredecorated function wrapscall_next()with a tenacity@retry-decorated inner async function.Also adds the required PEP 723 inline script metadata header at the top of the sample to declare extra dependencies, using:
agent-frameworktenacityValidation performed for this sample:
python -m ruff check samples/02-agents/auto_retry.pypython -m py_compile python/samples/02-agents/auto_retry.pyContribution Checklist
Original prompt
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.