-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Fix: Prevent retry of termination signals (Celery SoftTimeLimitExceeded) #2739
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix: Prevent retry of termination signals (Celery SoftTimeLimitExceeded) #2739
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| # asyncio cancellation | ||
| if exc_module == "asyncio" and exc_name == "CancelledError": | ||
| return True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure asyncio cancellations bypass retry loop
_should_not_retry checks for CancelledError by requiring exc.__class__.__module__ == "asyncio", but asyncio.CancelledError is defined in asyncio.exceptions across supported Python versions. The predicate never matches, so cancelled tasks are still retried and eventually surface as APIConnectionError instead of propagating the cancellation. This defeats the intended behavior for async termination signals.
Useful? React with 👍 / 👎.
|
Thanks for the review! You're absolutely right - I've updated the check to use Changes in bd96a86:
|
When using the OpenAI client in Celery tasks with soft time limits, the client's broad exception handling was catching SoftTimeLimitExceeded and treating it as a retryable connection error. This prevented Celery tasks from properly handling timeouts and running cleanup logic. This change adds a check to identify termination signals (like Celery's SoftTimeLimitExceeded or asyncio's CancelledError) and re-raises them immediately without retry. This allows task executors to properly handle these signals. Changes: - Added _should_not_retry() helper to identify termination signals - Modified sync and async exception handlers to check before retrying - Added test to verify termination signals are not retried Fixes openai#2737
- Fix module check to use startswith("asyncio") instead of == "asyncio"
to properly match asyncio.exceptions.CancelledError
- Add test case for asyncio.CancelledError to verify it bypasses retry loop
Addresses review feedback from chatgpt-codex-connector bot
bd96a86 to
20e8fd0
Compare
|
Rebased on latest main to bring branch up-to-date (was 10 commits behind). |
Changes being requested
This PR fixes issue #2737 where the OpenAI client's broad exception handling was catching Celery's
SoftTimeLimitExceededexception and retrying the request instead of allowing it to propagate for graceful task shutdown.Changes:
_should_not_retry()helper function to identify termination signals that should propagate immediatelyAffected exception types:
SoftTimeLimitExceeded,TimeLimitExceeded,TerminatedCancelledErrorThe fix preserves all existing retry logic while ensuring task executor termination signals are properly handled.
Additional context & links
Fixes #2737
Background:
When using the OpenAI client in Celery tasks with soft time limits, if the task exceeds its limit during an API call, Celery raises
SoftTimeLimitExceeded. The client'sexcept Exceptionclause was catching this and treating it as a retryable connection error, preventing cleanup logic from running.Solution approach:
Instead of narrowing the exception catch (which could miss legitimate connection errors), we check if caught exceptions are termination signals and re-raise them immediately without retry. This approach: