Skip to content

Feature request: add a provider/API error hook for rate limits, 5xx, timeouts, and stream disconnects #22774

@alario-tang

Description

@alario-tang

Summary

Codex hooks currently cover prompt/tool/permission/session/stop lifecycle events, but there is no hook for model/provider request failures such as 429, 503, timeout, EOF, connection reset, or stream disconnect.

This makes it hard to build reliable recovery, notification, backoff, or automation workflows around long-running Codex tasks, because Stop is not guaranteed to fire when the turn fails before normal completion.

Proposed event

Add a hook event such as ProviderError or TurnError.

It should fire when a model/provider request fails after Codex has enough context to identify the current thread/session/turn, including cases like:

  • HTTP 429 / rate limit
  • HTTP 5xx / provider unavailable
  • timeout
  • unexpected EOF
  • connection reset
  • stream disconnected before completion

Suggested payload

{
  "hook_event_name": "ProviderError",
  "session_id": "...",
  "thread_id": "...",
  "turn_id": "...",
  "cwd": "...",
  "model": "...",
  "provider": "...",
  "status_code": 429,
  "error_kind": "rate_limit",
  "retry_after_ms": 60000,
  "message": "redacted/safe error summary",
  "will_retry": false,
  "attempt": 1
}

Why this matters

External workflow layers can then implement safe behavior without scraping transcripts:

  • notify the user or monitoring system
  • schedule backoff/retry
  • mark a long-running automation as provider-interrupted
  • safely resume the same session later
  • distinguish provider failure from user cancellation, task completion, or deterministic code/test failure

Important boundary

This hook should not expose raw credentials, full request bodies, or private provider payloads. A redacted error summary plus status/error kind is enough.

I am not asking for this to replace Codex's internal retry behavior. The goal is to expose a reliable lifecycle surface for integrations that need to react when a turn ends because the provider request failed before normal Stop handling.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesthooksIssues related to event hooks

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions