Summary
Codex hooks currently cover prompt/tool/permission/session/stop lifecycle events, but there is no hook for model/provider request failures such as 429, 503, timeout, EOF, connection reset, or stream disconnect.
This makes it hard to build reliable recovery, notification, backoff, or automation workflows around long-running Codex tasks, because Stop is not guaranteed to fire when the turn fails before normal completion.
Proposed event
Add a hook event such as ProviderError or TurnError.
It should fire when a model/provider request fails after Codex has enough context to identify the current thread/session/turn, including cases like:
- HTTP 429 / rate limit
- HTTP 5xx / provider unavailable
- timeout
- unexpected EOF
- connection reset
- stream disconnected before completion
Suggested payload
{
"hook_event_name": "ProviderError",
"session_id": "...",
"thread_id": "...",
"turn_id": "...",
"cwd": "...",
"model": "...",
"provider": "...",
"status_code": 429,
"error_kind": "rate_limit",
"retry_after_ms": 60000,
"message": "redacted/safe error summary",
"will_retry": false,
"attempt": 1
}
Why this matters
External workflow layers can then implement safe behavior without scraping transcripts:
- notify the user or monitoring system
- schedule backoff/retry
- mark a long-running automation as provider-interrupted
- safely resume the same session later
- distinguish provider failure from user cancellation, task completion, or deterministic code/test failure
Important boundary
This hook should not expose raw credentials, full request bodies, or private provider payloads. A redacted error summary plus status/error kind is enough.
I am not asking for this to replace Codex's internal retry behavior. The goal is to expose a reliable lifecycle surface for integrations that need to react when a turn ends because the provider request failed before normal Stop handling.
Summary
Codex hooks currently cover prompt/tool/permission/session/stop lifecycle events, but there is no hook for model/provider request failures such as 429, 503, timeout, EOF, connection reset, or stream disconnect.
This makes it hard to build reliable recovery, notification, backoff, or automation workflows around long-running Codex tasks, because
Stopis not guaranteed to fire when the turn fails before normal completion.Proposed event
Add a hook event such as
ProviderErrororTurnError.It should fire when a model/provider request fails after Codex has enough context to identify the current thread/session/turn, including cases like:
Suggested payload
{ "hook_event_name": "ProviderError", "session_id": "...", "thread_id": "...", "turn_id": "...", "cwd": "...", "model": "...", "provider": "...", "status_code": 429, "error_kind": "rate_limit", "retry_after_ms": 60000, "message": "redacted/safe error summary", "will_retry": false, "attempt": 1 }Why this matters
External workflow layers can then implement safe behavior without scraping transcripts:
Important boundary
This hook should not expose raw credentials, full request bodies, or private provider payloads. A redacted error summary plus status/error kind is enough.
I am not asking for this to replace Codex's internal retry behavior. The goal is to expose a reliable lifecycle surface for integrations that need to react when a turn ends because the provider request failed before normal
Stophandling.