Skip to content

fix(proxy): derive pool cooldown from the 429 reset window (headers + body)#50

Merged
nnemirovsky merged 2 commits into
mainfrom
pool-cooldown-from-429-body
May 23, 2026
Merged

fix(proxy): derive pool cooldown from the 429 reset window (headers + body)#50
nnemirovsky merged 2 commits into
mainfrom
pool-cooldown-from-429-body

Conversation

@nnemirovsky
Copy link
Copy Markdown
Owner

Problem

v0.19.0's B1 derives a pool member's cooldown from the upstream 429, but only from headers (Retry-After, x-ratelimit-reset*). Verified live: the OpenAI Codex usage-limit 429 carries no such header. The reset window is in the JSON body:

{"type":"usage_limit_reached","plan_type":"team","resets_at":1779508759,"resets_in_seconds":1357}

With no header, B1 fell back to the 60s default, so an exhausted member uncooled after 60s, the recovery monitor fired "recovered", the agent re-probed, 429 again. The pool cycled exhausted to recovered every ~60 to 75s and the operator notices kept coming (now as exhausted/recovered pairs instead of the old failover flap). The A1/A2 flap fix held, but the practical churn did not stop because the real window was never read.

What this does

Derive the cooldown from the 429 reset window across the conventions that AI providers and general rate limiters actually use, keeping the existing header behavior first.

Headers (tried in this order, Retry-After takes precedence per the IETF RateLimit draft):

  • Retry-After (delta-seconds or HTTP-date)
  • RateLimit-Reset (IETF, delta)
  • X-RateLimit-Reset (GitHub/Twitter epoch, others delta, disambiguated by magnitude)
  • X-RateLimit-Reset-After (Discord, delta)
  • OpenAI x-ratelimit-reset-requests / x-ratelimit-reset-tokens (unit-suffixed)
  • Anthropic anthropic-ratelimit-requests-reset / anthropic-ratelimit-tokens-reset (ISO-8601 timestamp)

Body (when no usable header), top level and nested under an error object:

  • resets_in_seconds, resets_at (OpenAI Codex)
  • retry_after (Discord and others), reset_after

Each value is guarded against NaN, Inf, negative, and overflow, and disambiguated epoch vs delta by magnitude. Header windows clamp to MaxCooldown (6h); body windows clamp to MaxUsageLimitCooldown (24h, since a usage-limit reset can be hours or days). Bodies over 64 KiB or non-JSON are skipped.

With this, the Codex team account (resets in ~22 min) governs pool recovery instead of a 60s re-probe, so the pool stays quietly exhausted until a member's real window lapses, then serves it. That also matches the earlier observation that the agent recovered on its own once an account's window reset.

Testing

go test ./... passes (2888), -race on the proxy and vault packages, gofumpt, go vet, golangci-lint clean, make generate no drift. New cases cover each header form (incl. epoch vs delta, ISO-8601, Retry-After precedence) and each body field (top-level and nested, float values floored, oversized/non-JSON ignored).

@nnemirovsky nnemirovsky merged commit 7b66bdf into main May 23, 2026
6 checks passed
@nnemirovsky nnemirovsky deleted the pool-cooldown-from-429-body branch May 23, 2026 04:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant