fix(proxy): derive pool cooldown from the 429 reset window (headers + body)#50
Merged
Conversation
…no header is present
…ields, IETF/Discord/Anthropic/OpenAI)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
v0.19.0's B1 derives a pool member's cooldown from the upstream 429, but only from headers (
Retry-After,x-ratelimit-reset*). Verified live: the OpenAI Codex usage-limit 429 carries no such header. The reset window is in the JSON body:With no header, B1 fell back to the 60s default, so an exhausted member uncooled after 60s, the recovery monitor fired "recovered", the agent re-probed, 429 again. The pool cycled exhausted to recovered every ~60 to 75s and the operator notices kept coming (now as exhausted/recovered pairs instead of the old failover flap). The A1/A2 flap fix held, but the practical churn did not stop because the real window was never read.
What this does
Derive the cooldown from the 429 reset window across the conventions that AI providers and general rate limiters actually use, keeping the existing header behavior first.
Headers (tried in this order, Retry-After takes precedence per the IETF RateLimit draft):
Retry-After(delta-seconds or HTTP-date)RateLimit-Reset(IETF, delta)X-RateLimit-Reset(GitHub/Twitter epoch, others delta, disambiguated by magnitude)X-RateLimit-Reset-After(Discord, delta)x-ratelimit-reset-requests/x-ratelimit-reset-tokens(unit-suffixed)anthropic-ratelimit-requests-reset/anthropic-ratelimit-tokens-reset(ISO-8601 timestamp)Body (when no usable header), top level and nested under an
errorobject:resets_in_seconds,resets_at(OpenAI Codex)retry_after(Discord and others),reset_afterEach value is guarded against NaN, Inf, negative, and overflow, and disambiguated epoch vs delta by magnitude. Header windows clamp to
MaxCooldown(6h); body windows clamp toMaxUsageLimitCooldown(24h, since a usage-limit reset can be hours or days). Bodies over 64 KiB or non-JSON are skipped.With this, the Codex
teamaccount (resets in ~22 min) governs pool recovery instead of a 60s re-probe, so the pool stays quietly exhausted until a member's real window lapses, then serves it. That also matches the earlier observation that the agent recovered on its own once an account's window reset.Testing
go test ./...passes (2888),-raceon the proxy and vault packages, gofumpt,go vet,golangci-lintclean,make generateno drift. New cases cover each header form (incl. epoch vs delta, ISO-8601, Retry-After precedence) and each body field (top-level and nested, float values floored, oversized/non-JSON ignored).