fix: probe stale rate-limit cooldown primaries#87833
Conversation
|
Codex review: needs maintainer review before merge. Reviewed May 28, 2026, 8:52 PM ET / 00:52 UTC. Summary PR surface: Source +38, Tests +10. Total +48 across 3 files. Reproducibility: yes. from source inspection and PR proof: current main skips a cooldowned primary until near expiry, while the PR body reports a live Gateway/Ollama run where a seeded generic rate_limit cooldown was probed and cleared. I did not execute the repro in this read-only review. Review metrics: 1 noteworthy metric.
Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Risk before merge
Maintainer options:
Next step before merge
Security Review detailsBest possible solution: Land the focused fallback-policy fix after a maintainer accepts the bounded primary-probe tradeoff and required checks pass, keeping the provider reset-window guard intact. Do we have a high-confidence way to reproduce the issue? Yes from source inspection and PR proof: current main skips a cooldowned primary until near expiry, while the PR body reports a live Gateway/Ollama run where a seeded generic rate_limit cooldown was probed and cleared. I did not execute the repro in this read-only review. Is this the best way to solve the issue? Yes, the implementation is a narrow fallback-decision change with focused tests for stale generic rate_limit cooldowns and provider-recorded reset windows. The remaining question is whether maintainers accept the fallback-policy tradeoff before merge. AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against 61cf005437fd. Label changesLabel changes:
Label justifications:
Evidence reviewedPR surface: Source +38, Tests +10. Total +48 across 3 files. View PR surface stats
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
Summary
rate_limitcooldowns on primary model candidates even when fallbacks are configured.blockedUntilwith provider block metadata) so known future quota windows still keep the fallback path preferred until near expiry.Fixes #87608
Verification
.agents/skills/autoreview/scripts/autoreview --mode localnode scripts/run-vitest.mjs src/agents/model-fallback.probe.test.ts src/agents/model-fallback.test.ts -- --reporter=verboseawscbx_33d131797efcrun_4f42fc0b825aReal behavior proof
Behavior addressed: A primary model with a stale generic
rate_limitcooldown could remain bypassed behind configured fallbacks even after the provider recovered.Real environment tested: AWS Crabbox Linux runner using live Ollama Cloud credentials, Gateway, and a real
ollama-cloud/gemma3:4bmodel request.Exact steps or command run after this patch: seeded an agent auth profile with a 30-minute generic
cooldownUntil/cooldownReason: rate_limit, configured primaryollama-cloud/gemma3:4bplus a synthetic missing fallback, started Gateway, and sent a raw Gateway agent model-run request through Ollama Cloud.Evidence after fix: Crabbox run
run_4f42fc0b825aon AWS leasecbx_33d131797efcloggeddecision=probe_cooldown_candidateforollama-cloud/gemma3:4b, thendecision=candidate_succeededfor the same primary model.Observed result after fix: The primary Ollama model was probed and succeeded; the seeded cooldown state was cleared (
hasCooldownUntil: false,errorCount: 0,failureCounts: null), and the script ended withlive_regression_repro_passed=true.What was not tested: Full repository test suite; this change was covered with focused model fallback tests plus the live Gateway/Ollama regression repro.