fix(routing): hybrid selector returns null when no accounts are available#397
fix(routing): hybrid selector returns null when no accounts are available#397
Conversation
…able selectHybridAccount() previously returned the least-recently-used account as a 'fallback' when available.length === 0, even though that account was explicitly unavailable (cooling down, rate-limited, circuit-open, or otherwise blocked). The fetch loop in index.ts trusted the result without re-validating, which caused avoidable retries through known-unavailable candidates, noisy cross-account 5xx bursts, and misleading failover behavior in pool-wide stall situations. The new contract: return null when no account is currently available. The caller is responsible for surfacing the pool-wide unavailable condition (burst cooldown, fast-fail, or user-facing error) instead of churning through blocked candidates. Existing tests that documented the old LRU fallback behavior are updated to match the new contract with clear comments referencing AUDIT-H2. Added a regression test for the single-unavailable-account case. Closes AUDIT-H2 / D-01 (hybrid selector returns blocked accounts, identified in master repository audit). Evidence: 225/225 test files, 3419/3419 tests pass. typecheck + lint exit 0. No new dependencies, no new public API.
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 14 minutes and 57 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (4)
✨ Finishing Touches🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary
Changes the
selectHybridAccount()contract: when every account is unavailable (cooling down, rate-limited, circuit-open, or otherwise blocked), the selector now returnsnullinstead of returning the least-recently-used unavailable account as a "fallback".Problem
The fetch loop in
index.tstrusted the selector's result without re-validating account availability. The old LRU "fallback" handed back a known-unavailable account, which caused:Audit (
docs/audits/MASTER_AUDIT.md§5 HIGH /dim-D-routing.mdD-01) flagged this as a HIGH finding; Oracle confirmed the severity and recommended: "Change the hybrid selector contract to return null when no account is available, or re-runisAccountAvailableForFamily()after hybrid selection before using the account."Change
lib/rotation.tsselectHybridAccount():The caller is now responsible for surfacing the pool-wide unavailable condition (burst cooldown, fast-fail, or user-facing error) instead of churning through blocked candidates.
Test updates
Three existing tests documented the old LRU fallback behavior. All three are updated to assert the new
nullcontract, with comments referencing AUDIT-H2 so future readers understand why:test/rotation.test.ts—returns least-recently-used account when all accounts unavailable (fallback)→returns null when all accounts are unavailable (AUDIT-H2 contract). Added a regression for the single-unavailable-account case.test/accounts.test.ts—falls back to least-recently-used when all accounts are rate-limited→returns null when all accounts are rate-limited (AUDIT-H2 contract)test/property/rotation.property.test.ts—returns least recently used when all unavailable→returns null when all accounts are unavailable (AUDIT-H2 contract)Verification
npm run typecheckexit 0npm run lintexit 0Audit reference
docs/audits/MASTER_AUDIT.md§5 HIGHAUDIT-H2docs/audits/evidence/dim-D-routing.mdD-01docs/audits/evidence/oracle-verdicts.md§1.1 (confirmed HIGH), §2 Rank 3 R4 (this bug is one of three that R4 routing mutex closes holistically; this PR is the tactical point-fix)Caller expectations
The caller in
index.tsaround line 1149-1161 already has a null-guard (if (!account || attempted.has(account.index)) { break; }), so the new null return is handled gracefully: the fetch loop breaks out and surfaces the pool-wide unavailable condition through existing burst-cooldown / pool-exhaustion pathways. No additional caller changes needed in this PR.Scope guarantees
Note on diff size
The commit stat shows a larger insertion/deletion count than the minimal behavioral change because the repository's
lint-stagedpre-commit hook (biome/prettier) applied trailing-comma and wrapping normalizations to the files touched. The meaningful functional change is:lib/rotation.ts: ~15 lines replaced (the all-unavailable branch)Use GitHub's "Hide whitespace" diff option or
git diff -wfor a clearer read.Follow-up
Phase 1 continues with PR-E (short-429 race fix — AUDIT-H3) and PR-F (active-pointer normalization — AUDIT-H10). Tracked in
.sisyphus/plans/phase1-implementation.md.note: greptile review for oc-chatgpt-multi-auth. cite files like
lib/foo.ts:123. confirm regression tests + windows concurrency/token redaction coverage.Greptile Summary
fixes the
selectHybridAccountcontract: when every account is unavailable the selector now returnsnullinstead of the LRU fallback candidate. the caller inindex.tsalready has a null-guard at line 1159 (if (!account || attempted.has(account.index)) { break; }) so no additional caller changes are needed. three test files are updated and one regression test is added; all 3419 tests pass.Confidence Score: 5/5
safe to merge — only remaining findings are P2 doc/test-description cleanups.
the behavioral change is minimal and correct, the null-guard in
index.tsalready handles the new return value, full suite passes, and no concurrency or windows filesystem issues are introduced. both open findings are P2 style/docs only.lib/rotation.ts — stale JSDoc @param/@returns still describes old LRU fallback behavior.
Important Files Changed
available.length === 0now returns null instead of LRU fallback. Logic and inline comment (AUDIT-H2/D-01) are correct. JSDoc @param/@returns still describe the old behavior and need updating.getCurrentOrNextForFamilyHybrid; contract-change comment referencing AUDIT-H2/D-01 added. No issues found.Sequence Diagram
sequenceDiagram participant FL as fetch loop (index.ts) participant AM as AccountManager participant SH as selectHybridAccount FL->>AM: getCurrentOrNextForFamilyHybrid(family) AM->>SH: selectHybridAccount(accounts, healthTracker, tokenTracker) alt available.length > 0 SH-->>AM: AccountWithMetrics (best scored) AM-->>FL: account FL->>FL: proceed with request else available.length === 0 (all blocked) — new contract SH-->>AM: null AM-->>FL: null FL->>FL: break loop → surface pool-wide unavailable condition endPrompt To Fix All With AI
Reviews (1): Last reviewed commit: "fix(routing): hybrid selector returns nu..." | Re-trigger Greptile