Skip to content

fix(routing): hybrid selector returns null when no accounts are available#397

Merged
ndycode merged 1 commit intomainfrom
fix/hybrid-selector-null-contract
Apr 17, 2026
Merged

fix(routing): hybrid selector returns null when no accounts are available#397
ndycode merged 1 commit intomainfrom
fix/hybrid-selector-null-contract

Conversation

@ndycode
Copy link
Copy Markdown
Owner

@ndycode ndycode commented Apr 17, 2026

Summary

Changes the selectHybridAccount() contract: when every account is unavailable (cooling down, rate-limited, circuit-open, or otherwise blocked), the selector now returns null instead of returning the least-recently-used unavailable account as a "fallback".

Problem

The fetch loop in index.ts trusted the selector's result without re-validating account availability. The old LRU "fallback" handed back a known-unavailable account, which caused:

  • Avoidable retries through blocked candidates
  • Noisy cross-account 5xx bursts when the pool was already stalling
  • Misleading failover behavior in pool-wide-unavailable situations
  • Confusing user output ("using account 3" for a candidate that was just cooled down)

Audit (docs/audits/MASTER_AUDIT.md §5 HIGH / dim-D-routing.md D-01) flagged this as a HIGH finding; Oracle confirmed the severity and recommended: "Change the hybrid selector contract to return null when no account is available, or re-run isAccountAvailableForFamily() after hybrid selection before using the account."

Change

lib/rotation.ts selectHybridAccount():

- if (available.length === 0) {
-   if (resolvedAccounts.length === 0) return null;
-   let leastRecentlyUsed: AccountWithMetrics | null = null;
-   let oldestTime = Infinity;
-   for (const account of resolvedAccounts) {
-     if (account.lastUsed < oldestTime) {
-       oldestTime = account.lastUsed;
-       leastRecentlyUsed = account;
-     }
-   }
-   return leastRecentlyUsed;
- }
+ if (available.length === 0) {
+   return null;
+ }

The caller is now responsible for surfacing the pool-wide unavailable condition (burst cooldown, fast-fail, or user-facing error) instead of churning through blocked candidates.

Test updates

Three existing tests documented the old LRU fallback behavior. All three are updated to assert the new null contract, with comments referencing AUDIT-H2 so future readers understand why:

  • test/rotation.test.tsreturns least-recently-used account when all accounts unavailable (fallback)returns null when all accounts are unavailable (AUDIT-H2 contract). Added a regression for the single-unavailable-account case.
  • test/accounts.test.tsfalls back to least-recently-used when all accounts are rate-limitedreturns null when all accounts are rate-limited (AUDIT-H2 contract)
  • test/property/rotation.property.test.tsreturns least recently used when all unavailablereturns null when all accounts are unavailable (AUDIT-H2 contract)

Verification

  • Full suite: 225/225 files, 3419/3419 tests pass (+1 new regression test)
  • npm run typecheck exit 0
  • npm run lint exit 0
  • No new dependencies
  • No new public API

Audit reference

  • docs/audits/MASTER_AUDIT.md §5 HIGH AUDIT-H2
  • docs/audits/evidence/dim-D-routing.md D-01
  • docs/audits/evidence/oracle-verdicts.md §1.1 (confirmed HIGH), §2 Rank 3 R4 (this bug is one of three that R4 routing mutex closes holistically; this PR is the tactical point-fix)

Caller expectations

The caller in index.ts around line 1149-1161 already has a null-guard (if (!account || attempted.has(account.index)) { break; }), so the new null return is handled gracefully: the fetch loop breaks out and surfaces the pool-wide unavailable condition through existing burst-cooldown / pool-exhaustion pathways. No additional caller changes needed in this PR.

Scope guarantees

  • ✅ One concern — selector null contract
  • ✅ No refactor beyond the affected branch
  • ✅ All three tests updated with AUDIT-ID references
  • ✅ New regression test for the single-unavailable case
  • ✅ Full test suite green

Note on diff size

The commit stat shows a larger insertion/deletion count than the minimal behavioral change because the repository's lint-staged pre-commit hook (biome/prettier) applied trailing-comma and wrapping normalizations to the files touched. The meaningful functional change is:

  • lib/rotation.ts: ~15 lines replaced (the all-unavailable branch)
  • Three test files: ~5-10 lines each for the contract update + one new test

Use GitHub's "Hide whitespace" diff option or git diff -w for a clearer read.

Follow-up

Phase 1 continues with PR-E (short-429 race fix — AUDIT-H3) and PR-F (active-pointer normalization — AUDIT-H10). Tracked in .sisyphus/plans/phase1-implementation.md.

note: greptile review for oc-chatgpt-multi-auth. cite files like lib/foo.ts:123. confirm regression tests + windows concurrency/token redaction coverage.

Greptile Summary

fixes the selectHybridAccount contract: when every account is unavailable the selector now returns null instead of the LRU fallback candidate. the caller in index.ts already has a null-guard at line 1159 (if (!account || attempted.has(account.index)) { break; }) so no additional caller changes are needed. three test files are updated and one regression test is added; all 3419 tests pass.

Confidence Score: 5/5

safe to merge — only remaining findings are P2 doc/test-description cleanups.

the behavioral change is minimal and correct, the null-guard in index.ts already handles the new return value, full suite passes, and no concurrency or windows filesystem issues are introduced. both open findings are P2 style/docs only.

lib/rotation.ts — stale JSDoc @param/@returns still describes old LRU fallback behavior.

Important Files Changed

Filename Overview
lib/rotation.ts Core contract change: available.length === 0 now returns null instead of LRU fallback. Logic and inline comment (AUDIT-H2/D-01) are correct. JSDoc @param/@returns still describe the old behavior and need updating.
test/rotation.test.ts Three old LRU-fallback tests updated to assert null; new regression test added for single-unavailable-account case. AUDIT-H2 comments wired. Coverage looks complete for the changed branch.
test/accounts.test.ts Rate-limited test updated to assert null via getCurrentOrNextForFamilyHybrid; contract-change comment referencing AUDIT-H2/D-01 added. No issues found.
test/property/rotation.property.test.ts Property test contract test updated to assert null for all-unavailable case; however the existing "returns null only when accounts array is empty" description is now incorrect after the contract change.

Sequence Diagram

sequenceDiagram
    participant FL as fetch loop (index.ts)
    participant AM as AccountManager
    participant SH as selectHybridAccount

    FL->>AM: getCurrentOrNextForFamilyHybrid(family)
    AM->>SH: selectHybridAccount(accounts, healthTracker, tokenTracker)

    alt available.length > 0
        SH-->>AM: AccountWithMetrics (best scored)
        AM-->>FL: account
        FL->>FL: proceed with request
    else available.length === 0 (all blocked) — new contract
        SH-->>AM: null
        AM-->>FL: null
        FL->>FL: break loop → surface pool-wide unavailable condition
    end
Loading

Fix All in Codex

Prompt To Fix All With AI
This is a comment left during a code review.
Path: lib/rotation.ts
Line: 362-368

Comment:
**Stale JSDoc contradicts the new null contract**

The `@param accounts` description on line 362 still says *"when none are available the least-recently-used account is returned"* — the old LRU fallback behavior this PR removes. The `@returns` on line 368 only mentions null *"if no accounts exist"*, missing the new case where null is returned because all accounts are unavailable.

```suggestion
 * @param accounts - Candidate accounts with availability (`isAvailable`) and last-used timestamp (`lastUsed`); when none are available `null` is returned (AUDIT-H2 contract — the caller is responsible for surfacing pool-wide unavailability).
 * @param healthTracker - Tracker used to obtain per-account health scores (scoped by `quotaKey` when provided).
 * @param tokenTracker - Tracker used to obtain per-account token counts (scoped by `quotaKey` when provided). Logged token values are rounded for telemetry and sensitive tokens are not emitted.
 * @param quotaKey - Optional quota key to scope health and token lookups.
 * @param config - Partial selection weights that override defaults (healthWeight, tokenWeight, freshnessWeight).
 * @param options - Selection options. `pidOffsetEnabled` adds a small PID-based deterministic offset to distribute selection across processes. `scoreBoostByAccount` is an optional per-account numeric boost keyed by account index.
 * @returns The chosen `AccountWithMetrics` for the next request, or `null` if the accounts array is empty or all candidates are currently unavailable.
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: test/property/rotation.property.test.ts
Line: 222-227

Comment:
**Test description now misleading after contract change**

The word "only" makes this description incorrect — under the new AUDIT-H2 contract null is also returned when all accounts are unavailable, which is asserted by the test added at line 271. A future reader tracing the null path will be misled.

```suggestion
	it("returns null when accounts array is empty", () => {
```

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "fix(routing): hybrid selector returns nu..." | Re-trigger Greptile

…able

selectHybridAccount() previously returned the least-recently-used account
as a 'fallback' when available.length === 0, even though that account was
explicitly unavailable (cooling down, rate-limited, circuit-open, or
otherwise blocked). The fetch loop in index.ts trusted the result without
re-validating, which caused avoidable retries through known-unavailable
candidates, noisy cross-account 5xx bursts, and misleading failover
behavior in pool-wide stall situations.

The new contract: return null when no account is currently available. The
caller is responsible for surfacing the pool-wide unavailable condition
(burst cooldown, fast-fail, or user-facing error) instead of churning
through blocked candidates.

Existing tests that documented the old LRU fallback behavior are updated
to match the new contract with clear comments referencing AUDIT-H2.
Added a regression test for the single-unavailable-account case.

Closes AUDIT-H2 / D-01 (hybrid selector returns blocked accounts,
identified in master repository audit).

Evidence: 225/225 test files, 3419/3419 tests pass. typecheck + lint
exit 0. No new dependencies, no new public API.
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 17, 2026

Warning

Rate limit exceeded

@ndycode has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 14 minutes and 57 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 14 minutes and 57 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 754dbce6-83b4-4a3c-87e2-d181aaeee264

📥 Commits

Reviewing files that changed from the base of the PR and between 1f6da97 and 5c953e8.

📒 Files selected for processing (4)
  • lib/rotation.ts
  • test/accounts.test.ts
  • test/property/rotation.property.test.ts
  • test/rotation.test.ts
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/hybrid-selector-null-contract
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch fix/hybrid-selector-null-contract

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ndycode ndycode merged commit bb97f7e into main Apr 17, 2026
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant