release: rebuild open main PR wave with review fixes#387
Conversation
lib/runtime/metrics.ts is no longer imported anywhere in the runtime and its dedicated test only exercised that isolated dead code. Remove the unused module and orphan test to shrink maintenance surface without changing behavior.
The forecast and report commands duplicated the same refreshed-account persistence flow and forecast-result serialization logic. That increases maintenance cost and makes future fixes easy to miss in one path. Extract the shared pieces into a single helper module and keep both commands delegating to the same implementation.
parallel-probe used forced casts to distinguish overload inputs and to normalize probe failures. Replace those with explicit type guards and error normalization so the module keeps the same behavior without leaning on unsafe assertions.
Overlapping requests that share a prompt cache session can finish out of order. Older responses were allowed to overwrite newer session-affinity state, which lets stale previous_response_id and account affinity win. Version session-affinity writes per request so the newest in-flight request for a session keeps control of the stored response id and preferred account.
Quota cache saves and update-check cache writes could overlap within the same process. That risks write races, and the update checker also wrote directly to the final cache file instead of staging through a temp file. Serialize in-process writes for both caches and make update-check cache writes atomic via temp-file rename so concurrent runtime writers do not corrupt the on-disk state.
Live account sync reused the same controller once created, even when its effective debounce/poll configuration changed. That leaves stale runtime state in place and keeps watching with outdated timing parameters. Track a live-sync config key and recreate the controller when the configuration changes, matching the refresh-guardian pattern.
…ation When OpenAI returns 404 with usage_limit_reached body, fetch-helpers.ts remaps it to a 429 response. But the main pipeline was checking the original response.status (404) instead of errorResponse.status (429), so the rate-limit handling branch never ran. This caused exhausted accounts to appear healthy and sessions to be killed instead of rotating to available accounts. Change index.ts:1861 from: if (response.status === 429) to: if (errorResponse.status === 429 && rateLimit) The added rateLimit guard also prevents entitlement-style 429s (where rateLimit is explicitly set to undefined) from incorrectly marking accounts as rate-limited. Tests: 2 new tests covering both scenarios, full suite passes (3314 tests).
The 404 error remapper was converting ambiguous responses into 429s based on free-form body text and generic rate_limit_exceeded codes. That makes not-found errors look like cooldown signals and feeds weak rate-limit reasons downstream. Only remap structured 404 responses that carry explicit usage_limit codes, while preserving entitlement detection. This keeps quota-limit signals intact without manufacturing ambiguous 429s from plain text or generic rate-limit payloads.
The preemptive quota scheduler calculated deferrals from the nearest reset window. When a later primary or secondary reset was still active, that shortened the effective cooldown and let requests resume too early. Use the longest relevant reset window for 429 deferrals and for near-exhaustion quota deferrals so cooldowns stay aligned with the latest active quota window.
The 5xx handler rotates accounts and applies failure policy but does not inform the preemptiveQuotaScheduler about upstream capacity pressure. This means the scheduler keeps sending requests into an overloaded backend while the 429 handler already calls markRateLimited(). Add scheduler notification for overload-specific status codes (502 Bad Gateway, 503 Service Unavailable, 529 Overloaded) inside the 5xx handler, mirroring the 429 handler's awareness. Generic 500 errors are excluded since they do not indicate capacity pressure.
Accounts with tracked workspaces could remain eligible for family selection even after every workspace had been disabled. That leaves a workspace-exhausted account in the rotation pool and can block fallback selection unnecessarily. Treat accounts whose tracked workspaces are all disabled as unavailable for selection, while keeping legacy workspace-less accounts eligible.
When an upstream perpetually returns short Retry-After values (<= 5s), the short-retry path would loop indefinitely on the same account. Add MAX_SHORT_RETRY_ATTEMPTS (3) so that after 3 consecutive short-cooldown 429s the request falls through to the existing long-cooldown rotation path, which marks the account rate-limited and rotates to the next one. Changes: - lib/request/rate-limit-backoff.ts: add MAX_SHORT_RETRY_ATTEMPTS = 3 constant - index.ts: import MAX_SHORT_RETRY_ATTEMPTS and add attempt < MAX_SHORT_RETRY_ATTEMPTS guard to short-retry condition - test/index.test.ts: add 2 test cases covering the bound (at-limit rotates, below-limit retries)
The runtime 429 backoff was purely exponential, so multiple clients or parallel workers could retry in lockstep and create avoidable bursts. Add bounded jitter when a new backoff window is created, while keeping duplicate 429s within the dedup window stable so the same retry window is reused instead of re-randomized.
Several 429 retry and backoff thresholds were still hardcoded in the rate-limit backoff module, unlike other runtime retry controls that already respect plugin config and environment overrides. Expose those constants through plugin config/env getters and apply them at runtime so operators can tune dedup windows, reset windows, max backoff, and short-retry behavior without patching code.
The update checker silently swallowed invalid package metadata and cached JSON parse failures. Keep the same fallback behavior, but log those cases at debug level so malformed local state is observable during diagnosis.
Persisted account rate-limit state and persisted quota-cache views were tracked separately, so the CLI menu could lose rate-limit context between live probes. Merge stored account rate-limit resets into the display-side quota view so persisted 429 pressure stays visible even when the cache is otherwise empty.
There was a problem hiding this comment.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
lib/proactive-refresh.ts (1)
95-137:⚠️ Potential issue | 🟠 Majorstop logging raw account emails from the background refresher.
lib/proactive-refresh.ts:111-136emitsaccount.emailon refresh start, success, and failure. these logs are persistent operational output for a background task, so this adds avoidable pii retention; logaccount.indexonly, or at leastsanitizeEmail(account.email).As per coding guidelines, `lib/**`: check for logging that leaks tokens or emails.suggested fix
const timeUntilExpiry = getTimeUntilExpiry(account); + const safeEmail = sanitizeEmail(account.email); log.info("Proactively refreshing token", { accountIndex: account.index, - email: account.email, + email: safeEmail, expiresInMs: timeUntilExpiry, expiresInMinutes: Math.round(timeUntilExpiry / 60000), }); @@ if (result.type === "success") { log.info("Proactive refresh succeeded", { accountIndex: account.index, - email: account.email, + email: safeEmail, }); @@ log.warn("Proactive refresh failed", { accountIndex: account.index, - email: account.email, + email: safeEmail, failureReason: result.reason, });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/proactive-refresh.ts` around lines 95 - 137, In proactiveRefreshAccount, remove raw account.email from persistent logs (the log.info at the start, the log.info on success, and the log.warn on failure) and instead log only account.index or a sanitized email; update the three logging calls inside proactiveRefreshAccount to either omit the email field or replace it with sanitizeEmail(account.email) if that helper exists/import it (or add one) so no raw PII is written to logs while retaining accountIndex for observability.lib/codex-manager/commands/report.ts (1)
354-365:⚠️ Potential issue | 🟠 Majoremail/accountid-only refresh updates never persist here.
lib/codex-manager/commands/report.ts:375comparespreviousEmailtoaccount.emailandpreviousAccountIdtoaccount.accountIdbeforeapplyRefreshedAccountPatch()runs. those fields are still the old values, so a refresh that only updates email or account id skips both persistence and the in-memory patch. compare againstrefreshPatch.email/refreshPatch.accountIdinstead, and add a vitest intest/codex-manager-report-command.test.tsfor unchanged tokens with a new account id.As per coding guidelines, `lib/**`: focus on auth rotation, windows filesystem IO, and concurrency.suggested fix
- if ( - previousRefreshToken !== refreshPatch.refreshToken || - previousAccessToken !== refreshPatch.accessToken || - previousExpiresAt !== refreshPatch.expiresAt || - previousEmail !== account.email || - previousAccountId !== account.accountId - ) { + if ( + previousRefreshToken !== refreshPatch.refreshToken || + previousAccessToken !== refreshPatch.accessToken || + previousExpiresAt !== refreshPatch.expiresAt || + (refreshPatch.email !== undefined && + previousEmail !== refreshPatch.email) || + (refreshPatch.accountId !== undefined && + previousAccountId !== refreshPatch.accountId) + ) {Also applies to: 394-398
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/codex-manager/commands/report.ts` around lines 354 - 365, The comparisons that gate persistence use the old in-memory account values (previousEmail vs account.email and previousAccountId vs account.accountId) so email- or accountId-only refreshes never persist; change the checks around where refreshPatch is built to compare previousEmail to refreshPatch.email and previousAccountId to refreshPatch.accountId (also update the similar checks at the other occurrence around lines ~394-398) before calling applyRefreshedAccountPatch(), ensuring you still set refreshPatch fields as currently done; then add a vitest in test/codex-manager-report-command.test.ts that simulates an unchanged token but a new accountId to assert the patch persists.index.ts (1)
2056-2076:⚠️ Potential issue | 🟠 Majorstream-failover 429s still bypass the new identity-based backoff key.
the primary 429 path at
index.ts:2060passes a stable account identity intogetRateLimitBackoff(), but the stream-failover 429 branch atindex.ts:2397still omits that fourth argument. failover probes will maintain a separate backoff bucket and can re-hit 429s the primary path would have deduped. passfallbackAccountIdor the resolved email there too, and cover it intest/index-retry.test.ts.suggested fix
- const { delayMs } = getRateLimitBackoff( - fallbackAccount.index, - fallbackQuotaKey, - retryAfterMs, - ); + const { delayMs } = getRateLimitBackoff( + fallbackAccount.index, + fallbackQuotaKey, + retryAfterMs, + fallbackAccountId ?? + fallbackResolvedEmail ?? + null, + );🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@index.ts` around lines 2056 - 2076, The stream-failover 429 branch calls getRateLimitBackoff without the stable account identity, so its backoff bucket is separate; update the stream-failover 429 handling (the branch near the current secondary failover logic) to pass the same fourth argument used in the primary path—account.accountId ?? account.email ?? fallbackAccountId (or the resolved email)—into getRateLimitBackoff, ensure preemptiveQuotaScheduler.markRateLimited uses the resulting cooldown consistently, and add/adjust test/index-retry.test.ts to assert that failover probes increment the same backoff bucket (i.e., that a follow-up 429 is deduped by the identity-based backoff).
♻️ Duplicate comments (3)
lib/codex-manager.ts (1)
650-683:⚠️ Potential issue | 🟠 Majorthe menu now displays persisted 429s, but the auto-probe gate still ignores them.
lib/codex-manager.ts:1102-1108switched the dashboard view over togetPersistedQuotaViewForAccount(), butresolveMenuQuotaProbeInput()still gates probes offgetQuotaCacheEntryForAccount()atlib/codex-manager.ts:783-795. with a stale or empty cache entry and a future persistedrateLimitResetTimes.codex, the ui showsrate-limitedwhile the background menu refresh keeps probing the same account and generating avoidable 429s. please reuse the persisted-reset predicate in both paths and add the stale-cache + persisted-reset vitest.As per coding guidelines,
lib/**: focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios.Also applies to: 1102-1108
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/codex-manager.ts` around lines 650 - 683, The menu probe path still uses getQuotaCacheEntryForAccount to decide whether to skip probes, which ignores persisted rateLimitResetTimes.codex added by getPersistedQuotaViewForAccount; update resolveMenuQuotaProbeInput to reuse the same persisted-reset predicate (calling getRateLimitResetTimeForFamily and comparing against cached primary/secondary resetAtMs or using getPersistedQuotaViewForAccount) so probes are suppressed when a persisted 429 exists even if the in-memory cache is stale, and add a vitest that simulates a stale/empty cache with a future persisted rateLimitResetTimes.codex to assert the UI shows rate-limited and the probe is skipped (include tests covering EBUSY/429 retry behavior per guidelines).index.ts (1)
937-938: 🧹 Nitpick | 🔵 Trivialsplit the session-affinity version bump out of the const binding.
index.ts:937still trips biome'snoAssignInExpressions, and the side effect is harder to scan in a hot path. increment first, then bind the value.suggested fix
- const sessionAffinityVersion = - (sessionAffinityWriteVersion += 1); + sessionAffinityWriteVersion += 1; + const sessionAffinityVersion = + sessionAffinityWriteVersion;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@index.ts` around lines 937 - 938, The current expression const sessionAffinityVersion = (sessionAffinityWriteVersion += 1); performs an assignment inside a const binding and trips lint rules; change it to increment the counter first and then bind the value: perform sessionAffinityWriteVersion += 1 (or ++sessionAffinityWriteVersion) on its own line, then set const sessionAffinityVersion = sessionAffinityWriteVersion; this removes the assignment-in-expression and makes the side effect in sessionAffinityWriteVersion explicit.lib/codex-manager/commands/report.ts (1)
326-329:⚠️ Potential issue | 🟠 Majorprobe notes still leak account emails.
formatAccountLabel()is still used when pushingprobeErrorsatlib/codex-manager/commands/report.ts:328, and the same pattern remains atlib/codex-manager/commands/report.ts:391,lib/codex-manager/commands/report.ts:403, andlib/codex-manager/commands/report.ts:428. those strings are emitted in text mode and in the json report, so full email addresses still leave the process. switch these paths to a redacted label helper and assert the redaction intest/codex-manager-report-command.test.ts.As per coding guidelines,
lib/**: check for logging that leaks tokens or emails.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/codex-manager/commands/report.ts` around lines 326 - 329, Replace any use of formatAccountLabel when building probeErrors entries with the redacted account-label helper (e.g., formatAccountLabelRedacted or getRedactedAccountLabel) so that emitted probeErrors strings do not contain full email addresses; update the probeErrors.push sites (the ones currently calling formatAccountLabel) to call the redacting helper instead and ensure the message text remains the same aside from using the redacted label; finally, update test/codex-manager-report-command.test.ts to assert that the produced text and JSON report values contain the redacted label (no raw email) for those probe error cases.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@index.ts`:
- Around line 2510-2522: The current code still calls
sessionAffinityStore.remember(...) in two places (around where remember() is
used and at the blocks that currently call remember()/updateLastResponseId),
which bypasses the versioned ordering in lib/session-affinity.ts; replace those
remember(...) calls with
sessionAffinityStore.rememberWithVersion(sessionAffinityKey, ...,
sessionAffinityVersion) and ensure updateLastResponseId continues to be called
with sessionAffinityVersion (use the existing
updateLastResponseId(sessionAffinityKey, responseId, Date.now(),
sessionAffinityVersion) pattern), and add a vitest in
test/session-affinity.test.ts that races two non-continuation responses to
assert the mapping respects the newer write (i.e., the versioned write wins) to
prevent older completion from overwriting newer mappings.
- Around line 1983-2007: The server-burst recorder is never invoked for real
5xxs; add a call to recordServerBurstFailure(...) in the same branch that
handles overload-type server errors (where response.status is 502/503/529 and
policy.cooldownMs is a number) before any retry/rotation decision so the burst
circuit can open (this is the place alongside evaluateFailurePolicy(...) and
preemptiveQuotaScheduler.markRateLimited(...)); then add a Vitest in
test/index-retry.test.ts that drives three cross-account 5xx responses and
asserts that the next request fast-fails with a 503 (verifying
getServerBurstCooldownRemaining() and the burst fast-fail behavior).
In `@lib/codex-manager/commands/report.ts`:
- Around line 163-199: The flag-parsing code for --max-accounts and --max-probes
(the blocks that set options.maxAccounts and options.maxProbes) currently uses
Number.parseInt which silently truncates malformed numeric strings; change the
validation to explicitly reject non-integer strings before parsing (e.g. require
the raw token or the part after "--max-accounts=" / "--max-probes=" to match a
whole-digit regex like /^\d+$/ or otherwise verify Number(value) is an integer
and that the string does not include exponent/decimal/alpha chars), then parse
and assign only when the check passes; also add unit tests in
test/codex-manager-report-command.test.ts covering "1.9", "1e3", "2foo" and
similar cases to assert the command returns an error for those inputs.
In `@lib/request/request-resilience.ts`:
- Around line 48-60: The reset logic for serverBurstState currently clears the
entire state based only on windowStartedAt which can prematurely clear an active
cooldown; change the reset in request-resilience.ts so you only reinitialize
serverBurstState if there is no armed cooldown or the cooldown has expired
(i.e., check serverBurstState.cooldownUntil is null or <= now before resetting),
preserve cooldownUntil when it is still in the future, and continue to add
accountIndex and arm cooldown as before (symbols: serverBurstState,
SERVER_BURST_COOLDOWN_MS, SERVER_BURST_THRESHOLD, accountIndex). Also add a
vitest in test/request-resilience.test.ts that simulates failures at t=0,5,9,12s
to assert the cooldown remains armed until expiry and is not cleared by later
window resets.
- Around line 27-30: The armPoolExhaustionCooldown function currently overwrites
poolExhaustionCooldownUntil with now + bounded, which can shorten an active
cooldown; change armPoolExhaustionCooldown to compute the newExpiry = now +
bounded and set poolExhaustionCooldownUntil =
Math.max(poolExhaustionCooldownUntil, newExpiry) (refer to function
armPoolExhaustionCooldown and the variable poolExhaustionCooldownUntil), and add
a vitest in test/request-resilience.test.ts that sets an initial longer
cooldown, then calls armPoolExhaustionCooldown with a shorter wait while the
cooldown is active and asserts the expiry remains the original (monotonic)
value.
In `@README.md`:
- Around line 22-23: The README storage table is missing the new on-disk
artifact created by lib/runtime/runtime-observability.ts which persists
runtime-observability.json; update the storage table entries (including the
other referenced section near the earlier block) to list
runtime-observability.json, its approximate path, and a short description (e.g.,
"runtime observability metadata for request budget/cooldown/traffic") so
auditors and cleanup scripts know to handle this file — search for the
symbol/runtime-observability.json in lib/runtime/runtime-observability.ts and
add a corresponding row to the README storage table entries around the two
referenced sections.
---
Outside diff comments:
In `@index.ts`:
- Around line 2056-2076: The stream-failover 429 branch calls
getRateLimitBackoff without the stable account identity, so its backoff bucket
is separate; update the stream-failover 429 handling (the branch near the
current secondary failover logic) to pass the same fourth argument used in the
primary path—account.accountId ?? account.email ?? fallbackAccountId (or the
resolved email)—into getRateLimitBackoff, ensure
preemptiveQuotaScheduler.markRateLimited uses the resulting cooldown
consistently, and add/adjust test/index-retry.test.ts to assert that failover
probes increment the same backoff bucket (i.e., that a follow-up 429 is deduped
by the identity-based backoff).
In `@lib/codex-manager/commands/report.ts`:
- Around line 354-365: The comparisons that gate persistence use the old
in-memory account values (previousEmail vs account.email and previousAccountId
vs account.accountId) so email- or accountId-only refreshes never persist;
change the checks around where refreshPatch is built to compare previousEmail to
refreshPatch.email and previousAccountId to refreshPatch.accountId (also update
the similar checks at the other occurrence around lines ~394-398) before calling
applyRefreshedAccountPatch(), ensuring you still set refreshPatch fields as
currently done; then add a vitest in test/codex-manager-report-command.test.ts
that simulates an unchanged token but a new accountId to assert the patch
persists.
In `@lib/proactive-refresh.ts`:
- Around line 95-137: In proactiveRefreshAccount, remove raw account.email from
persistent logs (the log.info at the start, the log.info on success, and the
log.warn on failure) and instead log only account.index or a sanitized email;
update the three logging calls inside proactiveRefreshAccount to either omit the
email field or replace it with sanitizeEmail(account.email) if that helper
exists/import it (or add one) so no raw PII is written to logs while retaining
accountIndex for observability.
---
Duplicate comments:
In `@index.ts`:
- Around line 937-938: The current expression const sessionAffinityVersion =
(sessionAffinityWriteVersion += 1); performs an assignment inside a const
binding and trips lint rules; change it to increment the counter first and then
bind the value: perform sessionAffinityWriteVersion += 1 (or
++sessionAffinityWriteVersion) on its own line, then set const
sessionAffinityVersion = sessionAffinityWriteVersion; this removes the
assignment-in-expression and makes the side effect in
sessionAffinityWriteVersion explicit.
In `@lib/codex-manager.ts`:
- Around line 650-683: The menu probe path still uses
getQuotaCacheEntryForAccount to decide whether to skip probes, which ignores
persisted rateLimitResetTimes.codex added by getPersistedQuotaViewForAccount;
update resolveMenuQuotaProbeInput to reuse the same persisted-reset predicate
(calling getRateLimitResetTimeForFamily and comparing against cached
primary/secondary resetAtMs or using getPersistedQuotaViewForAccount) so probes
are suppressed when a persisted 429 exists even if the in-memory cache is stale,
and add a vitest that simulates a stale/empty cache with a future persisted
rateLimitResetTimes.codex to assert the UI shows rate-limited and the probe is
skipped (include tests covering EBUSY/429 retry behavior per guidelines).
In `@lib/codex-manager/commands/report.ts`:
- Around line 326-329: Replace any use of formatAccountLabel when building
probeErrors entries with the redacted account-label helper (e.g.,
formatAccountLabelRedacted or getRedactedAccountLabel) so that emitted
probeErrors strings do not contain full email addresses; update the
probeErrors.push sites (the ones currently calling formatAccountLabel) to call
the redacting helper instead and ensure the message text remains the same aside
from using the redacted label; finally, update
test/codex-manager-report-command.test.ts to assert that the produced text and
JSON report values contain the redacted label (no raw email) for those probe
error cases.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 104f094b-7f8f-4957-84d9-2bbabf7a198e
📒 Files selected for processing (12)
README.mdindex.tslib/codex-manager.tslib/codex-manager/commands/report.tslib/codex-manager/commands/status.tslib/proactive-refresh.tslib/quota-probe.tslib/request/request-resilience.tslib/runtime/runtime-observability.tstest/index-retry.test.tstest/request-resilience.test.tstest/runtime-observability.test.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (2)
lib/**
⚙️ CodeRabbit configuration file
focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.
Files:
lib/proactive-refresh.tslib/quota-probe.tslib/codex-manager/commands/status.tslib/codex-manager.tslib/request/request-resilience.tslib/runtime/runtime-observability.tslib/codex-manager/commands/report.ts
test/**
⚙️ CodeRabbit configuration file
tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.
Files:
test/runtime-observability.test.tstest/index-retry.test.tstest/request-resilience.test.ts
🪛 Biome (2.4.10)
index.ts
[error] 938-938: The assignment should not be in an expression.
(lint/suspicious/noAssignInExpressions)
🔇 Additional comments (2)
test/request-resilience.test.ts (1)
13-55: good deterministic coverage for the new cooldown helpers.
test/request-resilience.test.ts:13-55pins timestamps explicitly and resets shared state, so these cases should stay stable without real timers while exercising the pool cooldown, server burst cooldown, and candidate ordering paths.As per coding guidelines,
test/**: tests must stay deterministic and use vitest.test/index-retry.test.ts (1)
444-661: good regression coverage for the bounded retry budget.
test/index-retry.test.ts:444-661covers the no-replay default and the bounded-attempt paths with fixed inputs, and it asserts the operator-facing log context instead of only the final status code.As per coding guidelines,
test/**: tests must stay deterministic and use vitest.
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (4)
test/auto-update-checker.test.ts (2)
502-526:⚠️ Potential issue | 🟠 Majorprove serialized order and final cache integrity.
test/auto-update-checker.test.ts:502-526still passes if the two writes interleave or the last rename writes stale json. capture thewriteFileSync/renameSyncevent order and parse the payload from the last renamed temp file so this test actually proves the queue inlib/auto-update-checker.ts:33-77serialized the writes and left"5.0.1"in the cache.proposed test shape
it("serializes concurrent cache writes through temp-file renames", async () => { vi.mocked(fs.writeFileSync).mockClear(); vi.mocked(fs.renameSync).mockClear(); + const events: string[] = []; + const payloadByTempPath = new Map<string, string>(); + vi.mocked(fs.writeFileSync).mockImplementation((path, data) => { + const tempPath = String(path); + events.push(`write:${tempPath}`); + payloadByTempPath.set(tempPath, String(data)); + }); + vi.mocked(fs.renameSync).mockImplementation((from, to) => { + events.push(`rename:${String(from)}->${String(to)}`); + }); vi.mocked(globalThis.fetch) .mockResolvedValueOnce({ ok: true, @@ -515,11 +525,17 @@ await Promise.all([checkForUpdates(true), checkForUpdates(true)]); expect(fs.renameSync).toHaveBeenCalledTimes(2); + expect(events.map((event) => event.split(":")[0])).toEqual([ + "write", + "rename", + "write", + "rename", + ]); + const finalTempPath = String(vi.mocked(fs.renameSync).mock.calls.at(-1)?.[0]); + expect(JSON.parse(payloadByTempPath.get(finalTempPath) ?? "null")).toMatchObject({ + latestVersion: "5.0.1", + }); const writeTargets = vi .mocked(fs.writeFileSync) .mock.calls.map((call) => String(call[0])); expect( writeTargets.every((target) => target.includes("update-check-cache.json."),As per coding guidelines,
test/**: tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/auto-update-checker.test.ts` around lines 502 - 526, The test currently only checks calls but not ordering or final cache contents; update the test around checkForUpdates to record the sequence of fs.writeFileSync and fs.renameSync calls (e.g., by inspecting vi.mocked(fs.writeFileSync).mock.calls and vi.mocked(fs.renameSync).mock.calls), determine which temp file name was the last one passed to renameSync, read/parse the JSON payload that was written to that temp filename (from the corresponding writeFileSync call) and assert it contains version "5.0.1"; ensure you still assert two renames occurred and that the last rename's payload is the authoritative final cache content to prove lib/auto-update-checker.ts (checkForUpdates) serialized writes correctly.
477-500:⚠️ Potential issue | 🟠 Majorcover the exhausted rename path.
test/auto-update-checker.test.ts:477-500only proveslib/auto-update-checker.ts:51-64recovers from transientEBUSY/EPERM. it never drives the permanent-failure cleanup atlib/auto-update-checker.ts:68-75, so temp-file leaks on windows can regress unnoticed. add a vitest case that keepsrenameSyncfailing and assertsunlinkSyncandlogger.warn.proposed test shape
+ it.each(["EBUSY", "EPERM"] as const)( + "cleans up the temp file when rename exhausts retries (%s)", + async (code) => { + vi.mocked(fs.renameSync).mockImplementation(() => { + const error = new Error("busy") as NodeJS.ErrnoException; + error.code = code; + throw error; + }); + vi.mocked(globalThis.fetch).mockResolvedValue({ + ok: true, + json: async () => ({ version: "5.0.0" }), + } as Response); + + await checkForUpdates(true); + + expect(fs.unlinkSync).toHaveBeenCalledWith( + expect.stringContaining("update-check-cache.json."), + ); + expect(logger.warn).toHaveBeenCalledWith( + "Failed to save update cache", + expect.objectContaining({ error: expect.any(String) }), + ); + }, + );As per coding guidelines,
test/**: tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/auto-update-checker.test.ts` around lines 477 - 500, Add a deterministic Vitest case in test/auto-update-checker.test.ts that simulates persistent rename failures to exercise the permanent-failure branch in checkForUpdates: mock fs.renameSync to always throw a NodeJS.ErrnoException with code "EBUSY" or "EPERM", ensure globalThis.fetch returns an ok response, call checkForUpdates(true), and then assert that fs.unlinkSync was called to clean up the temp file and that the module logger.warn (or the logger used in lib/auto-update-checker.ts) was invoked with a warning about the failure; keep the test isolated and use vi.mocked to clear/restore mocks.lib/auto-update-checker.ts (1)
289-299:⚠️ Potential issue | 🟠 Majormake
clearUpdateCache()awaitable.
lib/auto-update-checker.ts:289returns before the queued clear runs. a caller that doesclearUpdateCache(); await checkForUpdates();can still read the stale cache atlib/auto-update-checker.ts:236in the same tick, andtest/auto-update-checker.test.ts:630-647is already compensating with manual microtask flushing. return the queued promise here and add a vitest regression for the immediate clear-then-check flow.proposed fix
-export function clearUpdateCache(): void { - void enqueueUpdateCacheWrite(() => { +export function clearUpdateCache(): Promise<void> { + return enqueueUpdateCacheWrite(() => { try { if (existsSync(CACHE_FILE)) { writeCacheContents("{}"); } } catch { // Ignore errors } }); }As per coding guidelines,
lib/**: focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/auto-update-checker.ts` around lines 289 - 299, clearUpdateCache currently returns before the queued clear runs because it fires-and-forgets enqueueUpdateCacheWrite; change clearUpdateCache to return the Promise from enqueueUpdateCacheWrite (i.e., make it async or return Promise<void>) so callers can await the cache-clear completing, update any callers/tests accordingly, and add a Vitest regression that does await clearUpdateCache(); await checkForUpdates(); to assert the cache is empty immediately. Locate the call site using the function name clearUpdateCache and the queue function enqueueUpdateCacheWrite, ensure the enqueue implementation still handles EBUSY/429 retry semantics per existing queue behavior, and update tests referenced in test/auto-update-checker.test.ts to cover the immediate clear-then-check flow.lib/codex-manager.ts (1)
650-683:⚠️ Potential issue | 🟠 Majorthe synthesized 429 view still treats stale state as live.
lib/codex-manager.ts:650-683returns cached429entries even after both cached reset times have passed, and when it synthesizes a persisted429it keeps the staleupdatedAt. that leaks intolib/codex-manager.ts:959as a falserate-limitedstatus and intolib/codex-manager.ts:783-795as repeat probes every ttl window whilerateLimitResetTimes.codexis still in the future. gate the synthetic view on a future reset timestamp, not juststatus === 429, and add vitest regressions for expired cached429plus stale-cache + future-persisted reset.As per coding guidelines,
lib/**: focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/config-explain.test.ts`:
- Around line 131-146: Update the vitest in test/config-explain.test.ts that
currently checks only "retryAllAccountsMaxRetries": call
getPluginConfigExplainReport() (using the existing helper expectEntry) and
assert that the four new explain entries added in lib/config.ts (the same keys
you added) all exist and have the expected value and defaultValue; then
JSON.stringify/JSON.parse the whole report and assert those same keys and their
value/defaultValue survive the round-trip (use serialized.entries.find by key);
finally add a sub-test that sets the corresponding env vars in process.env,
re-imports or re-calls getPluginConfigExplainReport(), and asserts each of the
four entries shows source: "env" (clean up env after); keep using vitest APIs
(no mocking of real secrets) and deterministic setup/teardown.
In `@test/fetch-helpers.test.ts`:
- Around line 497-509: Add a new unit test that verifies mixed entitlement vs
rate-limit signals are handled with entitlement precedence: create a response
body where error.code is 'usage_not_included' and error.type is
'usage_limit_reached', wrap it in a Response with status 404, call
handleErrorResponse, and assert the mapped response status is 403 and that
rateLimit is undefined; place the test alongside the existing tests in
fetch-helpers.test.ts near the other error-mapping cases so it documents that
entitlement detection in handleErrorResponse wins over rate-limit detection.
- Around line 1030-1054: Add a regression test for handleErrorResponse that
verifies a 404 response with malformed JSON body does not get remapped or throw;
create a new test similar to the existing cases (referencing handleErrorResponse
in the test file) that constructs a Response('{ invalid json', { status: 404 }),
calls await handleErrorResponse(response), and asserts result.status is 404 and
rateLimit is undefined so the try/catch path that falls through to null is
exercised.
---
Duplicate comments:
In `@lib/auto-update-checker.ts`:
- Around line 289-299: clearUpdateCache currently returns before the queued
clear runs because it fires-and-forgets enqueueUpdateCacheWrite; change
clearUpdateCache to return the Promise from enqueueUpdateCacheWrite (i.e., make
it async or return Promise<void>) so callers can await the cache-clear
completing, update any callers/tests accordingly, and add a Vitest regression
that does await clearUpdateCache(); await checkForUpdates(); to assert the cache
is empty immediately. Locate the call site using the function name
clearUpdateCache and the queue function enqueueUpdateCacheWrite, ensure the
enqueue implementation still handles EBUSY/429 retry semantics per existing
queue behavior, and update tests referenced in test/auto-update-checker.test.ts
to cover the immediate clear-then-check flow.
In `@test/auto-update-checker.test.ts`:
- Around line 502-526: The test currently only checks calls but not ordering or
final cache contents; update the test around checkForUpdates to record the
sequence of fs.writeFileSync and fs.renameSync calls (e.g., by inspecting
vi.mocked(fs.writeFileSync).mock.calls and vi.mocked(fs.renameSync).mock.calls),
determine which temp file name was the last one passed to renameSync, read/parse
the JSON payload that was written to that temp filename (from the corresponding
writeFileSync call) and assert it contains version "5.0.1"; ensure you still
assert two renames occurred and that the last rename's payload is the
authoritative final cache content to prove lib/auto-update-checker.ts
(checkForUpdates) serialized writes correctly.
- Around line 477-500: Add a deterministic Vitest case in
test/auto-update-checker.test.ts that simulates persistent rename failures to
exercise the permanent-failure branch in checkForUpdates: mock fs.renameSync to
always throw a NodeJS.ErrnoException with code "EBUSY" or "EPERM", ensure
globalThis.fetch returns an ok response, call checkForUpdates(true), and then
assert that fs.unlinkSync was called to clean up the temp file and that the
module logger.warn (or the logger used in lib/auto-update-checker.ts) was
invoked with a warning about the failure; keep the test isolated and use
vi.mocked to clear/restore mocks.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 34fec1e4-2d2d-4ddc-826c-41ab1d49a025
📒 Files selected for processing (8)
lib/auto-update-checker.tslib/codex-manager.tslib/config.tslib/request/fetch-helpers.tstest/auto-update-checker.test.tstest/codex-manager-cli.test.tstest/config-explain.test.tstest/fetch-helpers.test.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (2)
test/**
⚙️ CodeRabbit configuration file
tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.
Files:
test/config-explain.test.tstest/fetch-helpers.test.tstest/auto-update-checker.test.tstest/codex-manager-cli.test.ts
lib/**
⚙️ CodeRabbit configuration file
focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.
Files:
lib/request/fetch-helpers.tslib/config.tslib/auto-update-checker.tslib/codex-manager.ts
🔇 Additional comments (4)
lib/request/fetch-helpers.ts (1)
980-1024: lgtm, the signal-based 404 remapping logic is sound.the tightening from free-text regex to structured
error.code/error.typesignals is a good hardening change. the.includes("usage_limit")atlib/request/fetch-helpers.ts:1012correctly matches both"usage_limit"and"usage_limit_reached"variants.one minor observation: the function now silently drops numeric
error.codevalues by coercing them via.toString()at line 990-991 — this is fine since the upstream api uses string codes, but worth noting if that ever changes.test/fetch-helpers.test.ts (2)
511-526: good coverage for entitlement detection via error.type.this test at
test/fetch-helpers.test.ts:511-526correctly verifies thatusage_not_includedinerror.typetriggers the 403 entitlement path and suppresses rate-limit info. this is a key regression case for the signal-based parsing.
467-495: lgtm, the signal extraction tests are well-structured.tests at
test/fetch-helpers.test.ts:467-495properly cover:
error.codecarrying the usage_limit signal (line 467)error.typecarrying the usage_limit signal (line 482)both verify the 404→429 remap and
retryAfterMs > 0. deterministic and no mocking of real secrets.test/codex-manager-cli.test.ts (1)
7155-7200: good regression coverage for persisted-only fallbackthis looks correct now.
test/codex-manager-cli.test.ts:7199explicitly proves no live probe happens, so the fallback to persisted rate-limit state stays enforced.
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
test/runtime-services.test.ts (1)
72-110:⚠️ Potential issue | 🟡 Minoradd eperm variant test to cover windows filesystem lock edge case.
lib/runtime/runtime-services.ts:79 explicitly handles both
EBUSYandEPERMwith identical retry logic, but test/runtime-services.test.ts only coversEBUSYexhaustion at lines 72-110. add a parallel test case witherror.code = "EPERM"to ensure antivirus/indexing file locks on windows are handled the same way—this is a required regression case for windows behavior.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/runtime-services.test.ts` around lines 72 - 110, Add a parallel test to the existing "warns and keeps the previous path when busy retries are exhausted" spec that uses error.code = "EPERM" instead of "EBUSY" to exercise the same retry logic in ensureLiveAccountSyncState; copy the existing test structure (vi.useFakeTimers(), create NodeJS.ErrnoException with code "EPERM", mock syncToPath to throw, currentSync, logWarn, call ensureLiveAccountSyncState with enabled:true and targetPath "/tmp/new", run timers, await the result, then assert syncToPath was called 3 times, logWarn was called with the same "[plugin] Live account sync path switch failed due to transient filesystem locks; keeping previous watcher." message, and the returned object preserves currentSync/currentPath/currentConfigKey), and ensure you restore timers in a finally block with vi.useRealTimers().index.ts (1)
2403-2412:⚠️ Potential issue | 🟠 Majorstream failover 429s still bypass the stable backoff key.
index.ts:2071-2076was updated to pass a stable identity intogetRateLimitBackoff(), butindex.ts:2408-2412drops back to the mutablefallbackAccount.indexinside the stream-failover 429 branch. that means a streamed fallback that gets rate-limited can still split the same account across different backoff buckets after reindexing or account removal, which undoes the stable-account-key fix on one of the new 429 paths. please pass the resolved fallback identity here too, and add a vitest regression intest/index-retry.test.tsfor a streamed fallback 429.suggested fix
const { delayMs } = getRateLimitBackoff( fallbackAccount.index, fallbackQuotaKey, retryAfterMs, + fallbackAccountId ?? fallbackResolvedEmail ?? null, );As per coding guidelines,
lib/**: focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@index.ts` around lines 2403 - 2412, The stream-failover 429 branch uses the mutable fallbackAccount.index when calling getRateLimitBackoff, which breaks the stable backoff key fix; change the call to use the previously resolved stable fallback identity (the same key used at index.ts:2071-2076) instead of fallbackAccount.index — construct fallbackQuotaKey the same way (use modelFamily and model) and pass the resolved fallback identity into getRateLimitBackoff rather than fallbackAccount.index; then add a vitest regression in test/index-retry.test.ts that simulates a streamed fallback response returning 429 and asserts that requests for the same resolved fallback identity map to the same backoff bucket (i.e., consistent backoff key) after reindexing/removal.
♻️ Duplicate comments (1)
test/session-affinity.test.ts (1)
161-173:⚠️ Potential issue | 🟠 Majorthis regression still passes with the original overlap bug.
test/session-affinity.test.ts:166-169only proves that the second sequential write wins. iflib/session-affinity.ts:233-240regressed back to aDate.now()-derived default, this test would still pass because both writes would share the same version and the later call would overwrite the earlier one anyway. please make this a true same-timestamp overlap: let the newer request write first, let the older request complete later at the same mocked timestamp, and assert that bothgetPreferredAccountIndex()andgetLastResponseId()stay on the newer write.As per coding guidelines,
test/**: tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/session-affinity.test.ts` around lines 161 - 173, The test currently writes two versions sequentially and can miss same-timestamp overlap bugs; change it to simulate a real same-timestamp race by keeping the mocked clock fixed (vi.useFakeTimers + vi.setSystemTime) then invoke the newer request first (call SessionAffinityStore.rememberWithVersion for the newer version), schedule the older request to complete asynchronously at the same mocked timestamp (enqueue a setTimeout(fn, 0) or queueMicrotask that calls rememberWithVersion for the older version while not advancing time), then advance timers to run the older completion and assert that SessionAffinityStore.getPreferredAccountIndex and getLastResponseId still reflect the newer write; ensure you reference the SessionAffinityStore class and the rememberWithVersion, getPreferredAccountIndex, and getLastResponseId methods when updating the test.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@index.ts`:
- Around line 2726-2740: The user-facing cooldown message is built from
waitMs/waitLabel in index.ts but armPoolExhaustionCooldown (from
lib/request/request-resilience.ts) may clamp/extend the actual cooldown to a
longer minimum; update the branch that calls armPoolExhaustionCooldown so it
captures and uses the armed cooldown value (use the return value or expose the
effective cooldown from armPoolExhaustionCooldown) instead of the original
waitMs when composing message/runtimeMetrics.lastError and before calling
syncRuntimeObservability; ensure armPoolExhaustionCooldown’s API is adjusted to
return the effective cooldown if needed, update the test at
test/index-retry.test.ts:444-482 to expect the longer/clamped cooldown, and
verify no sensitive data (tokens/emails) is logged.
In `@lib/runtime/runtime-observability.ts`:
- Around line 199-202: pendingWrite chains into writeSnapshot(nextSnapshot)
without a terminal catch, so permanent write failures propagate as unhandled
rejections; add a terminal .catch(() => undefined) to the promise returned by
writeSnapshot in the pendingWrite assignment (the chain that references
pendingWrite and writeSnapshot) to swallow errors and keep pendingWrite settled,
and add a vitest that mocks fs.rename or fs.writeFile to always throw (simulate
non-retryable error / Windows contention) and asserts no unhandledRejection is
emitted during the test run.
In `@lib/runtime/runtime-services.ts`:
- Around line 49-59: The guard in runtime-services.ts uses nextConfigKey !==
null so a passed configKey of undefined does not reset liveAccountSyncConfigKey;
add a regression test in test/runtime-services.test.ts that sets
liveAccountSyncConfigKey = "25:250", invokes the update path with configKey:
undefined (so nextConfigKey becomes undefined) while liveAccountSync is
non-null, and assert the expected behavior (either that the sync is reset or
that the previous key/path is kept) to document the contract; alternatively, if
the intended behavior is to treat undefined like null, change the conditional in
the liveAccountSync reset block (referencing variables nextConfigKey,
liveAccountSync, liveAccountSyncConfigKey) to use a nullish check (e.g.,
nextConfigKey == null) so undefined triggers the reset, and update/add the test
accordingly.
In `@lib/storage.ts`:
- Around line 1272-1369: Tests are missing for the corrupt branches in
inspectStorageHealth; add regression tests that simulate (1) malformed JSON in
the primary storage file and (2) JSON that fails schema normalization so the
catch/invalid-normalize branches run, then assert inspectStorageHealth returns
"corrupt" when WAL recovery is not present and "recoverable" when
loadAccountsFromJournal (called inside inspectStorageHealth) finds entries; use
the inspectStorageHealth, loadAccountsFromJournal and loadAccountsFromPath
behavior to create test fixtures (write bad content to the storage path returned
by getStoragePath or mock it) and verify the details/error message path and
recoverySource fields match expectations.
In `@test/runtime-services.test.ts`:
- Around line 86-89: Indentation of the object literal properties currentSync,
currentPath, currentConfigKey, and createSync in the test
runtime-services.test.ts is inconsistent with surrounding lines; update their
leading whitespace to match the file's existing indentation style (use the same
tabs/spaces and alignment as surrounding object entries) so the block aligns
with neighboring lines and removes the lint warning.
In `@test/storage-health-inspection.test.ts`:
- Around line 11-69: Add new test cases alongside the existing
inspectStorageHealth spec to cover the missing regressions: (1) corrupt primary
storage — write malformed JSON to the path set via
storageModule.setStoragePathDirect(...) and ensure inspectStorageHealth()
returns summary.state === "corrupt" when the WAL also fails; (2) healthy storage
— write a valid accounts.json and assert summary.state === "healthy"; (3)
intentional-reset — create the reset marker file (same directory/name convention
used by storageModule) and assert summary.state === "intentional-reset"; (4)
empty storage — write valid JSON with accounts: [] and assert summary.state ===
"empty"; and (5) Windows EBUSY simulation — mock fs/promises (or vi.mock the fs
module used by storageModule) to throw an EBUSY error on read/open and assert
inspectStorageHealth() handles it gracefully (expect the appropriate state or
error handling per existing logic). Use the same helpers in the test
(setStoragePathDirect, inspectStorageHealth) and clean up workDir/reset marker
between tests.
---
Outside diff comments:
In `@index.ts`:
- Around line 2403-2412: The stream-failover 429 branch uses the mutable
fallbackAccount.index when calling getRateLimitBackoff, which breaks the stable
backoff key fix; change the call to use the previously resolved stable fallback
identity (the same key used at index.ts:2071-2076) instead of
fallbackAccount.index — construct fallbackQuotaKey the same way (use modelFamily
and model) and pass the resolved fallback identity into getRateLimitBackoff
rather than fallbackAccount.index; then add a vitest regression in
test/index-retry.test.ts that simulates a streamed fallback response returning
429 and asserts that requests for the same resolved fallback identity map to the
same backoff bucket (i.e., consistent backoff key) after reindexing/removal.
In `@test/runtime-services.test.ts`:
- Around line 72-110: Add a parallel test to the existing "warns and keeps the
previous path when busy retries are exhausted" spec that uses error.code =
"EPERM" instead of "EBUSY" to exercise the same retry logic in
ensureLiveAccountSyncState; copy the existing test structure
(vi.useFakeTimers(), create NodeJS.ErrnoException with code "EPERM", mock
syncToPath to throw, currentSync, logWarn, call ensureLiveAccountSyncState with
enabled:true and targetPath "/tmp/new", run timers, await the result, then
assert syncToPath was called 3 times, logWarn was called with the same "[plugin]
Live account sync path switch failed due to transient filesystem locks; keeping
previous watcher." message, and the returned object preserves
currentSync/currentPath/currentConfigKey), and ensure you restore timers in a
finally block with vi.useRealTimers().
---
Duplicate comments:
In `@test/session-affinity.test.ts`:
- Around line 161-173: The test currently writes two versions sequentially and
can miss same-timestamp overlap bugs; change it to simulate a real
same-timestamp race by keeping the mocked clock fixed (vi.useFakeTimers +
vi.setSystemTime) then invoke the newer request first (call
SessionAffinityStore.rememberWithVersion for the newer version), schedule the
older request to complete asynchronously at the same mocked timestamp (enqueue a
setTimeout(fn, 0) or queueMicrotask that calls rememberWithVersion for the older
version while not advancing time), then advance timers to run the older
completion and assert that SessionAffinityStore.getPreferredAccountIndex and
getLastResponseId still reflect the newer write; ensure you reference the
SessionAffinityStore class and the rememberWithVersion,
getPreferredAccountIndex, and getLastResponseId methods when updating the test.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 0484a041-dea5-4bff-aefc-7ab2a0f02521
📒 Files selected for processing (12)
index.tslib/request/request-resilience.tslib/runtime/runtime-observability.tslib/runtime/runtime-services.tslib/session-affinity.tslib/storage.tstest/index-retry.test.tstest/request-resilience.test.tstest/runtime-observability.test.tstest/runtime-services.test.tstest/session-affinity.test.tstest/storage-health-inspection.test.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (2)
test/**
⚙️ CodeRabbit configuration file
tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.
Files:
test/session-affinity.test.tstest/storage-health-inspection.test.tstest/runtime-services.test.tstest/runtime-observability.test.tstest/request-resilience.test.tstest/index-retry.test.ts
lib/**
⚙️ CodeRabbit configuration file
focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.
Files:
lib/runtime/runtime-services.tslib/storage.tslib/session-affinity.tslib/request/request-resilience.tslib/runtime/runtime-observability.ts
🪛 Biome (2.4.10)
index.ts
[error] 941-941: The assignment should not be in an expression.
(lint/suspicious/noAssignInExpressions)
🔇 Additional comments (10)
lib/storage.ts (3)
54-57: lgtm on import additions.the new health summary imports are properly scoped and used by
inspectStorageHealth.
134-143: export addition is correct.
StorageHealthSummaryis properly re-exported for consumers.
1371-1419: silent option implementation is clean.the
{ silent?: boolean }parameter correctly guards all three logging points:
lib/storage.ts:1393-1395(checksum mismatch)lib/storage.ts:1405-1407(recovery success)lib/storage.ts:1411(non-ENOENT errors)backward-compatible default
{}preserves existing caller behavior.test/storage-health-inspection.test.ts (1)
28-50: temp directory cleanup is safe but consider usingvi.fn()for stricter mock verification.at
test/storage-health-inspection.test.ts:17-26, the logger mock captures calls but doesn't verifydebuganderroraren't called with sensitive data. consider assertinglogWarn.mock.callscontents don't contain tokens or emails if you want to guard against future regressions in log statements.minor: the
workDirat line 28 usestmpdir()which is/tmpon linux but%TEMP%on windows. the test should work cross-platform but if ci runs windows, ensure path handling is correct (it should be, sincejoin()is used).lib/runtime/runtime-services.ts (3)
39-46: past issue addressed: disabled branch now unconditionally clears all state.the fix at
lib/runtime/runtime-services.ts:40-45uses optional chaining to stop the watcher and returns null for all three fields regardless of whethercurrentSyncexisted. this correctly handles the stale path/configKey scenario flagged previously.test coverage confirmed at
test/runtime-services.test.ts:31-48("clears stale live sync path and config key even when no watcher exists").
48-59: past issue addressed: null config key now triggers recreation.the condition at
lib/runtime/runtime-services.ts:52-53now treatsliveAccountSyncConfigKey === nullas a recreate case whennextConfigKey !== null. this prevents silently reusing a watcher with unknown debounce/poll settings after upgrade.test coverage confirmed at
test/runtime-services.test.ts:137-159("recreates live sync when the current config key is unknown").
71-84: ebusy/eperm retry logic looks solid for windows filesystem edge cases.the retry loop at
lib/runtime/runtime-services.ts:71-84correctly handles transient filesystem locks with exponential backoff (25ms, 50ms, 100ms). three attempts is reasonable for brief windows file-system contention.minor note: the delay formula
25 * 2 ** attemptyields 25, 50, 100ms for attempts 0, 1, 2 - confirm this timing is acceptable for your windows CI environment.test/runtime-services.test.ts (3)
31-48: good regression test for the stale state bug.
test/runtime-services.test.ts:31-48covers the edge case wherecurrentSync: nullbutcurrentPathandcurrentConfigKeyhave stale values. this ensures the disabled branch clears all state unconditionally, preventing the skip-syncToPath bug flagged in the prior review.
112-135: config key change test is well-structured.
test/runtime-services.test.ts:112-135verifies that whencurrentConfigKeydiffers fromconfigKey, the old sync is stopped, a new one is created, andsyncToPathis called. assertions are complete.
137-159: unknown config key (null → non-null) transition test is solid.
test/runtime-services.test.ts:137-159covers the scenario wherecurrentConfigKey: nulland a validconfigKeyis provided, ensuring recreation occurs. this directly addresses the past review concern about legacy/partially committed state.
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
test/codex-manager-report-command.test.ts (1)
29-69: 🧹 Nitpick | 🔵 Trivialmissing test coverage for loadRuntimeObservabilitySnapshot.
test/codex-manager-report-command.test.ts:29-69definescreateDepsbut doesn't includeloadRuntimeObservabilitySnapshot. no test exercises thereport.runtimefield population added inlib/codex-manager/commands/report.ts:524. consider adding a test case that mocksloadRuntimeObservabilitySnapshotto return a snapshot and asserts it appears in the JSON output.#!/bin/bash # verify no existing test covers loadRuntimeObservabilitySnapshot rg -n "loadRuntimeObservabilitySnapshot" test/🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/codex-manager-report-command.test.ts` around lines 29 - 69, The tests are missing coverage for loadRuntimeObservabilitySnapshot so update the test helper createDeps to include a mock implementation for loadRuntimeObservabilitySnapshot (on the ReportCommandDeps object) that returns a representative snapshot, then add or modify a test that runs the report command and asserts the produced JSON includes the expected snapshot under the report.runtime field; reference the mock by name loadRuntimeObservabilitySnapshot and assert the output contains the mocked snapshot values.lib/codex-manager/commands/report.ts (1)
258-285:⚠️ Potential issue | 🟡 MinordefaultWriteFile lacks vitest test coverage for windows EBUSY/EPERM retry scenarios.
lib/codex-manager/commands/report.ts:264-275retries rename 5 times with 10ms exponential backoff (total ~310ms). this matcheslib/storage.tsbackup copy logic but falls short oflib/storage/import-export.tswhich uses 25ms base (375ms total) for similar operations. windows antivirus or file indexing can hold locks beyond 310ms, but the bigger issue is that defaultWriteFile has no direct vitest tests for EBUSY/EPERM scenarios—the test file only mocks writeFile at the dependency level. per coding guidelines forlib/**, add a regression test simulating EBUSY/EPERM errors and verify the retry delay is sufficient for windows file locking edge cases.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/codex-manager/commands/report.ts` around lines 258 - 285, Add a vitest regression test for defaultWriteFile (in lib/codex-manager/commands/report.ts) that mocks fs.rename to throw transient EBUSY/EPERM errors for the first few attempts and then succeed, asserting that defaultWriteFile eventually writes the file, cleans up the temp file on failure paths, and respects the retry/backoff behavior (simulate delays so test proves the retry loop retries and succeeds within the configured attempts); ensure the test covers both EBUSY and EPERM cases and fails if rename is not retried or temp files are left behind.
♻️ Duplicate comments (4)
test/storage.test.ts (2)
2967-2981:⚠️ Potential issue | 🟠 Majorrestore
fs.unlinkspies infinallyto prevent test leakage.both tests restore the spy only on the happy path, so a failed assertion can leak the mock into later cases. this impacts deterministic behavior in
test/storage.test.ts:2967-2981andtest/storage.test.ts:4149-4158.proposed fix
- const unlinkSpy = vi + const unlinkSpy = vi .spyOn(fs, "unlink") .mockImplementation(async (targetPath) => { if (String(targetPath) === testStoragePath) { const error = Object.assign(new Error("locked"), { code: "EPERM" }); throw error; } return Promise.resolve(); }); - await expect(clearAccounts()).rejects.toMatchObject({ code: "EPERM" }); - expect(existsSync(testStoragePath)).toBe(true); - expect(existsSync(resetMarkerPath)).toBe(false); - - unlinkSpy.mockRestore(); + try { + await expect(clearAccounts()).rejects.toMatchObject({ code: "EPERM" }); + expect(existsSync(testStoragePath)).toBe(true); + expect(existsSync(resetMarkerPath)).toBe(false); + } finally { + unlinkSpy.mockRestore(); + }as per coding guidelines,
test/**: tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior.Also applies to: 4149-4158
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/storage.test.ts` around lines 2967 - 2981, The fs.unlink spy in the tests around clearAccounts is only restored on the success path, which can leak mocks; wrap the spy restoration in a finally block so unlinkSpy.mockRestore() always runs (for the tests referencing unlinkSpy, clearAccounts, testStoragePath, resetMarkerPath and existsSync), and apply the same finally-based restore fix to the similar test at the other location that uses unlinkSpy (the block spanning lines testing EPERM behavior) to ensure deterministic Vitest behavior across failures.
2984-3001:⚠️ Potential issue | 🟠 Majoradd the missing branch: primary already absent + wal unlink
ebusy.this test covers wal failure, but it does not exercise the specific branch where the primary path is already gone before wal removal fails. that branch was the windows-regression target tied to
lib/storage/account-clear.ts:35-41. current setup intest/storage.test.ts:2984-3001still starts with a present primary file.as per coding guidelines,
test/**: tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/storage.test.ts` around lines 2984 - 3001, Modify the test to exercise the branch where the primary file is already absent before the WAL unlink fails: create the WAL file (`${testStoragePath}.wal`) but ensure the primary (`testStoragePath`) is removed (e.g., await fs.unlink(testStoragePath) or do not create it) before calling clearAccounts(); keep the fs.unlink mock (vi.spyOn(fs, "unlink")) that throws an error with code "EBUSY" only when targetPath matches `${testStoragePath}.wal`; call await expect(clearAccounts()).rejects.toMatchObject({ code: "EBUSY" }) and then assert existsSync(getIntentionalResetMarkerPath(testStoragePath)) is false; finally restore the unlinkSpy—this will exercise the primary-absent + wal EBUSY branch referenced by getIntentionalResetMarkerPath and clearAccounts.test/config-explain.test.ts (1)
162-170:⚠️ Potential issue | 🟡 Minorassert exact per-key values, not just “defined” and source.
test/config-explain.test.ts:162-170andtest/config-explain.test.ts:179-181can miss key-to-env mixups inlib/config.tsbecause every var is set and onlydefined/sourceis checked. assert exact numeric values per key (especially in the env phase).suggested tightening
const envReport = envMod.getPluginConfigExplainReport(); + const expectedEnvValues: Record<(typeof keys)[number], number> = { + rateLimitDedupWindowMs: 3210, + rateLimitStateResetMs: 654321, + rateLimitMaxBackoffMs: 12345, + rateLimitShortRetryThresholdMs: 250, + }; for (const key of keys) { const entry = expectEntry(envReport, key); expect(entry?.source).toBe("env"); + expect(entry?.value).toBe(expectedEnvValues[key]); }Also applies to: 179-181
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/config-explain.test.ts` around lines 162 - 170, The test currently only checks that each serialized entry for keys (variable serialized, report, keys) is defined and has a defined source, which allows key-to-env mismatches to pass; update the assertions in test/config-explain.test.ts to assert exact expected values per key (compare entry.value and entry.defaultValue to the known expected numbers/strings for each key) instead of using only toBeDefined, and include explicit checks for the expected source when relevant (e.g., ensure env-derived keys have the env values), referencing the serialized.entries lookup and the keys array to map expected values to each entry and fail fast on mismatches.lib/codex-manager/commands/report.ts (1)
340-345:⚠️ Potential issue | 🟠 Majoremail addresses still leak via formatAccountLabel in probeErrors.
lib/codex-manager/commands/report.ts:342,405,417,442push messages containingformatAccountLabel(account, i)intoprobeErrors, which is serialized to JSON output and stdout. per past review,formatAccountLabelincludes full email addresses. this violates the coding guideline to "check for logging that leaks tokens or emails". use a sanitized label or redact the email portion before including in error messages.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/codex-manager/commands/report.ts` around lines 340 - 345, formatAccountLabel(account, i) currently includes full email addresses and is being passed into probeErrors.push (e.g., the push calls around the options.cachedOnly check and other probeErrors pushes), causing emails to be serialized to JSON/stdout; fix by replacing use of formatAccountLabel(account, i) in all probeErrors.push calls with a sanitized label that redacts the email portion (e.g., mask local-part or replace with "<redacted>@domain" or use an existing sanitized helper), or implement a small helper sanitizeAccountLabel(account) and call that before pushing to probeErrors so no full email is emitted.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@README.md`:
- Around line 278-279: Add a Windows-specific troubleshooting bullet to
README.md under the "Common symptoms" section (around lines 274-283) that reads:
"Storage cleanup fails with `EBUSY` / `EPERM` (Windows): run `codex auth doctor
--fix` to retry, or manually remove `~/.codex/multi-auth/<project-key>/` and
re-login". In lib/storage/account-clear.ts, update the unlink/remove logic
(where unlink is called around line 12) to handle transient Windows file-lock
errors by implementing exponential-backoff retries for EBUSY and EPERM: catch
the unlink/fs.rm errors in the function that performs account cleanup, loop with
a small base delay doubling each attempt, stop after a bounded max attempts, and
only rethrow non-transient errors; mirror the same retry/backoff parameters and
error checks used in lib/quota-cache.ts (the pattern at lines 154-171) so
behavior is consistent with quota-cache test expectations.
- Around line 160-167: Add regression tests to verify the JSON contract emitted
by the report command: update or extend
test/codex-manager-report-command.test.ts to call the report JSON path (the code
that reads lib/runtime/runtime-observability.ts via report.ts snapshot logic)
and assert that when a runtime snapshot is present the output includes
runtime.poolExhaustionCooldownUntil, runtime.serverBurstCooldownUntil, and
runtime.runtimeMetrics fields; mock or seed the runtime-observability snapshot
as needed to make the fields present. Also update the README note to clarify
that codex auth status is text-only (no --json) to avoid claiming JSON
availability. Reference symbols: report.ts snapshot emission,
lib/runtime/runtime-observability.ts snapshot fields, and the test file
test/codex-manager-report-command.test.ts.
In `@test/codex-manager-report-command.test.ts`:
- Around line 287-289: The test currently uses a loose assertion for
jsonOutput.forecast.recommendation.selectedReason (expect.any(String)); replace
this with a deterministic assertion: determine the exact expected reason string
for the test scenario and assert equality (e.g.,
expect(jsonOutput.forecast.recommendation.selectedReason).toEqual('EXPECTED_REASON')),
or if multiple valid reasons exist, assert membership against the allowed list
or use a more specific matcher (e.g., regex or expect.arrayContaining) so the
test verifies the actual expected value for selectedReason rather than just its
type.
- Around line 460-512: Add an assertion that saveAccounts was retried the
expected number of times when EPERM persistence fails: in the test that calls
runReportCommand(["--live","--json"]) with deps.saveAccounts throwing EPERM,
assert the mock saveAccounts call count (e.g.,
expect(deps.saveAccounts).toHaveBeenCalledTimes(...)) to confirm the command
attempted the configured retry attempts before giving up; locate the mock named
saveAccounts in the test and add the call-count expectation alongside the
existing checks (after the runReportCommand and before final storage
assertions).
In `@test/config-explain.test.ts`:
- Around line 173-176: The test mutates several environment keys
(CODEX_AUTH_RATE_LIMIT_DEDUP_WINDOW_MS, CODEX_AUTH_RATE_LIMIT_STATE_RESET_MS,
CODEX_AUTH_RATE_LIMIT_MAX_BACKOFF_MS,
CODEX_AUTH_RATE_LIMIT_SHORT_RETRY_THRESHOLD_MS) but the existing cleanup does
not remove them, causing test bleed; update test/config-explain.test.ts to
either save and restore the original values or explicitly delete these four
process.env keys in the test teardown (afterEach or afterAll) so other tests
remain deterministic and order-independent.
In `@test/fetch-helpers.test.ts`:
- Around line 472-476: Normalize the indentation in the test blocks that create
the mock response so they match the project's 2-tab (8-space) style: locate the
test code around the Response creation and assertions (references: the Response
constructor usage, the const resp = new Response(...), and the await
handleErrorResponse(...) call in the test) and change the overly deep 16-space
indentation on the object literal and subsequent lines to the standard
2-tab/8-space indentation used by neighboring tests (also apply the same fix to
the similar block around the other occurrence with the same pattern).
---
Outside diff comments:
In `@lib/codex-manager/commands/report.ts`:
- Around line 258-285: Add a vitest regression test for defaultWriteFile (in
lib/codex-manager/commands/report.ts) that mocks fs.rename to throw transient
EBUSY/EPERM errors for the first few attempts and then succeed, asserting that
defaultWriteFile eventually writes the file, cleans up the temp file on failure
paths, and respects the retry/backoff behavior (simulate delays so test proves
the retry loop retries and succeeds within the configured attempts); ensure the
test covers both EBUSY and EPERM cases and fails if rename is not retried or
temp files are left behind.
In `@test/codex-manager-report-command.test.ts`:
- Around line 29-69: The tests are missing coverage for
loadRuntimeObservabilitySnapshot so update the test helper createDeps to include
a mock implementation for loadRuntimeObservabilitySnapshot (on the
ReportCommandDeps object) that returns a representative snapshot, then add or
modify a test that runs the report command and asserts the produced JSON
includes the expected snapshot under the report.runtime field; reference the
mock by name loadRuntimeObservabilitySnapshot and assert the output contains the
mocked snapshot values.
---
Duplicate comments:
In `@lib/codex-manager/commands/report.ts`:
- Around line 340-345: formatAccountLabel(account, i) currently includes full
email addresses and is being passed into probeErrors.push (e.g., the push calls
around the options.cachedOnly check and other probeErrors pushes), causing
emails to be serialized to JSON/stdout; fix by replacing use of
formatAccountLabel(account, i) in all probeErrors.push calls with a sanitized
label that redacts the email portion (e.g., mask local-part or replace with
"<redacted>@domain" or use an existing sanitized helper), or implement a small
helper sanitizeAccountLabel(account) and call that before pushing to probeErrors
so no full email is emitted.
In `@test/config-explain.test.ts`:
- Around line 162-170: The test currently only checks that each serialized entry
for keys (variable serialized, report, keys) is defined and has a defined
source, which allows key-to-env mismatches to pass; update the assertions in
test/config-explain.test.ts to assert exact expected values per key (compare
entry.value and entry.defaultValue to the known expected numbers/strings for
each key) instead of using only toBeDefined, and include explicit checks for the
expected source when relevant (e.g., ensure env-derived keys have the env
values), referencing the serialized.entries lookup and the keys array to map
expected values to each entry and fail fast on mismatches.
In `@test/storage.test.ts`:
- Around line 2967-2981: The fs.unlink spy in the tests around clearAccounts is
only restored on the success path, which can leak mocks; wrap the spy
restoration in a finally block so unlinkSpy.mockRestore() always runs (for the
tests referencing unlinkSpy, clearAccounts, testStoragePath, resetMarkerPath and
existsSync), and apply the same finally-based restore fix to the similar test at
the other location that uses unlinkSpy (the block spanning lines testing EPERM
behavior) to ensure deterministic Vitest behavior across failures.
- Around line 2984-3001: Modify the test to exercise the branch where the
primary file is already absent before the WAL unlink fails: create the WAL file
(`${testStoragePath}.wal`) but ensure the primary (`testStoragePath`) is removed
(e.g., await fs.unlink(testStoragePath) or do not create it) before calling
clearAccounts(); keep the fs.unlink mock (vi.spyOn(fs, "unlink")) that throws an
error with code "EBUSY" only when targetPath matches `${testStoragePath}.wal`;
call await expect(clearAccounts()).rejects.toMatchObject({ code: "EBUSY" }) and
then assert existsSync(getIntentionalResetMarkerPath(testStoragePath)) is false;
finally restore the unlinkSpy—this will exercise the primary-absent + wal EBUSY
branch referenced by getIntentionalResetMarkerPath and clearAccounts.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: b58ef145-73bb-4756-a33f-6800c3a0bf73
📒 Files selected for processing (6)
README.mdlib/codex-manager/commands/report.tstest/codex-manager-report-command.test.tstest/config-explain.test.tstest/fetch-helpers.test.tstest/storage.test.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (2)
test/**
⚙️ CodeRabbit configuration file
tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.
Files:
test/config-explain.test.tstest/storage.test.tstest/codex-manager-report-command.test.tstest/fetch-helpers.test.ts
lib/**
⚙️ CodeRabbit configuration file
focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.
Files:
lib/codex-manager/commands/report.ts
🔇 Additional comments (16)
test/fetch-helpers.test.ts (5)
528-540: lgtm - this addresses the past review request for mixed entitlement + rate-limit signal test.the test at lines 528-540 verifies that
usage_not_included(entitlement) inerror.codetakes precedence overusage_limit_reached(rate-limit) inerror.type, resulting in 403 with norateLimit. this documents the priority behavior in the implementation.
1044-1077: lgtm - edge cases for free-text, structured rate-limit codes, and malformed json 404s are now covered.
- lines 1044-1051: free-text "usage limit exceeded" body stays 404 (no structured signal)
- lines 1053-1068: structured
rate_limit_exceededcode remaps to 429- lines 1070-1077: malformed json body safely falls through to 404 without throwing
the malformed json test at lines 1070-1077 addresses the past review request for regression coverage on the
catchblock inlib/request/fetch-helpers.ts:992-994.
467-480: lgtm - test coverserror.codesignal path for usage-limit 404 → 429 remapping.the test properly verifies that a 404 with
error.code: 'usage_limit_reached'gets remapped to 429 with a positiveretryAfterMs. assertions are complete.
482-509: lgtm - coverserror.typesignal and code/type disagreement scenarios.lines 482-495 test when only
error.typecarries the rate-limit signal, and lines 497-509 verify behavior whencodeandtypedisagree (both still indicate rate-limit). good coverage for the signal normalization logic inlib/request/fetch-helpers.ts.
511-526: lgtm - entitlement-style 404 → 403 and non-usage 404 unchanged tests are solid.lines 511-526 verify that
error.type: 'usage_not_included'remaps to 403 with the expected payload. lines 542-550 confirm that genericnot_foundcodes stay as 404.Also applies to: 542-550
test/config-explain.test.ts (1)
131-146: good coverage for json-safe retry max-retries output.this test in
test/config-explain.test.ts:131-146correctly locks in numeric0behavior and verifies round-trip serialization.README.md (1)
202-203: good fix: storage table now includes the runtime observability artifact.line 202 now matches the persisted snapshot path in
lib/runtime/runtime-observability.ts:48-57. this closes the local-state audit gap cleanly.lib/codex-manager/commands/report.ts (5)
63-75: parsePositiveIntegerOption correctly rejects malformed inputs.this addresses the past review concern about
Number.parseInttruncating partial numeric input. the/^\d+$/regex properly rejects "1.9", "1e3", "2foo" before parsing. theNumber.isSafeIntegercheck and< 1guard are correct.
160-213: cli flag parsing for budget limits looks correct.--cached-only, --max-accounts, and --max-probes parsing follows the existing pattern and uses the new parsePositiveIntegerOption helper. error messages are consistent.
406-412: good: in-memory patch applied only after persistence succeeds.
lib/codex-manager/commands/report.ts:408callsapplyRefreshedAccountPatchonly afterpersistRefreshedAccountPatchat line 393 succeeds. this ensures the in-memory snapshot isn't mutated if persistence fails, matching the new regression test in the test file.
509-515: selectedReason derivation looks correct.the fallback chain
forecastResults[...]?.reasons[0] ?? recommendation.reasonhandles missing forecast entries gracefully.
592-596: runtime traffic logging looks fine.no sensitive data exposed - just request counters.
test/codex-manager-report-command.test.ts (4)
53-60: inspectStorageHealth mock is deterministic.returns a hardcoded
StorageHealthSummarywith no randomness or timing dependencies.
103-117: good regression coverage for malformed numeric flags.the parameterized tests for "1.9", "1e3", "2foo" directly address the past review concern. both --max-accounts and --max-probes variants are covered.
143-173: budget enforcement test is well-structured.verifies both account consideration limit and probe execution limit with appropriate assertions on the JSON output.
175-193: cached-only mode test correctly verifies refresh is skipped.assertions confirm no refresh attempt and appropriate error message in probeErrors.
| - Requests fail fast with a pool cooldown message: wait for the cooldown window or inspect `codex auth status` | ||
| - Requests fail fast after repeated upstream 5xx errors: inspect `codex auth report --json` for runtime traffic and cooldown details |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# verify whether windows-specific EBUSY/EPERM handling and messaging exists in storage clear paths/docs.
# expected: code has explicit windows-safe retries/handling and docs mention user action.
rg -n -C3 --type=ts 'EBUSY|EPERM|rename|unlink|rm|account-clear|clearAccountStorageArtifacts'
rg -n -C3 --type=md 'EBUSY|EPERM|windows|storage clear|account clear|cleanup'Repository: ndycode/codex-multi-auth
Length of output: 50382
🏁 Script executed:
# Find the storage/account-clear file
fd -type f -name "*account*clear*" -o -name "*clear*account*" lib/
# Check README.md around the fail-fast bullets
sed -n '270,290p' README.md
# Search specifically for storage-related error handling docs
rg -n "windows|cleanup|storage.*clear|account.*clear" README.md -A2 -B2Repository: ndycode/codex-multi-auth
Length of output: 1505
🏁 Script executed:
# Look for the actual account-clear implementation
find lib/ -type f -name "*.ts" | xargs grep -l "EBUSY\|EPERM\|clearAccountStorage" | head -10Repository: ndycode/codex-multi-auth
Length of output: 372
🏁 Script executed:
# Check what the documentation test expects for storage/windows
rg -B5 -A5 "eperm.*ebusy|ebusy.*eperm" test/documentation.test.tsRepository: ndycode/codex-multi-auth
Length of output: 50
🏁 Script executed:
# Search for account-clear related functions more broadly
rg -n "clearAccountStorageArtifacts|account.*clear|storage.*clear|AccountClear" lib/ --type ts -A3 -B1 | head -80Repository: ndycode/codex-multi-auth
Length of output: 1781
🏁 Script executed:
# Find and view the documentation test that checks for eperm/ebusy
sed -n '370,380p' test/documentation.test.tsRepository: ndycode/codex-multi-auth
Length of output: 553
🏁 Script executed:
# Search README for any windows-specific troubleshooting
rg -n -i "windows|ebusy|eperm|file.*lock|access.*denied" README.mdRepository: ndycode/codex-multi-auth
Length of output: 191
🏁 Script executed:
# Read the account-clear.ts file to see EBUSY/EPERM handling
cat lib/storage/account-clear.tsRepository: ndycode/codex-multi-auth
Length of output: 1293
🏁 Script executed:
# Find where fieldInventory is built in documentation.test.ts
rg -B20 'expect.*fieldInventory.*eperm' test/documentation.test.ts | head -60Repository: ndycode/codex-multi-auth
Length of output: 957
🏁 Script executed:
# Check if README mentions account storage or storage clearing
rg -n -i "account storage|storage.*clear|clean.*storage" README.md -B2 -A2Repository: ndycode/codex-multi-auth
Length of output: 477
🏁 Script executed:
# Check docs/development/CONFIG_FIELDS.md to see what Windows notes exist
rg -B5 -A5 "eperm|ebusy|windows.*note|concurrency.*windows" docs/development/CONFIG_FIELDS.md | head -80Repository: ndycode/codex-multi-auth
Length of output: 50
🏁 Script executed:
# Check if CONFIG_FIELDS.md exists and has content
cat docs/development/CONFIG_FIELDS.md | head -100Repository: ndycode/codex-multi-auth
Length of output: 3139
🏁 Script executed:
# Search for where eperm and ebusy are mentioned in docs
fd --type f -path "docs/*" | xargs rg -l "eperm|ebusy" 2>/dev/nullRepository: ndycode/codex-multi-auth
Length of output: 311
🏁 Script executed:
# View the troubleshooting section of README more broadly
rg -n "troubleshooting|common symptoms|diagnostics" README.md -i -A 30 | head -100Repository: ndycode/codex-multi-auth
Length of output: 3603
🏁 Script executed:
# View more of CONFIG_FIELDS.md to find eperm/ebusy sections
cat docs/development/CONFIG_FIELDS.md | grep -B5 -A5 -i "eperm\|ebusy\|windows"Repository: ndycode/codex-multi-auth
Length of output: 709
🏁 Script executed:
# Check if there are any docs about storage cleanup or Windows issues
fd -type f docs/ | xargs rg -l "storage.*clear\|account.*clear\|ebusy\|eperm" 2>/dev/nullRepository: ndycode/codex-multi-auth
Length of output: 240
🏁 Script executed:
# Check docs/troubleshooting.md to see if Windows storage is documented there
cat docs/troubleshooting.md 2>/dev/null | head -150Repository: ndycode/codex-multi-auth
Length of output: 3692
Add Windows file-locking symptom and recovery path to README troubleshooting.
The lib/storage/account-clear.ts:12 unlink operations can fail with EBUSY or EPERM on Windows during concurrent file access or antivirus scanning. Storage cleanup failures leave stale state and block subsequent codex auth operations. Add one explicit bullet to README.md lines 274-283 under "Common symptoms" so Windows users have a direct recovery path:
- Storage cleanup fails with `EBUSY` / `EPERM` (Windows): run `codex auth doctor --fix` to retry, or manually remove `~/.codex/multi-auth/<project-key>/` and re-login
Also add retry logic with exponential backoff to lib/storage/account-clear.ts for transient EBUSY/EPERM errors during file removal, mirroring the pattern already used in lib/quota-cache.ts:154-171 and test/quota-cache.test.ts:136-176.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@README.md` around lines 278 - 279, Add a Windows-specific troubleshooting
bullet to README.md under the "Common symptoms" section (around lines 274-283)
that reads: "Storage cleanup fails with `EBUSY` / `EPERM` (Windows): run `codex
auth doctor --fix` to retry, or manually remove
`~/.codex/multi-auth/<project-key>/` and re-login". In
lib/storage/account-clear.ts, update the unlink/remove logic (where unlink is
called around line 12) to handle transient Windows file-lock errors by
implementing exponential-backoff retries for EBUSY and EPERM: catch the
unlink/fs.rm errors in the function that performs account cleanup, loop with a
small base delay doubling each attempt, stop after a bounded max attempts, and
only rethrow non-transient errors; mirror the same retry/backoff parameters and
error checks used in lib/quota-cache.ts (the pattern at lines 154-171) so
behavior is consistent with quota-cache test expectations.
| await clearPath(params.path, true); | ||
| await clearPath(params.walPath, true); | ||
| await fs.writeFile( | ||
| params.resetMarkerPath, | ||
| JSON.stringify({ version: 1, createdAt: Date.now() }), | ||
| { encoding: "utf-8", mode: 0o600 }, | ||
| ); |
There was a problem hiding this comment.
reset marker written after file deletion
the old code wrote the reset marker first to fence off stale re-use, then deleted the files. the new order is: delete path → delete walPath → write marker. if clearPath(params.walPath, true) throws after all 5 retries — which happens on windows when EBUSY/EPERM persists — the function propagates the error and the marker is never written. on the next startup, path is gone but walPath still exists with no reset marker, so inspectStorageHealth detects a recoverable wal and restores stale data the caller intended to wipe. the original first-marker ordering was deliberate crash-safety for this exact scenario.
| await clearPath(params.path, true); | |
| await clearPath(params.walPath, true); | |
| await fs.writeFile( | |
| params.resetMarkerPath, | |
| JSON.stringify({ version: 1, createdAt: Date.now() }), | |
| { encoding: "utf-8", mode: 0o600 }, | |
| ); | |
| await fs.writeFile( | |
| params.resetMarkerPath, | |
| JSON.stringify({ version: 1, createdAt: Date.now() }), | |
| { encoding: "utf-8", mode: 0o600 }, | |
| ); | |
| await clearPath(params.path, true); | |
| await clearPath(params.walPath, true); |
Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/storage/account-clear.ts
Line: 58-64
Comment:
**reset marker written after file deletion**
the old code wrote the reset marker *first* to fence off stale re-use, then deleted the files. the new order is: delete `path` → delete `walPath` → write marker. if `clearPath(params.walPath, true)` throws after all 5 retries — which happens on windows when `EBUSY`/`EPERM` persists — the function propagates the error and the marker is never written. on the next startup, `path` is gone but `walPath` still exists with no reset marker, so `inspectStorageHealth` detects a recoverable wal and restores stale data the caller intended to wipe. the original first-marker ordering was deliberate crash-safety for this exact scenario.
```suggestion
await fs.writeFile(
params.resetMarkerPath,
JSON.stringify({ version: 1, createdAt: Date.now() }),
{ encoding: "utf-8", mode: 0o600 },
);
await clearPath(params.path, true);
await clearPath(params.walPath, true);
```
How can I resolve this? If you propose a fix, please make it concise.| @@ -1,4 +1,4 @@ | |||
| const MAX_RETRY_HINT_MS = 5 * 60 * 1000; | |||
| const MAX_RETRY_HINT_MS = 24 * 60 * 60 * 1000; | |||
There was a problem hiding this comment.
24 h retry-after cap can lock accounts for a full day
MAX_RETRY_HINT_MS was raised from 5 minutes to 24 hours. in getRateLimitBackoff, the final delayMs is Math.max(baseDelay, jitteredDelay). baseDelay comes directly from serverRetryAfterMs (the parsed Retry-After header value), while jitteredDelay is capped at maxBackoffMs (default 60 s). so Math.max(86_400_000, 60_000) = 86,400,000 ms — the full baseDelay wins and is passed straight to accountManager.setAccountCooldown, locking the account for 24 hours. one misbehaving upstream (or a misconfigured header) silently makes every matching account unusable for a full day with no operator warning. the 5-minute cap was an effective upper bound. if intentionally large server hints need to be honored, add an explicit opt-in config knob rather than raising the shared ceiling.
| const MAX_RETRY_HINT_MS = 24 * 60 * 60 * 1000; | |
| const MAX_RETRY_HINT_MS = 5 * 60 * 1000; |
Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/request/response-metadata.ts
Line: 1
Comment:
**24 h retry-after cap can lock accounts for a full day**
`MAX_RETRY_HINT_MS` was raised from 5 minutes to 24 hours. in `getRateLimitBackoff`, the final `delayMs` is `Math.max(baseDelay, jitteredDelay)`. `baseDelay` comes directly from `serverRetryAfterMs` (the parsed `Retry-After` header value), while `jitteredDelay` is capped at `maxBackoffMs` (default 60 s). so `Math.max(86_400_000, 60_000)` = 86,400,000 ms — the full `baseDelay` wins and is passed straight to `accountManager.setAccountCooldown`, locking the account for 24 hours. one misbehaving upstream (or a misconfigured header) silently makes every matching account unusable for a full day with no operator warning. the 5-minute cap was an effective upper bound. if intentionally large server hints need to be honored, add an explicit opt-in config knob rather than raising the shared ceiling.
```suggestion
const MAX_RETRY_HINT_MS = 5 * 60 * 1000;
```
How can I resolve this? If you propose a fix, please make it concise.release: rebuild open main PR wave with review fixes
Summary
mainIncludes
#357,#358,#359,#360,#361,#362,#363,#364,#365,#366,#370,#374,#375, and#386#367,#368,#369,#371,#372,#373,#376,#377,#378,#379,#380,#381,#382,#383,#384, and#385Validation
npm run lintnpm run typechecknpm testnpm run buildNotes
#357through#386; those PRs have been closed in favor of this branchmainin dependency order and validated after the stacked request-reliability/runtime-observability chain was folded inrebuild/main-open-prsor back out the grouped rebuild commits if the combined wave needs to be unwoundnote: greptile review for oc-chatgpt-multi-auth. cite files like
lib/foo.ts:123. confirm regression tests + windows concurrency/token redaction coverage.Greptile Summary
this is a large integration wave folding ~30 PRs into one branch. it ships: configurable rate-limit backoff with jitter, per-request finite outbound attempt budgets, pool-exhaustion and server-burst fast-fail cooldowns, adaptive stream failover candidate selection (filters cooldown + rate-limit state), runtime observability snapshots with atomic temp-rename persistence, storage health inspection with silent wal probing, config-key-driven live-sync restarts, and session-affinity write-version guards against stale async overwrites. test coverage across the new modules is substantial and the overall architecture is sound.
two issues need attention before merge:
lib/storage/account-clear.ts— the reset marker is now written after the primary storage files are deleted. a crash or persistentEBUSY/EPERMonwalPathduring the clear leaves no marker, and the next startup performs stale wal recovery — silently undoing the wipe. this is a windows filesystem safety regression introduced by the new sequencing.lib/request/response-metadata.ts—MAX_RETRY_HINT_MSwas raised from 5 minutes to 24 hours. a single misbehavingRetry-Afterheader can now lock an account for a full day becausedelayMs = Math.max(baseDelay, jitteredDelay)surfaces the fullbaseDelaypast the per-backoffmaxBackoffMscap.other notable changes:
retryAllAccountsRateLimiteddefault flipped tofalse,retryAllAccountsMaxRetriesdefault changed to0— this is a breaking behavioral change for users relying on the previous defaults; worth a changelog notewithStreamingFailovernow aborts replay whenemittedBytes > 0, preventing duplicate stream content and side-effectful tool replay — correctCooldownReasonunion inmigrations.tsgainsserver-errorto align with thefailure-policy.tsrename — consistentConfidence Score: 4/5
safe to merge after addressing the account-clear ordering regression and the 24 h retry-after cap
two P1 findings remain: the account-clear reset marker ordering creates a windows crash-safety regression that can silently restore stale data after a failed wipe, and the 24 h MAX_RETRY_HINT_MS can lock accounts for a full day on a single bad Retry-After header. both are in changed lines in this PR and both have straightforward fixes. the rest of the work is well-structured and the test coverage is thorough.
lib/storage/account-clear.ts (reset marker ordering), lib/request/response-metadata.ts (24 h retry-after cap)
Important Files Changed
Flowchart
%%{init: {'theme': 'neutral'}}%% flowchart TD A[Incoming Request] --> B{Pool exhaustion\ncooldown active?} B -- yes --> C[Return 429\nfast-fail] B -- no --> D{Server burst\ncooldown active?} D -- yes --> E[Return 503\nfast-fail] D -- no --> F[Compute outbound\nattempt budget] F --> G[Account selection\nfrom pool] G --> H{Budget\nremaining?} H -- no --> I[Return 503\nbudget exhausted] H -- yes --> J[Fetch to upstream] J --> K{Response status} K -- 2xx --> L[Clear pool +\nburst cooldowns] L --> M[Sync observability\nsnapshot] M --> N[Return success] K -- 429 --> O{Short cooldown\nAND short retry\ncount ok?} O -- yes --> P[Sleep + retry\nsame account] P --> H O -- no --> Q[Account cooldown\n→ rotate] Q --> G K -- 5xx --> R[recordServerBurstFailure] R --> S{Burst threshold\nreached?} S -- yes --> T[Arm burst cooldown\n→ early 503 next req] S -- no --> U[Account cooldown\n→ rotate] U --> G K -- stream ok --> V[withStreamingFailover] V --> W{emittedBytes > 0\nor failovers exhausted?} W -- yes --> X[Stop failover] W -- no --> Y[Adaptive candidate\nselection] Y --> HPrompt To Fix All With AI
Reviews (7): Last reviewed commit: "test: normalize fetch helpers review nit" | Re-trigger Greptile