Skip to content

rollup: integrate open fix PRs into a single review branch#432

Merged
ndycode merged 48 commits intomainfrom
rollup/fix-prs
Apr 18, 2026
Merged

rollup: integrate open fix PRs into a single review branch#432
ndycode merged 48 commits intomainfrom
rollup/fix-prs

Conversation

@ndycode
Copy link
Copy Markdown
Owner

@ndycode ndycode commented Apr 18, 2026

Summary

Rolls up the currently open fix PRs (#414#429, #431) into a single integration branch so the combined fix set can be reviewed as one unit.

This branch merges, in order:

Integration note

While building the rollup branch, one hidden integration issue surfaced:

  • PR fix(accounts): invalidate runtime tracker key on removeAccount (HI-01) #419 updated lib/accounts.ts to call HealthScoreTracker.clearNumericKeysAtOrAbove() and TokenBucketTracker.clearNumericKeysAtOrAbove(), but those helper methods never actually landed in lib/rotation.ts on the source branch.
  • This rollup branch includes one integration-only commit (7f50455) adding those two methods so the combined branch typechecks and is self-contained.

The source PR #419 should receive the same patch before being merged independently.

Verification

Passes on the rollup branch

  • npm run typecheck
  • npm run lint
  • npm run build
  • npm run audit:ci
  • npm run pack:check
  • npm run vendor:verify
  • npm run clean:repo:check

Test status

The combined branch does not currently have a clean npm test, but the failures are already present on current main in this environment and are not introduced by the rollup merges:

  • test/codex-bin-wrapper.test.ts
  • test/benchmark-runtime-path-script.test.ts

They appear to be baseline failures under the current local Node/runtime combination (syntax/module-loading assumptions in temporary fixture scripts), not regressions from this fix set.

Why this PR exists

You asked for a single isolated PR from main so you can review the entire fix set in one place rather than merge 17 independent PRs one-by-one.

This PR is that integration view.

note: greptile review for oc-chatgpt-multi-auth. cite files like lib/foo.ts:123. confirm regression tests + windows concurrency/token redaction coverage.

Greptile Summary

rollup of 17 fix PRs covering oauth token-leak redaction, cursor/tracker correctness after account removal, double-exponential backoff fix (REQ-HIGH-01), SSE buffer O(n²)→O(n) (REQ-HIGH-03), optimistic-lock settings writes (ESTALE), import size guard, shutdown dedup, select Q-cancel, and thinking-part id uniqueness (RPTU-001).

  • P1 — windows regression: Atomics.wait was removed from the sync retry loops in writeSettingsRecordSync and trySnapshotUnifiedSettingsBackupSync; both now busy-spin through all 4 retries in microseconds, so EBUSY/EPERM locks on windows will not recover. the async path still has await sleep(10 * 2 ** attempt) — the sync paths need the same treatment restored.

Confidence Score: 4/5

safe to merge after restoring sync backoff delay in unified-settings.ts — one P1 windows regression blocks clean approval

the rollup is high quality with strong test coverage additions and well-documented fixes throughout. the single P1 is the missing Atomics.wait in the sync retry loops — a targeted one-liner restore per loop. everything else is P2 or lower. score would be 5 after that patch.

lib/unified-settings.ts — sync retry loops in writeSettingsRecordSync and trySnapshotUnifiedSettingsBackupSync need Atomics.wait restored

Important Files Changed

Filename Overview
lib/unified-settings.ts adds mtime-based optimistic locking (ESTALE) for async save with 3 retry attempts; removes Atomics.wait from sync retry loops, breaking backoff on Windows EBUSY/EPERM errors (P1)
lib/auth/auth.ts adds sanitizeOAuthResponseBodyForLog to redact tokens from error log messages; JSON-walk + regex fallback scrub covers both structured and URL-encoded responses; security improvement
lib/auth/server.ts stores _lastState on server instance and adds state-mismatch check in waitForCode; check is redundant (HTTP handler already validates state before setting _lastCode), but harmless
lib/accounts.ts HI-02: cursorByFamily advanced after markSwitched/markSwitchedMutex; removes stale numeric tracker keys and cached _runtimeTrackerKey after account splice+reindex; logic is correct
lib/rotation.ts integration-only commit 7f50455: adds clearNumericKeysAtOrAbove on HealthScoreTracker and TokenBucketTracker; JSON-parse key extraction and numeric comparison are correct
lib/shutdown.ts deduplicates concurrent runCleanup() calls via cleanupPromise; finally resets to null so future calls after shutdown are no-ops (empty cleanupFunctions); concurrency fix is correct
lib/request/rate-limit-backoff.ts fixes REQ-HIGH-01 double-exponential: calculateBackoffMs now applies only reason multiplier to already-progressed delay; _attempt param retained for API compat but unused
lib/request/response-handler.ts REQ-HIGH-03: SSE buffer changed from string concat (O(n²)) to chunk array + join (O(n)); size check uses Buffer.byteLength for correct UTF-8 byte counting; pre-append guard is correct
lib/recovery/storage.ts RPTU-001: prependThinkingPart now generates unique partId per invocation; module-level thinkingPartCounter shared across test runs may affect snapshot assertions
lib/storage/import-export.ts adds 4 MB MAX_IMPORT_BYTES guard using fs.open + handle.stat before reading; try/finally ensures handle.close() even on error; correct fix for unbounded file reads
lib/ui/select.ts Q hotkey now unconditionally calls finish(null) regardless of options.onInput, implementing cancel-without-save; const hotkey scoping in switch default case is valid TypeScript

Sequence Diagram

sequenceDiagram
    participant Caller
    participant CodexManager
    participant AuthServer as auth/server.ts
    participant AuthFlow as auth/auth.ts
    participant UnifiedSettings as unified-settings.ts
    participant Rotation as rotation.ts
    participant Accounts as accounts.ts

    Caller->>CodexManager: codex auth login
    CodexManager->>AuthServer: startLocalOAuthServer({state})
    AuthServer-->>CodexManager: {waitForCode(expectedState)}
    Note over AuthServer: HTTP handler validates state before storing _lastCode
    CodexManager->>AuthFlow: exchangeAuthorizationCode(code)
    AuthFlow->>AuthFlow: sanitizeOAuthResponseBodyForLog(body)
    Note over AuthFlow: redacts refresh_token/access_token before logError

    Caller->>Accounts: removeAccount(idx)
    Accounts->>Rotation: HealthScoreTracker.clearNumericKeysAtOrAbove(idx)
    Accounts->>Rotation: TokenBucketTracker.clearNumericKeysAtOrAbove(idx)
    Accounts->>Accounts: reindex + invalidate _runtimeTrackerKey

    Caller->>UnifiedSettings: saveUnifiedPluginConfig(config)
    UnifiedSettings->>UnifiedSettings: getUnifiedSettingsMtimeMs()
    UnifiedSettings->>UnifiedSettings: readSettingsRecordAsyncInternal()
    loop up to 3 ESTALE retries
        UnifiedSettings->>UnifiedSettings: writeSettingsRecordAsync(record, {expectedMtimeMs})
        alt mtime changed
            UnifiedSettings->>UnifiedSettings: throw ESTALE, re-read + retry
        end
    end
Loading

Comments Outside Diff (2)

  1. lib/unified-settings.ts, line 186-194 (link)

    P2 sync retry loop has no delay after Atomics.wait removal

    trySnapshotUnifiedSettingsBackupSync and writeSettingsRecordSync both loop up to 5 times on retryable fs errors (EBUSY/EPERM/EAGAIN). the removed Atomics.wait call provided the inter-attempt backoff (10ms, 20ms, 40ms, 80ms). without it, all 5 retries fire in sub-millisecond succession, giving windows antivirus no time to release a file lock — exactly the scenario AGENTS.md flags as a windows filesystem safety concern.

    the async rename path still calls await sleep(10 * 2 ** attempt), so only the sync paths regressed. if the removal was to avoid Atomics.wait throwing in non-worker contexts, a simple for…await or a sleepSync via Atomics.waitAsync polyfill would preserve the delay without the main-thread restriction.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: lib/unified-settings.ts
    Line: 186-194
    
    Comment:
    **sync retry loop has no delay after `Atomics.wait` removal**
    
    `trySnapshotUnifiedSettingsBackupSync` and `writeSettingsRecordSync` both loop up to 5 times on retryable fs errors (EBUSY/EPERM/EAGAIN). the removed `Atomics.wait` call provided the inter-attempt backoff (10ms, 20ms, 40ms, 80ms). without it, all 5 retries fire in sub-millisecond succession, giving windows antivirus no time to release a file lock — exactly the scenario `AGENTS.md` flags as a windows filesystem safety concern.
    
    the async rename path still calls `await sleep(10 * 2 ** attempt)`, so only the sync paths regressed. if the removal was to avoid `Atomics.wait` throwing in non-worker contexts, a simple `for…await` or a `sleepSync` via `Atomics.waitAsync` polyfill would preserve the delay without the main-thread restriction.
    
    How can I resolve this? If you propose a fix, please make it concise.

    Fix in Codex

  2. lib/unified-settings.ts, line 184-193 (link)

    P1 sync retry loops lost their backoff delay

    Atomics.wait was removed from both trySnapshotUnifiedSettingsBackupSync and writeSettingsRecordSync. these were the only mechanism for a synchronous sleep between retry attempts. without them, the loops spin through all 4 retries in microseconds — on windows, EBUSY/EPERM locks held by antivirus or another process won't release in that time, so every retry fails immediately. the async variant correctly retains await sleep(10 * 2 ** attempt) (line 216); the sync paths need the equivalent.

    same patch needed in writeSettingsRecordSync's inner catch block.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: lib/unified-settings.ts
    Line: 184-193
    
    Comment:
    **sync retry loops lost their backoff delay**
    
    `Atomics.wait` was removed from both `trySnapshotUnifiedSettingsBackupSync` and `writeSettingsRecordSync`. these were the only mechanism for a synchronous sleep between retry attempts. without them, the loops spin through all 4 retries in microseconds — on windows, EBUSY/EPERM locks held by antivirus or another process won't release in that time, so every retry fails immediately. the async variant correctly retains `await sleep(10 * 2 ** attempt)` (line 216); the sync paths need the equivalent.
    
    
    
    same patch needed in `writeSettingsRecordSync`'s inner `catch` block.
    
    How can I resolve this? If you propose a fix, please make it concise.

    Fix in Codex

Fix All in Codex

Prompt To Fix All With AI
This is a comment left during a code review.
Path: lib/unified-settings.ts
Line: 184-193

Comment:
**sync retry loops lost their backoff delay**

`Atomics.wait` was removed from both `trySnapshotUnifiedSettingsBackupSync` and `writeSettingsRecordSync`. these were the only mechanism for a synchronous sleep between retry attempts. without them, the loops spin through all 4 retries in microseconds — on windows, EBUSY/EPERM locks held by antivirus or another process won't release in that time, so every retry fails immediately. the async variant correctly retains `await sleep(10 * 2 ** attempt)` (line 216); the sync paths need the equivalent.

```suggestion
		} catch (error) {
			if (!isRetryableFsError(error) || attempt >= 4) {
				return;
			}
			Atomics.wait(new Int32Array(new SharedArrayBuffer(4)), 0, 0, 10 * 2 ** attempt);
		}
```

same patch needed in `writeSettingsRecordSync`'s inner `catch` block.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: lib/auth/server.ts
Line: 114-122

Comment:
**`_lastState` check in `waitForCode` is unreachable**

the HTTP handler at line 49 already validates `url.searchParams.get("state") !== state` and returns 400 before setting `_lastCode`. so when `_lastCode` is non-null, `_lastState` is always the same closed-over `state` value, and `waitForCode` is called with that same value — the `!== expectedState` branch can never fire. the log warning `"Discarding OAuth callback due to state mismatch in waitForCode"` will never appear. consider removing the redundant check, or document it explicitly as defense-in-depth so future readers aren't misled.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: lib/recovery/storage.ts
Line: 148-178

Comment:
**module-level `thinkingPartCounter` is a shared-mutable singleton**

`thinkingPartCounter` is module state, so it accumulates across calls within a process. in vitest runs where the module is not re-imported between tests (common with `vi.mock` module caching), the counter value leaks across test cases. the random suffix provides uniqueness, but snapshot tests that assert the exact `partId` prefix will see non-deterministic counter digits. consider exposing a `__resetThinkingPartCounter` test-only export (similar to the `__testOnly` barrel pattern used in `settings-hub/index.ts`) to allow deterministic counter resets in tests.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: lib/request/rate-limit-backoff.ts
Line: 240-252

Comment:
**`calculateBackoffMs` API now silently ignores `attempt`**

renaming the parameter to `_attempt` fixes REQ-HIGH-01 (double-exponential), but callers that explicitly pass `attempt` receive no type-level indication that the argument is ignored. the function is `export`-ed, so external callers relying on the old `2^(attempt-1)` progression will silently get a flat multiplier instead. the doc comment is good; consider whether a deprecation notice or a runtime assertion would help callers discover the behavioural change during testing.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (2): Last reviewed commit: "test(rollup): restore auth-list empty-st..." | Re-trigger Greptile

ndycode added 30 commits April 18, 2026 06:22
Deep audit finding REQ-HIGH-01 (.sisyphus/notepads/deep-audit/reports/request.json).

getRateLimitBackoff() already applies baseDelay * 2^(attempt-1) + jitter. getRateLimitBackoffWithReason() called calculateBackoffMs() which applied 2^(attempt-1) AGAIN, causing delays to hit the 60s maxBackoffMs cap after just 2 consecutive 429s instead of following the documented exponential curve.

Fix: calculateBackoffMs now applies only the reason multiplier to the already-exponential delayMs. Cap via maxBackoffMs is preserved.

Added regression test asserting 3-attempt delay stays well under cap and that attempt=1 to attempt=3 ratio is ~4x, not >16x.
Deep audit findings (2 CRITICAL) from .sisyphus/notepads/deep-audit/reports/codex-cli-manager.json.

The following modules exist with full implementations + dedicated test files
but are NEVER imported by production code. The dispatcher in codex-manager.ts
routes 'codex auth doctor/fix/verify-flagged' to runRepairDoctor/runRepairFix/
runRepairVerifyFlagged from repair-commands.ts instead. auth-commands.ts
similarly defines runAuthLogin but codex-manager.ts uses its own inline
implementation. Any behavior divergence between the parallel implementations
goes uncaught - tests validate the ghost, not the ship.

Deleted:
- lib/codex-manager/commands/doctor.ts (dead, 568 LOC)
- lib/codex-manager/commands/fix.ts (dead, 564 LOC)
- lib/codex-manager/commands/verify-flagged.ts (dead, 450 LOC)
- lib/codex-manager/auth-commands.ts (dead, 885 LOC)
- test/codex-manager-doctor-command.test.ts (ghost, 238 LOC)
- test/codex-manager-fix-command.test.ts (ghost, 258 LOC)
- test/codex-manager-verify-flagged-command.test.ts (ghost, 330 LOC)
- test/codex-manager-auth-commands.test.ts (ghost, 952 LOC)

Also removed the documentation.test.ts existence + content assertions that
pinned auth-commands.ts. The live switch.ts retains the identical
'Missing index. Usage: codex auth switch <index>' content assertion, so
user-visible copy drift is still guarded.

Production runDoctor/runFix/runVerifyFlagged in repair-commands.ts unchanged.
Production inline runAuthLogin in codex-manager.ts unchanged.
User-visible behavior: IDENTICAL (these modules never ran).

Verification (git grep on origin/main with production filter):
- runDoctorCommand: only defined in commands/doctor.ts, no production callers.
- runFixCommand: only defined in commands/fix.ts, no production callers.
- runVerifyFlaggedCommand: only defined in commands/verify-flagged.ts, no production callers.
- runAuthLogin (in auth-commands.ts): shadowed by inline live one in codex-manager.ts.
- runSwitch / runBest (in auth-commands.ts): shadowed by runSwitchCommand/runBestCommand from commands/switch.ts and commands/best.ts.
- persistAndSyncSelectedAccount (in auth-commands.ts): shadowed by inline live one in codex-manager.ts (referenced from commands/best.ts and commands/switch.ts via injected deps).
- withSerializedBestLiveRun / clampPreservedActiveIndexByFamily: only referenced internally within auth-commands.ts.

Test count change: 3529 -> 3492 (37 ghost tests removed; no real-coverage loss).
Test file count: 232 -> 228 (-4 ghost test files).
LOC removed: 4255 lines across 8 deleted files + 10 modified lines in documentation.test.ts.
…-001)

PR #395 redacted OAuth URL query params in user-facing output but opaque
refresh and access tokens can still leak via log messages when raw HTTP
token-endpoint response bodies are concatenated into logError calls.

The logger's TOKEN_PATTERNS recognises JWT (eyJ...), long hex, sk-* api
keys, and Bearer substrings, but ChatGPT refresh tokens are opaque
strings that do NOT match those patterns. Key-based redaction only
fires when a token sits under a SENSITIVE_KEYS object key, not when it
is interpolated into a free-form message string.

Identified leak sites:
- lib/auth/auth.ts exchangeAuthorizationCode: logError uses raw
  response body text after a failed /oauth/token request; the same
  text is returned verbatim as TokenResult.message which downstream
  code (codex-manager.ts logError paths) also writes to log output.
- lib/auth/auth.ts refreshAccessToken: same pattern on refresh failure.

Fix (targeted, no logger hot-path change):
- Add sanitizeOAuthResponseBodyForLog helper in lib/auth/auth.ts.
  It JSON-parses the body and masks sensitive keys (refresh_token,
  access_token, id_token, camelCase variants, token, code,
  code_verifier). On non-JSON bodies it falls back to regex scrubs
  for JSON-style key:value pairs and urlencoded key=value pairs.
- Route failed-refresh and failed-exchange bodies through the helper
  both into the logError message AND into the returned
  TokenResult.message so downstream consumers also see the sanitized
  form.

Regression tests in test/auth.test.ts assert:
- sanitizeOAuthResponseBodyForLog masks nested / camelCase /
  urlencoded / malformed-JSON token values while preserving
  non-sensitive fields and plain text.
- refreshAccessToken and exchangeAuthorizationCode never pass the
  opaque token string into logError.mock.calls, and the returned
  TokenResult.message does not contain the opaque token value.
- Non-sensitive content (error code, error_description) still
  reaches the log message.

Verification:
- npm run typecheck: clean
- npm run lint: clean
- npm test: 232 files / 3539 tests pass
Addresses REQ-HIGH-03 deep-audit finding.

Previous: fullText += decoder.decode(...) caused O(n^2) string concatenation in convertSseToJson; MAX_SSE_SIZE check ran AFTER append so memory briefly held chunk + 10MB before throwing.

Fix: accumulate chunks in string[] array; track running size; check size BEFORE append; final join() once at end. Linear time, bounded memory, size-check enforced pre-allocation.

Test asserts pre-append throw when next chunk would exceed cap.
getRuntimeTrackerKey caches the resolved tracker key on the account via
_runtimeTrackerKey. For accounts without accountId/email, the key falls
back to the numeric account.index. When removeAccount splices the pool
and reindexes survivors (acc.index = index), the cached numeric key is
not invalidated, so later lookups keep consulting the pre-reindex
position. Rotation, health, and token-bucket state for that account all
mismatch the current pool order.

Clear _runtimeTrackerKey on each survivor whose cached key was numeric,
right after the reindex loop. Identity-based string keys remain stable
because accountId/email are not affected by array-position changes, so
they do not need invalidation.

Regression test exercises the remove-then-select path: a pool of four
accounts (mixed refresh-only and identity-bearing) primes the runtime
tracker, records a health failure under the pre-reindex key, removes
the identity-bearing account at index 1, and asserts the surviving
refresh-only account reports its new numeric index (1) instead of the
stale cached value (2). The identity account's string key is verified
to survive the reindex unchanged.

Audit reference: .sisyphus/notepads/deep-audit/reports/accounts-rotation.json HI-01
markSwitched updated currentAccountIndexByFamily but left cursorByFamily
pointing at the pre-switch position. Subsequent round-robin passes
(getCurrentOrNextForFamily / getNextForFamily) started from the stale
cursor and could either re-pick the same slot or skip the just-switched
account entirely, dropping the caller's explicit switch intent after a
rate-limit-triggered rotation.

Fix both markSwitched and its mutex-serialized sibling markSwitchedLocked
to advance cursorByFamily[family] to `(account.index + 1) % count`,
matching the convention already used in getCurrentOrNextForFamilyHybrid
and the inner loop of getCurrentOrNextForFamily. No-op when the pool is
empty.

Adds a regression test that walks the cursor to a non-zero position,
marks a different account as switched, and asserts the next rotation
resumes AFTER the marked slot rather than from the stale cursor.

Not covered by PR #399 (which normalized pointers on setAccountEnabled
and getActiveIndexForFamily only).
… (RPTU-001)

prependThinkingPart used a fixed part id (prt_0000000000_thinking) so two
recovery passes on the same messageID would atomically overwrite the same
synthetic thinking-part file, losing any prior state. Recovery paths retry
on failure (recoverThinkingBlockOrder loops over orphan messages and the
caller retries the request), so collisions were reachable in normal
operation rather than only under hypothetical races.

Introduces generateThinkingPartId() which keeps the prt_0000000000_thinking_
prefix so the synthetic part still sorts before real generatePartId() ids
and preserves the prepend semantics relied on by findMessagesWithOrphanThinking
and findMessageByIndexNeedingThinking. The id then appends a hex millisecond
timestamp, a monotonic counter, and a short random suffix so every
invocation writes a distinct file.

Adds a regression test that invokes prependThinkingPart twice on the same
sessionID/messageID and asserts that both calls stage and rename DIFFERENT
target paths with distinct embedded ids, plus a sort-order test that guards
the prepend invariant against future id-format drift.

No behavior change for first-pass recovery. Retries and repeated passes now
accumulate synthetic thinking parts instead of silently clobbering the
earlier one, which matches the observable file-per-part model used by
readParts and stripThinkingParts.
HI-05 deep audit finding flagged two missing tests:
1. selectHybridAccount returns null when all accounts unavailable
2. concurrency test exercising the hybrid selector under parallel mutation

Validation on origin/main found (1) already covered by test/rotation.test.ts
"returns null when all accounts are unavailable (AUDIT-H2 contract)".

The concurrency gap was real: rotation-concurrency.property.test.ts simulates
the critical-section invariants but never calls selectHybridAccount directly.

Add test/property/hybrid-selector-concurrency.property.test.ts covering:
- N parallel selectHybridAccount calls under withRoutingMutex("enabled") that
  each mutate shared HealthScoreTracker / TokenBucketTracker / isAvailable
  state, asserting no two winners return the same index.
- single-slot pool under N parallel callers yields exactly one winner; every
  other caller observes the slot unavailable and receives null.
- N-slot pool saturates exactly once with overflow callers returning null.
- external concurrentCallers observer proves mutual exclusion holds.

Test-only change: no production code touched. Follows the pattern established
in test/property/rotation-concurrency.property.test.ts and uses
__resetRoutingMutexForTests for per-test isolation.
Addresses deep-audit auth finding on manual-paste state binding.

The previous check only rejected on mismatch:
  if (parsed.state && parsed.state !== state) return null;
which allowed a bare code with no state to bypass the state-binding check.

Now the manual-paste path requires state presence and equality before the
callback is accepted. Added regression test for missing state and updated
stale auth-list expectation to match current main behavior.
…E-002)

Addresses STORAGE-002 from the deep storage audit.

readImportFile previously went straight from existsSync() to fs.readFile()
with no size guard, allowing arbitrarily large JSON imports. Add a 4 MiB
MAX_IMPORT_BYTES check via fs.stat() before readFile() and reject oversized
imports before allocating the file contents.

Added focused unit tests covering oversized rejection and under-limit success.
Addresses LIB-HIGH-002 from the deep top-level lib audit.

The sync unified-settings retry paths used Atomics.wait(new SharedArrayBuffer)
as a synchronous sleep in two places:
- trySnapshotUnifiedSettingsBackupSync
- writeSettingsRecordSync rename retry loop

This blocks the event loop under EBUSY/EPERM conditions. Per user decision,
we keep the sync API and drop the blocking sleep instead of refactoring the
sync path to async. Retries now occur immediately in the sync path, while the
async path retains await sleep(...).

Added regression tests asserting retry succeeds and Atomics.wait is never
called for sync rename/copy retry paths.
Addresses RPTU-003 from the deep recovery/UI audit.

The UI help text advertised "Q Back" globally, but select() only handled
arrow keys, enter, and escape. q/Q only worked if a caller manually wired
onInput. confirm() uses select() and therefore silently ignored q.

Fix: handle q/Q as a default cancel/back hotkey in select(). Added focused
select() and confirm() regression tests.
Addresses the second auth HIGH finding from the deep audit.

OAuthServerInfo.waitForCode(state) accepted an expected state parameter in the
public contract and all callers passed it, but the implementation ignored the
argument entirely and simply returned the first captured code. The HTTP callback
handler already validated state before storing _lastCode, but waitForCode did
not re-check the stored state, so the defense-in-depth contract was violated.

Store _lastState alongside _lastCode and reject if it does not match the
expectedState passed to waitForCode. Added a unit test covering the mismatch
case and preserving current integration behavior.
…C-01)

Addresses R-LOGIC-01 / R-TEST-01 from the deep runtime audit.

The async unified-settings save paths read the current settings record, mutate
one section, and write it back without checking whether another process wrote
a newer settings.json in the meantime. That creates a stale-overwrite race.

Fix: capture the observed settings mtime, check it immediately before rename
inside writeSettingsRecordAsync, and on ESTALE re-read the latest settings
record and re-apply the section-scoped mutation up to 3 times.

Added a regression test that injects a concurrent external write between the
read and write and verifies the unrelated sections from the external write are
preserved while the intended pluginConfig mutation still lands.
…-HIGH-003)

Addresses the actionable part of LIB-HIGH-003 from the deep top-level lib audit.

The original finding mixed three concerns: ordering, idempotence, and signal/
beforeExit semantics. The smallest safe fix is to eliminate the actual race:
concurrent runCleanup() calls can currently double-run cleanup handlers.

Fix: cache the in-flight cleanup promise and return it to overlapping callers.
Preserve current FIFO ordering and current signal/beforeExit semantics to avoid
changing the shutdown contract in this PR. Added a focused regression test that
concurrent callers receive the same in-flight promise and only one cleanup run
executes.
Follow-up for PR #419 review comments.

Invalidate cached numeric tracker keys was not sufficient on its own: the
rotation health/token tracker maps still retained entries keyed by the old
numeric slots, so a later refresh-only survivor could inherit another
account's state after reindex.

Fix: clear numeric-keyed tracker entries at or above the removed index for
both HealthScoreTracker and TokenBucketTracker, then keep the regression test
fully numeric by making the shifted survivor refresh-only and asserting both
health and token state are pristine after reindex.
Responds to PR #421 review feedback.

The HI-02 regression test already covered the legacy markSwitched path.
This adds the same assertion through the mutex-serialized
markSwitchedLocked path with routingMutexMode enabled so the concurrency
variant cannot drift from the sync behavior.
Responds to PR #417 review feedback.

The urlencoded regex in sanitizeOAuthResponseBodyForLog matched bare
code= substrings inside longer parameter names like device_code,
causing non-sensitive fields to be redacted. Anchor the regex to the
start of the string or a query/field delimiter so only full parameter
names are scrubbed. Added regression coverage for _code suffix keys.
Responds to PR #418 review feedback.

The pre-append SSE size guard used decoded.length, which counts UTF-16 code
units rather than bytes. Multi-byte UTF-8 chunks (emoji, many CJK chars) could
therefore exceed MAX_SSE_SIZE without tripping the guard at the right point.

Use Buffer.byteLength(decoded, 'utf8') for both the pre-append check and the
running total, and add a regression test covering a multi-byte payload.
Responds to PR #423 review feedback.

Freeze Date.now/Math.random in the double-prepend regression so the test
catches timestamp/counter shape regressions deterministically while still
allowing the short random suffix to vary within the expected base36 shape.
#425)

Responds to PR #425 review feedback.

- eliminate the stat/readFile TOCTOU gap by opening the import file once and
  performing both stat() and readFile() on the same handle
- add exact-boundary coverage at MAX_IMPORT_BYTES
- keep the oversized import rejection guarantee and verify the handle closes
  on both reject and success paths
… oauth bodies

Responds to follow-up PR #417 review feedback.

- extend OAUTH_SENSITIVE_BODY_KEYS with codeVerifier
- recursively scrub token-like substrings inside parsed JSON string values so
  bodies like {"error_description":"... RT_ch_..."} do not leak
- add regression coverage for both leak shapes
…429)

Responds to the PR #429 review comment about the optimistic mtime check.

The previous patch captured the expected mtime in a separate fs.stat() after the
read, leaving a TOCTOU window between reading the file contents and sampling
its metadata. readSettingsRecordAsyncInternal now uses the mtime from the same
open file handle used for the async read and threads that value through the
ESTALE retry loop. Regression test remains green.
ndycode added 17 commits April 18, 2026 17:15
…429)

Responds to the PR #429 review comment.

The optimistic mtime check was taking expectedMtimeMs after the async read,
leaving a window where an external write could update the file between the read
and the mtime snapshot. Swap the order so the expected mtime is sampled first,
then the record is read, and mirror that order on ESTALE retry.

This preserves the existing settings contract and test behavior while closing
the specific stale-overwrite hole called out in review.
The rollup integration exposed that PR #419's branch updates accounts.ts to call
HealthScoreTracker.clearNumericKeysAtOrAbove() and
TokenBucketTracker.clearNumericKeysAtOrAbove(), but the helper methods never
landed in lib/rotation.ts on that branch.

Add the two tiny helper methods directly on the rollup branch so the full
integration set typechecks and the rollup PR is self-contained. PR #419 itself
should receive the same patch before final merge.
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 18, 2026

📝 Walkthrough

Pull Request Summary: Rollup Integration of 16 Fix PRs

INTENTIONAL REFACTORING + CORE FIXES (LOW-TO-MEDIUM RISK): This rollup integrates fixes from PRs #414#429 and #431 and consolidates scattered command implementations into a unified repair-commands.ts module. Commands are properly preserved and wired in the CLI dispatch layer; the apparent large deletion reflects file reorganization, not feature removal.

Core Fixes (High Priority)

  • REQ-HIGH-01: Removes double-exponential backoff by changing calculateBackoffMs to apply only reason multiplier (not additional 2^(attempt-1) growth); regression tests added for jitter extremes and ratio assertions.
  • STORAGE-001: Fixes WAL replay in loadAccountsInternal to honor newer WAL mtimeMs and persist via existing migration path; test coverage added for primary/no-WAL, stale-WAL, and newer-WAL scenarios.
  • HI-05: Prevents stale numeric tracker keys after removeAccount by invalidating cached _runtimeTrackerKey and purging entries via new clearNumericKeysAtOrAbove() methods in HealthScoreTracker and TokenBucketTracker; regression test included.

Security Fix

  • LIB-HIGH-001: Adds sanitizeOAuthResponseBodyForLog() to redact access_token, refresh_token, codeVerifier in OAuth error logs; 13 test cases cover JSON, URL-encoded, and malformed response formats.

Additional Fixes

  • OAuth server now validates callback state parameter.
  • Recovery storage generates unique thinking part IDs to prevent overwrites.
  • SSE buffering switches to linear accumulation with pre-append size cap.
  • Import file size enforcement (4MB max) via file handle pre-stat.
  • Settings sync uses mtime-based retry with expectedMtimeMs consistency checks.
  • UI hotkey 'q' cancels select/confirm prompts.
  • runCleanup() caches in-flight promise to prevent concurrent cleanup runs.
  • Accounts synchronize per-family round-robin cursor after switching.

Refactoring: Command Module Consolidation (Intentional)

The four deleted command files were reorganized, not removed:

  • lib/codex-manager/auth-commands.ts (885L) → split into lib/codex-manager/commands/best.ts, lib/codex-manager/commands/switch.ts, and inline runAuthLogin in codex-manager.ts
  • lib/codex-manager/commands/doctor.ts (568L) → lib/codex-manager/repair-commands.ts (2203L, consolidated with fix + verify-flagged)
  • lib/codex-manager/commands/fix.ts (564L) → repair-commands.ts
  • lib/codex-manager/commands/verify-flagged.ts (450L) → repair-commands.ts

Test files were similarly consolidated:

  • Deleted: codex-manager-auth-commands.test.ts, doctor-command.test.ts, fix-command.test.ts, verify-flagged-command.test.ts (~2,400 test lines)
  • Added: test/repair-commands.test.ts (1,132 lines), test/codex-manager-best-command.test.ts (371 lines), test/codex-manager-switch-command.test.ts (94 lines) = 1,597 lines

CLI dispatch properly wired in lib/codex-manager.ts:

  • if (command === "verify-flagged")runRepairVerifyFlagged(...)
  • if (command === "fix")runRepairFix(...)
  • if (command === "doctor")runRepairDoctor(...)
  • All auth commands (best, switch, login) remain intact

No loss of functionality; test coverage consolidated but ~800 lines reduced overall. The refactoring reduces API surface by consolidating related command logic.

Verification Status

  • Typecheck, lint, build, audit:ci, pack:check, vendor:verify, clean:repo:check: ✅ pass
  • npm test: ❌ pre-existing failures (codex-bin-wrapper.test.ts, benchmark-runtime-path-script.test.ts) unrelated to rollup
  • New regression tests: ✅ comprehensive (backoff ratio, WAL replay, numeric key invalidation, token redaction, SSE buffer, import size, settings retry, recovery uniqueness)
  • Refactoring completeness: ✅ all commands reachable and tested

Architectural Notes

  • The rollup demonstrates intentional modernization: consolidating scattered command modules into a unified repair-commands layer improves maintainability.
  • Per-family round-robin cursor synchronization fixes a subtle state-management bug in account switching.
  • The missing clearNumericKeysAtOrAbove() helpers (commit 7f50455) were correctly identified as needed and added during integration; PR #419 should receive the same patch if merged independently.

Recommendation

Safe to merge. This rollup is a well-tested consolidation of 16 independent fixes plus an intentional refactoring that improves code organization. Core security, data-loss, and behavioral fixes are properly covered by new regression tests. The apparent large deletion is file reorganization, not feature removal.

Walkthrough

This PR makes coordinated changes across account management, OAuth security, request handling, recovery, storage, and CLI scaffolding. Key changes: (1) invalidates stale numeric runtime tracker keys after account removal reindexing to prevent health/token score bleed; (2) sanitizes OAuth error responses in logs to prevent token leakage; (3) validates OAuth state parameter in local callback flow; (4) fixes double-exponential backoff in rate-limit delay calculation; (5) optimizes SSE response buffering with size cap enforcement before appending chunks; (6) generates unique synthetic thinking part IDs to prevent overwrites in recovery; (7) removes entire auth-commands, doctor, fix, and verify-flagged command modules; (8) adds file-size limits to import functionality; (9) ensures concurrent unified settings writes via mtime-based retry; (10) adds 'q' hotkey quit support to UI selects.

Changes

Cohort / File(s) Summary
Account Index & Cursor Management
lib/accounts.ts, lib/rotation.ts, test/accounts.test.ts
markSwitched and markSwitchedLocked now advance cursorByFamily[family] to (account.index + 1) % accounts.length post-mutation to keep round-robin cursor aligned. removeAccount now invalidates numeric _runtimeTrackerKey on surviving accounts after reindex and clears tracker state at-or-above the removed index via new clearNumericKeysAtOrAbove(startIndex) methods on both HealthScoreTracker and TokenBucketTracker. Regression tests verify cursor sync behavior and cached key invalidation.
OAuth Token Security & State Validation
lib/auth/auth.ts, lib/auth/server.ts, lib/codex-manager.ts, test/auth.test.ts, test/server.unit.test.ts
sanitizeOAuthResponseBodyForLog exported function redacts access_token, refresh_token, codeVerifier and camelCase variants in error responses to prevent token leakage in logs. waitForCode now accepts expectedState parameter and rejects responses when stored _lastState mismatches, logging a warning. promptManualCallback tightens validation: requires state present and matches expected value. New tests verify redaction patterns and state mismatch handling.
Rate Limiting & Backoff Calculation
lib/request/rate-limit-backoff.ts, test/rate-limit-backoff.test.ts
calculateBackoffMs parameter renamed to _attempt (ignored); removes exponential scaling 2^(attempt-1) so only reason multiplier applies. getRateLimitBackoffWithReason adjusted to pass un-jittered base delay on first attempt instead of pre-computed result. Prevents double-exponential backoff. Tests updated to expect flat values per reason and single-exponential growth across attempts; regression test verifies ratio ~4x for attempts 1–3.
SSE Response Buffering & Size Cap
lib/request/response-handler.ts, test/response-handler-sse-buffer.test.ts
convertSseToJson accumulates chunks in array instead of repeated string concatenation; size-checks each UTF-8 decoded chunk before appending against MAX_SSE_SIZE. New test module verifies acceptance under limit, rejection when chunk would exceed, and byte-length (not string-length) enforcement.
Recovery & Synthetic ID Generation
lib/recovery/storage.ts, test/recovery-storage.test.ts
Added generateThinkingPartId() with module-level monotonic counter to produce unique prt_0000000000_thinking_<unique> IDs on each prependThinkingPart call, preventing overwrites on repeated recovery passes while preserving lexicographic ordering. Tests verify distinct staged/final paths and IDs across repeat calls.
Import File Size Limits & Concurrent Settings Write
lib/storage/import-export.ts, lib/unified-settings.ts, test/storage-import-export.test.ts, test/unified-settings.test.ts
readImportFile now stats file, throws if exceeds MAX_IMPORT_BYTES, and uses file handle instead of direct read. writeSettingsRecordAsync extended with expectedMtimeMs?: number | null option; saveUnifiedPluginConfig and saveUnifiedDashboardSettings implement read-modify-write with mtime validation retry on ESTALE to prevent concurrent-write data loss. Removed Atomics.wait from retry loops. Tests verify size-limit enforcement and concurrent write merging.
CLI Command Module Removals
lib/codex-manager/auth-commands.ts, lib/codex-manager/commands/doctor.ts, lib/codex-manager/commands/fix.ts, lib/codex-manager/commands/verify-flagged.ts
Entire auth-commands module (including runAuthLogin, runSwitch, runBest, persistAndSyncSelectedAccount and supporting state/queues) removed; doctor command (diagnostic checks, fix actions, JSON/human output) removed; fix command (token refresh, quota cache updates, dashboard rendering) removed; verify-flagged command (flagged account restoration, retry logic, JSON reporting) removed. All associated tests deleted.
UI Hotkey & Shutdown
lib/ui/select.ts, lib/shutdown.ts, test/select.test.ts, test/shutdown.test.ts
select prompt now handles "q" hotkey (case-insensitive) to resolve to null before invoking onInput. runCleanup changed to non-async function returning cached promise; deduplicates concurrent calls so only one cleanup sequence runs. Tests verify hotkey cancellation and promise identity across concurrent calls.
Test Updates & Removals
test/codex-manager-cli.test.ts, test/codex-manager-auth-commands.test.ts, test/codex-manager-doctor-command.test.ts, test/codex-manager-fix-command.test.ts, test/codex-manager-verify-flagged-command.test.ts, test/documentation.test.ts, test/property/hybrid-selector-concurrency.property.test.ts
Deleted entire test suites for removed commands (auth-commands, doctor, fix, verify-flagged). Updated codex-manager-cli auth-list assertions for shorter/simpler health message. Removed auth-commands.ts filesystem assertions from documentation.test.ts. Added property-based concurrency test for selectHybridAccount under mutex locking, verifying no double-selection and mutual exclusion.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

high heterogeneity across independent bug fixes (backoff double-exponential, oauth state validation, runtime key invalidation), intricate patterns (mtime-based retry on concurrent writes, cursor synchronization with family-keyed tracking, mutex-protected selection), and large command module deletions spanning 15+ files. logic density is moderate-to-high in accounts/rotation (index reindex + tracker cleanup), unified-settings (mtime validation + retry), and rate-limit-backoff (parameter semantics change). separate reasoning required for each cohort; no single repeated refactor pattern.

Suggested labels

bug

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Title check ⚠️ Warning title does not follow conventional commits format (missing type/scope prefix); should be 'rollup(fix): integrate open fix prs' or similar. update title to 'rollup(fix): integrate open fix prs into single review branch' or use proper 'type(scope): summary' format with lowercase imperative.
Docstring Coverage ⚠️ Warning Docstring coverage is 42.86% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description check ✅ Passed description is comprehensive and well-structured with summary, what changed, verification, and context, but lacks formal validation checklist completion.
Linked Issues check ✅ Passed the rollup successfully implements all three primary linked issues: #414 (double-exponential backoff fix with tests), #415 (wal newer-mtime handling), #419 (numeric tracker key invalidation and clearNumericKeysAtOrAbove helpers).
Out of Scope Changes check ✅ Passed all significant changes align with the stated objectives from linked issues; only integration-only commit (7f50455) adds helper methods required by #419, plus test cleanup/removals and documentation updates are in scope.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch rollup/fix-prs
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch rollup/fix-prs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
lib/request/rate-limit-backoff.ts (1)

319-336: ⚠️ Potential issue | 🟡 Minor

fresh attempt=1 now returns un-jittered delay while dedup'd duplicates return jittered delay — inconsistent observable output

at lib/request/rate-limit-backoff.ts:319-336, the fresh attempt === 1 && !isDuplicate branch feeds normalizedBaseDelay (un-jittered) into calculateBackoffMs, while both the duplicate-window path and subsequent attempts feed result.delayMs (which is the jittered value stored in rateLimitStateByAccountQuota). concretely:

  • caller A hits a 429, gets back normalizedBaseDelay * multiplier (no jitter).
  • caller B hits the same account/quota within dedupWindowMs, gets back jitteredBaseDelay * multiplier.

with RATE_LIMIT_BACKOFF_JITTER_FACTOR = 0.2 that is up to ±20% divergence between the seeding call and its dedup'd siblings, which defeats the whole point of dedup returning "the same" delay. the tests don't catch this because Math.random is mocked to 0.5 so jitter collapses to zero (see test/rate-limit-backoff.test.ts:18).

either apply jitter to the fresh path too, or strip jitter from the duplicate path — whichever you prefer, but the two branches should agree. suggestion: keep it simple and use result.delayMs for both, since dedup semantics demand consistency and getRateLimitBackoff already stored the "deterministic portion" via the Math.max(baseDelay, jitteredDelay) floor.

🔧 proposed fix
-	const normalizedBaseDelay = normalizeDelayMs(
-		resolvedServerRetryAfterMs,
-		1000,
-	);
-	// For the first fresh attempt, pass the un-jittered normalized base so the
-	// deterministic portion of the reason multiplier is visible to callers.
-	// For subsequent attempts (or duplicate-window hits), `result.delayMs`
-	// already encodes the exponential progression + jitter from
-	// `getRateLimitBackoff`, and `calculateBackoffMs` intentionally applies
-	// only the reason multiplier so the exponential is NOT double-applied
-	// (see audit finding REQ-HIGH-01).
-	const adjustedDelay = calculateBackoffMs(
-		result.attempt === 1 && !result.isDuplicate
-			? normalizedBaseDelay
-			: result.delayMs,
-		result.attempt,
-		resolvedReason,
-	);
+	// `result.delayMs` already encodes the exponential + jitter produced by
+	// `getRateLimitBackoff`; `calculateBackoffMs` applies only the reason
+	// multiplier so the exponential is not double-applied (REQ-HIGH-01).
+	// Using `result.delayMs` uniformly keeps dedup'd siblings in agreement
+	// with the seeding call.
+	const adjustedDelay = calculateBackoffMs(
+		result.delayMs,
+		result.attempt,
+		resolvedReason,
+	);

and add a regression test in test/rate-limit-backoff.test.ts with a non-0.5 Math.random to assert the fresh call and a dedup'd call return the same delayMs.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/request/rate-limit-backoff.ts` around lines 319 - 336, The branch in
calculateBackoffMs currently feeds an un-jittered normalizedBaseDelay when
result.attempt === 1 && !result.isDuplicate but uses result.delayMs (a jittered
value) for duplicates/subsequent attempts, causing inconsistent outputs; change
the condition in lib/request/rate-limit-backoff.ts so both fresh and duplicate
paths pass result.delayMs into calculateBackoffMs (i.e., always use the stored
jittered seed from rateLimitStateByAccountQuota/getRateLimitBackoff), and add a
regression test in test/rate-limit-backoff.test.ts that sets Math.random to a
non-0.5 value and asserts the fresh call and its dedup'd sibling return
identical delayMs.
lib/unified-settings.ts (1)

184-192: ⚠️ Potential issue | 🟠 Major

make sync EBUSY/EPERM retries real, or fail fast.

after removing the wait, lib/unified-settings.ts:184 and lib/unified-settings.ts:342 retry in a tight loop. a windows lock that clears a few ms later will still exhaust all attempts immediately, while test/unified-settings.test.ts:498 and test/unified-settings.test.ts:533 only cover a one-shot mock failure. use a real bounded sync backoff strategy or stop treating the sync path as retrying transient locks. As per coding guidelines, lib/**: focus on auth rotation, windows filesystem IO, and concurrency.

Also applies to: 342-351

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/unified-settings.ts` around lines 184 - 192, The tight retry loop around
the synchronous copy (copyFileSync using UNIFIED_SETTINGS_PATH ->
UNIFIED_SETTINGS_BACKUP_PATH guarded by isRetryableFsError) currently spins
without delay and should be fixed: either remove retries and fail fast for the
synchronous path, or convert the operation to an async retry with a bounded
exponential/backoff (e.g., use fs.promises.copyFile, await small increasing
delays between attempts) so transient Windows EBUSY/EPERM locks are actually
waited for; apply the same change to the similar retry block at the second
location (the retry loop around
UNIFIED_SETTINGS_PATH/UNIFIED_SETTINGS_BACKUP_PATH at the later block referenced
in the comment).
test/codex-manager-cli.test.ts (1)

932-945: ⚠️ Potential issue | 🟠 Major

make this status test control the storage-health state.

test/codex-manager-cli.test.ts:933 still returns null, but lib/codex-manager/commands/status.ts:43-65 treats null as the intentional-reset path when health is empty. the new healthy assertion at test/codex-manager-cli.test.ts:944 depends on the real inspectStorageHealth() filesystem probe because the storage mock at test/codex-manager-cli.test.ts:156-187 does not override it. use an empty storage object and stub health explicitly, or keep null and assert the reset output.

proposed test fix
 it("prints empty account status for auth list", async () => {
-	loadAccountsMock.mockResolvedValueOnce(null);
+	loadAccountsMock.mockResolvedValueOnce({
+		version: 3,
+		activeIndex: 0,
+		activeIndexByFamily: { codex: 0 },
+		accounts: [],
+	});
 	const logSpy = vi.spyOn(console, "log").mockImplementation(() => {});
+	const storageModule = await import("../lib/storage.js");
+	vi.spyOn(storageModule, "inspectStorageHealth").mockResolvedValueOnce({
+		state: "healthy",
+		path: "/mock/openai-codex-accounts.json",
+		resetMarkerPath: "/mock/openai-codex-accounts.json.reset",
+		walPath: "/mock/openai-codex-accounts.json.wal",
+		hasResetMarker: false,
+		hasWal: false,
+	});
 	const { runCodexMultiAuthCli } = await import("../lib/codex-manager.js");

As per coding guidelines, test/**: tests must stay deterministic and use vitest.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/codex-manager-cli.test.ts` around lines 932 - 945, The test currently
returns null from loadAccountsMock which triggers the "intentional-reset" branch
in inspectStorageHealth; make the test deterministic by returning an empty
storage object and stubbing the storage health probe: change
loadAccountsMock.mockResolvedValueOnce(null) to
loadAccountsMock.mockResolvedValueOnce({}) and mock inspectStorageHealth to
return "healthy" (using vi.spyOn or the existing mock for inspectStorageHealth)
so runCodexMultiAuthCli(["auth","list"]) yields the expected "Storage health:
healthy" log; alternatively, keep null but update assertions to expect the reset
output path instead of "healthy". Ensure you reference runCodexMultiAuthCli,
loadAccountsMock, and inspectStorageHealth when applying the fix.
test/server.unit.test.ts (1)

46-55: 🛠️ Refactor suggestion | 🟠 Major

mockserver type is missing _lastState; test relies on untyped writes.

test/server.unit.test.ts:46-49 types mockServer with _lastCode only, but test/server.unit.test.ts:315 writes mockServer._lastState = 'other-state'. under strict typechecking this should be a property-access error — it only passes today because the inline vi.mock factory at test/server.unit.test.ts:11-17 returns an object that flows through as unknown as ... casts. please align the typed shape with the real server contract in lib/auth/server.ts:69-72 so a future rename (e.g. _lastState_storedState) breaks this test loudly instead of silently no-opping the regression.

proposed fix
 	let mockServer: ReturnType<typeof http.createServer> & {
 		_handler?: (req: IncomingMessage, res: ServerResponse) => void;
 		_lastCode?: string;
+		_lastState?: string;
 	};

and mirror the same field in the vi.mock('node:http', ...) factory so the mock surface matches lib/auth/server.ts.

As per coding guidelines: "tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior."

Also applies to: 303-322

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/server.unit.test.ts` around lines 46 - 55, The test's mockServer type
only declares _lastCode but tests assign mockServer._lastState; update the mock
type and the vi.mock('node:http', ...) factory to include the same stored-state
field(s) used by the real server (match lib/auth/server.ts's fields, e.g. add
_lastState or _storedState alongside _lastCode) so the mock surface exactly
mirrors the real server contract (adjust the type declaration for mockServer and
the returned object from the inline vi.mock factory to include the same property
name(s) found in lib/auth/server.ts:69-72).
lib/ui/select.ts (1)

500-523: ⚠️ Potential issue | 🟠 Major

move global q check after custom onInput handler so auth-menu can override cancel behavior.

lib/ui/select.ts:502-505 checks for q and finishes before calling options.onInput at line 506. this blocks lib/ui/auth-menu.ts:548-575 from handling its own q -> { type: "cancel" } flow. the code should let custom handlers see q first, then fall back to global cancel for prompts without onInput.

test/select.test.ts:102-119 covers q without onInput (correctly expects null). test/auth-menu-hotkeys.test.ts:65-95 mocks select entirely, so it doesn't exercise the real select.ts code path. add a regression test in test/select.test.ts that provides an onInput handler that ignores q, then confirms the global fallback still triggers.

also wrap the default: case in braces to fix biome noSwitchDeclarations warning at lib/ui/select.ts:500.

proposed fix
 				case "escape-start":
 					if (options.allowEscape !== false) {
 						escapeTimeout = setTimeout(() => finish(null), ESCAPE_TIMEOUT_MS);
 					}
 					return;
-				default:
+				default: {
 					const hotkey = decodeHotkeyInput(data);
-					if (hotkey && hotkey.toLowerCase() === "q") {
-						finish(null);
-						return;
-					}
 					if (options.onInput) {
 						if (hotkey) {
 							rerenderRequested = false;
 							const result = options.onInput(hotkey, {
 								cursor,
 								items,
 								requestRerender,
 							});
 							if (result !== undefined) {
 								finish(result);
 								return;
 							}
 							if (rerenderRequested) {
 								render();
 							}
 						}
 					}
+					if (hotkey && hotkey.toLowerCase() === "q") {
+						finish(null);
+						return;
+					}
 					return;
+				}
 			}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/ui/select.ts` around lines 500 - 523, Move the global "q" cancel check so
custom handlers get the hotkey first: inside the select.ts default case (around
decodeHotkeyInput and options.onInput), call options.onInput(hotkey, ...) before
performing the global if (hotkey.toLowerCase() === "q") finish(null) fallback;
if onInput returns a defined result or handles rerenderRequested, act on that;
otherwise fall through to the global cancel for prompts without onInput. Also
wrap the default: case body in braces to satisfy the noSwitchDeclarations lint.
Add a regression test in test/select.test.ts that supplies an onInput which
ignores "q" (returns undefined) and asserts that the global fallback still
triggers, and another test that supplies an onInput which handles "q" to ensure
custom handlers can override cancel.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lib/auth/auth.ts`:
- Around line 69-159: The fallback text-scrub in sanitizeOAuthResponseBodyForLog
misses token-like substrings when rawBody isn't JSON/URL-encoded; call
scrubTokenLikeSubstrings on the final scrubbed string before returning (so
sanitizeOAuthResponseBodyForLog runs scrubTokenLikeSubstrings(scrubbed) as a
last pass), referencing scrubTokenLikeSubstrings,
sanitizeOAuthResponseBodyForLog, and
OAUTH_SENSITIVE_BODY_KEYS/redactSensitiveFields to locate the JSON-path handling
and final regex loop; also add a unit test in test/auth.test.ts that verifies an
upstream plain-text body with an RT_ch_... substring is redacted.

In `@lib/auth/server.ts`:
- Around line 69-80: Add a short comment in waitForCode (around the compare at
the expectedState check) explaining the invariant that trackedServer._lastState
is always equal to the closure-captured state written earlier in
startLocalOAuthServer, and that the expectedState compare is a defense‑in‑depth
guard against caller misuse (not an attacker-controlled path); additionally, to
eliminate a potential TOCTOU race if someone later inserts async work,
read/snapshot both trackedServer._lastCode and trackedServer._lastState into
local variables in a single synchronous read before doing the state compare and
code handling inside waitForCode (referencing trackedServer._lastCode,
trackedServer._lastState, startLocalOAuthServer, and waitForCode).

In `@lib/request/rate-limit-backoff.ts`:
- Around line 246-256: This change to calculateBackoffMs (renamed parameter
_attempt and removal of exponential factor) is a public API behavior change;
confirm PR `#414` includes an explicit semver-major bump or an explicit migration
note in the changelog/PR description documenting the new semantics and the
retained-but-unused attempt parameter; if the PR lacks such documentation, add a
clear migration entry describing the old exponential behavior vs new
multiplier-only behavior, update lib/index.ts and lib/request/index.ts export
notes and release notes, and ensure tests (test/rate-limit-backoff.test.ts) and
the internal caller getRateLimitBackoff are cross-referenced to show the
intentional fix.

In `@lib/storage/import-export.ts`:
- Around line 96-102: The import path currently uses handle.stat() and then
unbounded handle.readFile(), which allows files to grow or non-regular files to
stream unlimited bytes; update lib/storage/import-export.ts to first verify the
entry is a regular file via stats.isFile() (and throw if not), keep the existing
stats.size check against MAX_IMPORT_BYTES, and replace handle.readFile() with a
bounded read loop that reads at most MAX_IMPORT_BYTES + 1 bytes from the handle
and stops early; if the loop observes more than MAX_IMPORT_BYTES bytes, throw an
error referencing params.resolvedPath and MAX_IMPORT_BYTES. Also add a vitest
regression that stubs stat() to return a small size but makes the read path
produce > MAX_IMPORT_BYTES bytes to assert the new early-abort behavior.

In `@lib/unified-settings.ts`:
- Around line 398-418: The optimistic mtime check around UNIFIED_SETTINGS_PATH
using fs.stat and options.expectedMtimeMs is still vulnerable to a lost-update
race because the code re-checks mtime before fs.rename(tempPath,
UNIFIED_SETTINGS_PATH) but does not hold a lock across the read/modify/write
window; fix by acquiring a save lock around the entire read/modify/write (the
code path that calls the mtime check and the rename in this module), or
implement an interprocess file lock that is held from the moment you read stat
through the rename, ensuring the check and fs.rename(tempPath,
UNIFIED_SETTINGS_PATH) are atomic relative to other processes; alternatively, if
you choose best-effort behavior, update the function comment for this module and
add a regression test that injects a concurrent writer between the stat
(options.expectedMtimeMs check) and the fs.rename to assert ESTALE or document
the limitation—focus changes around the mtime check (options.expectedMtimeMs),
the rename call (fs.rename(tempPath, UNIFIED_SETTINGS_PATH)), and the retry loop
that wraps them.

In `@test/accounts.test.ts`:
- Around line 1969-1971: The test passes a third argument to the AccountManager
constructor which is ignored because AccountManager currently defines only a
two-parameter constructor; change the test to construct the manager with two
args and then call the instance method setRoutingMutexMode("enabled") on the
created AccountManager (reference symbols: AccountManager constructor and
setRoutingMutexMode) so the routing mutex is actually enabled at runtime and the
test exercises the enabled mutex path.

In `@test/auth.test.ts`:
- Around line 749-959: Add two regression tests: (1) mock fetch for
refreshAccessToken/exchangeAuthorizationCode to return a urlencoded error body
like "error=invalid_grant&refresh_token=RT_ch_..." and assert the returned
result.message (and any logged error via loggerModule.logError spy) does not
contain the raw token while preserving other fields; (2) mock fetch to return a
plain-text non-JSON/no-urlencoded body that contains "refresh_token=RT_ch_..."
(or just the token substring) and assert sanitizeOAuthResponseBodyForLog is
applied so result.message and any loggedArgs do not contain the raw token but
may contain non-sensitive text. Use existing helpers/names
sanitizeOAuthResponseBodyForLog, refreshAccessToken, exchangeAuthorizationCode
and the logError spy pattern already in the file to locate where to add these
tests.

In `@test/property/hybrid-selector-concurrency.property.test.ts`:
- Around line 91-135: The test records an observation after releasing the mutex,
causing a race; move the history append into the critical section so predecessor
visibility is observed deterministically. Inside the withRoutingMutex callback
(the async block passed to withRoutingMutex), after computing
sawPredecessorWrite and before the callback returns, push the observation into
pool.history (use the same SelectionObservation shape) and then return it; keep
the existing concurrent-call accounting in finally but ensure the push happens
before leaving the critical section. Apply the same change to the other test
blocks in this file that append pool.history outside the mutex (the other
occurrences of appending pool.history after withRoutingMutex) so all history
writes occur while the mutex is held.

In `@test/rate-limit-backoff.test.ts`:
- Around line 349-405: Add two deterministic regression tests around
getRateLimitBackoffWithReason: (1) a jitter-parity case that mocks Math.random
to a non-0.5 value, calls getRateLimitBackoffWithReason once (fresh attempt=1)
then again within the dedupWindowMs to assert isDuplicate===true and delayMs
equality (verifies calculateBackoffMs(seed, ...) vs
calculateBackoffMs(result.delayMs, ...) parity), and (2) a max-cap saturation
case that walks consecutive attempts with reason="quota" (3.0× multiplier) and
baseDelayMs small enough to force exponential growth, asserting delayMs is
monotonic non-decreasing and clamps at maxBackoffMs; ensure each test uses
vi.useFakeTimers()/vi.setSystemTime and mocks Math.random to keep determinism.

In `@test/recovery-storage.test.ts`:
- Around line 604-654: Pin Date.now() and Math.random() in the two failing tests
so the same-millisecond/same-random-path is reproduced: inside the "should
generate unique ids on repeat calls..." and the "should generate ids that sort
before real generatePartId ids..." tests, add vitest spies like vi.spyOn(Date,
"now").mockReturnValue(<fixedEpochMs>) and vi.spyOn(Math,
"random").mockReturnValue(<fixedRandom>) before calling
storage.prependThinkingPart / storage.generateThinkingPartId /
storage.generatePartId, and restore them after (mockRestore or in an afterEach)
so thinkingPartCounter behavior is deterministic and the assertions on IDs
always reproduce the regression.

In `@test/select.test.ts`:
- Around line 102-129: Add a deterministic vitest regression that verifies
onInput precedence for the "q" hotkey: in test/select.test.ts create a test that
calls select(...) with an options.onInput handler which detects input "q" and
returns a sentinel cancel object (e.g., a typed { cancelled: true } value), emit
"q" on stdin after the same timer delay, then assert that the resolved value
from select() is that sentinel (not null). Reference the existing select()
import and the onInput option to replicate the lib/ui/auth-menu.ts caller
behavior so the test fails if onInput is ignored.

In `@test/storage-import-export.test.ts`:
- Around line 3-20: The vi.mock factory is hoisted before the local const mocks
(existsSyncMock, statMock, readFileMock, closeMock, openMock) are initialized,
causing a "cannot access before initialization" error; move creation of those
mocks into vi.hoisted() so they are initialized before the hoisted vi.mock
factory, e.g., wrap each mock initializer with vi.hoisted(() => vi.fn()) and
then call vi.mock(...) as shown, and also add regression tests exercising
Windows filesystem behavior (path separators and case sensitivity) and
concurrency race conditions (simultaneous reads/writes) to the storage
import/export test suite.

In `@test/unified-settings.test.ts`:
- Around line 813-898: The test's injected concurrent write can keep the same
mtime as the temp write on coarse-mtime platforms, making ESTALE
nondeterministic; inside the writeSpy.mockImplementation that performs the
injected write (the block that calls originalWriteFile when
String(filePath).includes(".tmp")), after awaiting originalWriteFile for the
injected content call fs.utimes (or fs.promises.utimes) on
getUnifiedSettingsPath() to advance its mtime (e.g. Date.now()+1000) before
returning, so saveUnifiedPluginConfig's mtime check reliably sees a newer
on-disk record.
- Around line 533-570: The test should assert the backup copy was actually
retried by counting copyFileSync attempts and verifying the backup file exists
after saveUnifiedPluginConfigSync; update the mock in the test to increment a
counter (or set a flag) inside the doMock for node:fs's copyFileSync and after
calling saveUnifiedPluginConfigSync({ codexMode: true, retries: 1 }) assert that
the mock was called >1 time and that a backup snapshot file (derived from
getUnifiedSettingsPath()/settingsPath) was created; keep the existing
Atomics.wait spy and cleanup (vi.doUnmock, vi.resetModules,
atomicsWaitSpy.mockRestore) intact.

---

Outside diff comments:
In `@lib/request/rate-limit-backoff.ts`:
- Around line 319-336: The branch in calculateBackoffMs currently feeds an
un-jittered normalizedBaseDelay when result.attempt === 1 && !result.isDuplicate
but uses result.delayMs (a jittered value) for duplicates/subsequent attempts,
causing inconsistent outputs; change the condition in
lib/request/rate-limit-backoff.ts so both fresh and duplicate paths pass
result.delayMs into calculateBackoffMs (i.e., always use the stored jittered
seed from rateLimitStateByAccountQuota/getRateLimitBackoff), and add a
regression test in test/rate-limit-backoff.test.ts that sets Math.random to a
non-0.5 value and asserts the fresh call and its dedup'd sibling return
identical delayMs.

In `@lib/ui/select.ts`:
- Around line 500-523: Move the global "q" cancel check so custom handlers get
the hotkey first: inside the select.ts default case (around decodeHotkeyInput
and options.onInput), call options.onInput(hotkey, ...) before performing the
global if (hotkey.toLowerCase() === "q") finish(null) fallback; if onInput
returns a defined result or handles rerenderRequested, act on that; otherwise
fall through to the global cancel for prompts without onInput. Also wrap the
default: case body in braces to satisfy the noSwitchDeclarations lint. Add a
regression test in test/select.test.ts that supplies an onInput which ignores
"q" (returns undefined) and asserts that the global fallback still triggers, and
another test that supplies an onInput which handles "q" to ensure custom
handlers can override cancel.

In `@lib/unified-settings.ts`:
- Around line 184-192: The tight retry loop around the synchronous copy
(copyFileSync using UNIFIED_SETTINGS_PATH -> UNIFIED_SETTINGS_BACKUP_PATH
guarded by isRetryableFsError) currently spins without delay and should be
fixed: either remove retries and fail fast for the synchronous path, or convert
the operation to an async retry with a bounded exponential/backoff (e.g., use
fs.promises.copyFile, await small increasing delays between attempts) so
transient Windows EBUSY/EPERM locks are actually waited for; apply the same
change to the similar retry block at the second location (the retry loop around
UNIFIED_SETTINGS_PATH/UNIFIED_SETTINGS_BACKUP_PATH at the later block referenced
in the comment).

In `@test/codex-manager-cli.test.ts`:
- Around line 932-945: The test currently returns null from loadAccountsMock
which triggers the "intentional-reset" branch in inspectStorageHealth; make the
test deterministic by returning an empty storage object and stubbing the storage
health probe: change loadAccountsMock.mockResolvedValueOnce(null) to
loadAccountsMock.mockResolvedValueOnce({}) and mock inspectStorageHealth to
return "healthy" (using vi.spyOn or the existing mock for inspectStorageHealth)
so runCodexMultiAuthCli(["auth","list"]) yields the expected "Storage health:
healthy" log; alternatively, keep null but update assertions to expect the reset
output path instead of "healthy". Ensure you reference runCodexMultiAuthCli,
loadAccountsMock, and inspectStorageHealth when applying the fix.

In `@test/server.unit.test.ts`:
- Around line 46-55: The test's mockServer type only declares _lastCode but
tests assign mockServer._lastState; update the mock type and the
vi.mock('node:http', ...) factory to include the same stored-state field(s) used
by the real server (match lib/auth/server.ts's fields, e.g. add _lastState or
_storedState alongside _lastCode) so the mock surface exactly mirrors the real
server contract (adjust the type declaration for mockServer and the returned
object from the inline vi.mock factory to include the same property name(s)
found in lib/auth/server.ts:69-72).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: d4bf5ace-6df9-4975-a4cc-434743b98d23

📥 Commits

Reviewing files that changed from the base of the PR and between 3f1c1fe and 7f50455.

📒 Files selected for processing (33)
  • lib/accounts.ts
  • lib/auth/auth.ts
  • lib/auth/server.ts
  • lib/codex-manager.ts
  • lib/codex-manager/auth-commands.ts
  • lib/codex-manager/commands/doctor.ts
  • lib/codex-manager/commands/fix.ts
  • lib/codex-manager/commands/verify-flagged.ts
  • lib/recovery/storage.ts
  • lib/request/rate-limit-backoff.ts
  • lib/request/response-handler.ts
  • lib/rotation.ts
  • lib/shutdown.ts
  • lib/storage/import-export.ts
  • lib/ui/select.ts
  • lib/unified-settings.ts
  • test/accounts.test.ts
  • test/auth.test.ts
  • test/codex-manager-auth-commands.test.ts
  • test/codex-manager-cli.test.ts
  • test/codex-manager-doctor-command.test.ts
  • test/codex-manager-fix-command.test.ts
  • test/codex-manager-verify-flagged-command.test.ts
  • test/documentation.test.ts
  • test/property/hybrid-selector-concurrency.property.test.ts
  • test/rate-limit-backoff.test.ts
  • test/recovery-storage.test.ts
  • test/response-handler-sse-buffer.test.ts
  • test/select.test.ts
  • test/server.unit.test.ts
  • test/shutdown.test.ts
  • test/storage-import-export.test.ts
  • test/unified-settings.test.ts
💤 Files with no reviewable changes (9)
  • test/documentation.test.ts
  • test/codex-manager-doctor-command.test.ts
  • lib/codex-manager/commands/fix.ts
  • test/codex-manager-auth-commands.test.ts
  • test/codex-manager-fix-command.test.ts
  • test/codex-manager-verify-flagged-command.test.ts
  • lib/codex-manager/commands/doctor.ts
  • lib/codex-manager/commands/verify-flagged.ts
  • lib/codex-manager/auth-commands.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (2)
lib/**

⚙️ CodeRabbit configuration file

focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.

Files:

  • lib/storage/import-export.ts
  • lib/codex-manager.ts
  • lib/shutdown.ts
  • lib/ui/select.ts
  • lib/recovery/storage.ts
  • lib/rotation.ts
  • lib/auth/server.ts
  • lib/accounts.ts
  • lib/auth/auth.ts
  • lib/request/rate-limit-backoff.ts
  • lib/unified-settings.ts
  • lib/request/response-handler.ts
test/**

⚙️ CodeRabbit configuration file

tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.

Files:

  • test/server.unit.test.ts
  • test/shutdown.test.ts
  • test/recovery-storage.test.ts
  • test/rate-limit-backoff.test.ts
  • test/accounts.test.ts
  • test/unified-settings.test.ts
  • test/codex-manager-cli.test.ts
  • test/response-handler-sse-buffer.test.ts
  • test/property/hybrid-selector-concurrency.property.test.ts
  • test/storage-import-export.test.ts
  • test/auth.test.ts
  • test/select.test.ts
🪛 ast-grep (0.42.1)
test/recovery-storage.test.ts

[warning] 586-588: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(
^${partDir.replace(/\\/g, "\\\\")}[\\\\/]prt_0000000000_thinking_[0-9a-f]+_[0-9a-z]+_[0-9a-z]+\\.json$,
)
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html

(regexp-from-variable)

lib/auth/auth.ts

[warning] 151-151: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(("${key}"\\s*:\\s*)"[^"]*", "g")
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html

(regexp-from-variable)


[warning] 154-154: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp((^|[?&\\s])(${key}=)[^&\\s]+, "g")
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html

(regexp-from-variable)

🪛 Biome (2.4.11)
lib/ui/select.ts

[error] 501-501: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.

(lint/correctness/noSwitchDeclarations)

🔇 Additional comments (13)
lib/shutdown.ts (1)

19-40: lgtm — concurrent runCleanup coalescing looks correct.

snapshotting cleanupFunctions into fns and truncating before the first await at lib/shutdown.ts:25-26 avoids re-running the same fns if a second caller arrives mid-flight, and the .finally() reset at lib/shutdown.ts:35-37 keeps the module usable for subsequent shutdown cycles (e.g. the test suite's beforeEach). errors are swallowed per-fn so one bad cleanup can't abort the rest — appropriate for shutdown.

one thing worth being aware of (not a blocker): any registerCleanup call made while a cleanup is in flight lands in the now-empty array and will only run on the next runCleanup() — fine for the signal path since process.exit(0) follows, but callers that register from within a cleanup fn won't see those fns executed in the same cycle. worth a one-liner in the jsdoc if you care.

test/shutdown.test.ts (1)

68-86: good concurrency regression — covers the exact bug.

test/shutdown.test.ts:78-82 pins down both invariants that matter: first === second proves promise coalescing, and toHaveBeenCalledTimes(1) proves the captured-snapshot path doesn't double-invoke. the blocker pattern keeps it deterministic without fake timers, which is the right call here.

minor nit, not blocking: consider also asserting that a third runCleanup() issued after await first returns a fresh promise (i.e. cleanupPromise was reset by .finally()). the beforeEach already exercises the reset path implicitly, but an explicit assertion would lock in the full lifecycle against future refactors.

lib/request/response-handler.ts (1)

769-803: lgtm — pre-append cap is correctly ordered and cancel-on-throw still holds.

the throw at lib/request/response-handler.ts:796-798 fires before chunks.push(decoded), so peak memory is bounded to MAX_SSE_SIZE rather than MAX_SSE_SIZE + chunk. the throw propagates through the outer try at lib/request/response-handler.ts:783 into the catch at lib/request/response-handler.ts:838, which still invokes reader.cancel(...) and then releaseLock() in finally — no reader-leak regression. using Buffer.byteLength(decoded, "utf8") (not value.byteLength) is a deliberate choice and is what the new utf-8 byte-counting test at test/response-handler-sse-buffer.test.ts:116-141 pins down.

nit, non-blocking: decoder.decode(value, { stream: true }) will hold back trailing partial multi-byte bytes until the next chunk (or a final decoder.decode() flush, which this loop still doesn’t issue). that means totalSize tracks decoded utf-8 bytes rather than wire bytes and can lag value.byteLength by up to 3 bytes across a chunk boundary. pre-existing behavior, not a regression from this PR — flagging only so it’s on the record.

test/response-handler-sse-buffer.test.ts (1)

1-141: solid regression coverage for REQ-HIGH-03 — deterministic, vitest-native, no real secrets.

the four cases (test/response-handler-sse-buffer.test.ts:49-79, 81-101, 103-114, 116-141) cover exactly the shapes the fix needs to hold: under-cap accumulation, overflow on nth chunk, overflow on 1st chunk, and utf-8 vs utf-16 byte accounting. MAX_SSE_SIZE_FOR_TEST drift is self-detected by the "just under cap" case since any drop in the real cap would push the 9MB filler over. reader.read.toHaveBeenCalledTimes(...) + reader.cancel assertions correctly nail down that no reads happen past the pre-append rejection.

two small nits (non-blocking):

  1. test/response-handler-sse-buffer.test.ts:130-132 uses a manual throw new Error(...) for the test-setup sanity check. that bypasses vitest’s failure reporter — prefer expect(byteLength).toBeGreaterThan(MAX_SSE_SIZE_FOR_TEST) + expect(decodedLengthEstimate).toBeLessThan(byteLength) so the failure surfaces as a normal assertion with diff.
  2. the "just under cap" case at test/response-handler-sse-buffer.test.ts:49-79 allocates ~9MB + json wrapping. fine on CI but consider tagging this file with a slightly higher per-test timeout if CI is ever run under memory pressure — not required today.

scope note per coding guidelines: windows FS and auth-rotation concurrency aren’t expected in this file since REQ-HIGH-03 targets SSE buffering only; the relevant regression surface is byte-accounting + cancel, which is covered.

lib/recovery/storage.ts (2)

151-176: looks good: synthetic thinking ids now preserve ordering and avoid same-process clobbering.

lib/recovery/storage.ts:171 keeps the prt_0000000000_thinking_ prefix, so findMessagesWithOrphanThinking still sees synthetic thinking first after sorting, while lib/recovery/storage.ts:172-175 makes repeat calls produce distinct ids. covered by test/recovery-storage.test.ts:604-654. As per coding guidelines, lib/**: focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.


410-416: looks good: prependThinkingPart no longer targets a fixed filename.

lib/recovery/storage.ts:416 removes the overwrite path for repeated recovery passes on the same messageID; each write now gets its own prt_0000000000_thinking_<...>.json target. covered by test/recovery-storage.test.ts:558-641. As per coding guidelines, lib/**: focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.

lib/unified-settings.ts (1)

439-449: mtime helper fails closed.

lib/unified-settings.ts:439 returns null only for ENOENT and rethrows other filesystem errors, so permission and lock failures stay visible to callers.

lib/rotation.ts (1)

169-180: numeric tracker cleanup looks scoped correctly.

lib/rotation.ts:169 and lib/rotation.ts:324 only remove json-normalized numeric account slots at or above the removed index, while prefixed identity keys remain untouched. the vitest coverage at test/accounts.test.ts:2965 exercises numeric reindex cleanup, and test/accounts.test.ts:3743 keeps identity-keyed tracker state stable.

Also applies to: 324-335

lib/accounts.ts (2)

869-878: cursor sync matches the rotation invariant.

lib/accounts.ts:869 and lib/accounts.ts:913 now advance cursorByFamily to the successor slot after updating the active family index, so the next round-robin pass resumes after the just-switched account.

Also applies to: 913-919


1349-1367: numeric fallback keys are invalidated at the splice root.

lib/accounts.ts:1353 clears numeric tracker entries before reindexing, and lib/accounts.ts:1365 drops cached numeric _runtimeTrackerKey values. this prevents shifted refresh-only accounts from consulting stale slot health/token state.

test/accounts.test.ts (1)

2965-3078: numeric reindex regression is covered.

test/accounts.test.ts:2965 primes numeric _runtimeTrackerKey values, removes an earlier account, then verifies both health and token state no longer bleed into shifted numeric slots. this directly covers lib/accounts.ts:1353 and lib/accounts.ts:1365.

lib/codex-manager.ts (1)

1363-1364: strict state check lgtm.

lib/codex-manager.ts:1363-1364 correctly rejects manual-callback input that is missing state or whose state does not match the one bound to this flow, which mirrors the local server's waitForCode(expectedState) tightening in lib/auth/server.ts:102. no concerns.

lib/storage/import-export.ts (1)

8-8: the 4 mib import cap is a good boundary.

lib/storage/import-export.ts:8 matches the new vitest boundary coverage in test/storage-import-export.test.ts:37 and test/storage-import-export.test.ts:54.

Comment thread lib/auth/auth.ts
Comment on lines +69 to +159
const OAUTH_SENSITIVE_BODY_KEYS = [
"refresh_token",
"refreshToken",
"access_token",
"accessToken",
"id_token",
"idToken",
"codeVerifier",
"token",
"code",
"code_verifier",
] as const;

function scrubTokenLikeSubstrings(value: string): string {
let scrubbed = value.replace(
/(\b(?:refresh|access|id)[_-]?token\s*[:=]\s*)([^\s,;"'}]{8,})/gi,
(_match, prefix) => `${prefix}***REDACTED***`,
);
scrubbed = scrubbed.replace(
/\b(?:RT|AT)_ch_[A-Za-z0-9_-]{20,}\b/g,
"***REDACTED***",
);
return scrubbed;
}

function redactSensitiveFields(value: unknown): unknown {
if (Array.isArray(value)) {
return value.map((item) => redactSensitiveFields(item));
}
if (value !== null && typeof value === "object") {
const out: Record<string, unknown> = {};
const sensitiveSet = new Set<string>(OAUTH_SENSITIVE_BODY_KEYS);
for (const [k, v] of Object.entries(value as Record<string, unknown>)) {
if (sensitiveSet.has(k)) {
out[k] = "***REDACTED***";
} else {
out[k] = redactSensitiveFields(v);
}
}
return out;
}
if (typeof value === "string") {
return scrubTokenLikeSubstrings(value);
}
return value;
}

/**
* Scrub opaque tokens from a raw OAuth token-endpoint response body before
* interpolating it into a log message.
*
* Error and success responses from `/oauth/token` may contain `refresh_token`,
* `access_token`, or `id_token` values. ChatGPT refresh tokens are opaque
* high-entropy strings that do NOT match the logger's `TOKEN_PATTERNS`
* (JWT, long hex, `sk-*`, `Bearer <x>`), so they would be written verbatim
* to disk log files when a status-body string is concatenated into a
* `logError` message.
*
* Strategy:
* 1. If the body parses as JSON, walk it and mask sensitive keys.
* 2. Otherwise fall back to a targeted regex scrub of `"key":"value"` and
* `key=value` patterns for the known sensitive keys.
*
* The returned string is safe to interpolate into log messages.
*/
export function sanitizeOAuthResponseBodyForLog(rawBody: string): string {
if (!rawBody) return rawBody;
const trimmed = rawBody.trim();
if (trimmed.length === 0) return rawBody;

if (trimmed.startsWith("{") || trimmed.startsWith("[")) {
try {
const parsed = JSON.parse(trimmed) as unknown;
const redacted = redactSensitiveFields(parsed);
return JSON.stringify(redacted);
} catch {
// Fall through to regex scrub for malformed JSON.
}
}

let scrubbed = rawBody;
for (const key of OAUTH_SENSITIVE_BODY_KEYS) {
// "key":"value" style (JSON-like text)
const jsonPattern = new RegExp(`("${key}"\\s*:\\s*)"[^"]*"`, "g");
scrubbed = scrubbed.replace(jsonPattern, `$1"***REDACTED***"`);
// key=value style (urlencoded / query-string)
const urlPattern = new RegExp(`(^|[?&\\s])(${key}=)[^&\\s]+`, "g");
scrubbed = scrubbed.replace(urlPattern, `$1$2***REDACTED***`);
}
return scrubbed;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

sanitizer looks solid; one subtle gap worth covering.

lib/auth/auth.ts:134-159 does the right thing: structured json → recursive key redaction, otherwise a targeted regex scrub. two notes:

  1. ast-grep's redos hit on lib/auth/auth.ts:152 and lib/auth/auth.ts:155 is a false positive — OAUTH_SENSITIVE_BODY_KEYS is a frozen const array of ascii identifiers, not user input, so no regex injection surface here. no action.
  2. real gap: when the body is neither json nor urlencoded but is a loose text blob (e.g. a reverse proxy returns "upstream error: refresh_token=RT_ch_...; trace=..." with leading plain text), the json path is skipped (doesn't start with {/[) and the urlencoded regex at lib/auth/auth.ts:155 requires a ^|[?&\s] boundary before key=. a body like upstream error refresh_token=RT_ch_xxxx... with a space before refresh_token is covered by \s, but upstream:refresh_token=... (colon boundary) is not. the scrubTokenLikeSubstrings helper at lib/auth/auth.ts:82-92 would save you here for the RT_ch_/AT_ch_ shape, but it is only invoked from redactSensitiveFields on parsed-json strings, not on the raw-text fallback path.

please either (a) pipe the fallback output through scrubTokenLikeSubstrings as a final pass, or (b) loosen the boundary to also accept [:]. recommend (a) since it also catches opaque tokens that appear without a key= prefix.

proposed fix
 	let scrubbed = rawBody;
 	for (const key of OAUTH_SENSITIVE_BODY_KEYS) {
 		const jsonPattern = new RegExp(`("${key}"\\s*:\\s*)"[^"]*"`, "g");
 		scrubbed = scrubbed.replace(jsonPattern, `$1"***REDACTED***"`);
 		const urlPattern = new RegExp(`(^|[?&\\s])(${key}=)[^&\\s]+`, "g");
 		scrubbed = scrubbed.replace(urlPattern, `$1$2***REDACTED***`);
 	}
-	return scrubbed;
+	return scrubTokenLikeSubstrings(scrubbed);
 }

and add a test in test/auth.test.ts alongside the LIB-HIGH-001 regression suite covering a non-json/non-urlencoded body carrying an opaque RT_ch_... substring.

As per coding guidelines: "check for logging that leaks tokens or emails."

🧰 Tools
🪛 ast-grep (0.42.1)

[warning] 151-151: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(("${key}"\\s*:\\s*)"[^"]*", "g")
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html

(regexp-from-variable)


[warning] 154-154: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp((^|[?&\\s])(${key}=)[^&\\s]+, "g")
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html

(regexp-from-variable)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/auth/auth.ts` around lines 69 - 159, The fallback text-scrub in
sanitizeOAuthResponseBodyForLog misses token-like substrings when rawBody isn't
JSON/URL-encoded; call scrubTokenLikeSubstrings on the final scrubbed string
before returning (so sanitizeOAuthResponseBodyForLog runs
scrubTokenLikeSubstrings(scrubbed) as a last pass), referencing
scrubTokenLikeSubstrings, sanitizeOAuthResponseBodyForLog, and
OAUTH_SENSITIVE_BODY_KEYS/redactSensitiveFields to locate the JSON-path handling
and final regex loop; also add a unit test in test/auth.test.ts that verifies an
upstream plain-text body with an RT_ch_... substring is redacted.

Comment thread lib/auth/server.ts
Comment on lines +69 to +80
const trackedServer = server as http.Server & {
_lastCode?: string;
_lastState?: string;
};
if (trackedServer._lastCode) {
logWarn(
"Duplicate OAuth callback received; preserving first authorization code",
);
return;
}
trackedServer._lastCode = code;
trackedServer._lastState = state;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

state pinning is correct but partially redundant; worth a comment.

lib/auth/server.ts:49 already rejects callbacks whose state doesn't match the closure-captured state, so by the time _lastState is written at lib/auth/server.ts:80, it is provably equal to state. that means the expectedState check inside waitForCode at lib/auth/server.ts:116-121 only fires when the caller in lib/codex-manager.ts:1894 or the runtime flows passes a value different from what it handed to startLocalOAuthServer({ state }) — i.e. a caller-side bug, not an attacker-controlled path.

that's still a reasonable defense-in-depth check for the rollup of #418, but please add a short comment at lib/auth/server.ts:116 spelling out the invariant (_lastState is always equal to the server-closure state; the compare guards against caller misuse) so a future refactor doesn't conclude the check is dead code and delete it.

also, one concurrency nit worth calling out for windows/EBUSY-prone environments per the repo guidelines: waitForCode polls _lastCode every 100ms on a single-threaded event loop, so there's no TOCTOU here, but if anyone later adds an async step between the lastCode read at lib/auth/server.ts:114 and the state compare at :116, this becomes racy. leave a comment or snapshot both fields in one read:

suggested tightening
-							const lastCode = trackedServer._lastCode;
-							if (lastCode) {
-								if (trackedServer._lastState !== expectedState) {
+							const { _lastCode: lastCode, _lastState: lastState } =
+								trackedServer;
+							if (lastCode) {
+								if (lastState !== expectedState) {
 									logWarn(
 										"Discarding OAuth callback due to state mismatch in waitForCode",
 									);
 									return null;
 								}
 								return { code: lastCode };
 							}

As per coding guidelines: "focus on auth rotation, windows filesystem IO, and concurrency."

Also applies to: 102-128, 148-148

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/auth/server.ts` around lines 69 - 80, Add a short comment in waitForCode
(around the compare at the expectedState check) explaining the invariant that
trackedServer._lastState is always equal to the closure-captured state written
earlier in startLocalOAuthServer, and that the expectedState compare is a
defense‑in‑depth guard against caller misuse (not an attacker-controlled path);
additionally, to eliminate a potential TOCTOU race if someone later inserts
async work, read/snapshot both trackedServer._lastCode and
trackedServer._lastState into local variables in a single synchronous read
before doing the state compare and code handling inside waitForCode (referencing
trackedServer._lastCode, trackedServer._lastState, startLocalOAuthServer, and
waitForCode).

Comment on lines 246 to 256
export function calculateBackoffMs(
baseDelayMs: number,
attempt: number,
_attempt: number,
reason: RateLimitReason = "unknown",
): number {
const multiplier = BACKOFF_MULTIPLIERS[reason] ?? 1.0;
const exponentialDelay = baseDelayMs * Math.pow(2, attempt - 1);
return Math.min(
Math.floor(exponentialDelay * multiplier),
Math.max(0, Math.floor(baseDelayMs * multiplier)),
rateLimitBackoffConfig.maxBackoffMs,
);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# find all callers of calculateBackoffMs across the repo and the arguments they pass
rg -nP --type=ts -C3 '\bcalculateBackoffMs\s*\('

Repository: ndycode/codex-multi-auth

Length of output: 5757


🏁 Script executed:

# Check if calculateBackoffMs is re-exported in any barrel files or index files
rg -nP --type=ts '\bcalculateBackoffMs\b' --glob='**/index.ts' --glob='**/index.js' --glob='**/*.d.ts'

Repository: ndycode/codex-multi-auth

Length of output: 104


🏁 Script executed:

# Check package.json for exports field that might expose this function
cat package.json | jq '.exports // "no exports field"' 2>/dev/null || echo "checking manually..."

Repository: ndycode/codex-multi-auth

Length of output: 962


🏁 Script executed:

# Look for any comments or documentation about this being a public API
rg -nP --type=ts 'calculateBackoffMs' lib/request/rate-limit-backoff.ts -A5 -B5 | head -40

Repository: ndycode/codex-multi-auth

Length of output: 1212


🏁 Script executed:

# Check if calculateBackoffMs is exported from lib/request/index.ts
rg -nP 'calculateBackoffMs' lib/request/index.ts

Repository: ndycode/codex-multi-auth

Length of output: 50


🏁 Script executed:

# Check what IS exported from lib/request/index.ts
cat lib/request/index.ts

Repository: ndycode/codex-multi-auth

Length of output: 223


public api surface confirmed, but behavior change is documented as intentional fix

calculateBackoffMs is re-exported in the public API (lib/index.ts:13, lib/request/index.ts). the parameter rename to _attempt and behavior change from baseDelayMs * 2^(attempt-1) * multiplier to baseDelayMs * multiplier do constitute a breaking change for external consumers.

however, the JSDoc at lib/request/rate-limit-backoff.ts:243-245 explicitly documents that attempt is "retained for API compatibility" and the comment at lines 327-329 references audit finding REQ-HIGH-01, indicating this was a deliberate fix for a double-exponential bug that "saturated maxBackoffMs after ~2 retries."

only one internal call exists (lib/request/rate-limit-backoff.ts:330), which already receives a pre-exponential value from getRateLimitBackoff, so it correctly applies only the reason multiplier. all test cases in test/rate-limit-backoff.test.ts:152-210 explicitly validate the new behavior and the fix intent.

confirm whether this breaking change is covered by a semver bump or migration guide in the PR notes for #414, since external consumers relying on the old exponential progression would break silently.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/request/rate-limit-backoff.ts` around lines 246 - 256, This change to
calculateBackoffMs (renamed parameter _attempt and removal of exponential
factor) is a public API behavior change; confirm PR `#414` includes an explicit
semver-major bump or an explicit migration note in the changelog/PR description
documenting the new semantics and the retained-but-unused attempt parameter; if
the PR lacks such documentation, add a clear migration entry describing the old
exponential behavior vs new multiplier-only behavior, update lib/index.ts and
lib/request/index.ts export notes and release notes, and ensure tests
(test/rate-limit-backoff.test.ts) and the internal caller getRateLimitBackoff
are cross-referenced to show the intentional fix.

Comment on lines +96 to +102
const stats = await handle.stat();
if (stats.size > MAX_IMPORT_BYTES) {
throw new Error(
`Import file exceeds maximum size of ${MAX_IMPORT_BYTES} bytes: ${params.resolvedPath}`,
);
}
content = await handle.readFile({ encoding: "utf-8" });
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cd /tmp/repo 2>/dev/null || cd . && git ls-files lib/storage/ | head -20

Repository: ndycode/codex-multi-auth

Length of output: 698


🏁 Script executed:

cat -n lib/storage/import-export.ts | sed -n '1,120p'

Repository: ndycode/codex-multi-auth

Length of output: 4717


🏁 Script executed:

cat -n test/storage-import-export.test.ts | sed -n '37,108p'

Repository: ndycode/codex-multi-auth

Length of output: 2594


🏁 Script executed:

rg "MAX_IMPORT_BYTES" -n

Repository: ndycode/codex-multi-auth

Length of output: 336


🏁 Script executed:

rg "handle\.(readFile|read)" -A 3 lib/storage/import-export.ts

Repository: ndycode/codex-multi-auth

Length of output: 226


🏁 Script executed:

cat -n test/storage-import-export.test.ts | sed -n '1,50p' | grep -E "(vitest|import|describe|it\()"

Repository: ndycode/codex-multi-auth

Length of output: 396


🏁 Script executed:

rg "isFile|isDirectory" lib/storage/import-export.ts

Repository: ndycode/codex-multi-auth

Length of output: 50


🏁 Script executed:

cat -n test/storage-import-export.test.ts | head -30

Repository: ndycode/codex-multi-auth

Length of output: 969


🏁 Script executed:

cat -n test/storage-import-export.test.ts | sed -n '30,52p'

Repository: ndycode/codex-multi-auth

Length of output: 850


🏁 Script executed:

rg "stat.*isFile\|isDirectory" lib/ -i

Repository: ndycode/codex-multi-auth

Length of output: 50


🏁 Script executed:

cat -n test/storage-import-export.test.ts | wc -l

Repository: ndycode/codex-multi-auth

Length of output: 71


🏁 Script executed:

rg "stat\(\)" lib/storage/import-export.ts -B 2 -A 10

Repository: ndycode/codex-multi-auth

Length of output: 460


enforce the cap while reading, not only before reading.

lib/storage/import-export.ts:96 checks stat.size, but lib/storage/import-export.ts:102 does unbounded readFile. a file can grow between stat() and readFile(), and non-regular files (device files, pipes, sockets) can report a small size while producing unbounded data. this leaves a concurrency and windows filesystem edge case outside test coverage at test/storage-import-export.test.ts:37-108.

add a regular-file check with !stats.isFile() before the size check. replace the unbounded readFile() with a bounded read loop that enforces MAX_IMPORT_BYTES + 1 limit, stopping early if exceeded.

add a vitest regression where stat() returns small size but the read path produces more than MAX_IMPORT_BYTES bytes. per coding guidelines, lib/** paths require focus on windows filesystem io and concurrency edge cases.

proposed bounded-read direction
 		const stats = await handle.stat();
+		if (!stats.isFile()) {
+			throw new Error(`Import path must be a regular file: ${params.resolvedPath}`);
+		}
 		if (stats.size > MAX_IMPORT_BYTES) {
 			throw new Error(
 				`Import file exceeds maximum size of ${MAX_IMPORT_BYTES} bytes: ${params.resolvedPath}`,
 			);
 		}
-		content = await handle.readFile({ encoding: "utf-8" });
+		const chunks: Buffer[] = [];
+		const buffer = Buffer.allocUnsafe(64 * 1024);
+		let totalBytes = 0;
+		let position = 0;
+		for (;;) {
+			const remaining = MAX_IMPORT_BYTES + 1 - totalBytes;
+			const { bytesRead } = await handle.read(
+				buffer,
+				0,
+				Math.min(buffer.length, remaining),
+				position,
+			);
+			if (bytesRead === 0) {
+				break;
+			}
+			totalBytes += bytesRead;
+			if (totalBytes > MAX_IMPORT_BYTES) {
+				throw new Error(
+					`Import file exceeds maximum size of ${MAX_IMPORT_BYTES} bytes: ${params.resolvedPath}`,
+				);
+			}
+			chunks.push(Buffer.from(buffer.subarray(0, bytesRead)));
+			position += bytesRead;
+		}
+		content = Buffer.concat(chunks, totalBytes).toString("utf-8");
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const stats = await handle.stat();
if (stats.size > MAX_IMPORT_BYTES) {
throw new Error(
`Import file exceeds maximum size of ${MAX_IMPORT_BYTES} bytes: ${params.resolvedPath}`,
);
}
content = await handle.readFile({ encoding: "utf-8" });
const stats = await handle.stat();
if (!stats.isFile()) {
throw new Error(`Import path must be a regular file: ${params.resolvedPath}`);
}
if (stats.size > MAX_IMPORT_BYTES) {
throw new Error(
`Import file exceeds maximum size of ${MAX_IMPORT_BYTES} bytes: ${params.resolvedPath}`,
);
}
const chunks: Buffer[] = [];
const buffer = Buffer.allocUnsafe(64 * 1024);
let totalBytes = 0;
let position = 0;
for (;;) {
const remaining = MAX_IMPORT_BYTES + 1 - totalBytes;
const { bytesRead } = await handle.read(
buffer,
0,
Math.min(buffer.length, remaining),
position,
);
if (bytesRead === 0) {
break;
}
totalBytes += bytesRead;
if (totalBytes > MAX_IMPORT_BYTES) {
throw new Error(
`Import file exceeds maximum size of ${MAX_IMPORT_BYTES} bytes: ${params.resolvedPath}`,
);
}
chunks.push(Buffer.from(buffer.subarray(0, bytesRead)));
position += bytesRead;
}
content = Buffer.concat(chunks, totalBytes).toString("utf-8");
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/storage/import-export.ts` around lines 96 - 102, The import path
currently uses handle.stat() and then unbounded handle.readFile(), which allows
files to grow or non-regular files to stream unlimited bytes; update
lib/storage/import-export.ts to first verify the entry is a regular file via
stats.isFile() (and throw if not), keep the existing stats.size check against
MAX_IMPORT_BYTES, and replace handle.readFile() with a bounded read loop that
reads at most MAX_IMPORT_BYTES + 1 bytes from the handle and stops early; if the
loop observes more than MAX_IMPORT_BYTES bytes, throw an error referencing
params.resolvedPath and MAX_IMPORT_BYTES. Also add a vitest regression that
stubs stat() to return a small size but makes the read path produce >
MAX_IMPORT_BYTES bytes to assert the new early-abort behavior.

Comment thread lib/unified-settings.ts
Comment on lines +398 to 418
if (options && "expectedMtimeMs" in options) {
let currentMtimeMs: number | null = null;
try {
currentMtimeMs = (await fs.stat(UNIFIED_SETTINGS_PATH)).mtimeMs;
} catch (error) {
const code = (error as NodeJS.ErrnoException).code;
if (code !== "ENOENT") {
throw error;
}
}
if (currentMtimeMs !== options.expectedMtimeMs) {
const staleError = new Error(
"Unified settings changed on disk during save; retrying with latest state.",
) as NodeJS.ErrnoException;
staleError.code = "ESTALE";
throw staleError;
}
}
for (let attempt = 0; attempt < 5; attempt += 1) {
try {
await fs.rename(tempPath, UNIFIED_SETTINGS_PATH);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

the optimistic check is still a lost-update race.

lib/unified-settings.ts:398 validates the mtime, but lib/unified-settings.ts:418 renames later. any external writer that commits between those operations is overwritten without an ESTALE; the current regression at test/unified-settings.test.ts:843 only injects before this check. use the config save lock across read/modify/write, add an interprocess lock, or document this as best-effort and add a regression for the post-stat/pre-rename race. As per coding guidelines, lib/**: focus on auth rotation, windows filesystem IO, and concurrency.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/unified-settings.ts` around lines 398 - 418, The optimistic mtime check
around UNIFIED_SETTINGS_PATH using fs.stat and options.expectedMtimeMs is still
vulnerable to a lost-update race because the code re-checks mtime before
fs.rename(tempPath, UNIFIED_SETTINGS_PATH) but does not hold a lock across the
read/modify/write window; fix by acquiring a save lock around the entire
read/modify/write (the code path that calls the mtime check and the rename in
this module), or implement an interprocess file lock that is held from the
moment you read stat through the rename, ensuring the check and
fs.rename(tempPath, UNIFIED_SETTINGS_PATH) are atomic relative to other
processes; alternatively, if you choose best-effort behavior, update the
function comment for this module and add a regression test that injects a
concurrent writer between the stat (options.expectedMtimeMs check) and the
fs.rename to assert ESTALE or document the limitation—focus changes around the
mtime check (options.expectedMtimeMs), the rename call (fs.rename(tempPath,
UNIFIED_SETTINGS_PATH)), and the retry loop that wraps them.

Comment on lines +604 to 654
it("should generate unique ids on repeat calls so retries do not overwrite (RPTU-001)", () => {
// Simulate two recovery passes on the same messageID: each invocation
// must stage and rename a DISTINCT target file, proving the synthetic
// thinking part from the first pass is preserved.
const sessionID = "s";
const messageID = "m";
const partDir = join(PART_STORAGE, messageID);
fsMock.existsSync.mockReturnValue(true);

expect(storage.prependThinkingPart(sessionID, messageID)).toBe(true);
expect(storage.prependThinkingPart(sessionID, messageID)).toBe(true);

expect(fsMock.writeFileSync).toHaveBeenCalledTimes(2);
expect(fsMock.renameSync).toHaveBeenCalledTimes(2);

const [firstTemp, firstPayload] =
fsMock.writeFileSync.mock.calls[0] ?? [];
const [secondTemp, secondPayload] =
fsMock.writeFileSync.mock.calls[1] ?? [];
const [, firstTarget] = fsMock.renameSync.mock.calls[0] ?? [];
const [, secondTarget] = fsMock.renameSync.mock.calls[1] ?? [];

// Temp staging paths must differ (otherwise two writers would race).
expect(firstTemp).not.toBe(secondTemp);
// Final target paths MUST differ so the second pass does not
// overwrite the first synthetic thinking part.
expect(firstTarget).not.toBe(secondTarget);

// Both payloads must carry their own unique id matching their final
// target filename, so readers see two distinct parts on disk.
const firstParsed = JSON.parse(firstPayload);
const secondParsed = JSON.parse(secondPayload);
expect(firstParsed.id).toMatch(/^prt_0000000000_thinking_[0-9a-f]+_[0-9a-z]+_[0-9a-z]+$/);
expect(secondParsed.id).toMatch(/^prt_0000000000_thinking_[0-9a-f]+_[0-9a-z]+_[0-9a-z]+$/);
expect(firstParsed.id).not.toBe(secondParsed.id);
expect(firstTarget).toContain(`${firstParsed.id}.json`);
expect(secondTarget).toContain(`${secondParsed.id}.json`);
});

it("should generate ids that sort before real generatePartId ids so orphan detection still sees thinking first", () => {
// findMessagesWithOrphanThinking sorts parts by id and checks the
// first element. Synthetic thinking ids MUST sort lexicographically
// before any id from generatePartId(), which starts with
// `prt_<hex_timestamp>` where the hex timestamp's leading digit is
// non-zero for any real-world time.
const thinkingId = storage.generateThinkingPartId();
const realId = storage.generatePartId();

expect(thinkingId.startsWith("prt_0000000000_thinking_")).toBe(true);
expect([thinkingId, realId].sort()).toEqual([thinkingId, realId]);
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

make the id regression deterministic by freezing time and randomness.

test/recovery-storage.test.ts:604-654 should pin Date.now() and Math.random() so the test reproduces the same-millisecond/same-random case that thinkingPartCounter is meant to handle. otherwise, a broken counter could be masked by real time or random output. As per coding guidelines, test/**: tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.

proposed deterministic vitest shape
 it("should generate unique ids on repeat calls so retries do not overwrite (RPTU-001)", () => {
+	const nowSpy = vi.spyOn(Date, "now").mockReturnValue(1700000000000);
+	const randomSpy = vi.spyOn(Math, "random").mockReturnValue(0.123456789);
 	// Simulate two recovery passes on the same messageID: each invocation
 	// must stage and rename a DISTINCT target file, proving the synthetic
 	// thinking part from the first pass is preserved.
-	const sessionID = "s";
-	const messageID = "m";
-	const partDir = join(PART_STORAGE, messageID);
-	fsMock.existsSync.mockReturnValue(true);
+	try {
+		const sessionID = "s";
+		const messageID = "m";
+		const partDir = join(PART_STORAGE, messageID);
+		fsMock.existsSync.mockReturnValue(true);
 
-	expect(storage.prependThinkingPart(sessionID, messageID)).toBe(true);
-	expect(storage.prependThinkingPart(sessionID, messageID)).toBe(true);
+		expect(storage.prependThinkingPart(sessionID, messageID)).toBe(true);
+		expect(storage.prependThinkingPart(sessionID, messageID)).toBe(true);
 
-	expect(fsMock.writeFileSync).toHaveBeenCalledTimes(2);
-	expect(fsMock.renameSync).toHaveBeenCalledTimes(2);
+		expect(fsMock.writeFileSync).toHaveBeenCalledTimes(2);
+		expect(fsMock.renameSync).toHaveBeenCalledTimes(2);
 
-	const [firstTemp, firstPayload] =
-		fsMock.writeFileSync.mock.calls[0] ?? [];
-	const [secondTemp, secondPayload] =
-		fsMock.writeFileSync.mock.calls[1] ?? [];
-	const [, firstTarget] = fsMock.renameSync.mock.calls[0] ?? [];
-	const [, secondTarget] = fsMock.renameSync.mock.calls[1] ?? [];
+		const [firstTemp, firstPayload] =
+			fsMock.writeFileSync.mock.calls[0] ?? [];
+		const [secondTemp, secondPayload] =
+			fsMock.writeFileSync.mock.calls[1] ?? [];
+		const [, firstTarget] = fsMock.renameSync.mock.calls[0] ?? [];
+		const [, secondTarget] = fsMock.renameSync.mock.calls[1] ?? [];
 
-	// Temp staging paths must differ (otherwise two writers would race).
-	expect(firstTemp).not.toBe(secondTemp);
-	// Final target paths MUST differ so the second pass does not
-	// overwrite the first synthetic thinking part.
-	expect(firstTarget).not.toBe(secondTarget);
+		expect(firstTemp).not.toBe(secondTemp);
+		expect(firstTarget).not.toBe(secondTarget);
 
-	// Both payloads must carry their own unique id matching their final
-	// target filename, so readers see two distinct parts on disk.
-	const firstParsed = JSON.parse(firstPayload);
-	const secondParsed = JSON.parse(secondPayload);
-	expect(firstParsed.id).toMatch(/^prt_0000000000_thinking_[0-9a-f]+_[0-9a-z]+_[0-9a-z]+$/);
-	expect(secondParsed.id).toMatch(/^prt_0000000000_thinking_[0-9a-f]+_[0-9a-z]+_[0-9a-z]+$/);
-	expect(firstParsed.id).not.toBe(secondParsed.id);
-	expect(firstTarget).toContain(`${firstParsed.id}.json`);
-	expect(secondTarget).toContain(`${secondParsed.id}.json`);
+		const firstParsed = JSON.parse(firstPayload);
+		const secondParsed = JSON.parse(secondPayload);
+		expect(firstParsed.id).not.toBe(secondParsed.id);
+		expect(firstTarget).toContain(`${firstParsed.id}.json`);
+		expect(secondTarget).toContain(`${secondParsed.id}.json`);
+	} finally {
+		nowSpy.mockRestore();
+		randomSpy.mockRestore();
+	}
 });
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
it("should generate unique ids on repeat calls so retries do not overwrite (RPTU-001)", () => {
// Simulate two recovery passes on the same messageID: each invocation
// must stage and rename a DISTINCT target file, proving the synthetic
// thinking part from the first pass is preserved.
const sessionID = "s";
const messageID = "m";
const partDir = join(PART_STORAGE, messageID);
fsMock.existsSync.mockReturnValue(true);
expect(storage.prependThinkingPart(sessionID, messageID)).toBe(true);
expect(storage.prependThinkingPart(sessionID, messageID)).toBe(true);
expect(fsMock.writeFileSync).toHaveBeenCalledTimes(2);
expect(fsMock.renameSync).toHaveBeenCalledTimes(2);
const [firstTemp, firstPayload] =
fsMock.writeFileSync.mock.calls[0] ?? [];
const [secondTemp, secondPayload] =
fsMock.writeFileSync.mock.calls[1] ?? [];
const [, firstTarget] = fsMock.renameSync.mock.calls[0] ?? [];
const [, secondTarget] = fsMock.renameSync.mock.calls[1] ?? [];
// Temp staging paths must differ (otherwise two writers would race).
expect(firstTemp).not.toBe(secondTemp);
// Final target paths MUST differ so the second pass does not
// overwrite the first synthetic thinking part.
expect(firstTarget).not.toBe(secondTarget);
// Both payloads must carry their own unique id matching their final
// target filename, so readers see two distinct parts on disk.
const firstParsed = JSON.parse(firstPayload);
const secondParsed = JSON.parse(secondPayload);
expect(firstParsed.id).toMatch(/^prt_0000000000_thinking_[0-9a-f]+_[0-9a-z]+_[0-9a-z]+$/);
expect(secondParsed.id).toMatch(/^prt_0000000000_thinking_[0-9a-f]+_[0-9a-z]+_[0-9a-z]+$/);
expect(firstParsed.id).not.toBe(secondParsed.id);
expect(firstTarget).toContain(`${firstParsed.id}.json`);
expect(secondTarget).toContain(`${secondParsed.id}.json`);
});
it("should generate ids that sort before real generatePartId ids so orphan detection still sees thinking first", () => {
// findMessagesWithOrphanThinking sorts parts by id and checks the
// first element. Synthetic thinking ids MUST sort lexicographically
// before any id from generatePartId(), which starts with
// `prt_<hex_timestamp>` where the hex timestamp's leading digit is
// non-zero for any real-world time.
const thinkingId = storage.generateThinkingPartId();
const realId = storage.generatePartId();
expect(thinkingId.startsWith("prt_0000000000_thinking_")).toBe(true);
expect([thinkingId, realId].sort()).toEqual([thinkingId, realId]);
});
it("should generate unique ids on repeat calls so retries do not overwrite (RPTU-001)", () => {
const nowSpy = vi.spyOn(Date, "now").mockReturnValue(1700000000000);
const randomSpy = vi.spyOn(Math, "random").mockReturnValue(0.123456789);
// Simulate two recovery passes on the same messageID: each invocation
// must stage and rename a DISTINCT target file, proving the synthetic
// thinking part from the first pass is preserved.
try {
const sessionID = "s";
const messageID = "m";
const partDir = join(PART_STORAGE, messageID);
fsMock.existsSync.mockReturnValue(true);
expect(storage.prependThinkingPart(sessionID, messageID)).toBe(true);
expect(storage.prependThinkingPart(sessionID, messageID)).toBe(true);
expect(fsMock.writeFileSync).toHaveBeenCalledTimes(2);
expect(fsMock.renameSync).toHaveBeenCalledTimes(2);
const [firstTemp, firstPayload] =
fsMock.writeFileSync.mock.calls[0] ?? [];
const [secondTemp, secondPayload] =
fsMock.writeFileSync.mock.calls[1] ?? [];
const [, firstTarget] = fsMock.renameSync.mock.calls[0] ?? [];
const [, secondTarget] = fsMock.renameSync.mock.calls[1] ?? [];
expect(firstTemp).not.toBe(secondTemp);
expect(firstTarget).not.toBe(secondTarget);
const firstParsed = JSON.parse(firstPayload);
const secondParsed = JSON.parse(secondPayload);
expect(firstParsed.id).not.toBe(secondParsed.id);
expect(firstTarget).toContain(`${firstParsed.id}.json`);
expect(secondTarget).toContain(`${secondParsed.id}.json`);
} finally {
nowSpy.mockRestore();
randomSpy.mockRestore();
}
});
it("should generate ids that sort before real generatePartId ids so orphan detection still sees thinking first", () => {
// findMessagesWithOrphanThinking sorts parts by id and checks the
// first element. Synthetic thinking ids MUST sort lexicographically
// before any id from generatePartId(), which starts with
// `prt_<hex_timestamp>` where the hex timestamp's leading digit is
// non-zero for any real-world time.
const thinkingId = storage.generateThinkingPartId();
const realId = storage.generatePartId();
expect(thinkingId.startsWith("prt_0000000000_thinking_")).toBe(true);
expect([thinkingId, realId].sort()).toEqual([thinkingId, realId]);
});
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/recovery-storage.test.ts` around lines 604 - 654, Pin Date.now() and
Math.random() in the two failing tests so the same-millisecond/same-random-path
is reproduced: inside the "should generate unique ids on repeat calls..." and
the "should generate ids that sort before real generatePartId ids..." tests, add
vitest spies like vi.spyOn(Date, "now").mockReturnValue(<fixedEpochMs>) and
vi.spyOn(Math, "random").mockReturnValue(<fixedRandom>) before calling
storage.prependThinkingPart / storage.generateThinkingPartId /
storage.generatePartId, and restore them after (mockRestore or in an afterEach)
so thinkingPartCounter behavior is deterministic and the assertions on IDs
always reproduce the regression.

Comment thread test/select.test.ts
Comment on lines +102 to +129
it("treats q hotkey as back/cancel", async () => {
const { select } = await import("../lib/ui/select.js");
const selectPromise = select(
[
{ label: "A", value: "a" },
{ label: "B", value: "b" },
],
{
message: "Pick",
},
);

await vi.advanceTimersByTimeAsync(130);
stdin.emit("data", Buffer.from("q", "utf8"));

const result = await selectPromise;
expect(result).toBeNull();
});

it("confirm returns false when q hotkey cancels", async () => {
const { confirm } = await import("../lib/ui/confirm.js");
const confirmPromise = confirm("Are you sure?");

await vi.advanceTimersByTimeAsync(130);
stdin.emit("data", Buffer.from("q", "utf8"));

await expect(confirmPromise).resolves.toBe(false);
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

add the onInput precedence regression for q.

test/select.test.ts:102 and test/select.test.ts:121 cover prompts with no custom onInput, but they do not reproduce the lib/ui/auth-menu.ts:548 caller that returns a typed cancel object for q. add a vitest case where onInput("q") returns a sentinel value and assert select() resolves to that value, not null.

proposed regression test
 	it("treats q hotkey as back/cancel", async () => {
 		const { select } = await import("../lib/ui/select.js");
 		const selectPromise = select(
 			[
@@
 		const result = await selectPromise;
 		expect(result).toBeNull();
 	});
+
+	it("lets onInput override the q hotkey", async () => {
+		const { select } = await import("../lib/ui/select.js");
+		const selectPromise = select(
+			[
+				{ label: "A", value: "a" },
+				{ label: "B", value: "b" },
+			],
+			{
+				message: "Pick",
+				onInput: (input) => {
+					if (input === "q") return "custom-cancel";
+					return undefined;
+				},
+			},
+		);
+
+		await vi.advanceTimersByTimeAsync(130);
+		stdin.emit("data", Buffer.from("q", "utf8"));
+
+		await expect(selectPromise).resolves.toBe("custom-cancel");
+	});
 
 	it("confirm returns false when q hotkey cancels", async () => {
 		const { confirm } = await import("../lib/ui/confirm.js");
 		const confirmPromise = confirm("Are you sure?");

As per coding guidelines, test/**: “tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior.”

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
it("treats q hotkey as back/cancel", async () => {
const { select } = await import("../lib/ui/select.js");
const selectPromise = select(
[
{ label: "A", value: "a" },
{ label: "B", value: "b" },
],
{
message: "Pick",
},
);
await vi.advanceTimersByTimeAsync(130);
stdin.emit("data", Buffer.from("q", "utf8"));
const result = await selectPromise;
expect(result).toBeNull();
});
it("confirm returns false when q hotkey cancels", async () => {
const { confirm } = await import("../lib/ui/confirm.js");
const confirmPromise = confirm("Are you sure?");
await vi.advanceTimersByTimeAsync(130);
stdin.emit("data", Buffer.from("q", "utf8"));
await expect(confirmPromise).resolves.toBe(false);
});
it("treats q hotkey as back/cancel", async () => {
const { select } = await import("../lib/ui/select.js");
const selectPromise = select(
[
{ label: "A", value: "a" },
{ label: "B", value: "b" },
],
{
message: "Pick",
},
);
await vi.advanceTimersByTimeAsync(130);
stdin.emit("data", Buffer.from("q", "utf8"));
const result = await selectPromise;
expect(result).toBeNull();
});
it("lets onInput override the q hotkey", async () => {
const { select } = await import("../lib/ui/select.js");
const selectPromise = select(
[
{ label: "A", value: "a" },
{ label: "B", value: "b" },
],
{
message: "Pick",
onInput: (input) => {
if (input === "q") return "custom-cancel";
return undefined;
},
},
);
await vi.advanceTimersByTimeAsync(130);
stdin.emit("data", Buffer.from("q", "utf8"));
await expect(selectPromise).resolves.toBe("custom-cancel");
});
it("confirm returns false when q hotkey cancels", async () => {
const { confirm } = await import("../lib/ui/confirm.js");
const confirmPromise = confirm("Are you sure?");
await vi.advanceTimersByTimeAsync(130);
stdin.emit("data", Buffer.from("q", "utf8"));
await expect(confirmPromise).resolves.toBe(false);
});
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/select.test.ts` around lines 102 - 129, Add a deterministic vitest
regression that verifies onInput precedence for the "q" hotkey: in
test/select.test.ts create a test that calls select(...) with an options.onInput
handler which detects input "q" and returns a sentinel cancel object (e.g., a
typed { cancelled: true } value), emit "q" on stdin after the same timer delay,
then assert that the resolved value from select() is that sentinel (not null).
Reference the existing select() import and the onInput option to replicate the
lib/ui/auth-menu.ts caller behavior so the test fails if onInput is ignored.

Comment on lines +3 to +20
const existsSyncMock = vi.fn();
const statMock = vi.fn();
const readFileMock = vi.fn();
const closeMock = vi.fn();
const openMock = vi.fn();

vi.mock("node:fs", () => ({
existsSync: existsSyncMock,
promises: {
stat: statMock,
readFile: readFileMock,
open: openMock,
mkdir: vi.fn(),
writeFile: vi.fn(),
rename: vi.fn(),
unlink: vi.fn(),
},
}));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

vitest vi.mock hoisted factory vi.hoisted variables before initialization

💡 Result:

In Vitest, vi.mock calls are hoisted to the top of the test file and executed before any imports or variable initializations. This causes a "Cannot access 'variable' before initialization" error if the mock factory references variables defined outside vi.hoisted. To fix this, define mock functions or variables inside vi.hoisted before the vi.mock call. vi.hoisted(factory) executes the factory early (before imports and mocks) and returns its values for later use. Example from official docs and Stack Overflow: const { setExMock } = vi.hoisted( => { return { setExMock: vi.fn } }) vi.mock('redis', => { return { createClient: => ({ connect: vi.fn, get: vi.fn, setEx: setExMock, // Now accessible }), } }) This pattern allows referencing hoisted mocks in the factory and reusing them in tests (e.g., expect(setExMock).toHaveBeenCalledWith(...)). Use vi.doMock instead if you need non-hoisted behavior, but it requires calling before imports. vi.hoisted is designed for side effects like mocks that must run before module evaluation.

Citations:


🏁 Script executed:

# First, check if the file exists and read the relevant lines
head -30 test/storage-import-export.test.ts

Repository: ndycode/codex-multi-auth

Length of output: 759


🏁 Script executed:

# Read the full test file to check for regression tests and test coverage
wc -l test/storage-import-export.test.ts && echo "---" && cat -n test/storage-import-export.test.ts

Repository: ndycode/codex-multi-auth

Length of output: 3683


move the mock functions into vi.hoisted.

test/storage-import-export.test.ts:9 uses a hoisted vi.mock factory that references const mocks from test/storage-import-export.test.ts:3-7. vitest hoists the factory before those initializers, so this suite will fail during module setup with a "cannot access before initialization" error instead of running the assertions.

the fix is to wrap mock creation in vi.hoisted() before the vi.mock() call:

vitest-safe mock setup
-const existsSyncMock = vi.fn();
-const statMock = vi.fn();
-const readFileMock = vi.fn();
-const closeMock = vi.fn();
-const openMock = vi.fn();
+const {
+	existsSyncMock,
+	statMock,
+	readFileMock,
+	closeMock,
+	openMock,
+} = vi.hoisted(() => ({
+	existsSyncMock: vi.fn(),
+	statMock: vi.fn(),
+	readFileMock: vi.fn(),
+	closeMock: vi.fn(),
+	openMock: vi.fn(),
+}));

additionally, this test suite lacks regression cases for windows filesystem behavior (path separators, case sensitivity) and concurrency races (simultaneous reads/writes). add tests that exercise these edge cases per the coding guidelines.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const existsSyncMock = vi.fn();
const statMock = vi.fn();
const readFileMock = vi.fn();
const closeMock = vi.fn();
const openMock = vi.fn();
vi.mock("node:fs", () => ({
existsSync: existsSyncMock,
promises: {
stat: statMock,
readFile: readFileMock,
open: openMock,
mkdir: vi.fn(),
writeFile: vi.fn(),
rename: vi.fn(),
unlink: vi.fn(),
},
}));
const {
existsSyncMock,
statMock,
readFileMock,
closeMock,
openMock,
} = vi.hoisted(() => ({
existsSyncMock: vi.fn(),
statMock: vi.fn(),
readFileMock: vi.fn(),
closeMock: vi.fn(),
openMock: vi.fn(),
}));
vi.mock("node:fs", () => ({
existsSync: existsSyncMock,
promises: {
stat: statMock,
readFile: readFileMock,
open: openMock,
mkdir: vi.fn(),
writeFile: vi.fn(),
rename: vi.fn(),
unlink: vi.fn(),
},
}));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/storage-import-export.test.ts` around lines 3 - 20, The vi.mock factory
is hoisted before the local const mocks (existsSyncMock, statMock, readFileMock,
closeMock, openMock) are initialized, causing a "cannot access before
initialization" error; move creation of those mocks into vi.hoisted() so they
are initialized before the hoisted vi.mock factory, e.g., wrap each mock
initializer with vi.hoisted(() => vi.fn()) and then call vi.mock(...) as shown,
and also add regression tests exercising Windows filesystem behavior (path
separators and case sensitivity) and concurrency race conditions (simultaneous
reads/writes) to the storage import/export test suite.

Comment on lines +533 to +570
it("retries sync backup snapshot copy on retryable fs errors without Atomics.wait", async () => {
const { getUnifiedSettingsPath } = await import("../lib/unified-settings.js");
const settingsPath = getUnifiedSettingsPath();
await fs.mkdir(dirname(settingsPath), { recursive: true });
await fs.writeFile(
settingsPath,
JSON.stringify({ version: 1, pluginConfig: { codexMode: true } }, null, 2),
"utf8",
);

const atomicsWaitSpy = vi.spyOn(Atomics, "wait");
vi.resetModules();
vi.doMock("node:fs", async () => {
const actual = await vi.importActual<typeof import("node:fs")>("node:fs");
let first = true;
return {
...actual,
copyFileSync: (...args: Parameters<typeof actual.copyFileSync>) => {
if (first) {
first = false;
const error = new Error("perm") as NodeJS.ErrnoException;
error.code = "EPERM";
throw error;
}
return actual.copyFileSync(...args);
},
};
});
try {
const { saveUnifiedPluginConfigSync } = await import("../lib/unified-settings.js");
saveUnifiedPluginConfigSync({ codexMode: true, retries: 1 });
expect(atomicsWaitSpy).not.toHaveBeenCalled();
} finally {
vi.doUnmock("node:fs");
vi.resetModules();
atomicsWaitSpy.mockRestore();
}
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

assert the backup copy was actually retried.

test/unified-settings.test.ts:533 can pass even if copyFileSync is never retried, because lib/unified-settings.ts:186 treats snapshot failure as best-effort and the save still succeeds. count attempts and verify the backup was created. As per coding guidelines, test/**: tests must stay deterministic and use vitest.

proposed test tightening
 	it("retries sync backup snapshot copy on retryable fs errors without Atomics.wait", async () => {
 		const { getUnifiedSettingsPath } = await import("../lib/unified-settings.js");
 		const settingsPath = getUnifiedSettingsPath();
@@
 
 		const atomicsWaitSpy = vi.spyOn(Atomics, "wait");
+		let copyAttempts = 0;
 		vi.resetModules();
 		vi.doMock("node:fs", async () => {
 			const actual = await vi.importActual<typeof import("node:fs")>("node:fs");
-			let first = true;
 			return {
 				...actual,
 				copyFileSync: (...args: Parameters<typeof actual.copyFileSync>) => {
-					if (first) {
-						first = false;
+					copyAttempts += 1;
+					if (copyAttempts === 1) {
 						const error = new Error("perm") as NodeJS.ErrnoException;
 						error.code = "EPERM";
 						throw error;
 					}
@@
 		try {
 			const { saveUnifiedPluginConfigSync } = await import("../lib/unified-settings.js");
 			saveUnifiedPluginConfigSync({ codexMode: true, retries: 1 });
 			expect(atomicsWaitSpy).not.toHaveBeenCalled();
+			expect(copyAttempts).toBe(2);
+			const backupRecord = JSON.parse(
+				await fs.readFile(`${settingsPath}.bak`, "utf8"),
+			) as { pluginConfig?: Record<string, unknown> };
+			expect(backupRecord.pluginConfig).toEqual({ codexMode: true });
 		} finally {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/unified-settings.test.ts` around lines 533 - 570, The test should assert
the backup copy was actually retried by counting copyFileSync attempts and
verifying the backup file exists after saveUnifiedPluginConfigSync; update the
mock in the test to increment a counter (or set a flag) inside the doMock for
node:fs's copyFileSync and after calling saveUnifiedPluginConfigSync({
codexMode: true, retries: 1 }) assert that the mock was called >1 time and that
a backup snapshot file (derived from getUnifiedSettingsPath()/settingsPath) was
created; keep the existing Atomics.wait spy and cleanup (vi.doUnmock,
vi.resetModules, atomicsWaitSpy.mockRestore) intact.

Comment on lines +813 to +898
it("retries plugin section write against newer on-disk record to avoid stale overwrite", async () => {
const {
getUnifiedSettingsPath,
saveUnifiedPluginConfig,
} = await import("../lib/unified-settings.js");

await fs.writeFile(
getUnifiedSettingsPath(),
JSON.stringify(
{
version: 1,
pluginConfig: { codexMode: false, retries: 9 },
dashboardDisplaySettings: {
menuShowLastUsed: true,
uiThemePreset: "green",
},
experimentalDraft: {
enabled: false,
panels: ["future"],
},
},
null,
2,
),
"utf8",
);

const originalWriteFile = fs.writeFile;
const writeSpy = vi.spyOn(fs, "writeFile");
let injectedConcurrentWrite = false;
writeSpy.mockImplementation(async (...args: Parameters<typeof fs.writeFile>) => {
const [filePath, data] = args;
const result = await originalWriteFile(...args);
if (!injectedConcurrentWrite && String(filePath).includes(".tmp")) {
injectedConcurrentWrite = true;
await originalWriteFile(
getUnifiedSettingsPath(),
`${JSON.stringify(
{
version: 1,
pluginConfig: { codexMode: false, retries: 9 },
dashboardDisplaySettings: {
menuShowLastUsed: false,
uiThemePreset: "purple",
},
experimentalDraft: {
enabled: true,
panels: ["fresh"],
},
},
null,
2,
)}\n`,
"utf8",
);
}
return result;
});

try {
await saveUnifiedPluginConfig({ codexMode: true, fetchTimeoutMs: 45_000 });
} finally {
writeSpy.mockRestore();
}

expect(injectedConcurrentWrite).toBe(true);
const parsed = JSON.parse(
await fs.readFile(getUnifiedSettingsPath(), "utf8"),
) as {
pluginConfig?: Record<string, unknown>;
dashboardDisplaySettings?: Record<string, unknown>;
experimentalDraft?: Record<string, unknown>;
};
expect(parsed.pluginConfig).toEqual({
codexMode: true,
fetchTimeoutMs: 45_000,
});
expect(parsed.dashboardDisplaySettings).toEqual({
menuShowLastUsed: false,
uiThemePreset: "purple",
});
expect(parsed.experimentalDraft).toEqual({
enabled: true,
panels: ["fresh"],
});
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

force a distinct mtime in the stale-write regression.

test/unified-settings.test.ts:843 injects the concurrent write immediately after the temp write. on windows or coarse-mtime filesystems, the primary mtime can match the earlier value, so lib/unified-settings.ts:408 may not raise ESTALE deterministically. force the injected record’s mtime forward before returning from the mock. As per coding guidelines, test/**: tests must stay deterministic and use vitest.

proposed deterministic mtime update
 				await originalWriteFile(
 					getUnifiedSettingsPath(),
 					`${JSON.stringify(
@@
 					)}\n`,
 					"utf8",
 				);
+				const forcedMtime = new Date(Date.now() + 5_000);
+				await fs.utimes(getUnifiedSettingsPath(), forcedMtime, forcedMtime);
 			}
 			return result;
 		});
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/unified-settings.test.ts` around lines 813 - 898, The test's injected
concurrent write can keep the same mtime as the temp write on coarse-mtime
platforms, making ESTALE nondeterministic; inside the
writeSpy.mockImplementation that performs the injected write (the block that
calls originalWriteFile when String(filePath).includes(".tmp")), after awaiting
originalWriteFile for the injected content call fs.utimes (or
fs.promises.utimes) on getUnifiedSettingsPath() to advance its mtime (e.g.
Date.now()+1000) before returning, so saveUnifiedPluginConfig's mtime check
reliably sees a newer on-disk record.

@ndycode ndycode merged commit ad486c4 into main Apr 18, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant