rollup: integrate open fix PRs into a single review branch#432
rollup: integrate open fix PRs into a single review branch#432
Conversation
Deep audit finding REQ-HIGH-01 (.sisyphus/notepads/deep-audit/reports/request.json). getRateLimitBackoff() already applies baseDelay * 2^(attempt-1) + jitter. getRateLimitBackoffWithReason() called calculateBackoffMs() which applied 2^(attempt-1) AGAIN, causing delays to hit the 60s maxBackoffMs cap after just 2 consecutive 429s instead of following the documented exponential curve. Fix: calculateBackoffMs now applies only the reason multiplier to the already-exponential delayMs. Cap via maxBackoffMs is preserved. Added regression test asserting 3-attempt delay stays well under cap and that attempt=1 to attempt=3 ratio is ~4x, not >16x.
Deep audit findings (2 CRITICAL) from .sisyphus/notepads/deep-audit/reports/codex-cli-manager.json. The following modules exist with full implementations + dedicated test files but are NEVER imported by production code. The dispatcher in codex-manager.ts routes 'codex auth doctor/fix/verify-flagged' to runRepairDoctor/runRepairFix/ runRepairVerifyFlagged from repair-commands.ts instead. auth-commands.ts similarly defines runAuthLogin but codex-manager.ts uses its own inline implementation. Any behavior divergence between the parallel implementations goes uncaught - tests validate the ghost, not the ship. Deleted: - lib/codex-manager/commands/doctor.ts (dead, 568 LOC) - lib/codex-manager/commands/fix.ts (dead, 564 LOC) - lib/codex-manager/commands/verify-flagged.ts (dead, 450 LOC) - lib/codex-manager/auth-commands.ts (dead, 885 LOC) - test/codex-manager-doctor-command.test.ts (ghost, 238 LOC) - test/codex-manager-fix-command.test.ts (ghost, 258 LOC) - test/codex-manager-verify-flagged-command.test.ts (ghost, 330 LOC) - test/codex-manager-auth-commands.test.ts (ghost, 952 LOC) Also removed the documentation.test.ts existence + content assertions that pinned auth-commands.ts. The live switch.ts retains the identical 'Missing index. Usage: codex auth switch <index>' content assertion, so user-visible copy drift is still guarded. Production runDoctor/runFix/runVerifyFlagged in repair-commands.ts unchanged. Production inline runAuthLogin in codex-manager.ts unchanged. User-visible behavior: IDENTICAL (these modules never ran). Verification (git grep on origin/main with production filter): - runDoctorCommand: only defined in commands/doctor.ts, no production callers. - runFixCommand: only defined in commands/fix.ts, no production callers. - runVerifyFlaggedCommand: only defined in commands/verify-flagged.ts, no production callers. - runAuthLogin (in auth-commands.ts): shadowed by inline live one in codex-manager.ts. - runSwitch / runBest (in auth-commands.ts): shadowed by runSwitchCommand/runBestCommand from commands/switch.ts and commands/best.ts. - persistAndSyncSelectedAccount (in auth-commands.ts): shadowed by inline live one in codex-manager.ts (referenced from commands/best.ts and commands/switch.ts via injected deps). - withSerializedBestLiveRun / clampPreservedActiveIndexByFamily: only referenced internally within auth-commands.ts. Test count change: 3529 -> 3492 (37 ghost tests removed; no real-coverage loss). Test file count: 232 -> 228 (-4 ghost test files). LOC removed: 4255 lines across 8 deleted files + 10 modified lines in documentation.test.ts.
…-001) PR #395 redacted OAuth URL query params in user-facing output but opaque refresh and access tokens can still leak via log messages when raw HTTP token-endpoint response bodies are concatenated into logError calls. The logger's TOKEN_PATTERNS recognises JWT (eyJ...), long hex, sk-* api keys, and Bearer substrings, but ChatGPT refresh tokens are opaque strings that do NOT match those patterns. Key-based redaction only fires when a token sits under a SENSITIVE_KEYS object key, not when it is interpolated into a free-form message string. Identified leak sites: - lib/auth/auth.ts exchangeAuthorizationCode: logError uses raw response body text after a failed /oauth/token request; the same text is returned verbatim as TokenResult.message which downstream code (codex-manager.ts logError paths) also writes to log output. - lib/auth/auth.ts refreshAccessToken: same pattern on refresh failure. Fix (targeted, no logger hot-path change): - Add sanitizeOAuthResponseBodyForLog helper in lib/auth/auth.ts. It JSON-parses the body and masks sensitive keys (refresh_token, access_token, id_token, camelCase variants, token, code, code_verifier). On non-JSON bodies it falls back to regex scrubs for JSON-style key:value pairs and urlencoded key=value pairs. - Route failed-refresh and failed-exchange bodies through the helper both into the logError message AND into the returned TokenResult.message so downstream consumers also see the sanitized form. Regression tests in test/auth.test.ts assert: - sanitizeOAuthResponseBodyForLog masks nested / camelCase / urlencoded / malformed-JSON token values while preserving non-sensitive fields and plain text. - refreshAccessToken and exchangeAuthorizationCode never pass the opaque token string into logError.mock.calls, and the returned TokenResult.message does not contain the opaque token value. - Non-sensitive content (error code, error_description) still reaches the log message. Verification: - npm run typecheck: clean - npm run lint: clean - npm test: 232 files / 3539 tests pass
Addresses REQ-HIGH-03 deep-audit finding. Previous: fullText += decoder.decode(...) caused O(n^2) string concatenation in convertSseToJson; MAX_SSE_SIZE check ran AFTER append so memory briefly held chunk + 10MB before throwing. Fix: accumulate chunks in string[] array; track running size; check size BEFORE append; final join() once at end. Linear time, bounded memory, size-check enforced pre-allocation. Test asserts pre-append throw when next chunk would exceed cap.
getRuntimeTrackerKey caches the resolved tracker key on the account via _runtimeTrackerKey. For accounts without accountId/email, the key falls back to the numeric account.index. When removeAccount splices the pool and reindexes survivors (acc.index = index), the cached numeric key is not invalidated, so later lookups keep consulting the pre-reindex position. Rotation, health, and token-bucket state for that account all mismatch the current pool order. Clear _runtimeTrackerKey on each survivor whose cached key was numeric, right after the reindex loop. Identity-based string keys remain stable because accountId/email are not affected by array-position changes, so they do not need invalidation. Regression test exercises the remove-then-select path: a pool of four accounts (mixed refresh-only and identity-bearing) primes the runtime tracker, records a health failure under the pre-reindex key, removes the identity-bearing account at index 1, and asserts the surviving refresh-only account reports its new numeric index (1) instead of the stale cached value (2). The identity account's string key is verified to survive the reindex unchanged. Audit reference: .sisyphus/notepads/deep-audit/reports/accounts-rotation.json HI-01
markSwitched updated currentAccountIndexByFamily but left cursorByFamily pointing at the pre-switch position. Subsequent round-robin passes (getCurrentOrNextForFamily / getNextForFamily) started from the stale cursor and could either re-pick the same slot or skip the just-switched account entirely, dropping the caller's explicit switch intent after a rate-limit-triggered rotation. Fix both markSwitched and its mutex-serialized sibling markSwitchedLocked to advance cursorByFamily[family] to `(account.index + 1) % count`, matching the convention already used in getCurrentOrNextForFamilyHybrid and the inner loop of getCurrentOrNextForFamily. No-op when the pool is empty. Adds a regression test that walks the cursor to a non-zero position, marks a different account as switched, and asserts the next rotation resumes AFTER the marked slot rather than from the stale cursor. Not covered by PR #399 (which normalized pointers on setAccountEnabled and getActiveIndexForFamily only).
… (RPTU-001) prependThinkingPart used a fixed part id (prt_0000000000_thinking) so two recovery passes on the same messageID would atomically overwrite the same synthetic thinking-part file, losing any prior state. Recovery paths retry on failure (recoverThinkingBlockOrder loops over orphan messages and the caller retries the request), so collisions were reachable in normal operation rather than only under hypothetical races. Introduces generateThinkingPartId() which keeps the prt_0000000000_thinking_ prefix so the synthetic part still sorts before real generatePartId() ids and preserves the prepend semantics relied on by findMessagesWithOrphanThinking and findMessageByIndexNeedingThinking. The id then appends a hex millisecond timestamp, a monotonic counter, and a short random suffix so every invocation writes a distinct file. Adds a regression test that invokes prependThinkingPart twice on the same sessionID/messageID and asserts that both calls stage and rename DIFFERENT target paths with distinct embedded ids, plus a sort-order test that guards the prepend invariant against future id-format drift. No behavior change for first-pass recovery. Retries and repeated passes now accumulate synthetic thinking parts instead of silently clobbering the earlier one, which matches the observable file-per-part model used by readParts and stripThinkingParts.
HI-05 deep audit finding flagged two missing tests:
1. selectHybridAccount returns null when all accounts unavailable
2. concurrency test exercising the hybrid selector under parallel mutation
Validation on origin/main found (1) already covered by test/rotation.test.ts
"returns null when all accounts are unavailable (AUDIT-H2 contract)".
The concurrency gap was real: rotation-concurrency.property.test.ts simulates
the critical-section invariants but never calls selectHybridAccount directly.
Add test/property/hybrid-selector-concurrency.property.test.ts covering:
- N parallel selectHybridAccount calls under withRoutingMutex("enabled") that
each mutate shared HealthScoreTracker / TokenBucketTracker / isAvailable
state, asserting no two winners return the same index.
- single-slot pool under N parallel callers yields exactly one winner; every
other caller observes the slot unavailable and receives null.
- N-slot pool saturates exactly once with overflow callers returning null.
- external concurrentCallers observer proves mutual exclusion holds.
Test-only change: no production code touched. Follows the pattern established
in test/property/rotation-concurrency.property.test.ts and uses
__resetRoutingMutexForTests for per-test isolation.
Addresses deep-audit auth finding on manual-paste state binding. The previous check only rejected on mismatch: if (parsed.state && parsed.state !== state) return null; which allowed a bare code with no state to bypass the state-binding check. Now the manual-paste path requires state presence and equality before the callback is accepted. Added regression test for missing state and updated stale auth-list expectation to match current main behavior.
…E-002) Addresses STORAGE-002 from the deep storage audit. readImportFile previously went straight from existsSync() to fs.readFile() with no size guard, allowing arbitrarily large JSON imports. Add a 4 MiB MAX_IMPORT_BYTES check via fs.stat() before readFile() and reject oversized imports before allocating the file contents. Added focused unit tests covering oversized rejection and under-limit success.
Addresses LIB-HIGH-002 from the deep top-level lib audit. The sync unified-settings retry paths used Atomics.wait(new SharedArrayBuffer) as a synchronous sleep in two places: - trySnapshotUnifiedSettingsBackupSync - writeSettingsRecordSync rename retry loop This blocks the event loop under EBUSY/EPERM conditions. Per user decision, we keep the sync API and drop the blocking sleep instead of refactoring the sync path to async. Retries now occur immediately in the sync path, while the async path retains await sleep(...). Added regression tests asserting retry succeeds and Atomics.wait is never called for sync rename/copy retry paths.
Addresses RPTU-003 from the deep recovery/UI audit. The UI help text advertised "Q Back" globally, but select() only handled arrow keys, enter, and escape. q/Q only worked if a caller manually wired onInput. confirm() uses select() and therefore silently ignored q. Fix: handle q/Q as a default cancel/back hotkey in select(). Added focused select() and confirm() regression tests.
Addresses the second auth HIGH finding from the deep audit. OAuthServerInfo.waitForCode(state) accepted an expected state parameter in the public contract and all callers passed it, but the implementation ignored the argument entirely and simply returned the first captured code. The HTTP callback handler already validated state before storing _lastCode, but waitForCode did not re-check the stored state, so the defense-in-depth contract was violated. Store _lastState alongside _lastCode and reject if it does not match the expectedState passed to waitForCode. Added a unit test covering the mismatch case and preserving current integration behavior.
…C-01) Addresses R-LOGIC-01 / R-TEST-01 from the deep runtime audit. The async unified-settings save paths read the current settings record, mutate one section, and write it back without checking whether another process wrote a newer settings.json in the meantime. That creates a stale-overwrite race. Fix: capture the observed settings mtime, check it immediately before rename inside writeSettingsRecordAsync, and on ESTALE re-read the latest settings record and re-apply the section-scoped mutation up to 3 times. Added a regression test that injects a concurrent external write between the read and write and verifies the unrelated sections from the external write are preserved while the intended pluginConfig mutation still lands.
…-HIGH-003) Addresses the actionable part of LIB-HIGH-003 from the deep top-level lib audit. The original finding mixed three concerns: ordering, idempotence, and signal/ beforeExit semantics. The smallest safe fix is to eliminate the actual race: concurrent runCleanup() calls can currently double-run cleanup handlers. Fix: cache the in-flight cleanup promise and return it to overlapping callers. Preserve current FIFO ordering and current signal/beforeExit semantics to avoid changing the shutdown contract in this PR. Added a focused regression test that concurrent callers receive the same in-flight promise and only one cleanup run executes.
Follow-up for PR #419 review comments. Invalidate cached numeric tracker keys was not sufficient on its own: the rotation health/token tracker maps still retained entries keyed by the old numeric slots, so a later refresh-only survivor could inherit another account's state after reindex. Fix: clear numeric-keyed tracker entries at or above the removed index for both HealthScoreTracker and TokenBucketTracker, then keep the regression test fully numeric by making the shifted survivor refresh-only and asserting both health and token state are pristine after reindex.
Responds to PR #421 review feedback. The HI-02 regression test already covered the legacy markSwitched path. This adds the same assertion through the mutex-serialized markSwitchedLocked path with routingMutexMode enabled so the concurrency variant cannot drift from the sync behavior.
Responds to PR #417 review feedback. The urlencoded regex in sanitizeOAuthResponseBodyForLog matched bare code= substrings inside longer parameter names like device_code, causing non-sensitive fields to be redacted. Anchor the regex to the start of the string or a query/field delimiter so only full parameter names are scrubbed. Added regression coverage for _code suffix keys.
Responds to PR #418 review feedback. The pre-append SSE size guard used decoded.length, which counts UTF-16 code units rather than bytes. Multi-byte UTF-8 chunks (emoji, many CJK chars) could therefore exceed MAX_SSE_SIZE without tripping the guard at the right point. Use Buffer.byteLength(decoded, 'utf8') for both the pre-append check and the running total, and add a regression test covering a multi-byte payload.
Responds to PR #423 review feedback. Freeze Date.now/Math.random in the double-prepend regression so the test catches timestamp/counter shape regressions deterministically while still allowing the short random suffix to vary within the expected base36 shape.
#425) Responds to PR #425 review feedback. - eliminate the stat/readFile TOCTOU gap by opening the import file once and performing both stat() and readFile() on the same handle - add exact-boundary coverage at MAX_IMPORT_BYTES - keep the oversized import rejection guarantee and verify the handle closes on both reject and success paths
… oauth bodies Responds to follow-up PR #417 review feedback. - extend OAUTH_SENSITIVE_BODY_KEYS with codeVerifier - recursively scrub token-like substrings inside parsed JSON string values so bodies like {"error_description":"... RT_ch_..."} do not leak - add regression coverage for both leak shapes
…429) Responds to the PR #429 review comment about the optimistic mtime check. The previous patch captured the expected mtime in a separate fs.stat() after the read, leaving a TOCTOU window between reading the file contents and sampling its metadata. readSettingsRecordAsyncInternal now uses the mtime from the same open file handle used for the async read and threads that value through the ESTALE retry loop. Regression test remains green.
…429) Responds to the PR #429 review comment. The optimistic mtime check was taking expectedMtimeMs after the async read, leaving a window where an external write could update the file between the read and the mtime snapshot. Swap the order so the expected mtime is sampled first, then the record is read, and mirror that order on ESTALE retry. This preserves the existing settings contract and test behavior while closing the specific stale-overwrite hole called out in review.
The rollup integration exposed that PR #419's branch updates accounts.ts to call HealthScoreTracker.clearNumericKeysAtOrAbove() and TokenBucketTracker.clearNumericKeysAtOrAbove(), but the helper methods never landed in lib/rotation.ts on that branch. Add the two tiny helper methods directly on the rollup branch so the full integration set typechecks and the rollup PR is self-contained. PR #419 itself should receive the same patch before final merge.
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
📝 WalkthroughPull Request Summary: Rollup Integration of 16 Fix PRsINTENTIONAL REFACTORING + CORE FIXES (LOW-TO-MEDIUM RISK): This rollup integrates fixes from PRs Core Fixes (High Priority)
Security Fix
Additional Fixes
Refactoring: Command Module Consolidation (Intentional)The four deleted command files were reorganized, not removed:
Test files were similarly consolidated:
CLI dispatch properly wired in
No loss of functionality; test coverage consolidated but ~800 lines reduced overall. The refactoring reduces API surface by consolidating related command logic. Verification Status
Architectural Notes
RecommendationSafe to merge. This rollup is a well-tested consolidation of 16 independent fixes plus an intentional refactoring that improves code organization. Core security, data-loss, and behavioral fixes are properly covered by new regression tests. The apparent large deletion is file reorganization, not feature removal. WalkthroughThis PR makes coordinated changes across account management, OAuth security, request handling, recovery, storage, and CLI scaffolding. Key changes: (1) invalidates stale numeric runtime tracker keys after account removal reindexing to prevent health/token score bleed; (2) sanitizes OAuth error responses in logs to prevent token leakage; (3) validates OAuth state parameter in local callback flow; (4) fixes double-exponential backoff in rate-limit delay calculation; (5) optimizes SSE response buffering with size cap enforcement before appending chunks; (6) generates unique synthetic thinking part IDs to prevent overwrites in recovery; (7) removes entire auth-commands, doctor, fix, and verify-flagged command modules; (8) adds file-size limits to import functionality; (9) ensures concurrent unified settings writes via mtime-based retry; (10) adds 'q' hotkey quit support to UI selects. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes high heterogeneity across independent bug fixes (backoff double-exponential, oauth state validation, runtime key invalidation), intricate patterns (mtime-based retry on concurrent writes, cursor synchronization with family-keyed tracking, mutex-protected selection), and large command module deletions spanning 15+ files. logic density is moderate-to-high in accounts/rotation (index reindex + tracker cleanup), unified-settings (mtime validation + retry), and rate-limit-backoff (parameter semantics change). separate reasoning required for each cohort; no single repeated refactor pattern. Suggested labels
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 14
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (5)
lib/request/rate-limit-backoff.ts (1)
319-336:⚠️ Potential issue | 🟡 Minorfresh attempt=1 now returns un-jittered delay while dedup'd duplicates return jittered delay — inconsistent observable output
at
lib/request/rate-limit-backoff.ts:319-336, the freshattempt === 1 && !isDuplicatebranch feedsnormalizedBaseDelay(un-jittered) intocalculateBackoffMs, while both the duplicate-window path and subsequent attempts feedresult.delayMs(which is the jittered value stored inrateLimitStateByAccountQuota). concretely:
- caller A hits a 429, gets back
normalizedBaseDelay * multiplier(no jitter).- caller B hits the same account/quota within
dedupWindowMs, gets backjitteredBaseDelay * multiplier.with
RATE_LIMIT_BACKOFF_JITTER_FACTOR = 0.2that is up to ±20% divergence between the seeding call and its dedup'd siblings, which defeats the whole point of dedup returning "the same" delay. the tests don't catch this becauseMath.randomis mocked to0.5so jitter collapses to zero (seetest/rate-limit-backoff.test.ts:18).either apply jitter to the fresh path too, or strip jitter from the duplicate path — whichever you prefer, but the two branches should agree. suggestion: keep it simple and use
result.delayMsfor both, since dedup semantics demand consistency andgetRateLimitBackoffalready stored the "deterministic portion" via theMath.max(baseDelay, jitteredDelay)floor.🔧 proposed fix
- const normalizedBaseDelay = normalizeDelayMs( - resolvedServerRetryAfterMs, - 1000, - ); - // For the first fresh attempt, pass the un-jittered normalized base so the - // deterministic portion of the reason multiplier is visible to callers. - // For subsequent attempts (or duplicate-window hits), `result.delayMs` - // already encodes the exponential progression + jitter from - // `getRateLimitBackoff`, and `calculateBackoffMs` intentionally applies - // only the reason multiplier so the exponential is NOT double-applied - // (see audit finding REQ-HIGH-01). - const adjustedDelay = calculateBackoffMs( - result.attempt === 1 && !result.isDuplicate - ? normalizedBaseDelay - : result.delayMs, - result.attempt, - resolvedReason, - ); + // `result.delayMs` already encodes the exponential + jitter produced by + // `getRateLimitBackoff`; `calculateBackoffMs` applies only the reason + // multiplier so the exponential is not double-applied (REQ-HIGH-01). + // Using `result.delayMs` uniformly keeps dedup'd siblings in agreement + // with the seeding call. + const adjustedDelay = calculateBackoffMs( + result.delayMs, + result.attempt, + resolvedReason, + );and add a regression test in
test/rate-limit-backoff.test.tswith a non-0.5Math.randomto assert the fresh call and a dedup'd call return the samedelayMs.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/request/rate-limit-backoff.ts` around lines 319 - 336, The branch in calculateBackoffMs currently feeds an un-jittered normalizedBaseDelay when result.attempt === 1 && !result.isDuplicate but uses result.delayMs (a jittered value) for duplicates/subsequent attempts, causing inconsistent outputs; change the condition in lib/request/rate-limit-backoff.ts so both fresh and duplicate paths pass result.delayMs into calculateBackoffMs (i.e., always use the stored jittered seed from rateLimitStateByAccountQuota/getRateLimitBackoff), and add a regression test in test/rate-limit-backoff.test.ts that sets Math.random to a non-0.5 value and asserts the fresh call and its dedup'd sibling return identical delayMs.lib/unified-settings.ts (1)
184-192:⚠️ Potential issue | 🟠 Majormake sync EBUSY/EPERM retries real, or fail fast.
after removing the wait,
lib/unified-settings.ts:184andlib/unified-settings.ts:342retry in a tight loop. a windows lock that clears a few ms later will still exhaust all attempts immediately, whiletest/unified-settings.test.ts:498andtest/unified-settings.test.ts:533only cover a one-shot mock failure. use a real bounded sync backoff strategy or stop treating the sync path as retrying transient locks. As per coding guidelines,lib/**: focus on auth rotation, windows filesystem IO, and concurrency.Also applies to: 342-351
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/unified-settings.ts` around lines 184 - 192, The tight retry loop around the synchronous copy (copyFileSync using UNIFIED_SETTINGS_PATH -> UNIFIED_SETTINGS_BACKUP_PATH guarded by isRetryableFsError) currently spins without delay and should be fixed: either remove retries and fail fast for the synchronous path, or convert the operation to an async retry with a bounded exponential/backoff (e.g., use fs.promises.copyFile, await small increasing delays between attempts) so transient Windows EBUSY/EPERM locks are actually waited for; apply the same change to the similar retry block at the second location (the retry loop around UNIFIED_SETTINGS_PATH/UNIFIED_SETTINGS_BACKUP_PATH at the later block referenced in the comment).test/codex-manager-cli.test.ts (1)
932-945:⚠️ Potential issue | 🟠 Majormake this status test control the storage-health state.
test/codex-manager-cli.test.ts:933still returnsnull, butlib/codex-manager/commands/status.ts:43-65treatsnullas the intentional-reset path when health is empty. the newhealthyassertion attest/codex-manager-cli.test.ts:944depends on the realinspectStorageHealth()filesystem probe because the storage mock attest/codex-manager-cli.test.ts:156-187does not override it. use an empty storage object and stub health explicitly, or keepnulland assert the reset output.proposed test fix
it("prints empty account status for auth list", async () => { - loadAccountsMock.mockResolvedValueOnce(null); + loadAccountsMock.mockResolvedValueOnce({ + version: 3, + activeIndex: 0, + activeIndexByFamily: { codex: 0 }, + accounts: [], + }); const logSpy = vi.spyOn(console, "log").mockImplementation(() => {}); + const storageModule = await import("../lib/storage.js"); + vi.spyOn(storageModule, "inspectStorageHealth").mockResolvedValueOnce({ + state: "healthy", + path: "/mock/openai-codex-accounts.json", + resetMarkerPath: "/mock/openai-codex-accounts.json.reset", + walPath: "/mock/openai-codex-accounts.json.wal", + hasResetMarker: false, + hasWal: false, + }); const { runCodexMultiAuthCli } = await import("../lib/codex-manager.js");As per coding guidelines,
test/**: tests must stay deterministic and use vitest.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/codex-manager-cli.test.ts` around lines 932 - 945, The test currently returns null from loadAccountsMock which triggers the "intentional-reset" branch in inspectStorageHealth; make the test deterministic by returning an empty storage object and stubbing the storage health probe: change loadAccountsMock.mockResolvedValueOnce(null) to loadAccountsMock.mockResolvedValueOnce({}) and mock inspectStorageHealth to return "healthy" (using vi.spyOn or the existing mock for inspectStorageHealth) so runCodexMultiAuthCli(["auth","list"]) yields the expected "Storage health: healthy" log; alternatively, keep null but update assertions to expect the reset output path instead of "healthy". Ensure you reference runCodexMultiAuthCli, loadAccountsMock, and inspectStorageHealth when applying the fix.test/server.unit.test.ts (1)
46-55: 🛠️ Refactor suggestion | 🟠 Majormockserver type is missing
_lastState; test relies on untyped writes.
test/server.unit.test.ts:46-49typesmockServerwith_lastCodeonly, buttest/server.unit.test.ts:315writesmockServer._lastState = 'other-state'. under strict typechecking this should be a property-access error — it only passes today because the inlinevi.mockfactory attest/server.unit.test.ts:11-17returns an object that flows throughas unknown as ...casts. please align the typed shape with the real server contract inlib/auth/server.ts:69-72so a future rename (e.g._lastState→_storedState) breaks this test loudly instead of silently no-opping the regression.proposed fix
let mockServer: ReturnType<typeof http.createServer> & { _handler?: (req: IncomingMessage, res: ServerResponse) => void; _lastCode?: string; + _lastState?: string; };and mirror the same field in the
vi.mock('node:http', ...)factory so the mock surface matcheslib/auth/server.ts.As per coding guidelines: "tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior."
Also applies to: 303-322
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/server.unit.test.ts` around lines 46 - 55, The test's mockServer type only declares _lastCode but tests assign mockServer._lastState; update the mock type and the vi.mock('node:http', ...) factory to include the same stored-state field(s) used by the real server (match lib/auth/server.ts's fields, e.g. add _lastState or _storedState alongside _lastCode) so the mock surface exactly mirrors the real server contract (adjust the type declaration for mockServer and the returned object from the inline vi.mock factory to include the same property name(s) found in lib/auth/server.ts:69-72).lib/ui/select.ts (1)
500-523:⚠️ Potential issue | 🟠 Majormove global
qcheck after customonInputhandler so auth-menu can override cancel behavior.lib/ui/select.ts:502-505 checks for
qand finishes before calling options.onInput at line 506. this blocks lib/ui/auth-menu.ts:548-575 from handling its ownq->{ type: "cancel" }flow. the code should let custom handlers seeqfirst, then fall back to global cancel for prompts without onInput.test/select.test.ts:102-119 covers
qwithout onInput (correctly expects null). test/auth-menu-hotkeys.test.ts:65-95 mocks select entirely, so it doesn't exercise the real select.ts code path. add a regression test in test/select.test.ts that provides an onInput handler that ignoresq, then confirms the global fallback still triggers.also wrap the
default:case in braces to fix biome noSwitchDeclarations warning at lib/ui/select.ts:500.proposed fix
case "escape-start": if (options.allowEscape !== false) { escapeTimeout = setTimeout(() => finish(null), ESCAPE_TIMEOUT_MS); } return; - default: + default: { const hotkey = decodeHotkeyInput(data); - if (hotkey && hotkey.toLowerCase() === "q") { - finish(null); - return; - } if (options.onInput) { if (hotkey) { rerenderRequested = false; const result = options.onInput(hotkey, { cursor, items, requestRerender, }); if (result !== undefined) { finish(result); return; } if (rerenderRequested) { render(); } } } + if (hotkey && hotkey.toLowerCase() === "q") { + finish(null); + return; + } return; + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/ui/select.ts` around lines 500 - 523, Move the global "q" cancel check so custom handlers get the hotkey first: inside the select.ts default case (around decodeHotkeyInput and options.onInput), call options.onInput(hotkey, ...) before performing the global if (hotkey.toLowerCase() === "q") finish(null) fallback; if onInput returns a defined result or handles rerenderRequested, act on that; otherwise fall through to the global cancel for prompts without onInput. Also wrap the default: case body in braces to satisfy the noSwitchDeclarations lint. Add a regression test in test/select.test.ts that supplies an onInput which ignores "q" (returns undefined) and asserts that the global fallback still triggers, and another test that supplies an onInput which handles "q" to ensure custom handlers can override cancel.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@lib/auth/auth.ts`:
- Around line 69-159: The fallback text-scrub in sanitizeOAuthResponseBodyForLog
misses token-like substrings when rawBody isn't JSON/URL-encoded; call
scrubTokenLikeSubstrings on the final scrubbed string before returning (so
sanitizeOAuthResponseBodyForLog runs scrubTokenLikeSubstrings(scrubbed) as a
last pass), referencing scrubTokenLikeSubstrings,
sanitizeOAuthResponseBodyForLog, and
OAUTH_SENSITIVE_BODY_KEYS/redactSensitiveFields to locate the JSON-path handling
and final regex loop; also add a unit test in test/auth.test.ts that verifies an
upstream plain-text body with an RT_ch_... substring is redacted.
In `@lib/auth/server.ts`:
- Around line 69-80: Add a short comment in waitForCode (around the compare at
the expectedState check) explaining the invariant that trackedServer._lastState
is always equal to the closure-captured state written earlier in
startLocalOAuthServer, and that the expectedState compare is a defense‑in‑depth
guard against caller misuse (not an attacker-controlled path); additionally, to
eliminate a potential TOCTOU race if someone later inserts async work,
read/snapshot both trackedServer._lastCode and trackedServer._lastState into
local variables in a single synchronous read before doing the state compare and
code handling inside waitForCode (referencing trackedServer._lastCode,
trackedServer._lastState, startLocalOAuthServer, and waitForCode).
In `@lib/request/rate-limit-backoff.ts`:
- Around line 246-256: This change to calculateBackoffMs (renamed parameter
_attempt and removal of exponential factor) is a public API behavior change;
confirm PR `#414` includes an explicit semver-major bump or an explicit migration
note in the changelog/PR description documenting the new semantics and the
retained-but-unused attempt parameter; if the PR lacks such documentation, add a
clear migration entry describing the old exponential behavior vs new
multiplier-only behavior, update lib/index.ts and lib/request/index.ts export
notes and release notes, and ensure tests (test/rate-limit-backoff.test.ts) and
the internal caller getRateLimitBackoff are cross-referenced to show the
intentional fix.
In `@lib/storage/import-export.ts`:
- Around line 96-102: The import path currently uses handle.stat() and then
unbounded handle.readFile(), which allows files to grow or non-regular files to
stream unlimited bytes; update lib/storage/import-export.ts to first verify the
entry is a regular file via stats.isFile() (and throw if not), keep the existing
stats.size check against MAX_IMPORT_BYTES, and replace handle.readFile() with a
bounded read loop that reads at most MAX_IMPORT_BYTES + 1 bytes from the handle
and stops early; if the loop observes more than MAX_IMPORT_BYTES bytes, throw an
error referencing params.resolvedPath and MAX_IMPORT_BYTES. Also add a vitest
regression that stubs stat() to return a small size but makes the read path
produce > MAX_IMPORT_BYTES bytes to assert the new early-abort behavior.
In `@lib/unified-settings.ts`:
- Around line 398-418: The optimistic mtime check around UNIFIED_SETTINGS_PATH
using fs.stat and options.expectedMtimeMs is still vulnerable to a lost-update
race because the code re-checks mtime before fs.rename(tempPath,
UNIFIED_SETTINGS_PATH) but does not hold a lock across the read/modify/write
window; fix by acquiring a save lock around the entire read/modify/write (the
code path that calls the mtime check and the rename in this module), or
implement an interprocess file lock that is held from the moment you read stat
through the rename, ensuring the check and fs.rename(tempPath,
UNIFIED_SETTINGS_PATH) are atomic relative to other processes; alternatively, if
you choose best-effort behavior, update the function comment for this module and
add a regression test that injects a concurrent writer between the stat
(options.expectedMtimeMs check) and the fs.rename to assert ESTALE or document
the limitation—focus changes around the mtime check (options.expectedMtimeMs),
the rename call (fs.rename(tempPath, UNIFIED_SETTINGS_PATH)), and the retry loop
that wraps them.
In `@test/accounts.test.ts`:
- Around line 1969-1971: The test passes a third argument to the AccountManager
constructor which is ignored because AccountManager currently defines only a
two-parameter constructor; change the test to construct the manager with two
args and then call the instance method setRoutingMutexMode("enabled") on the
created AccountManager (reference symbols: AccountManager constructor and
setRoutingMutexMode) so the routing mutex is actually enabled at runtime and the
test exercises the enabled mutex path.
In `@test/auth.test.ts`:
- Around line 749-959: Add two regression tests: (1) mock fetch for
refreshAccessToken/exchangeAuthorizationCode to return a urlencoded error body
like "error=invalid_grant&refresh_token=RT_ch_..." and assert the returned
result.message (and any logged error via loggerModule.logError spy) does not
contain the raw token while preserving other fields; (2) mock fetch to return a
plain-text non-JSON/no-urlencoded body that contains "refresh_token=RT_ch_..."
(or just the token substring) and assert sanitizeOAuthResponseBodyForLog is
applied so result.message and any loggedArgs do not contain the raw token but
may contain non-sensitive text. Use existing helpers/names
sanitizeOAuthResponseBodyForLog, refreshAccessToken, exchangeAuthorizationCode
and the logError spy pattern already in the file to locate where to add these
tests.
In `@test/property/hybrid-selector-concurrency.property.test.ts`:
- Around line 91-135: The test records an observation after releasing the mutex,
causing a race; move the history append into the critical section so predecessor
visibility is observed deterministically. Inside the withRoutingMutex callback
(the async block passed to withRoutingMutex), after computing
sawPredecessorWrite and before the callback returns, push the observation into
pool.history (use the same SelectionObservation shape) and then return it; keep
the existing concurrent-call accounting in finally but ensure the push happens
before leaving the critical section. Apply the same change to the other test
blocks in this file that append pool.history outside the mutex (the other
occurrences of appending pool.history after withRoutingMutex) so all history
writes occur while the mutex is held.
In `@test/rate-limit-backoff.test.ts`:
- Around line 349-405: Add two deterministic regression tests around
getRateLimitBackoffWithReason: (1) a jitter-parity case that mocks Math.random
to a non-0.5 value, calls getRateLimitBackoffWithReason once (fresh attempt=1)
then again within the dedupWindowMs to assert isDuplicate===true and delayMs
equality (verifies calculateBackoffMs(seed, ...) vs
calculateBackoffMs(result.delayMs, ...) parity), and (2) a max-cap saturation
case that walks consecutive attempts with reason="quota" (3.0× multiplier) and
baseDelayMs small enough to force exponential growth, asserting delayMs is
monotonic non-decreasing and clamps at maxBackoffMs; ensure each test uses
vi.useFakeTimers()/vi.setSystemTime and mocks Math.random to keep determinism.
In `@test/recovery-storage.test.ts`:
- Around line 604-654: Pin Date.now() and Math.random() in the two failing tests
so the same-millisecond/same-random-path is reproduced: inside the "should
generate unique ids on repeat calls..." and the "should generate ids that sort
before real generatePartId ids..." tests, add vitest spies like vi.spyOn(Date,
"now").mockReturnValue(<fixedEpochMs>) and vi.spyOn(Math,
"random").mockReturnValue(<fixedRandom>) before calling
storage.prependThinkingPart / storage.generateThinkingPartId /
storage.generatePartId, and restore them after (mockRestore or in an afterEach)
so thinkingPartCounter behavior is deterministic and the assertions on IDs
always reproduce the regression.
In `@test/select.test.ts`:
- Around line 102-129: Add a deterministic vitest regression that verifies
onInput precedence for the "q" hotkey: in test/select.test.ts create a test that
calls select(...) with an options.onInput handler which detects input "q" and
returns a sentinel cancel object (e.g., a typed { cancelled: true } value), emit
"q" on stdin after the same timer delay, then assert that the resolved value
from select() is that sentinel (not null). Reference the existing select()
import and the onInput option to replicate the lib/ui/auth-menu.ts caller
behavior so the test fails if onInput is ignored.
In `@test/storage-import-export.test.ts`:
- Around line 3-20: The vi.mock factory is hoisted before the local const mocks
(existsSyncMock, statMock, readFileMock, closeMock, openMock) are initialized,
causing a "cannot access before initialization" error; move creation of those
mocks into vi.hoisted() so they are initialized before the hoisted vi.mock
factory, e.g., wrap each mock initializer with vi.hoisted(() => vi.fn()) and
then call vi.mock(...) as shown, and also add regression tests exercising
Windows filesystem behavior (path separators and case sensitivity) and
concurrency race conditions (simultaneous reads/writes) to the storage
import/export test suite.
In `@test/unified-settings.test.ts`:
- Around line 813-898: The test's injected concurrent write can keep the same
mtime as the temp write on coarse-mtime platforms, making ESTALE
nondeterministic; inside the writeSpy.mockImplementation that performs the
injected write (the block that calls originalWriteFile when
String(filePath).includes(".tmp")), after awaiting originalWriteFile for the
injected content call fs.utimes (or fs.promises.utimes) on
getUnifiedSettingsPath() to advance its mtime (e.g. Date.now()+1000) before
returning, so saveUnifiedPluginConfig's mtime check reliably sees a newer
on-disk record.
- Around line 533-570: The test should assert the backup copy was actually
retried by counting copyFileSync attempts and verifying the backup file exists
after saveUnifiedPluginConfigSync; update the mock in the test to increment a
counter (or set a flag) inside the doMock for node:fs's copyFileSync and after
calling saveUnifiedPluginConfigSync({ codexMode: true, retries: 1 }) assert that
the mock was called >1 time and that a backup snapshot file (derived from
getUnifiedSettingsPath()/settingsPath) was created; keep the existing
Atomics.wait spy and cleanup (vi.doUnmock, vi.resetModules,
atomicsWaitSpy.mockRestore) intact.
---
Outside diff comments:
In `@lib/request/rate-limit-backoff.ts`:
- Around line 319-336: The branch in calculateBackoffMs currently feeds an
un-jittered normalizedBaseDelay when result.attempt === 1 && !result.isDuplicate
but uses result.delayMs (a jittered value) for duplicates/subsequent attempts,
causing inconsistent outputs; change the condition in
lib/request/rate-limit-backoff.ts so both fresh and duplicate paths pass
result.delayMs into calculateBackoffMs (i.e., always use the stored jittered
seed from rateLimitStateByAccountQuota/getRateLimitBackoff), and add a
regression test in test/rate-limit-backoff.test.ts that sets Math.random to a
non-0.5 value and asserts the fresh call and its dedup'd sibling return
identical delayMs.
In `@lib/ui/select.ts`:
- Around line 500-523: Move the global "q" cancel check so custom handlers get
the hotkey first: inside the select.ts default case (around decodeHotkeyInput
and options.onInput), call options.onInput(hotkey, ...) before performing the
global if (hotkey.toLowerCase() === "q") finish(null) fallback; if onInput
returns a defined result or handles rerenderRequested, act on that; otherwise
fall through to the global cancel for prompts without onInput. Also wrap the
default: case body in braces to satisfy the noSwitchDeclarations lint. Add a
regression test in test/select.test.ts that supplies an onInput which ignores
"q" (returns undefined) and asserts that the global fallback still triggers, and
another test that supplies an onInput which handles "q" to ensure custom
handlers can override cancel.
In `@lib/unified-settings.ts`:
- Around line 184-192: The tight retry loop around the synchronous copy
(copyFileSync using UNIFIED_SETTINGS_PATH -> UNIFIED_SETTINGS_BACKUP_PATH
guarded by isRetryableFsError) currently spins without delay and should be
fixed: either remove retries and fail fast for the synchronous path, or convert
the operation to an async retry with a bounded exponential/backoff (e.g., use
fs.promises.copyFile, await small increasing delays between attempts) so
transient Windows EBUSY/EPERM locks are actually waited for; apply the same
change to the similar retry block at the second location (the retry loop around
UNIFIED_SETTINGS_PATH/UNIFIED_SETTINGS_BACKUP_PATH at the later block referenced
in the comment).
In `@test/codex-manager-cli.test.ts`:
- Around line 932-945: The test currently returns null from loadAccountsMock
which triggers the "intentional-reset" branch in inspectStorageHealth; make the
test deterministic by returning an empty storage object and stubbing the storage
health probe: change loadAccountsMock.mockResolvedValueOnce(null) to
loadAccountsMock.mockResolvedValueOnce({}) and mock inspectStorageHealth to
return "healthy" (using vi.spyOn or the existing mock for inspectStorageHealth)
so runCodexMultiAuthCli(["auth","list"]) yields the expected "Storage health:
healthy" log; alternatively, keep null but update assertions to expect the reset
output path instead of "healthy". Ensure you reference runCodexMultiAuthCli,
loadAccountsMock, and inspectStorageHealth when applying the fix.
In `@test/server.unit.test.ts`:
- Around line 46-55: The test's mockServer type only declares _lastCode but
tests assign mockServer._lastState; update the mock type and the
vi.mock('node:http', ...) factory to include the same stored-state field(s) used
by the real server (match lib/auth/server.ts's fields, e.g. add _lastState or
_storedState alongside _lastCode) so the mock surface exactly mirrors the real
server contract (adjust the type declaration for mockServer and the returned
object from the inline vi.mock factory to include the same property name(s)
found in lib/auth/server.ts:69-72).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: d4bf5ace-6df9-4975-a4cc-434743b98d23
📒 Files selected for processing (33)
lib/accounts.tslib/auth/auth.tslib/auth/server.tslib/codex-manager.tslib/codex-manager/auth-commands.tslib/codex-manager/commands/doctor.tslib/codex-manager/commands/fix.tslib/codex-manager/commands/verify-flagged.tslib/recovery/storage.tslib/request/rate-limit-backoff.tslib/request/response-handler.tslib/rotation.tslib/shutdown.tslib/storage/import-export.tslib/ui/select.tslib/unified-settings.tstest/accounts.test.tstest/auth.test.tstest/codex-manager-auth-commands.test.tstest/codex-manager-cli.test.tstest/codex-manager-doctor-command.test.tstest/codex-manager-fix-command.test.tstest/codex-manager-verify-flagged-command.test.tstest/documentation.test.tstest/property/hybrid-selector-concurrency.property.test.tstest/rate-limit-backoff.test.tstest/recovery-storage.test.tstest/response-handler-sse-buffer.test.tstest/select.test.tstest/server.unit.test.tstest/shutdown.test.tstest/storage-import-export.test.tstest/unified-settings.test.ts
💤 Files with no reviewable changes (9)
- test/documentation.test.ts
- test/codex-manager-doctor-command.test.ts
- lib/codex-manager/commands/fix.ts
- test/codex-manager-auth-commands.test.ts
- test/codex-manager-fix-command.test.ts
- test/codex-manager-verify-flagged-command.test.ts
- lib/codex-manager/commands/doctor.ts
- lib/codex-manager/commands/verify-flagged.ts
- lib/codex-manager/auth-commands.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (2)
lib/**
⚙️ CodeRabbit configuration file
focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.
Files:
lib/storage/import-export.tslib/codex-manager.tslib/shutdown.tslib/ui/select.tslib/recovery/storage.tslib/rotation.tslib/auth/server.tslib/accounts.tslib/auth/auth.tslib/request/rate-limit-backoff.tslib/unified-settings.tslib/request/response-handler.ts
test/**
⚙️ CodeRabbit configuration file
tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.
Files:
test/server.unit.test.tstest/shutdown.test.tstest/recovery-storage.test.tstest/rate-limit-backoff.test.tstest/accounts.test.tstest/unified-settings.test.tstest/codex-manager-cli.test.tstest/response-handler-sse-buffer.test.tstest/property/hybrid-selector-concurrency.property.test.tstest/storage-import-export.test.tstest/auth.test.tstest/select.test.ts
🪛 ast-grep (0.42.1)
test/recovery-storage.test.ts
[warning] 586-588: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(
^${partDir.replace(/\\/g, "\\\\")}[\\\\/]prt_0000000000_thinking_[0-9a-f]+_[0-9a-z]+_[0-9a-z]+\\.json$,
)
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html
(regexp-from-variable)
lib/auth/auth.ts
[warning] 151-151: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(("${key}"\\s*:\\s*)"[^"]*", "g")
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html
(regexp-from-variable)
[warning] 154-154: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp((^|[?&\\s])(${key}=)[^&\\s]+, "g")
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html
(regexp-from-variable)
🪛 Biome (2.4.11)
lib/ui/select.ts
[error] 501-501: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.
(lint/correctness/noSwitchDeclarations)
🔇 Additional comments (13)
lib/shutdown.ts (1)
19-40: lgtm — concurrentrunCleanupcoalescing looks correct.snapshotting
cleanupFunctionsintofnsand truncating before the firstawaitatlib/shutdown.ts:25-26avoids re-running the same fns if a second caller arrives mid-flight, and the.finally()reset atlib/shutdown.ts:35-37keeps the module usable for subsequent shutdown cycles (e.g. the test suite'sbeforeEach). errors are swallowed per-fn so one bad cleanup can't abort the rest — appropriate for shutdown.one thing worth being aware of (not a blocker): any
registerCleanupcall made while a cleanup is in flight lands in the now-empty array and will only run on the nextrunCleanup()— fine for the signal path sinceprocess.exit(0)follows, but callers that register from within a cleanup fn won't see those fns executed in the same cycle. worth a one-liner in the jsdoc if you care.test/shutdown.test.ts (1)
68-86: good concurrency regression — covers the exact bug.
test/shutdown.test.ts:78-82pins down both invariants that matter:first === secondproves promise coalescing, andtoHaveBeenCalledTimes(1)proves the captured-snapshot path doesn't double-invoke. the blocker pattern keeps it deterministic without fake timers, which is the right call here.minor nit, not blocking: consider also asserting that a third
runCleanup()issued afterawait firstreturns a fresh promise (i.e.cleanupPromisewas reset by.finally()). thebeforeEachalready exercises the reset path implicitly, but an explicit assertion would lock in the full lifecycle against future refactors.lib/request/response-handler.ts (1)
769-803: lgtm — pre-append cap is correctly ordered and cancel-on-throw still holds.the throw at
lib/request/response-handler.ts:796-798fires beforechunks.push(decoded), so peak memory is bounded toMAX_SSE_SIZErather thanMAX_SSE_SIZE + chunk. the throw propagates through the outertryatlib/request/response-handler.ts:783into thecatchatlib/request/response-handler.ts:838, which still invokesreader.cancel(...)and thenreleaseLock()infinally— no reader-leak regression. usingBuffer.byteLength(decoded, "utf8")(notvalue.byteLength) is a deliberate choice and is what the new utf-8 byte-counting test attest/response-handler-sse-buffer.test.ts:116-141pins down.nit, non-blocking:
decoder.decode(value, { stream: true })will hold back trailing partial multi-byte bytes until the next chunk (or a finaldecoder.decode()flush, which this loop still doesn’t issue). that meanstotalSizetracks decoded utf-8 bytes rather than wire bytes and can lagvalue.byteLengthby up to 3 bytes across a chunk boundary. pre-existing behavior, not a regression from this PR — flagging only so it’s on the record.test/response-handler-sse-buffer.test.ts (1)
1-141: solid regression coverage for REQ-HIGH-03 — deterministic, vitest-native, no real secrets.the four cases (
test/response-handler-sse-buffer.test.ts:49-79,81-101,103-114,116-141) cover exactly the shapes the fix needs to hold: under-cap accumulation, overflow on nth chunk, overflow on 1st chunk, and utf-8 vs utf-16 byte accounting.MAX_SSE_SIZE_FOR_TESTdrift is self-detected by the "just under cap" case since any drop in the real cap would push the 9MB filler over.reader.read.toHaveBeenCalledTimes(...)+reader.cancelassertions correctly nail down that no reads happen past the pre-append rejection.two small nits (non-blocking):
test/response-handler-sse-buffer.test.ts:130-132uses a manualthrow new Error(...)for the test-setup sanity check. that bypasses vitest’s failure reporter — preferexpect(byteLength).toBeGreaterThan(MAX_SSE_SIZE_FOR_TEST)+expect(decodedLengthEstimate).toBeLessThan(byteLength)so the failure surfaces as a normal assertion with diff.- the "just under cap" case at
test/response-handler-sse-buffer.test.ts:49-79allocates ~9MB + json wrapping. fine on CI but consider tagging this file with a slightly higher per-test timeout if CI is ever run under memory pressure — not required today.scope note per coding guidelines: windows FS and auth-rotation concurrency aren’t expected in this file since REQ-HIGH-03 targets SSE buffering only; the relevant regression surface is byte-accounting + cancel, which is covered.
lib/recovery/storage.ts (2)
151-176: looks good: synthetic thinking ids now preserve ordering and avoid same-process clobbering.
lib/recovery/storage.ts:171keeps theprt_0000000000_thinking_prefix, sofindMessagesWithOrphanThinkingstill sees synthetic thinking first after sorting, whilelib/recovery/storage.ts:172-175makes repeat calls produce distinct ids. covered bytest/recovery-storage.test.ts:604-654. As per coding guidelines,lib/**: focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.
410-416: looks good:prependThinkingPartno longer targets a fixed filename.
lib/recovery/storage.ts:416removes the overwrite path for repeated recovery passes on the samemessageID; each write now gets its ownprt_0000000000_thinking_<...>.jsontarget. covered bytest/recovery-storage.test.ts:558-641. As per coding guidelines,lib/**: focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.lib/unified-settings.ts (1)
439-449: mtime helper fails closed.
lib/unified-settings.ts:439returnsnullonly forENOENTand rethrows other filesystem errors, so permission and lock failures stay visible to callers.lib/rotation.ts (1)
169-180: numeric tracker cleanup looks scoped correctly.
lib/rotation.ts:169andlib/rotation.ts:324only remove json-normalized numeric account slots at or above the removed index, while prefixed identity keys remain untouched. the vitest coverage attest/accounts.test.ts:2965exercises numeric reindex cleanup, andtest/accounts.test.ts:3743keeps identity-keyed tracker state stable.Also applies to: 324-335
lib/accounts.ts (2)
869-878: cursor sync matches the rotation invariant.
lib/accounts.ts:869andlib/accounts.ts:913now advancecursorByFamilyto the successor slot after updating the active family index, so the next round-robin pass resumes after the just-switched account.Also applies to: 913-919
1349-1367: numeric fallback keys are invalidated at the splice root.
lib/accounts.ts:1353clears numeric tracker entries before reindexing, andlib/accounts.ts:1365drops cached numeric_runtimeTrackerKeyvalues. this prevents shifted refresh-only accounts from consulting stale slot health/token state.test/accounts.test.ts (1)
2965-3078: numeric reindex regression is covered.
test/accounts.test.ts:2965primes numeric_runtimeTrackerKeyvalues, removes an earlier account, then verifies both health and token state no longer bleed into shifted numeric slots. this directly coverslib/accounts.ts:1353andlib/accounts.ts:1365.lib/codex-manager.ts (1)
1363-1364: strict state check lgtm.
lib/codex-manager.ts:1363-1364correctly rejects manual-callback input that is missingstateor whosestatedoes not match the one bound to this flow, which mirrors the local server'swaitForCode(expectedState)tightening inlib/auth/server.ts:102. no concerns.lib/storage/import-export.ts (1)
8-8: the 4 mib import cap is a good boundary.
lib/storage/import-export.ts:8matches the new vitest boundary coverage intest/storage-import-export.test.ts:37andtest/storage-import-export.test.ts:54.
| const OAUTH_SENSITIVE_BODY_KEYS = [ | ||
| "refresh_token", | ||
| "refreshToken", | ||
| "access_token", | ||
| "accessToken", | ||
| "id_token", | ||
| "idToken", | ||
| "codeVerifier", | ||
| "token", | ||
| "code", | ||
| "code_verifier", | ||
| ] as const; | ||
|
|
||
| function scrubTokenLikeSubstrings(value: string): string { | ||
| let scrubbed = value.replace( | ||
| /(\b(?:refresh|access|id)[_-]?token\s*[:=]\s*)([^\s,;"'}]{8,})/gi, | ||
| (_match, prefix) => `${prefix}***REDACTED***`, | ||
| ); | ||
| scrubbed = scrubbed.replace( | ||
| /\b(?:RT|AT)_ch_[A-Za-z0-9_-]{20,}\b/g, | ||
| "***REDACTED***", | ||
| ); | ||
| return scrubbed; | ||
| } | ||
|
|
||
| function redactSensitiveFields(value: unknown): unknown { | ||
| if (Array.isArray(value)) { | ||
| return value.map((item) => redactSensitiveFields(item)); | ||
| } | ||
| if (value !== null && typeof value === "object") { | ||
| const out: Record<string, unknown> = {}; | ||
| const sensitiveSet = new Set<string>(OAUTH_SENSITIVE_BODY_KEYS); | ||
| for (const [k, v] of Object.entries(value as Record<string, unknown>)) { | ||
| if (sensitiveSet.has(k)) { | ||
| out[k] = "***REDACTED***"; | ||
| } else { | ||
| out[k] = redactSensitiveFields(v); | ||
| } | ||
| } | ||
| return out; | ||
| } | ||
| if (typeof value === "string") { | ||
| return scrubTokenLikeSubstrings(value); | ||
| } | ||
| return value; | ||
| } | ||
|
|
||
| /** | ||
| * Scrub opaque tokens from a raw OAuth token-endpoint response body before | ||
| * interpolating it into a log message. | ||
| * | ||
| * Error and success responses from `/oauth/token` may contain `refresh_token`, | ||
| * `access_token`, or `id_token` values. ChatGPT refresh tokens are opaque | ||
| * high-entropy strings that do NOT match the logger's `TOKEN_PATTERNS` | ||
| * (JWT, long hex, `sk-*`, `Bearer <x>`), so they would be written verbatim | ||
| * to disk log files when a status-body string is concatenated into a | ||
| * `logError` message. | ||
| * | ||
| * Strategy: | ||
| * 1. If the body parses as JSON, walk it and mask sensitive keys. | ||
| * 2. Otherwise fall back to a targeted regex scrub of `"key":"value"` and | ||
| * `key=value` patterns for the known sensitive keys. | ||
| * | ||
| * The returned string is safe to interpolate into log messages. | ||
| */ | ||
| export function sanitizeOAuthResponseBodyForLog(rawBody: string): string { | ||
| if (!rawBody) return rawBody; | ||
| const trimmed = rawBody.trim(); | ||
| if (trimmed.length === 0) return rawBody; | ||
|
|
||
| if (trimmed.startsWith("{") || trimmed.startsWith("[")) { | ||
| try { | ||
| const parsed = JSON.parse(trimmed) as unknown; | ||
| const redacted = redactSensitiveFields(parsed); | ||
| return JSON.stringify(redacted); | ||
| } catch { | ||
| // Fall through to regex scrub for malformed JSON. | ||
| } | ||
| } | ||
|
|
||
| let scrubbed = rawBody; | ||
| for (const key of OAUTH_SENSITIVE_BODY_KEYS) { | ||
| // "key":"value" style (JSON-like text) | ||
| const jsonPattern = new RegExp(`("${key}"\\s*:\\s*)"[^"]*"`, "g"); | ||
| scrubbed = scrubbed.replace(jsonPattern, `$1"***REDACTED***"`); | ||
| // key=value style (urlencoded / query-string) | ||
| const urlPattern = new RegExp(`(^|[?&\\s])(${key}=)[^&\\s]+`, "g"); | ||
| scrubbed = scrubbed.replace(urlPattern, `$1$2***REDACTED***`); | ||
| } | ||
| return scrubbed; | ||
| } |
There was a problem hiding this comment.
sanitizer looks solid; one subtle gap worth covering.
lib/auth/auth.ts:134-159 does the right thing: structured json → recursive key redaction, otherwise a targeted regex scrub. two notes:
- ast-grep's redos hit on
lib/auth/auth.ts:152andlib/auth/auth.ts:155is a false positive —OAUTH_SENSITIVE_BODY_KEYSis a frozen const array of ascii identifiers, not user input, so no regex injection surface here. no action. - real gap: when the body is neither json nor urlencoded but is a loose text blob (e.g. a reverse proxy returns
"upstream error: refresh_token=RT_ch_...; trace=..."with leading plain text), the json path is skipped (doesn't start with{/[) and the urlencoded regex atlib/auth/auth.ts:155requires a^|[?&\s]boundary beforekey=. a body likeupstream error refresh_token=RT_ch_xxxx...with a space beforerefresh_tokenis covered by\s, butupstream:refresh_token=...(colon boundary) is not. thescrubTokenLikeSubstringshelper atlib/auth/auth.ts:82-92would save you here for theRT_ch_/AT_ch_shape, but it is only invoked fromredactSensitiveFieldson parsed-json strings, not on the raw-text fallback path.
please either (a) pipe the fallback output through scrubTokenLikeSubstrings as a final pass, or (b) loosen the boundary to also accept [:]. recommend (a) since it also catches opaque tokens that appear without a key= prefix.
proposed fix
let scrubbed = rawBody;
for (const key of OAUTH_SENSITIVE_BODY_KEYS) {
const jsonPattern = new RegExp(`("${key}"\\s*:\\s*)"[^"]*"`, "g");
scrubbed = scrubbed.replace(jsonPattern, `$1"***REDACTED***"`);
const urlPattern = new RegExp(`(^|[?&\\s])(${key}=)[^&\\s]+`, "g");
scrubbed = scrubbed.replace(urlPattern, `$1$2***REDACTED***`);
}
- return scrubbed;
+ return scrubTokenLikeSubstrings(scrubbed);
}and add a test in test/auth.test.ts alongside the LIB-HIGH-001 regression suite covering a non-json/non-urlencoded body carrying an opaque RT_ch_... substring.
As per coding guidelines: "check for logging that leaks tokens or emails."
🧰 Tools
🪛 ast-grep (0.42.1)
[warning] 151-151: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(("${key}"\\s*:\\s*)"[^"]*", "g")
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html
(regexp-from-variable)
[warning] 154-154: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp((^|[?&\\s])(${key}=)[^&\\s]+, "g")
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html
(regexp-from-variable)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@lib/auth/auth.ts` around lines 69 - 159, The fallback text-scrub in
sanitizeOAuthResponseBodyForLog misses token-like substrings when rawBody isn't
JSON/URL-encoded; call scrubTokenLikeSubstrings on the final scrubbed string
before returning (so sanitizeOAuthResponseBodyForLog runs
scrubTokenLikeSubstrings(scrubbed) as a last pass), referencing
scrubTokenLikeSubstrings, sanitizeOAuthResponseBodyForLog, and
OAUTH_SENSITIVE_BODY_KEYS/redactSensitiveFields to locate the JSON-path handling
and final regex loop; also add a unit test in test/auth.test.ts that verifies an
upstream plain-text body with an RT_ch_... substring is redacted.
| const trackedServer = server as http.Server & { | ||
| _lastCode?: string; | ||
| _lastState?: string; | ||
| }; | ||
| if (trackedServer._lastCode) { | ||
| logWarn( | ||
| "Duplicate OAuth callback received; preserving first authorization code", | ||
| ); | ||
| return; | ||
| } | ||
| trackedServer._lastCode = code; | ||
| trackedServer._lastState = state; |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
state pinning is correct but partially redundant; worth a comment.
lib/auth/server.ts:49 already rejects callbacks whose state doesn't match the closure-captured state, so by the time _lastState is written at lib/auth/server.ts:80, it is provably equal to state. that means the expectedState check inside waitForCode at lib/auth/server.ts:116-121 only fires when the caller in lib/codex-manager.ts:1894 or the runtime flows passes a value different from what it handed to startLocalOAuthServer({ state }) — i.e. a caller-side bug, not an attacker-controlled path.
that's still a reasonable defense-in-depth check for the rollup of #418, but please add a short comment at lib/auth/server.ts:116 spelling out the invariant (_lastState is always equal to the server-closure state; the compare guards against caller misuse) so a future refactor doesn't conclude the check is dead code and delete it.
also, one concurrency nit worth calling out for windows/EBUSY-prone environments per the repo guidelines: waitForCode polls _lastCode every 100ms on a single-threaded event loop, so there's no TOCTOU here, but if anyone later adds an async step between the lastCode read at lib/auth/server.ts:114 and the state compare at :116, this becomes racy. leave a comment or snapshot both fields in one read:
suggested tightening
- const lastCode = trackedServer._lastCode;
- if (lastCode) {
- if (trackedServer._lastState !== expectedState) {
+ const { _lastCode: lastCode, _lastState: lastState } =
+ trackedServer;
+ if (lastCode) {
+ if (lastState !== expectedState) {
logWarn(
"Discarding OAuth callback due to state mismatch in waitForCode",
);
return null;
}
return { code: lastCode };
}As per coding guidelines: "focus on auth rotation, windows filesystem IO, and concurrency."
Also applies to: 102-128, 148-148
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@lib/auth/server.ts` around lines 69 - 80, Add a short comment in waitForCode
(around the compare at the expectedState check) explaining the invariant that
trackedServer._lastState is always equal to the closure-captured state written
earlier in startLocalOAuthServer, and that the expectedState compare is a
defense‑in‑depth guard against caller misuse (not an attacker-controlled path);
additionally, to eliminate a potential TOCTOU race if someone later inserts
async work, read/snapshot both trackedServer._lastCode and
trackedServer._lastState into local variables in a single synchronous read
before doing the state compare and code handling inside waitForCode (referencing
trackedServer._lastCode, trackedServer._lastState, startLocalOAuthServer, and
waitForCode).
| export function calculateBackoffMs( | ||
| baseDelayMs: number, | ||
| attempt: number, | ||
| _attempt: number, | ||
| reason: RateLimitReason = "unknown", | ||
| ): number { | ||
| const multiplier = BACKOFF_MULTIPLIERS[reason] ?? 1.0; | ||
| const exponentialDelay = baseDelayMs * Math.pow(2, attempt - 1); | ||
| return Math.min( | ||
| Math.floor(exponentialDelay * multiplier), | ||
| Math.max(0, Math.floor(baseDelayMs * multiplier)), | ||
| rateLimitBackoffConfig.maxBackoffMs, | ||
| ); | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# find all callers of calculateBackoffMs across the repo and the arguments they pass
rg -nP --type=ts -C3 '\bcalculateBackoffMs\s*\('Repository: ndycode/codex-multi-auth
Length of output: 5757
🏁 Script executed:
# Check if calculateBackoffMs is re-exported in any barrel files or index files
rg -nP --type=ts '\bcalculateBackoffMs\b' --glob='**/index.ts' --glob='**/index.js' --glob='**/*.d.ts'Repository: ndycode/codex-multi-auth
Length of output: 104
🏁 Script executed:
# Check package.json for exports field that might expose this function
cat package.json | jq '.exports // "no exports field"' 2>/dev/null || echo "checking manually..."Repository: ndycode/codex-multi-auth
Length of output: 962
🏁 Script executed:
# Look for any comments or documentation about this being a public API
rg -nP --type=ts 'calculateBackoffMs' lib/request/rate-limit-backoff.ts -A5 -B5 | head -40Repository: ndycode/codex-multi-auth
Length of output: 1212
🏁 Script executed:
# Check if calculateBackoffMs is exported from lib/request/index.ts
rg -nP 'calculateBackoffMs' lib/request/index.tsRepository: ndycode/codex-multi-auth
Length of output: 50
🏁 Script executed:
# Check what IS exported from lib/request/index.ts
cat lib/request/index.tsRepository: ndycode/codex-multi-auth
Length of output: 223
public api surface confirmed, but behavior change is documented as intentional fix
calculateBackoffMs is re-exported in the public API (lib/index.ts:13, lib/request/index.ts). the parameter rename to _attempt and behavior change from baseDelayMs * 2^(attempt-1) * multiplier to baseDelayMs * multiplier do constitute a breaking change for external consumers.
however, the JSDoc at lib/request/rate-limit-backoff.ts:243-245 explicitly documents that attempt is "retained for API compatibility" and the comment at lines 327-329 references audit finding REQ-HIGH-01, indicating this was a deliberate fix for a double-exponential bug that "saturated maxBackoffMs after ~2 retries."
only one internal call exists (lib/request/rate-limit-backoff.ts:330), which already receives a pre-exponential value from getRateLimitBackoff, so it correctly applies only the reason multiplier. all test cases in test/rate-limit-backoff.test.ts:152-210 explicitly validate the new behavior and the fix intent.
confirm whether this breaking change is covered by a semver bump or migration guide in the PR notes for #414, since external consumers relying on the old exponential progression would break silently.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@lib/request/rate-limit-backoff.ts` around lines 246 - 256, This change to
calculateBackoffMs (renamed parameter _attempt and removal of exponential
factor) is a public API behavior change; confirm PR `#414` includes an explicit
semver-major bump or an explicit migration note in the changelog/PR description
documenting the new semantics and the retained-but-unused attempt parameter; if
the PR lacks such documentation, add a clear migration entry describing the old
exponential behavior vs new multiplier-only behavior, update lib/index.ts and
lib/request/index.ts export notes and release notes, and ensure tests
(test/rate-limit-backoff.test.ts) and the internal caller getRateLimitBackoff
are cross-referenced to show the intentional fix.
| const stats = await handle.stat(); | ||
| if (stats.size > MAX_IMPORT_BYTES) { | ||
| throw new Error( | ||
| `Import file exceeds maximum size of ${MAX_IMPORT_BYTES} bytes: ${params.resolvedPath}`, | ||
| ); | ||
| } | ||
| content = await handle.readFile({ encoding: "utf-8" }); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cd /tmp/repo 2>/dev/null || cd . && git ls-files lib/storage/ | head -20Repository: ndycode/codex-multi-auth
Length of output: 698
🏁 Script executed:
cat -n lib/storage/import-export.ts | sed -n '1,120p'Repository: ndycode/codex-multi-auth
Length of output: 4717
🏁 Script executed:
cat -n test/storage-import-export.test.ts | sed -n '37,108p'Repository: ndycode/codex-multi-auth
Length of output: 2594
🏁 Script executed:
rg "MAX_IMPORT_BYTES" -nRepository: ndycode/codex-multi-auth
Length of output: 336
🏁 Script executed:
rg "handle\.(readFile|read)" -A 3 lib/storage/import-export.tsRepository: ndycode/codex-multi-auth
Length of output: 226
🏁 Script executed:
cat -n test/storage-import-export.test.ts | sed -n '1,50p' | grep -E "(vitest|import|describe|it\()"Repository: ndycode/codex-multi-auth
Length of output: 396
🏁 Script executed:
rg "isFile|isDirectory" lib/storage/import-export.tsRepository: ndycode/codex-multi-auth
Length of output: 50
🏁 Script executed:
cat -n test/storage-import-export.test.ts | head -30Repository: ndycode/codex-multi-auth
Length of output: 969
🏁 Script executed:
cat -n test/storage-import-export.test.ts | sed -n '30,52p'Repository: ndycode/codex-multi-auth
Length of output: 850
🏁 Script executed:
rg "stat.*isFile\|isDirectory" lib/ -iRepository: ndycode/codex-multi-auth
Length of output: 50
🏁 Script executed:
cat -n test/storage-import-export.test.ts | wc -lRepository: ndycode/codex-multi-auth
Length of output: 71
🏁 Script executed:
rg "stat\(\)" lib/storage/import-export.ts -B 2 -A 10Repository: ndycode/codex-multi-auth
Length of output: 460
enforce the cap while reading, not only before reading.
lib/storage/import-export.ts:96 checks stat.size, but lib/storage/import-export.ts:102 does unbounded readFile. a file can grow between stat() and readFile(), and non-regular files (device files, pipes, sockets) can report a small size while producing unbounded data. this leaves a concurrency and windows filesystem edge case outside test coverage at test/storage-import-export.test.ts:37-108.
add a regular-file check with !stats.isFile() before the size check. replace the unbounded readFile() with a bounded read loop that enforces MAX_IMPORT_BYTES + 1 limit, stopping early if exceeded.
add a vitest regression where stat() returns small size but the read path produces more than MAX_IMPORT_BYTES bytes. per coding guidelines, lib/** paths require focus on windows filesystem io and concurrency edge cases.
proposed bounded-read direction
const stats = await handle.stat();
+ if (!stats.isFile()) {
+ throw new Error(`Import path must be a regular file: ${params.resolvedPath}`);
+ }
if (stats.size > MAX_IMPORT_BYTES) {
throw new Error(
`Import file exceeds maximum size of ${MAX_IMPORT_BYTES} bytes: ${params.resolvedPath}`,
);
}
- content = await handle.readFile({ encoding: "utf-8" });
+ const chunks: Buffer[] = [];
+ const buffer = Buffer.allocUnsafe(64 * 1024);
+ let totalBytes = 0;
+ let position = 0;
+ for (;;) {
+ const remaining = MAX_IMPORT_BYTES + 1 - totalBytes;
+ const { bytesRead } = await handle.read(
+ buffer,
+ 0,
+ Math.min(buffer.length, remaining),
+ position,
+ );
+ if (bytesRead === 0) {
+ break;
+ }
+ totalBytes += bytesRead;
+ if (totalBytes > MAX_IMPORT_BYTES) {
+ throw new Error(
+ `Import file exceeds maximum size of ${MAX_IMPORT_BYTES} bytes: ${params.resolvedPath}`,
+ );
+ }
+ chunks.push(Buffer.from(buffer.subarray(0, bytesRead)));
+ position += bytesRead;
+ }
+ content = Buffer.concat(chunks, totalBytes).toString("utf-8");📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const stats = await handle.stat(); | |
| if (stats.size > MAX_IMPORT_BYTES) { | |
| throw new Error( | |
| `Import file exceeds maximum size of ${MAX_IMPORT_BYTES} bytes: ${params.resolvedPath}`, | |
| ); | |
| } | |
| content = await handle.readFile({ encoding: "utf-8" }); | |
| const stats = await handle.stat(); | |
| if (!stats.isFile()) { | |
| throw new Error(`Import path must be a regular file: ${params.resolvedPath}`); | |
| } | |
| if (stats.size > MAX_IMPORT_BYTES) { | |
| throw new Error( | |
| `Import file exceeds maximum size of ${MAX_IMPORT_BYTES} bytes: ${params.resolvedPath}`, | |
| ); | |
| } | |
| const chunks: Buffer[] = []; | |
| const buffer = Buffer.allocUnsafe(64 * 1024); | |
| let totalBytes = 0; | |
| let position = 0; | |
| for (;;) { | |
| const remaining = MAX_IMPORT_BYTES + 1 - totalBytes; | |
| const { bytesRead } = await handle.read( | |
| buffer, | |
| 0, | |
| Math.min(buffer.length, remaining), | |
| position, | |
| ); | |
| if (bytesRead === 0) { | |
| break; | |
| } | |
| totalBytes += bytesRead; | |
| if (totalBytes > MAX_IMPORT_BYTES) { | |
| throw new Error( | |
| `Import file exceeds maximum size of ${MAX_IMPORT_BYTES} bytes: ${params.resolvedPath}`, | |
| ); | |
| } | |
| chunks.push(Buffer.from(buffer.subarray(0, bytesRead))); | |
| position += bytesRead; | |
| } | |
| content = Buffer.concat(chunks, totalBytes).toString("utf-8"); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@lib/storage/import-export.ts` around lines 96 - 102, The import path
currently uses handle.stat() and then unbounded handle.readFile(), which allows
files to grow or non-regular files to stream unlimited bytes; update
lib/storage/import-export.ts to first verify the entry is a regular file via
stats.isFile() (and throw if not), keep the existing stats.size check against
MAX_IMPORT_BYTES, and replace handle.readFile() with a bounded read loop that
reads at most MAX_IMPORT_BYTES + 1 bytes from the handle and stops early; if the
loop observes more than MAX_IMPORT_BYTES bytes, throw an error referencing
params.resolvedPath and MAX_IMPORT_BYTES. Also add a vitest regression that
stubs stat() to return a small size but makes the read path produce >
MAX_IMPORT_BYTES bytes to assert the new early-abort behavior.
| if (options && "expectedMtimeMs" in options) { | ||
| let currentMtimeMs: number | null = null; | ||
| try { | ||
| currentMtimeMs = (await fs.stat(UNIFIED_SETTINGS_PATH)).mtimeMs; | ||
| } catch (error) { | ||
| const code = (error as NodeJS.ErrnoException).code; | ||
| if (code !== "ENOENT") { | ||
| throw error; | ||
| } | ||
| } | ||
| if (currentMtimeMs !== options.expectedMtimeMs) { | ||
| const staleError = new Error( | ||
| "Unified settings changed on disk during save; retrying with latest state.", | ||
| ) as NodeJS.ErrnoException; | ||
| staleError.code = "ESTALE"; | ||
| throw staleError; | ||
| } | ||
| } | ||
| for (let attempt = 0; attempt < 5; attempt += 1) { | ||
| try { | ||
| await fs.rename(tempPath, UNIFIED_SETTINGS_PATH); |
There was a problem hiding this comment.
the optimistic check is still a lost-update race.
lib/unified-settings.ts:398 validates the mtime, but lib/unified-settings.ts:418 renames later. any external writer that commits between those operations is overwritten without an ESTALE; the current regression at test/unified-settings.test.ts:843 only injects before this check. use the config save lock across read/modify/write, add an interprocess lock, or document this as best-effort and add a regression for the post-stat/pre-rename race. As per coding guidelines, lib/**: focus on auth rotation, windows filesystem IO, and concurrency.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@lib/unified-settings.ts` around lines 398 - 418, The optimistic mtime check
around UNIFIED_SETTINGS_PATH using fs.stat and options.expectedMtimeMs is still
vulnerable to a lost-update race because the code re-checks mtime before
fs.rename(tempPath, UNIFIED_SETTINGS_PATH) but does not hold a lock across the
read/modify/write window; fix by acquiring a save lock around the entire
read/modify/write (the code path that calls the mtime check and the rename in
this module), or implement an interprocess file lock that is held from the
moment you read stat through the rename, ensuring the check and
fs.rename(tempPath, UNIFIED_SETTINGS_PATH) are atomic relative to other
processes; alternatively, if you choose best-effort behavior, update the
function comment for this module and add a regression test that injects a
concurrent writer between the stat (options.expectedMtimeMs check) and the
fs.rename to assert ESTALE or document the limitation—focus changes around the
mtime check (options.expectedMtimeMs), the rename call (fs.rename(tempPath,
UNIFIED_SETTINGS_PATH)), and the retry loop that wraps them.
| it("should generate unique ids on repeat calls so retries do not overwrite (RPTU-001)", () => { | ||
| // Simulate two recovery passes on the same messageID: each invocation | ||
| // must stage and rename a DISTINCT target file, proving the synthetic | ||
| // thinking part from the first pass is preserved. | ||
| const sessionID = "s"; | ||
| const messageID = "m"; | ||
| const partDir = join(PART_STORAGE, messageID); | ||
| fsMock.existsSync.mockReturnValue(true); | ||
|
|
||
| expect(storage.prependThinkingPart(sessionID, messageID)).toBe(true); | ||
| expect(storage.prependThinkingPart(sessionID, messageID)).toBe(true); | ||
|
|
||
| expect(fsMock.writeFileSync).toHaveBeenCalledTimes(2); | ||
| expect(fsMock.renameSync).toHaveBeenCalledTimes(2); | ||
|
|
||
| const [firstTemp, firstPayload] = | ||
| fsMock.writeFileSync.mock.calls[0] ?? []; | ||
| const [secondTemp, secondPayload] = | ||
| fsMock.writeFileSync.mock.calls[1] ?? []; | ||
| const [, firstTarget] = fsMock.renameSync.mock.calls[0] ?? []; | ||
| const [, secondTarget] = fsMock.renameSync.mock.calls[1] ?? []; | ||
|
|
||
| // Temp staging paths must differ (otherwise two writers would race). | ||
| expect(firstTemp).not.toBe(secondTemp); | ||
| // Final target paths MUST differ so the second pass does not | ||
| // overwrite the first synthetic thinking part. | ||
| expect(firstTarget).not.toBe(secondTarget); | ||
|
|
||
| // Both payloads must carry their own unique id matching their final | ||
| // target filename, so readers see two distinct parts on disk. | ||
| const firstParsed = JSON.parse(firstPayload); | ||
| const secondParsed = JSON.parse(secondPayload); | ||
| expect(firstParsed.id).toMatch(/^prt_0000000000_thinking_[0-9a-f]+_[0-9a-z]+_[0-9a-z]+$/); | ||
| expect(secondParsed.id).toMatch(/^prt_0000000000_thinking_[0-9a-f]+_[0-9a-z]+_[0-9a-z]+$/); | ||
| expect(firstParsed.id).not.toBe(secondParsed.id); | ||
| expect(firstTarget).toContain(`${firstParsed.id}.json`); | ||
| expect(secondTarget).toContain(`${secondParsed.id}.json`); | ||
| }); | ||
|
|
||
| it("should generate ids that sort before real generatePartId ids so orphan detection still sees thinking first", () => { | ||
| // findMessagesWithOrphanThinking sorts parts by id and checks the | ||
| // first element. Synthetic thinking ids MUST sort lexicographically | ||
| // before any id from generatePartId(), which starts with | ||
| // `prt_<hex_timestamp>` where the hex timestamp's leading digit is | ||
| // non-zero for any real-world time. | ||
| const thinkingId = storage.generateThinkingPartId(); | ||
| const realId = storage.generatePartId(); | ||
|
|
||
| expect(thinkingId.startsWith("prt_0000000000_thinking_")).toBe(true); | ||
| expect([thinkingId, realId].sort()).toEqual([thinkingId, realId]); | ||
| }); |
There was a problem hiding this comment.
make the id regression deterministic by freezing time and randomness.
test/recovery-storage.test.ts:604-654 should pin Date.now() and Math.random() so the test reproduces the same-millisecond/same-random case that thinkingPartCounter is meant to handle. otherwise, a broken counter could be masked by real time or random output. As per coding guidelines, test/**: tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.
proposed deterministic vitest shape
it("should generate unique ids on repeat calls so retries do not overwrite (RPTU-001)", () => {
+ const nowSpy = vi.spyOn(Date, "now").mockReturnValue(1700000000000);
+ const randomSpy = vi.spyOn(Math, "random").mockReturnValue(0.123456789);
// Simulate two recovery passes on the same messageID: each invocation
// must stage and rename a DISTINCT target file, proving the synthetic
// thinking part from the first pass is preserved.
- const sessionID = "s";
- const messageID = "m";
- const partDir = join(PART_STORAGE, messageID);
- fsMock.existsSync.mockReturnValue(true);
+ try {
+ const sessionID = "s";
+ const messageID = "m";
+ const partDir = join(PART_STORAGE, messageID);
+ fsMock.existsSync.mockReturnValue(true);
- expect(storage.prependThinkingPart(sessionID, messageID)).toBe(true);
- expect(storage.prependThinkingPart(sessionID, messageID)).toBe(true);
+ expect(storage.prependThinkingPart(sessionID, messageID)).toBe(true);
+ expect(storage.prependThinkingPart(sessionID, messageID)).toBe(true);
- expect(fsMock.writeFileSync).toHaveBeenCalledTimes(2);
- expect(fsMock.renameSync).toHaveBeenCalledTimes(2);
+ expect(fsMock.writeFileSync).toHaveBeenCalledTimes(2);
+ expect(fsMock.renameSync).toHaveBeenCalledTimes(2);
- const [firstTemp, firstPayload] =
- fsMock.writeFileSync.mock.calls[0] ?? [];
- const [secondTemp, secondPayload] =
- fsMock.writeFileSync.mock.calls[1] ?? [];
- const [, firstTarget] = fsMock.renameSync.mock.calls[0] ?? [];
- const [, secondTarget] = fsMock.renameSync.mock.calls[1] ?? [];
+ const [firstTemp, firstPayload] =
+ fsMock.writeFileSync.mock.calls[0] ?? [];
+ const [secondTemp, secondPayload] =
+ fsMock.writeFileSync.mock.calls[1] ?? [];
+ const [, firstTarget] = fsMock.renameSync.mock.calls[0] ?? [];
+ const [, secondTarget] = fsMock.renameSync.mock.calls[1] ?? [];
- // Temp staging paths must differ (otherwise two writers would race).
- expect(firstTemp).not.toBe(secondTemp);
- // Final target paths MUST differ so the second pass does not
- // overwrite the first synthetic thinking part.
- expect(firstTarget).not.toBe(secondTarget);
+ expect(firstTemp).not.toBe(secondTemp);
+ expect(firstTarget).not.toBe(secondTarget);
- // Both payloads must carry their own unique id matching their final
- // target filename, so readers see two distinct parts on disk.
- const firstParsed = JSON.parse(firstPayload);
- const secondParsed = JSON.parse(secondPayload);
- expect(firstParsed.id).toMatch(/^prt_0000000000_thinking_[0-9a-f]+_[0-9a-z]+_[0-9a-z]+$/);
- expect(secondParsed.id).toMatch(/^prt_0000000000_thinking_[0-9a-f]+_[0-9a-z]+_[0-9a-z]+$/);
- expect(firstParsed.id).not.toBe(secondParsed.id);
- expect(firstTarget).toContain(`${firstParsed.id}.json`);
- expect(secondTarget).toContain(`${secondParsed.id}.json`);
+ const firstParsed = JSON.parse(firstPayload);
+ const secondParsed = JSON.parse(secondPayload);
+ expect(firstParsed.id).not.toBe(secondParsed.id);
+ expect(firstTarget).toContain(`${firstParsed.id}.json`);
+ expect(secondTarget).toContain(`${secondParsed.id}.json`);
+ } finally {
+ nowSpy.mockRestore();
+ randomSpy.mockRestore();
+ }
});📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| it("should generate unique ids on repeat calls so retries do not overwrite (RPTU-001)", () => { | |
| // Simulate two recovery passes on the same messageID: each invocation | |
| // must stage and rename a DISTINCT target file, proving the synthetic | |
| // thinking part from the first pass is preserved. | |
| const sessionID = "s"; | |
| const messageID = "m"; | |
| const partDir = join(PART_STORAGE, messageID); | |
| fsMock.existsSync.mockReturnValue(true); | |
| expect(storage.prependThinkingPart(sessionID, messageID)).toBe(true); | |
| expect(storage.prependThinkingPart(sessionID, messageID)).toBe(true); | |
| expect(fsMock.writeFileSync).toHaveBeenCalledTimes(2); | |
| expect(fsMock.renameSync).toHaveBeenCalledTimes(2); | |
| const [firstTemp, firstPayload] = | |
| fsMock.writeFileSync.mock.calls[0] ?? []; | |
| const [secondTemp, secondPayload] = | |
| fsMock.writeFileSync.mock.calls[1] ?? []; | |
| const [, firstTarget] = fsMock.renameSync.mock.calls[0] ?? []; | |
| const [, secondTarget] = fsMock.renameSync.mock.calls[1] ?? []; | |
| // Temp staging paths must differ (otherwise two writers would race). | |
| expect(firstTemp).not.toBe(secondTemp); | |
| // Final target paths MUST differ so the second pass does not | |
| // overwrite the first synthetic thinking part. | |
| expect(firstTarget).not.toBe(secondTarget); | |
| // Both payloads must carry their own unique id matching their final | |
| // target filename, so readers see two distinct parts on disk. | |
| const firstParsed = JSON.parse(firstPayload); | |
| const secondParsed = JSON.parse(secondPayload); | |
| expect(firstParsed.id).toMatch(/^prt_0000000000_thinking_[0-9a-f]+_[0-9a-z]+_[0-9a-z]+$/); | |
| expect(secondParsed.id).toMatch(/^prt_0000000000_thinking_[0-9a-f]+_[0-9a-z]+_[0-9a-z]+$/); | |
| expect(firstParsed.id).not.toBe(secondParsed.id); | |
| expect(firstTarget).toContain(`${firstParsed.id}.json`); | |
| expect(secondTarget).toContain(`${secondParsed.id}.json`); | |
| }); | |
| it("should generate ids that sort before real generatePartId ids so orphan detection still sees thinking first", () => { | |
| // findMessagesWithOrphanThinking sorts parts by id and checks the | |
| // first element. Synthetic thinking ids MUST sort lexicographically | |
| // before any id from generatePartId(), which starts with | |
| // `prt_<hex_timestamp>` where the hex timestamp's leading digit is | |
| // non-zero for any real-world time. | |
| const thinkingId = storage.generateThinkingPartId(); | |
| const realId = storage.generatePartId(); | |
| expect(thinkingId.startsWith("prt_0000000000_thinking_")).toBe(true); | |
| expect([thinkingId, realId].sort()).toEqual([thinkingId, realId]); | |
| }); | |
| it("should generate unique ids on repeat calls so retries do not overwrite (RPTU-001)", () => { | |
| const nowSpy = vi.spyOn(Date, "now").mockReturnValue(1700000000000); | |
| const randomSpy = vi.spyOn(Math, "random").mockReturnValue(0.123456789); | |
| // Simulate two recovery passes on the same messageID: each invocation | |
| // must stage and rename a DISTINCT target file, proving the synthetic | |
| // thinking part from the first pass is preserved. | |
| try { | |
| const sessionID = "s"; | |
| const messageID = "m"; | |
| const partDir = join(PART_STORAGE, messageID); | |
| fsMock.existsSync.mockReturnValue(true); | |
| expect(storage.prependThinkingPart(sessionID, messageID)).toBe(true); | |
| expect(storage.prependThinkingPart(sessionID, messageID)).toBe(true); | |
| expect(fsMock.writeFileSync).toHaveBeenCalledTimes(2); | |
| expect(fsMock.renameSync).toHaveBeenCalledTimes(2); | |
| const [firstTemp, firstPayload] = | |
| fsMock.writeFileSync.mock.calls[0] ?? []; | |
| const [secondTemp, secondPayload] = | |
| fsMock.writeFileSync.mock.calls[1] ?? []; | |
| const [, firstTarget] = fsMock.renameSync.mock.calls[0] ?? []; | |
| const [, secondTarget] = fsMock.renameSync.mock.calls[1] ?? []; | |
| expect(firstTemp).not.toBe(secondTemp); | |
| expect(firstTarget).not.toBe(secondTarget); | |
| const firstParsed = JSON.parse(firstPayload); | |
| const secondParsed = JSON.parse(secondPayload); | |
| expect(firstParsed.id).not.toBe(secondParsed.id); | |
| expect(firstTarget).toContain(`${firstParsed.id}.json`); | |
| expect(secondTarget).toContain(`${secondParsed.id}.json`); | |
| } finally { | |
| nowSpy.mockRestore(); | |
| randomSpy.mockRestore(); | |
| } | |
| }); | |
| it("should generate ids that sort before real generatePartId ids so orphan detection still sees thinking first", () => { | |
| // findMessagesWithOrphanThinking sorts parts by id and checks the | |
| // first element. Synthetic thinking ids MUST sort lexicographically | |
| // before any id from generatePartId(), which starts with | |
| // `prt_<hex_timestamp>` where the hex timestamp's leading digit is | |
| // non-zero for any real-world time. | |
| const thinkingId = storage.generateThinkingPartId(); | |
| const realId = storage.generatePartId(); | |
| expect(thinkingId.startsWith("prt_0000000000_thinking_")).toBe(true); | |
| expect([thinkingId, realId].sort()).toEqual([thinkingId, realId]); | |
| }); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/recovery-storage.test.ts` around lines 604 - 654, Pin Date.now() and
Math.random() in the two failing tests so the same-millisecond/same-random-path
is reproduced: inside the "should generate unique ids on repeat calls..." and
the "should generate ids that sort before real generatePartId ids..." tests, add
vitest spies like vi.spyOn(Date, "now").mockReturnValue(<fixedEpochMs>) and
vi.spyOn(Math, "random").mockReturnValue(<fixedRandom>) before calling
storage.prependThinkingPart / storage.generateThinkingPartId /
storage.generatePartId, and restore them after (mockRestore or in an afterEach)
so thinkingPartCounter behavior is deterministic and the assertions on IDs
always reproduce the regression.
| it("treats q hotkey as back/cancel", async () => { | ||
| const { select } = await import("../lib/ui/select.js"); | ||
| const selectPromise = select( | ||
| [ | ||
| { label: "A", value: "a" }, | ||
| { label: "B", value: "b" }, | ||
| ], | ||
| { | ||
| message: "Pick", | ||
| }, | ||
| ); | ||
|
|
||
| await vi.advanceTimersByTimeAsync(130); | ||
| stdin.emit("data", Buffer.from("q", "utf8")); | ||
|
|
||
| const result = await selectPromise; | ||
| expect(result).toBeNull(); | ||
| }); | ||
|
|
||
| it("confirm returns false when q hotkey cancels", async () => { | ||
| const { confirm } = await import("../lib/ui/confirm.js"); | ||
| const confirmPromise = confirm("Are you sure?"); | ||
|
|
||
| await vi.advanceTimersByTimeAsync(130); | ||
| stdin.emit("data", Buffer.from("q", "utf8")); | ||
|
|
||
| await expect(confirmPromise).resolves.toBe(false); | ||
| }); |
There was a problem hiding this comment.
add the onInput precedence regression for q.
test/select.test.ts:102 and test/select.test.ts:121 cover prompts with no custom onInput, but they do not reproduce the lib/ui/auth-menu.ts:548 caller that returns a typed cancel object for q. add a vitest case where onInput("q") returns a sentinel value and assert select() resolves to that value, not null.
proposed regression test
it("treats q hotkey as back/cancel", async () => {
const { select } = await import("../lib/ui/select.js");
const selectPromise = select(
[
@@
const result = await selectPromise;
expect(result).toBeNull();
});
+
+ it("lets onInput override the q hotkey", async () => {
+ const { select } = await import("../lib/ui/select.js");
+ const selectPromise = select(
+ [
+ { label: "A", value: "a" },
+ { label: "B", value: "b" },
+ ],
+ {
+ message: "Pick",
+ onInput: (input) => {
+ if (input === "q") return "custom-cancel";
+ return undefined;
+ },
+ },
+ );
+
+ await vi.advanceTimersByTimeAsync(130);
+ stdin.emit("data", Buffer.from("q", "utf8"));
+
+ await expect(selectPromise).resolves.toBe("custom-cancel");
+ });
it("confirm returns false when q hotkey cancels", async () => {
const { confirm } = await import("../lib/ui/confirm.js");
const confirmPromise = confirm("Are you sure?");As per coding guidelines, test/**: “tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior.”
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| it("treats q hotkey as back/cancel", async () => { | |
| const { select } = await import("../lib/ui/select.js"); | |
| const selectPromise = select( | |
| [ | |
| { label: "A", value: "a" }, | |
| { label: "B", value: "b" }, | |
| ], | |
| { | |
| message: "Pick", | |
| }, | |
| ); | |
| await vi.advanceTimersByTimeAsync(130); | |
| stdin.emit("data", Buffer.from("q", "utf8")); | |
| const result = await selectPromise; | |
| expect(result).toBeNull(); | |
| }); | |
| it("confirm returns false when q hotkey cancels", async () => { | |
| const { confirm } = await import("../lib/ui/confirm.js"); | |
| const confirmPromise = confirm("Are you sure?"); | |
| await vi.advanceTimersByTimeAsync(130); | |
| stdin.emit("data", Buffer.from("q", "utf8")); | |
| await expect(confirmPromise).resolves.toBe(false); | |
| }); | |
| it("treats q hotkey as back/cancel", async () => { | |
| const { select } = await import("../lib/ui/select.js"); | |
| const selectPromise = select( | |
| [ | |
| { label: "A", value: "a" }, | |
| { label: "B", value: "b" }, | |
| ], | |
| { | |
| message: "Pick", | |
| }, | |
| ); | |
| await vi.advanceTimersByTimeAsync(130); | |
| stdin.emit("data", Buffer.from("q", "utf8")); | |
| const result = await selectPromise; | |
| expect(result).toBeNull(); | |
| }); | |
| it("lets onInput override the q hotkey", async () => { | |
| const { select } = await import("../lib/ui/select.js"); | |
| const selectPromise = select( | |
| [ | |
| { label: "A", value: "a" }, | |
| { label: "B", value: "b" }, | |
| ], | |
| { | |
| message: "Pick", | |
| onInput: (input) => { | |
| if (input === "q") return "custom-cancel"; | |
| return undefined; | |
| }, | |
| }, | |
| ); | |
| await vi.advanceTimersByTimeAsync(130); | |
| stdin.emit("data", Buffer.from("q", "utf8")); | |
| await expect(selectPromise).resolves.toBe("custom-cancel"); | |
| }); | |
| it("confirm returns false when q hotkey cancels", async () => { | |
| const { confirm } = await import("../lib/ui/confirm.js"); | |
| const confirmPromise = confirm("Are you sure?"); | |
| await vi.advanceTimersByTimeAsync(130); | |
| stdin.emit("data", Buffer.from("q", "utf8")); | |
| await expect(confirmPromise).resolves.toBe(false); | |
| }); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/select.test.ts` around lines 102 - 129, Add a deterministic vitest
regression that verifies onInput precedence for the "q" hotkey: in
test/select.test.ts create a test that calls select(...) with an options.onInput
handler which detects input "q" and returns a sentinel cancel object (e.g., a
typed { cancelled: true } value), emit "q" on stdin after the same timer delay,
then assert that the resolved value from select() is that sentinel (not null).
Reference the existing select() import and the onInput option to replicate the
lib/ui/auth-menu.ts caller behavior so the test fails if onInput is ignored.
| const existsSyncMock = vi.fn(); | ||
| const statMock = vi.fn(); | ||
| const readFileMock = vi.fn(); | ||
| const closeMock = vi.fn(); | ||
| const openMock = vi.fn(); | ||
|
|
||
| vi.mock("node:fs", () => ({ | ||
| existsSync: existsSyncMock, | ||
| promises: { | ||
| stat: statMock, | ||
| readFile: readFileMock, | ||
| open: openMock, | ||
| mkdir: vi.fn(), | ||
| writeFile: vi.fn(), | ||
| rename: vi.fn(), | ||
| unlink: vi.fn(), | ||
| }, | ||
| })); |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
vitest vi.mock hoisted factory vi.hoisted variables before initialization
💡 Result:
In Vitest, vi.mock calls are hoisted to the top of the test file and executed before any imports or variable initializations. This causes a "Cannot access 'variable' before initialization" error if the mock factory references variables defined outside vi.hoisted. To fix this, define mock functions or variables inside vi.hoisted before the vi.mock call. vi.hoisted(factory) executes the factory early (before imports and mocks) and returns its values for later use. Example from official docs and Stack Overflow: const { setExMock } = vi.hoisted( => { return { setExMock: vi.fn } }) vi.mock('redis', => { return { createClient: => ({ connect: vi.fn, get: vi.fn, setEx: setExMock, // Now accessible }), } }) This pattern allows referencing hoisted mocks in the factory and reusing them in tests (e.g., expect(setExMock).toHaveBeenCalledWith(...)). Use vi.doMock instead if you need non-hoisted behavior, but it requires calling before imports. vi.hoisted is designed for side effects like mocks that must run before module evaluation.
Citations:
- 1: https://stackoverflow.com/questions/76836909/referenceerror-cannot-access-mock-before-initialization-when-using-vitest
- 2: https://vitest.dev/api/vi.html
- 3: https://vitest.dev/api/vi
- 4: https://vitest.dev/guide/mocking
- 5: How to use different mocks for each test with vi.mock? vitest-dev/vitest#3589
🏁 Script executed:
# First, check if the file exists and read the relevant lines
head -30 test/storage-import-export.test.tsRepository: ndycode/codex-multi-auth
Length of output: 759
🏁 Script executed:
# Read the full test file to check for regression tests and test coverage
wc -l test/storage-import-export.test.ts && echo "---" && cat -n test/storage-import-export.test.tsRepository: ndycode/codex-multi-auth
Length of output: 3683
move the mock functions into vi.hoisted.
test/storage-import-export.test.ts:9 uses a hoisted vi.mock factory that references const mocks from test/storage-import-export.test.ts:3-7. vitest hoists the factory before those initializers, so this suite will fail during module setup with a "cannot access before initialization" error instead of running the assertions.
the fix is to wrap mock creation in vi.hoisted() before the vi.mock() call:
vitest-safe mock setup
-const existsSyncMock = vi.fn();
-const statMock = vi.fn();
-const readFileMock = vi.fn();
-const closeMock = vi.fn();
-const openMock = vi.fn();
+const {
+ existsSyncMock,
+ statMock,
+ readFileMock,
+ closeMock,
+ openMock,
+} = vi.hoisted(() => ({
+ existsSyncMock: vi.fn(),
+ statMock: vi.fn(),
+ readFileMock: vi.fn(),
+ closeMock: vi.fn(),
+ openMock: vi.fn(),
+}));additionally, this test suite lacks regression cases for windows filesystem behavior (path separators, case sensitivity) and concurrency races (simultaneous reads/writes). add tests that exercise these edge cases per the coding guidelines.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const existsSyncMock = vi.fn(); | |
| const statMock = vi.fn(); | |
| const readFileMock = vi.fn(); | |
| const closeMock = vi.fn(); | |
| const openMock = vi.fn(); | |
| vi.mock("node:fs", () => ({ | |
| existsSync: existsSyncMock, | |
| promises: { | |
| stat: statMock, | |
| readFile: readFileMock, | |
| open: openMock, | |
| mkdir: vi.fn(), | |
| writeFile: vi.fn(), | |
| rename: vi.fn(), | |
| unlink: vi.fn(), | |
| }, | |
| })); | |
| const { | |
| existsSyncMock, | |
| statMock, | |
| readFileMock, | |
| closeMock, | |
| openMock, | |
| } = vi.hoisted(() => ({ | |
| existsSyncMock: vi.fn(), | |
| statMock: vi.fn(), | |
| readFileMock: vi.fn(), | |
| closeMock: vi.fn(), | |
| openMock: vi.fn(), | |
| })); | |
| vi.mock("node:fs", () => ({ | |
| existsSync: existsSyncMock, | |
| promises: { | |
| stat: statMock, | |
| readFile: readFileMock, | |
| open: openMock, | |
| mkdir: vi.fn(), | |
| writeFile: vi.fn(), | |
| rename: vi.fn(), | |
| unlink: vi.fn(), | |
| }, | |
| })); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/storage-import-export.test.ts` around lines 3 - 20, The vi.mock factory
is hoisted before the local const mocks (existsSyncMock, statMock, readFileMock,
closeMock, openMock) are initialized, causing a "cannot access before
initialization" error; move creation of those mocks into vi.hoisted() so they
are initialized before the hoisted vi.mock factory, e.g., wrap each mock
initializer with vi.hoisted(() => vi.fn()) and then call vi.mock(...) as shown,
and also add regression tests exercising Windows filesystem behavior (path
separators and case sensitivity) and concurrency race conditions (simultaneous
reads/writes) to the storage import/export test suite.
| it("retries sync backup snapshot copy on retryable fs errors without Atomics.wait", async () => { | ||
| const { getUnifiedSettingsPath } = await import("../lib/unified-settings.js"); | ||
| const settingsPath = getUnifiedSettingsPath(); | ||
| await fs.mkdir(dirname(settingsPath), { recursive: true }); | ||
| await fs.writeFile( | ||
| settingsPath, | ||
| JSON.stringify({ version: 1, pluginConfig: { codexMode: true } }, null, 2), | ||
| "utf8", | ||
| ); | ||
|
|
||
| const atomicsWaitSpy = vi.spyOn(Atomics, "wait"); | ||
| vi.resetModules(); | ||
| vi.doMock("node:fs", async () => { | ||
| const actual = await vi.importActual<typeof import("node:fs")>("node:fs"); | ||
| let first = true; | ||
| return { | ||
| ...actual, | ||
| copyFileSync: (...args: Parameters<typeof actual.copyFileSync>) => { | ||
| if (first) { | ||
| first = false; | ||
| const error = new Error("perm") as NodeJS.ErrnoException; | ||
| error.code = "EPERM"; | ||
| throw error; | ||
| } | ||
| return actual.copyFileSync(...args); | ||
| }, | ||
| }; | ||
| }); | ||
| try { | ||
| const { saveUnifiedPluginConfigSync } = await import("../lib/unified-settings.js"); | ||
| saveUnifiedPluginConfigSync({ codexMode: true, retries: 1 }); | ||
| expect(atomicsWaitSpy).not.toHaveBeenCalled(); | ||
| } finally { | ||
| vi.doUnmock("node:fs"); | ||
| vi.resetModules(); | ||
| atomicsWaitSpy.mockRestore(); | ||
| } | ||
| }); |
There was a problem hiding this comment.
assert the backup copy was actually retried.
test/unified-settings.test.ts:533 can pass even if copyFileSync is never retried, because lib/unified-settings.ts:186 treats snapshot failure as best-effort and the save still succeeds. count attempts and verify the backup was created. As per coding guidelines, test/**: tests must stay deterministic and use vitest.
proposed test tightening
it("retries sync backup snapshot copy on retryable fs errors without Atomics.wait", async () => {
const { getUnifiedSettingsPath } = await import("../lib/unified-settings.js");
const settingsPath = getUnifiedSettingsPath();
@@
const atomicsWaitSpy = vi.spyOn(Atomics, "wait");
+ let copyAttempts = 0;
vi.resetModules();
vi.doMock("node:fs", async () => {
const actual = await vi.importActual<typeof import("node:fs")>("node:fs");
- let first = true;
return {
...actual,
copyFileSync: (...args: Parameters<typeof actual.copyFileSync>) => {
- if (first) {
- first = false;
+ copyAttempts += 1;
+ if (copyAttempts === 1) {
const error = new Error("perm") as NodeJS.ErrnoException;
error.code = "EPERM";
throw error;
}
@@
try {
const { saveUnifiedPluginConfigSync } = await import("../lib/unified-settings.js");
saveUnifiedPluginConfigSync({ codexMode: true, retries: 1 });
expect(atomicsWaitSpy).not.toHaveBeenCalled();
+ expect(copyAttempts).toBe(2);
+ const backupRecord = JSON.parse(
+ await fs.readFile(`${settingsPath}.bak`, "utf8"),
+ ) as { pluginConfig?: Record<string, unknown> };
+ expect(backupRecord.pluginConfig).toEqual({ codexMode: true });
} finally {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/unified-settings.test.ts` around lines 533 - 570, The test should assert
the backup copy was actually retried by counting copyFileSync attempts and
verifying the backup file exists after saveUnifiedPluginConfigSync; update the
mock in the test to increment a counter (or set a flag) inside the doMock for
node:fs's copyFileSync and after calling saveUnifiedPluginConfigSync({
codexMode: true, retries: 1 }) assert that the mock was called >1 time and that
a backup snapshot file (derived from getUnifiedSettingsPath()/settingsPath) was
created; keep the existing Atomics.wait spy and cleanup (vi.doUnmock,
vi.resetModules, atomicsWaitSpy.mockRestore) intact.
| it("retries plugin section write against newer on-disk record to avoid stale overwrite", async () => { | ||
| const { | ||
| getUnifiedSettingsPath, | ||
| saveUnifiedPluginConfig, | ||
| } = await import("../lib/unified-settings.js"); | ||
|
|
||
| await fs.writeFile( | ||
| getUnifiedSettingsPath(), | ||
| JSON.stringify( | ||
| { | ||
| version: 1, | ||
| pluginConfig: { codexMode: false, retries: 9 }, | ||
| dashboardDisplaySettings: { | ||
| menuShowLastUsed: true, | ||
| uiThemePreset: "green", | ||
| }, | ||
| experimentalDraft: { | ||
| enabled: false, | ||
| panels: ["future"], | ||
| }, | ||
| }, | ||
| null, | ||
| 2, | ||
| ), | ||
| "utf8", | ||
| ); | ||
|
|
||
| const originalWriteFile = fs.writeFile; | ||
| const writeSpy = vi.spyOn(fs, "writeFile"); | ||
| let injectedConcurrentWrite = false; | ||
| writeSpy.mockImplementation(async (...args: Parameters<typeof fs.writeFile>) => { | ||
| const [filePath, data] = args; | ||
| const result = await originalWriteFile(...args); | ||
| if (!injectedConcurrentWrite && String(filePath).includes(".tmp")) { | ||
| injectedConcurrentWrite = true; | ||
| await originalWriteFile( | ||
| getUnifiedSettingsPath(), | ||
| `${JSON.stringify( | ||
| { | ||
| version: 1, | ||
| pluginConfig: { codexMode: false, retries: 9 }, | ||
| dashboardDisplaySettings: { | ||
| menuShowLastUsed: false, | ||
| uiThemePreset: "purple", | ||
| }, | ||
| experimentalDraft: { | ||
| enabled: true, | ||
| panels: ["fresh"], | ||
| }, | ||
| }, | ||
| null, | ||
| 2, | ||
| )}\n`, | ||
| "utf8", | ||
| ); | ||
| } | ||
| return result; | ||
| }); | ||
|
|
||
| try { | ||
| await saveUnifiedPluginConfig({ codexMode: true, fetchTimeoutMs: 45_000 }); | ||
| } finally { | ||
| writeSpy.mockRestore(); | ||
| } | ||
|
|
||
| expect(injectedConcurrentWrite).toBe(true); | ||
| const parsed = JSON.parse( | ||
| await fs.readFile(getUnifiedSettingsPath(), "utf8"), | ||
| ) as { | ||
| pluginConfig?: Record<string, unknown>; | ||
| dashboardDisplaySettings?: Record<string, unknown>; | ||
| experimentalDraft?: Record<string, unknown>; | ||
| }; | ||
| expect(parsed.pluginConfig).toEqual({ | ||
| codexMode: true, | ||
| fetchTimeoutMs: 45_000, | ||
| }); | ||
| expect(parsed.dashboardDisplaySettings).toEqual({ | ||
| menuShowLastUsed: false, | ||
| uiThemePreset: "purple", | ||
| }); | ||
| expect(parsed.experimentalDraft).toEqual({ | ||
| enabled: true, | ||
| panels: ["fresh"], | ||
| }); | ||
| }); |
There was a problem hiding this comment.
force a distinct mtime in the stale-write regression.
test/unified-settings.test.ts:843 injects the concurrent write immediately after the temp write. on windows or coarse-mtime filesystems, the primary mtime can match the earlier value, so lib/unified-settings.ts:408 may not raise ESTALE deterministically. force the injected record’s mtime forward before returning from the mock. As per coding guidelines, test/**: tests must stay deterministic and use vitest.
proposed deterministic mtime update
await originalWriteFile(
getUnifiedSettingsPath(),
`${JSON.stringify(
@@
)}\n`,
"utf8",
);
+ const forcedMtime = new Date(Date.now() + 5_000);
+ await fs.utimes(getUnifiedSettingsPath(), forcedMtime, forcedMtime);
}
return result;
});🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/unified-settings.test.ts` around lines 813 - 898, The test's injected
concurrent write can keep the same mtime as the temp write on coarse-mtime
platforms, making ESTALE nondeterministic; inside the
writeSpy.mockImplementation that performs the injected write (the block that
calls originalWriteFile when String(filePath).includes(".tmp")), after awaiting
originalWriteFile for the injected content call fs.utimes (or
fs.promises.utimes) on getUnifiedSettingsPath() to advance its mtime (e.g.
Date.now()+1000) before returning, so saveUnifiedPluginConfig's mtime check
reliably sees a newer on-disk record.
Summary
Rolls up the currently open fix PRs (#414–#429, #431) into a single integration branch so the combined fix set can be reviewed as one unit.
This branch merges, in order:
Integration note
While building the rollup branch, one hidden integration issue surfaced:
lib/accounts.tsto callHealthScoreTracker.clearNumericKeysAtOrAbove()andTokenBucketTracker.clearNumericKeysAtOrAbove(), but those helper methods never actually landed inlib/rotation.tson the source branch.7f50455) adding those two methods so the combined branch typechecks and is self-contained.The source PR #419 should receive the same patch before being merged independently.
Verification
Passes on the rollup branch
npm run typechecknpm run lintnpm run buildnpm run audit:cinpm run pack:checknpm run vendor:verifynpm run clean:repo:checkTest status
The combined branch does not currently have a clean
npm test, but the failures are already present on current main in this environment and are not introduced by the rollup merges:test/codex-bin-wrapper.test.tstest/benchmark-runtime-path-script.test.tsThey appear to be baseline failures under the current local Node/runtime combination (syntax/module-loading assumptions in temporary fixture scripts), not regressions from this fix set.
Why this PR exists
You asked for a single isolated PR from
mainso you can review the entire fix set in one place rather than merge 17 independent PRs one-by-one.This PR is that integration view.
note: greptile review for oc-chatgpt-multi-auth. cite files like
lib/foo.ts:123. confirm regression tests + windows concurrency/token redaction coverage.Greptile Summary
rollup of 17 fix PRs covering oauth token-leak redaction, cursor/tracker correctness after account removal, double-exponential backoff fix (REQ-HIGH-01), SSE buffer O(n²)→O(n) (REQ-HIGH-03), optimistic-lock settings writes (ESTALE), import size guard, shutdown dedup, select Q-cancel, and thinking-part id uniqueness (RPTU-001).
Atomics.waitwas removed from the sync retry loops inwriteSettingsRecordSyncandtrySnapshotUnifiedSettingsBackupSync; both now busy-spin through all 4 retries in microseconds, so EBUSY/EPERM locks on windows will not recover. the async path still hasawait sleep(10 * 2 ** attempt)— the sync paths need the same treatment restored.Confidence Score: 4/5
safe to merge after restoring sync backoff delay in unified-settings.ts — one P1 windows regression blocks clean approval
the rollup is high quality with strong test coverage additions and well-documented fixes throughout. the single P1 is the missing Atomics.wait in the sync retry loops — a targeted one-liner restore per loop. everything else is P2 or lower. score would be 5 after that patch.
lib/unified-settings.ts — sync retry loops in writeSettingsRecordSync and trySnapshotUnifiedSettingsBackupSync need Atomics.wait restored
Important Files Changed
Sequence Diagram
sequenceDiagram participant Caller participant CodexManager participant AuthServer as auth/server.ts participant AuthFlow as auth/auth.ts participant UnifiedSettings as unified-settings.ts participant Rotation as rotation.ts participant Accounts as accounts.ts Caller->>CodexManager: codex auth login CodexManager->>AuthServer: startLocalOAuthServer({state}) AuthServer-->>CodexManager: {waitForCode(expectedState)} Note over AuthServer: HTTP handler validates state before storing _lastCode CodexManager->>AuthFlow: exchangeAuthorizationCode(code) AuthFlow->>AuthFlow: sanitizeOAuthResponseBodyForLog(body) Note over AuthFlow: redacts refresh_token/access_token before logError Caller->>Accounts: removeAccount(idx) Accounts->>Rotation: HealthScoreTracker.clearNumericKeysAtOrAbove(idx) Accounts->>Rotation: TokenBucketTracker.clearNumericKeysAtOrAbove(idx) Accounts->>Accounts: reindex + invalidate _runtimeTrackerKey Caller->>UnifiedSettings: saveUnifiedPluginConfig(config) UnifiedSettings->>UnifiedSettings: getUnifiedSettingsMtimeMs() UnifiedSettings->>UnifiedSettings: readSettingsRecordAsyncInternal() loop up to 3 ESTALE retries UnifiedSettings->>UnifiedSettings: writeSettingsRecordAsync(record, {expectedMtimeMs}) alt mtime changed UnifiedSettings->>UnifiedSettings: throw ESTALE, re-read + retry end endComments Outside Diff (2)
lib/unified-settings.ts, line 186-194 (link)Atomics.waitremovaltrySnapshotUnifiedSettingsBackupSyncandwriteSettingsRecordSyncboth loop up to 5 times on retryable fs errors (EBUSY/EPERM/EAGAIN). the removedAtomics.waitcall provided the inter-attempt backoff (10ms, 20ms, 40ms, 80ms). without it, all 5 retries fire in sub-millisecond succession, giving windows antivirus no time to release a file lock — exactly the scenarioAGENTS.mdflags as a windows filesystem safety concern.the async rename path still calls
await sleep(10 * 2 ** attempt), so only the sync paths regressed. if the removal was to avoidAtomics.waitthrowing in non-worker contexts, a simplefor…awaitor asleepSyncviaAtomics.waitAsyncpolyfill would preserve the delay without the main-thread restriction.Prompt To Fix With AI
lib/unified-settings.ts, line 184-193 (link)Atomics.waitwas removed from bothtrySnapshotUnifiedSettingsBackupSyncandwriteSettingsRecordSync. these were the only mechanism for a synchronous sleep between retry attempts. without them, the loops spin through all 4 retries in microseconds — on windows, EBUSY/EPERM locks held by antivirus or another process won't release in that time, so every retry fails immediately. the async variant correctly retainsawait sleep(10 * 2 ** attempt)(line 216); the sync paths need the equivalent.same patch needed in
writeSettingsRecordSync's innercatchblock.Prompt To Fix With AI
Prompt To Fix All With AI
Reviews (2): Last reviewed commit: "test(rollup): restore auth-list empty-st..." | Re-trigger Greptile