Skip to content

feat: complete enterprise hardening and storage lock reliability#34

Closed
ndycode wants to merge 8 commits intofeat/enterprise-hardeningfrom
fix/pr32-feedback
Closed

feat: complete enterprise hardening and storage lock reliability#34
ndycode wants to merge 8 commits intofeat/enterprise-hardeningfrom
fix/pr32-feedback

Conversation

@ndycode
Copy link
Owner

@ndycode ndycode commented Mar 3, 2026

Summary

  • add enterprise hardening baseline (audit, ABAC, idempotency, retention, secret scan hooks, CI gates)
  • harden secret key derivation and rotation command handling
  • add file-based lock stale cleanup to prevent storage deadlocks
  • add runbooks and operational docs

Validation

  • npm test
  • npm run lint
  • npm run typecheck
  • npm run build

Thread Resolution Update (2026-03-05)

Addressed all open review threads in this stacked PR with focused reliability/security fixes:

  • Sanitized audit resource fallback paths in index.ts to strip query/fragment on URL parse failure.
  • Hardened background job retry flow (lib/background-jobs.ts): full token redaction charset and jittered backoff.
  • Reworked retention traversal (lib/data-retention.ts) to avoid recursive retry amplification and added ENOTEMPTY retryability.
  • Made unified settings lock-release failures (EPERM/EBUSY) non-fatal post-write in lib/unified-settings.ts.
  • Hardened file lock stale eviction and cleanup precedence (lib/file-lock.ts): PID liveness check and original error preservation.
  • Expanded CLI redaction patterns in lib/codex-manager.ts.
  • Normalized SPDX plus-forms in scripts/license-policy-check.js and added regression coverage.

Added/updated regressions:

  • test/background-jobs.test.ts
  • test/codex-manager-cli.test.ts
  • test/data-retention.test.ts
  • test/file-lock.test.ts
  • test/index-retry.test.ts
  • test/unified-settings.test.ts
  • test/license-policy-check.test.ts (new)

Validation:

  • npm run typecheck
  • npm run lint
  • npx vitest run test/background-jobs.test.ts test/codex-manager-cli.test.ts test/data-retention.test.ts test/file-lock.test.ts test/index-retry.test.ts test/unified-settings.test.ts test/license-policy-check.test.ts

Thread Resolution Update (2026-03-05, final)

What changed

  • Deduplicated plugin audit resource sanitization by extracting oSafeAuditResource(url) and reusing both request-failure audit call sites in index.ts.
  • Hardened retention error visibility in lib/data-retention.ts: only ENOENT is treated as expected race; other retry-exhausted entry errors are logged with path/code context and rethrown.
  • Added/updated regressions in est/data-retention.test.ts for persistent retry exhaustion (EBUSY) and retained transient retry coverage.
  • Added shared cleanup helper est/helpers/remove-with-retry.ts and switched duplicated callers ( est/unified-settings.test.ts, est/file-lock.test.ts, plus touched tests in this PR).
  • Expanded est/license-policy-check.test.ts with deterministic Windows retry-branch coverage and non-string license-shape denylist regressions (license.type, licenses[]).

How to test

px vitest run test/data-retention.test.ts test/license-policy-check.test.ts test/unified-settings.test.ts test/file-lock.test.ts

pm run lint

pm run typecheck

Risk / rollout notes

  • Behavior change is limited to retention error surfacing: persistent retry-exhausted prune errors now fail retention pass (with warning context) instead of being silently swallowed.
  • Runtime request behavior is unchanged except internal dedupe of duplicated audit URL sanitization logic.

note: greptile review for oc-chatgpt-multi-auth. cite files like lib/foo.ts:123. confirm regression tests + windows concurrency/token redaction coverage.

Greptile Summary

this pr lands a focused reliability and token-safety hardening pass across the storage, lock, retention, and cli layers. the core changes are: deterministic lock ordering in lib/storage.ts (file-queue mutex → file lock → storage mutex), per-leaf-op retry in lib/data-retention.ts to avoid subtree restarts, pid-liveness gating before stale lock eviction in lib/file-lock.ts, read-modify-write atomicity for lib/unified-settings.ts, toSafeAuditResource stripping query/fragment from audit log urls in index.ts, and expanded redaction coverage in both lib/background-jobs.ts and lib/codex-manager.ts. ci is also hardened with a dependency gate on the test job before coverage runs.

key changes

  • lib/storage.ts: introduces withStorageSerializedFileLock with explicit documented acquisition order; adds STORAGE_DECRYPT_ERROR_CODE fast-fail path; rotateStoredSecretEncryption is now transactional
  • lib/file-lock.ts: pid liveness check before stale eviction; atomic released = true placement; cleanup on partial write failures
  • lib/unified-settings.ts: lock scope now covers full read-modify-write cycle (not just the write); EPERM/EBUSY release failures are non-fatal post-write
  • lib/data-retention.ts: retry is scoped to individual leaf i/o ops; misleading "Skipping..." log message logged immediately before a throw that actually aborts the prune cycle — needs correction
  • lib/background-jobs.ts: jittered backoff, 429 retryability, accurate attempt counter; redaction regex has an optional separator that can over-redact plain status phrases like "token expired"
  • test/background-jobs.test.ts: inlines a removeWithRetry copy with fewer retryable codes than the shared helper added in the same pr — should import from test/helpers/remove-with-retry.ts instead

Confidence Score: 3/5

  • mergeable after fixing the misleading log message in data-retention and the duplicate test helper; no critical security regressions
  • the lock-ordering, pid-liveness, audit-url sanitisation, and unified-settings atomicity changes are all well-reasoned and tested. the main concern blocking a 4 is the misleading "Skipping" log in lib/data-retention.ts that fires immediately before a throw — operators will read that log and expect a skip, not a crash of the full prune cycle, which is a real ops incident risk on windows with av-locked files. the inlined removeWithRetry in test/background-jobs.test.ts with an incomplete retryable-codes set is a minor but tangible risk on windows ci runners. the background-jobs redaction regex is over-aggressive but not a security risk. no new token leakage vectors introduced.
  • lib/data-retention.ts (misleading log + abort behaviour) and test/background-jobs.test.ts (incomplete inline helper)

Important Files Changed

Filename Overview
lib/data-retention.ts adds per-leaf-op retry with withRetentionIoRetry to handle transient windows lock errors; however throw error after logging "Skipping" contradicts the stated best-effort skip intent and will abort the entire prune cycle on persistent lock failures
lib/file-lock.ts hardened lock acquisition: write/close errors now clean up incomplete lock files, released flag moved post-unlink to prevent premature reuse, and stale eviction now checks pid liveness before unlinking — solid concurrency improvements for windows
lib/storage.ts introduces withStorageSerializedFileLock with explicit acquisition order (queue mutex → file lock → storage mutex) to prevent deadlocks; adds STORAGE_DECRYPT_ERROR_CODE for fast-fail on decryption errors; rotateStoredSecretEncryption now correctly uses a transaction
lib/unified-settings.ts lock is lifted out of writeSettingsRecordSync/Async and moved to callers, so read-modify-write is now atomic under the file lock; EPERM/EBUSY release failures are non-fatal post-write; SETTINGS_LOCK_OPTIONS constant avoids copy-paste
lib/background-jobs.ts adds jittered backoff, 429 retryability, accurate attemptsMade counter, and DLQ error sanitisation; redaction regex has an optional separator that can over-redact plain status phrases, reducing DLQ diagnostic value
lib/codex-manager.ts hardens idempotency-key parsing with looksLikeOptionToken, base64 cursor validation, page-size digit-only guard, and sanitizeCliErrorMessage for rotate-secrets audit; changes look correct and well-tested
index.ts extracts toSafeAuditResource helper and applies it to all three audit-log call sites; fallback strips query/fragment on URL parse failure; error audit now records errorType (class name) instead of the full message to avoid token leakage
scripts/license-policy-check.js adds SPDX plus-form normalisation (normalizeSpdxToken) and structured extractLicenseTokens splitting; switches from substring includes to set membership check, eliminating false positives from substring matches like MIT matching MIT-CMU
test/background-jobs.test.ts good new coverage for 429 retry, jitter, non-retryable attempt counting, and DLQ sanitisation; inlines its own removeWithRetry with an incomplete retryable-codes set instead of importing the shared helper added in the same PR
test/helpers/remove-with-retry.ts new shared helper with full windows-friendly retryable codes; used consistently by data-retention, unified-settings, and codex-manager test suites — but not imported by background-jobs tests
.github/workflows/ci.yml coverage-gate now depends on the test job via needs: test, uploads/downloads a dist artifact, and uses the correct test:coverage script; eliminates the previous race where coverage could run before a successful build
test/data-retention.test.ts comprehensive new retention tests covering transient EBUSY/EPERM/EAGAIN retry, persistent-lock abort, ENOTEMPTY race, and single-file unlink retry; test expectations align with the throw-on-exhaustion behaviour in the implementation

Sequence Diagram

sequenceDiagram
    participant C as Caller
    participant M as accountFileMutex
    participant F as acquireFileLock
    participant S as storageMutex
    participant D as Disk

    C->>M: withAccountFileMutex()
    activate M
    M->>F: withAccountFileLock(path)
    activate F
    F->>D: open(.lock, wx)
    D-->>F: fd
    F->>D: writeFile(pid+timestamp)
    F->>D: close(fd)
    F->>S: withStorageLock(fn)
    activate S
    S->>D: loadAccountsInternal()
    D-->>S: accounts
    S->>D: saveAccountsUnlocked()
    D-->>S: ok
    S-->>F: result
    deactivate S
    F->>D: unlink(.lock)
    F-->>M: result
    deactivate F
    M-->>C: result
    deactivate M
Loading

Comments Outside Diff (2)

  1. General comment

    orphaned lock file on write failure — windows filesystem deadlock risk

    when handle.writeFile throws (common on windows due to av-scanner holding the fd briefly — EPERM), the file was already created by fs.open("wx"). the error path closes the handle and throws, but never unlinks the file. subsequent acquireFileLock calls hit EEXIST and fall into the stale-detection path, which tries to JSON.parse the lock file. an empty or partially-written file causes the parse to throw, and the stale cleanup skips the unlink — leaving the lock stuck for the full staleAfterMs (120 s).

    same problem exists in acquireFileLockSync at the equivalent block.

    no vitest coverage for this branch — add a test that injects EPERM on handle.writeFile and asserts the lock file is removed before the next attempt.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: lib/file-lock.ts
    Line: 311-332
    
    Comment:
    **orphaned lock file on write failure — windows filesystem deadlock risk**
    
    when `handle.writeFile` throws (common on windows due to av-scanner holding the fd briefly — `EPERM`), the file was already created by `fs.open("wx")`. the error path closes the handle and throws, but never unlinks the file. subsequent `acquireFileLock` calls hit `EEXIST` and fall into the stale-detection path, which tries to `JSON.parse` the lock file. an empty or partially-written file causes the parse to throw, and the stale cleanup skips the unlink — leaving the lock stuck for the full `staleAfterMs` (120 s).
    
    same problem exists in `acquireFileLockSync` at the equivalent block.
    
    
    
    no vitest coverage for this branch — add a test that injects `EPERM` on `handle.writeFile` and asserts the lock file is removed before the next attempt.
    
    How can I resolve this? If you propose a fix, please make it concise.

    Fix in Codex

  2. General comment

    write happens outside the file lock in the refactored writeSettingsRecordSync

    the old code held the file lock for the entire temp-write → rename sequence. the refactored writeSettingsRecordSync writes the temp file and renames before any lock is acquired — the lock is now only acquired in the callers (saveUnifiedPluginConfigSync, etc.). since writeSettingsRecordSync is called inside the locked block in the callers, the write is actually protected today.

    however, the rename loop in writeSettingsRecordSync happens while the lock is held, which is correct. but the temp path for the sync variant is ${path}.${process.pid}.${Date.now()}.tmp — no random suffix. two same-process sync writes occurring at the same millisecond timestamp would overwrite each other's temp file. the file lock serialises same-process callers, but a cross-process scenario (multiple terminals starting at the same ms) could clobber the temp file before rename. the async variant already uses a random suffix — apply the same pattern here for consistency and safety.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: lib/unified-settings.ts
    Line: 133-162
    
    Comment:
    **write happens outside the file lock in the refactored `writeSettingsRecordSync`**
    
    the old code held the file lock for the entire temp-write → rename sequence. the refactored `writeSettingsRecordSync` writes the temp file and renames before any lock is acquired — the lock is now only acquired in the callers (`saveUnifiedPluginConfigSync`, etc.). since `writeSettingsRecordSync` is called *inside* the locked block in the callers, the write is actually protected today.
    
    however, the rename loop in `writeSettingsRecordSync` happens while the lock is held, which is correct. but the temp path for the sync variant is `${path}.${process.pid}.${Date.now()}.tmp` — no random suffix. two same-process sync writes occurring at the same millisecond timestamp would overwrite each other's temp file. the file lock serialises same-process callers, but a cross-process scenario (multiple terminals starting at the same ms) could clobber the temp file before rename. the async variant already uses a random suffix — apply the same pattern here for consistency and safety.
    
    How can I resolve this? If you propose a fix, please make it concise.

    Fix in Codex

  3. General comment

    writeSettingsRecordSync/Async now have implicit caller contract for lock safety

    the old design held the file lock inside writeSettingsRecordSync and writeSettingsRecordAsync, so any call was automatically safe. the refactor moved the lock up to saveUnifiedPluginConfigSync, saveUnifiedPluginConfig, and saveUnifiedDashboardSettings, which correctly protects the full read-modify-write cycle — but turns the two write helpers into functions that silently require the caller to hold UNIFIED_SETTINGS_LOCK_PATH.

    on windows, a future internal caller that doesn't acquire the lock first could race concurrent renames on settings.json without any guard.

    consider adding a // REQUIRES: caller holds UNIFIED_SETTINGS_LOCK_PATH file lock doc comment on both private write helpers (lines 131 and 182) to make the contract explicit, or keeping the internal lock AND the outer lock (nested acquires of the same file-lock path are already guarded by the staleAfterMs check).

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: lib/unified-settings.ts
    Line: 131-182
    
    Comment:
    `writeSettingsRecordSync`/`Async` now have implicit caller contract for lock safety
    
    the old design held the file lock inside `writeSettingsRecordSync` and `writeSettingsRecordAsync`, so any call was automatically safe. the refactor moved the lock up to `saveUnifiedPluginConfigSync`, `saveUnifiedPluginConfig`, and `saveUnifiedDashboardSettings`, which correctly protects the full read-modify-write cycle — but turns the two write helpers into functions that silently require the caller to hold `UNIFIED_SETTINGS_LOCK_PATH`.
    
    on windows, a future internal caller that doesn't acquire the lock first could race concurrent renames on `settings.json` without any guard.
    
    consider adding a `// REQUIRES: caller holds UNIFIED_SETTINGS_LOCK_PATH file lock` doc comment on both private write helpers (lines 131 and 182) to make the contract explicit, or keeping the internal lock AND the outer lock (nested acquires of the same file-lock path are already guarded by the `staleAfterMs` check).
    
    How can I resolve this? If you propose a fix, please make it concise.

    Fix in Codex

  4. lib/data-retention.ts, line 486-491 (link)

    misleading log message before abort

    the logger emits "Skipping retention entry after retry exhaustion" but then immediately throw error — which aborts the entire prune cycle, not just that one entry. the test at test/data-retention.test.ts confirms the throw is intentional, but the log message contradicts what actually happens, making ops triage confusing. on windows, an antivirus-locked file hitting this path will bubble up through every caller and kill the full retention run while the audit log says "Skipping".

    fix the message to match the behaviour, e.g.:

Fix All in Codex

Last reviewed commit: 65112ff

Context used:

  • Rule from dashboard - What: Every code change must explain how it defends against Windows filesystem concurrency bugs and ... (source)

- cleanup stale/dead process lock artifacts before acquiring account lock

- ensure lock release always attempts fallback cleanup

- keep clearAccounts/saveTransactions serialized across file and memory locks

Co-authored-by: Codex <noreply@openai.com>
@coderabbitai
Copy link

coderabbitai bot commented Mar 3, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

walkthrough

introduces serialized, platform-aware storage locking and explicit decrypt-error propagation. adds StorageError and STORAGE_DECRYPT_ERROR_CODE, a file-queue + in-process mutex ordering for storage mutations, atomic write retries, and retryable IO helpers. key behavior changes live in lib/storage.ts:1, lib/file-lock.ts:1, and lib/unified-settings.ts:1.

changes

Cohort / File(s) Summary
storage locking & decrypt errors
lib/storage.ts
adds StorageError, STORAGE_DECRYPT_ERROR_CODE, toStorageDecryptError, accountFileMutex, withStorageSerializedFileLock, withAccountFileMutex, and releaseStorageLockFallback. refactors normalization/decrypt paths and storage mutations to propagate decrypt-specific errors and to use serialized lock ordering. (lib/storage.ts:1)
file-lock primitives
lib/file-lock.ts
splits write/close error handling, adds pid-based stale-lock removal, introduces cleanup helpers and ENOENT-tolerant release semantics. acquireFileLock accepts options and returns a release method in practice. (lib/file-lock.ts:1)
unified settings atomic writes & lock release
lib/unified-settings.ts
replaces per-call lock writes with temp-file atomic write + rename retries, introduces SETTINGS_LOCK_OPTIONS, and adds async/sync lock release helpers with best-effort cleanup/logging. (lib/unified-settings.ts:1)
io retry & retention
lib/data-retention.ts
adds retry wrapper withRetentionIoRetry and retryable error set for transient IO codes; wraps readdir/stat/unlink/rmdir with exponential backoff. (lib/data-retention.ts:1)
background jobs & redaction
lib/background-jobs.ts, lib/codex-manager.ts
adds sanitized error messages, jittered backoff, attemptsMade tracking, CLI token/option hardening, and CLI error redaction. (lib/background-jobs.ts:1, lib/codex-manager.ts:1)
tests: storage, file-lock, retention, unified-settings
test/storage.test.ts, test/file-lock.test.ts, test/data-retention.test.ts, test/unified-settings.test.ts
adds tests for decrypt-fail-fast, lock.release fallback on EBUSY/EPERM, deterministic concurrent saves, worker dynamic import of file-lock, release retry behavior, and retryable IO scenarios. (test/storage.test.ts:1, test/file-lock.test.ts:1)
ci, sbom, license checks, docs
.github/.../ci.yml, .github/.../supply-chain.yml, package.json, scripts/license-policy-check.js, docs/*
ci artifact sharing, sbom script + devDependency, SPDX token normalization and token-based denylist checking, documentation updates for testing and operations. (.github/...:1, package.json:1, scripts/license-policy-check.js:1)

sequence diagram(s)

sequenceDiagram
    autonumber
    participant caller as "caller"
    participant inproc as "in-process mutex\n(rgba(70,130,180,0.5))"
    participant filequeue as "file queue / file-lock\n(rgba(34,139,34,0.5))"
    participant storagemtx as "storage mutex\n(rgba(218,165,32,0.5))"
    participant fs as "filesystem\n(rgba(128,0,128,0.5))"

    caller->>inproc: request serialized lock
    inproc->>filequeue: enqueue / acquire file lock
    filequeue->>fs: create/acquire lock file
    filequeue-->>inproc: lock acquired
    inproc->>storagemtx: acquire storage mutex
    storagemtx-->>caller: perform storage mutation
    caller->>storagemtx: release storage mutex
    storagemtx-->>inproc: signal release
    inproc->>filequeue: release file lock
    filequeue->>fs: remove lock file (or leave for fallback)
    filequeue-->>inproc: fully released
Loading

estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

notes for review

  • concurrency risk: verify deterministic acquisition order and absence of deadlocks across withStorageSerializedFileLock and accountFileMutex. review lib/storage.ts:1 and lib/file-lock.ts:1 for all lock acquisition/release and error branches.
  • missing regression tests: add focused tests for withStorageSerializedFileLock and releaseStorageLockFallback under concurrent abort/crash scenarios. current tests exercise many cases but do not cover all interleavings (test/storage.test.ts:1, test/file-lock.test.ts:1).
  • windows edge cases: validate rename/unlink and sharing-violation semantics on windows. inspect lib/file-lock.ts:1 and lib/unified-settings.ts:1 for platform-specific handling and ensure tests simulate EBUSY/EPERM behaviors on windows.
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 13.24% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed the title follows conventional commits format with type and lowercase imperative summary, accurately reflecting the main change (enterprise hardening and storage lock reliability).
Description check ✅ Passed PR description is largely complete with clear summary, validation steps, and multiple detailed technical updates, but is missing sections from the template (README, docs, SECURITY/CONTRIBUTING review, Risk/Rollback).

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/pr32-feedback

Comment @coderabbitai help to get the list of available commands and usage tips.

Ensure account-storage mutations keep deterministic ordering while preserving the historical file-lock before in-process mutex acquisition sequence.

Co-authored-by: Codex <noreply@openai.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lib/storage.ts`:
- Around line 382-386: withAccountFileLock currently calls
cleanupDeadProcessStorageLock(lockPath) before calling
acquireFileLock(lockPath,...), which creates a race where two processes can both
remove a stale lock then race to create it; instead remove the external cleanup
call and rely on acquireFileLock to handle stale-lock removal (or move the
cleanup logic into acquireFileLock). Update withAccountFileLock to stop calling
cleanupDeadProcessStorageLock and ensure acquireFileLock (in lib/file-lock.ts)
invokes removeIfStale/remove-or-cleanup atomically so stale locks are handled
inside acquireFileLock rather than externally.
- Around line 373-379: The catch block around the storage read should not call
releaseStorageLockFallback when the error is an EBUSY from
fs.readFile/JSON.parse; update the catch in the function that reads the lock
(the block that currently inspects (error as NodeJS.ErrnoException).code) to
explicitly handle error.code === "ENOENT" (return) and error.code === "EBUSY"
(return or rethrow as appropriate) and only call
releaseStorageLockFallback(lockPath) for other error types (e.g., parse errors).
Reference the existing catch scope that uses (error as
NodeJS.ErrnoException).code and the releaseStorageLockFallback(lockPath) call to
implement this conditional behavior.
- Around line 123-139: The comment claiming "file lock -> in-process mutex"
contradicts the implementation in withStorageSerializedFileLock (which currently
calls withAccountFileMutex → withAccountFileLock → withStorageLock), so either
update the comment to state the actual acquisition order (in-process mutex ->
file lock -> storage lock) or reorder the calls to match the comment;
specifically, either change the comment near withStorageSerializedFileLock to
reflect the real sequence or change the implementation to call
withAccountFileLock(path, () => withAccountFileMutex(() => withStorageLock(fn)))
so the acquisition becomes file lock -> in-process mutex -> storage lock (and
ensure variable/closure usage still compiles).
- Around line 340-346: The catch block in releaseStorageLockFallback currently
swallows all errors from fs.rm; update it to log a debug-level message including
the lockPath and the caught error so failed cleanup is observable (e.g., use the
existing logger or processLogger if available), while preserving the
"best-effort" behavior by not rethrowing; ensure you reference
releaseStorageLockFallback and the fs.rm call so the log includes both the path
and error details.
- Around line 355-362: When detecting and removing a stale lock (the branch that
computes lockPid and isDeadProcess using process.kill) and the separate
age-based cleanup branch, add observability logs using the existing logger
(e.g., processLogger or storage logger used elsewhere in this module) that
record the action and context: include the lock path/name, lockPid, whether
isDeadProcess was true, the lock age (timestamp or computed age), and a concise
reason ("stale: dead PID" or "stale: age threshold exceeded"). Place one log
right before or immediately after the dead-process cleanup path (where
isDeadProcess is true) and another log in the age-based cleanup path to make
both events visible to operators. Ensure the log messages are structured and
include these fields so they match ops runbook expectations.
- Around line 390-401: The current finally always calls
releaseStorageLockFallback(lockPath) even when lock.release() succeeds; change
the flow so the fallback is only invoked when lock.release() throws or otherwise
fails: try calling await lock.release() and on catch (error) check the error
code like in the existing block, log the warning, then call await
releaseStorageLockFallback(lockPath) from inside that catch (or when a boolean
“failedToRelease” flag is set) so the fallback only runs on failure; update
references to lock.release, releaseStorageLockFallback, lockPath and preserve
the ENOENT handling/logging behavior.
- Around line 348-380: Add explicit debug/info/error logs inside
cleanupDeadProcessStorageLock to record: when the lock file is read and parsed
(include pid and acquiredAt), when process.kill(pid, 0) indicates the process is
alive (skipping cleanup) or throws ESRCH (releasing lock), when EPERM or other
errors occur (and whether lock is stale), when
releaseStorageLockFallback(lockPath) is invoked, and when ENOENT or JSON parse
errors are encountered; use the existing processLogger (or the module logger)
and include lockPath and relevant metadata in each message. Also add vitest
regression tests for cleanupDeadProcessStorageLock covering: (1) process exists
and lock is fresh — assert no releaseStorageLockFallback call, (2) process dead
(mock process.kill to throw ESRCH) — assert releaseStorageLockFallback called,
(3) process exists but process.kill throws EPERM — assert no release unless lock
is stale (simulate acquiredAt older than
ACCOUNT_STORAGE_LOCK_OPTIONS.staleAfterMs), and (4) missing or corrupted lock
file — assert fallback release or ENOENT path; mock fs.readFile, process.kill,
and releaseStorageLockFallback to verify log calls and behavior.

ℹ️ Review info
Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0d15b434-50c2-4cf6-8218-c272deab5dd3

📥 Commits

Reviewing files that changed from the base of the PR and between d1c603e and d267377.

📒 Files selected for processing (1)
  • lib/storage.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (1)
lib/**

⚙️ CodeRabbit configuration file

focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.

Files:

  • lib/storage.ts
🧬 Code graph analysis (1)
lib/storage.ts (1)
lib/file-lock.ts (1)
  • acquireFileLock (57-107)
🔇 Additional comments (2)
lib/storage.ts (2)

112-112: lgtm - mutex declaration is straightforward.

the accountFileMutex follows the same pattern as storageMutex at line 111.


1318-1321: regression tests do exist for the locking pattern—they just weren't cited in the pr.

test/storage.test.ts:175 directly exercises withAccountStorageTransaction with concurrent promise.all calls and verifies serialization without data loss. test/storage-async.test.ts:295 covers the mutex behavior with rapid sequential saves and concurrent writes. test/storage.test.ts:1407+ tests ebusy/eperm retry logic, covering windows filesystem edge cases. these tests validate the lock-order changes in withAccountStorageTransaction, saveAccounts, and clearAccounts at lib/storage.ts:1318, 1334, 1346.

			> Likely an incorrect or invalid review comment.

Drop pre-acquire dead-process lock cleanup and only run fallback lock-file deletion when lock.release() fails. Also align lock-order comment and add debug observability for fallback cleanup failures.

Co-authored-by: Codex <noreply@openai.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lib/storage.ts`:
- Around line 123-138: The new file-lock serialization helpers
withAccountFileMutex and withStorageSerializedFileLock and their use in
saveAccounts lack tests covering lock.release() error paths; add unit tests in
test/storage.test.ts that mock the lock.release() implementation to throw EBUSY
and EPERM and assert that releaseStorageLockFallback is invoked and saveAccounts
still completes (no uncaught rejection), add a test that simulates concurrent
saveAccounts calls to verify ordering under lock contention (use mocked locks
that delay release) and add a Windows-specific test case where lock.release()
throws EPERM to validate the fallback cleanup path; reference the functions
withAccountFileMutex, withStorageSerializedFileLock, saveAccounts,
releaseStorageLockFallback and the lock.release() call when adding the mocks and
assertions.

ℹ️ Review info
Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 3d9bb546-70b4-4fee-b0a7-5413a4e724b5

📥 Commits

Reviewing files that changed from the base of the PR and between d267377 and 1f516d1.

📒 Files selected for processing (1)
  • lib/storage.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (1)
lib/**

⚙️ CodeRabbit configuration file

focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.

Files:

  • lib/storage.ts
🧬 Code graph analysis (1)
lib/storage.ts (1)
lib/file-lock.ts (1)
  • acquireFileLock (57-107)
🔇 Additional comments (7)
lib/storage.ts (7)

112-112: lgtm: in-process mutex matches existing storageMutex pattern.

the accountFileMutex at lib/storage.ts:112 and withAccountFileMutex at lib/storage.ts:123-130 mirror the existing storageMutex/withStorageLock implementation. pattern is correct for serializing async operations within a single process.

Also applies to: 123-130


132-138: lock order comment now matches implementation.

lib/storage.ts:133-134 correctly states acquisition order: file-queue mutex → file lock → storage mutex. code at lines 135-137 does exactly that: withAccountFileMutexwithAccountFileLockwithStorageLock. past review comment resolved.


339-348: fallback now has debug observability as requested.

lib/storage.ts:343-346 logs cleanup failures at debug level with path and error. addresses past review comment about silent swallowing.


350-370: release fallback only fires on failure—double cleanup fixed.

lib/storage.ts:357-367 now only calls releaseStorageLockFallback when lock.release() throws (and it's not ENOENT). this addresses the past review about unconditional double cleanup.

one edge case worth noting: if lock.release() throws EBUSY (windows antivirus holding the file), releaseStorageLockFallback uses fs.rm({ force: true }) which will also likely fail with EBUSY and log at debug level. lock file may remain until next acquire cleans it via removeIfStale in lib/file-lock.ts:91-93. this is probably acceptable for "best-effort" but could leave stale locks on heavily-contested windows systems.


1285-1288: withAccountStorageTransaction now uses serialized lock path.

lib/storage.ts:1285-1288 routes through withStorageSerializedFileLock, ensuring consistent acquisition order with saveAccounts and clearAccounts. this eliminates the possibility of lock-order deadlocks between these paths.


1299-1304: saveAccounts uses serialized lock—consistent with transaction path.

lib/storage.ts:1301-1303 wraps saveAccountsUnlocked in withStorageSerializedFileLock. matches the acquisition order used by withAccountStorageTransaction.


1311-1336: clearAccounts uses serialized lock—consistent with other mutation paths.

lib/storage.ts:1313-1335 wraps the entire clear operation in withStorageSerializedFileLock. all three mutation paths (withAccountStorageTransaction, saveAccounts, clearAccounts) now share the same lock acquisition sequence.

Add regression tests for lock.release() EBUSY/EPERM fallback cleanup and concurrent save lock serialization to close outstanding PR feedback.

Co-authored-by: Codex <noreply@openai.com>
@ndycode
Copy link
Owner Author

ndycode commented Mar 4, 2026

@greptile pls review

Address unresolved review threads by hardening audit/log redaction, lock scope, cursor validation, retention retries, and storage decrypt behavior; align workflows/docs/scripts; and add regression tests for all actionable items.

Co-authored-by: Codex <noreply@openai.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
lib/storage.ts (1)

930-947: ⚠️ Potential issue | 🟠 Major

v1 migration output is bypassed when rebuilding normalized accounts.

lib/storage.ts:930 computes baseStorage via migrateV1ToV3, but lib/storage.ts:936 iterates rawAccounts instead of baseStorage.accounts. this can skip account-level migration normalization before decrypt/dedupe.

proposed fix
-  const rawActiveIndex = clampIndex(activeIndexValue, rawAccounts.length);
-  const activeKey = extractActiveKey(rawAccounts, rawActiveIndex);
+  const sourceAccounts = Array.isArray(baseStorage.accounts) ? baseStorage.accounts : [];
+  const rawActiveIndex = clampIndex(activeIndexValue, sourceAccounts.length);
+  const activeKey = extractActiveKey(sourceAccounts as unknown[], rawActiveIndex);
@@
-	const validAccounts: AccountMetadataV3[] = [];
-	for (const account of rawAccounts) {
+	const validAccounts: AccountMetadataV3[] = [];
+	for (const account of sourceAccounts) {
 		if (!isStoredAccountCandidate(account)) {
 			continue;
 		}
@@
-    const clampedRawIndex = clampIndex(rawIndex, rawAccounts.length);
-    const familyKey = extractActiveKey(rawAccounts, clampedRawIndex);
+    const clampedRawIndex = clampIndex(rawIndex, sourceAccounts.length);
+    const familyKey = extractActiveKey(sourceAccounts as unknown[], clampedRawIndex);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/storage.ts` around lines 930 - 947, The code builds baseStorage using
fromVersion and migrateV1ToV3 but then iterates rawAccounts, bypassing any
migration/normalization; change the loop to iterate baseStorage.accounts (or the
normalized accounts array on AccountStorageV3) when populating validAccounts so
decryptAccountSensitiveFields runs on migrated data, keeping the existing
try/catch that converts errors via toStorageDecryptError and preserving
validAccounts accumulation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@index.ts`:
- Around line 1725-1732: The fallback for safeAuditResource currently returns
the raw url when new URL(url) throws, which can leak query/hash into auditLog
(see auditLog in lib/audit.ts); update the fallback in the safeAuditResource
IIFE (and the analogous spot around line 1739) to strip query and hash from the
raw string before returning (e.g., remove everything from the first '?' or '#'
onward) so only origin+pathname or sanitized path is logged; ensure the
sanitized value is what gets passed to auditLog/resource.

In `@lib/background-jobs.ts`:
- Around line 54-57: getDelayMs currently uses a deterministic exponential
backoff which causes worker synchronization; modify getDelayMs in
lib/background-jobs.ts to compute the capped exponential base delay (as it does
now) then apply a randomized jitter of about ±20% (e.g., multiply by 1 +
randomBetween(-0.2, 0.2)) and return an integer ms to avoid synchronized retries
on 429/EBUSY. Keep the existing cap logic and ensure jitter is applied after
capping; update tests in test/background-jobs.test.ts to include a
concurrent-retry assertion that verifies returned delays for the same attempt
can differ (i.e., show jitter is present) while preserving existing behavior for
EPERM/other cases.
- Around line 39-47: The token-redaction regex in sanitizeErrorMessage currently
uses [A-Z0-9._-]+ which misses base64 chars like /, +, =; update the
token-capturing patterns in sanitizeErrorMessage (the
/\b(?:access|refresh|id)?_?token(?:=|:)?\s*([A-Z0-9._-]+)/gi and the Bearer
pattern) to use a non-whitespace matcher such as \S+ so tokens are consumed
until whitespace (e.g., /\b(?:access|refresh|id)?_?token(?:=|:)?\s*\S+/gi and
/\b(Bearer)\s+\S+/gi), and add a unit test in the "redacts sensitive error text"
test to assert that tokens with base64-like suffixes (containing /, +, =) are
fully redacted.

In `@lib/data-retention.ts`:
- Line 21: The RETRYABLE_RETENTION_CODES set is missing "ENOTEMPTY", causing
production code to skip retries while tests expect it; update the
RETRYABLE_RETENTION_CODES constant to include "ENOTEMPTY" (matching the test's
RETRYABLE_REMOVE_CODES) so the retry logic that checks this set (used around the
fs.rmdir call) will treat ENOTEMPTY as transient and retry accordingly.

In `@lib/file-lock.ts`:
- Around line 69-89: When write or close fails while acquiring the lock in
lib/file-lock.ts (the block using handle.writeFile and handle.close that sets
writeError/closeError), perform a best-effort unlink of the lock file before
rethrowing the error to avoid leaving a stale lock on disk; do the same for the
analogous block around lines 154-175, and add a vitest regression that simulates
both a write failure and a close failure during acquisition to assert the lock
file is removed (no orphan lock) even when errors are thrown.

In `@lib/unified-settings.ts`:
- Around line 265-272: The current pattern around
acquireFileLockSync/lock.release in unified-settings (used with
UNIFIED_SETTINGS_LOCK_PATH and SETTINGS_LOCK_OPTIONS and wrapping
readSettingsRecordSync/writeSettingsRecordSync) can propagate lock.release
errors (EPERM/EBUSY) and fail after a successful write; update all call sites
(the blocks around the lock.release calls) to catch errors from lock.release(),
treat EPERM/EBUSY as non-fatal by logging a warning and swallowing them while
rethrowing unexpected errors, ensuring the writeSettingsRecordSync result is not
rolled back; also add vitest tests in test/unified-settings.test.ts that
stub/mock lock.release to throw EPERM and EBUSY and assert the write succeeds
and the lock file does not leave the process in a failing state.

In `@scripts/license-policy-check.js`:
- Around line 48-54: The exact-token denylist check is missing common SPDX
variants (e.g., "gpl-2.0-only", "gpl-2.0-or-later") causing blocked licenses to
slip through; update the matching logic that uses extractLicenseTokens so that
for each token you also derive and check a normalized base SPDX id (strip
suffixes like "-only" and "-or-later", and any trailing "+") against the
denylist (i.e., when evaluating in the function that performs the denylist
lookup near the token match), ensuring you test both the original token and the
stripped/base form before allowing it.

In `@test/background-jobs.test.ts`:
- Around line 141-160: The test currently asserts that the thrown error contains
PII ("person@example.com"); change the assertion to avoid checking for secrets
by expecting a generic failure instead—e.g., assert that
runBackgroundJobWithRetry (from runBackgroundJobWithRetry /
getBackgroundJobDlqPath) rejects (using .rejects.toThrow() with no PII-specific
message, or .rejects.toThrowError()/toBeRejected) and keep any checks about
redaction focused on logs/DLQ contents rather than the thrown error text; update
the assertion in the failing test block so it no longer requires the email/token
to appear.

In `@test/data-retention.test.ts`:
- Around line 101-156: Add deterministic Vitest cases exercising the remaining
retry branches in enforceDataRetention: simulate transient EPERM/EACCES/EAGAIN
errors on directory operations (fs.stat/fs.rmdir) and a transient
EBUSY/EACCES-like error on single-file removal (fs.unlink) to verify the retry
logic in the functions implemented in lib/data-retention.ts (refer to
enforceDataRetention and its internal directory-entry pruning and single-file
unlink paths). For each test, spy/mock the specific fs method (stat, rmdir,
unlink) to throw the transient error once for the target path then succeed,
assert that the retry flag was hit, assert the file/dir is ultimately removed
(expect ENOENT) and restore spies in finally blocks to keep tests deterministic
and isolated.

---

Outside diff comments:
In `@lib/storage.ts`:
- Around line 930-947: The code builds baseStorage using fromVersion and
migrateV1ToV3 but then iterates rawAccounts, bypassing any
migration/normalization; change the loop to iterate baseStorage.accounts (or the
normalized accounts array on AccountStorageV3) when populating validAccounts so
decryptAccountSensitiveFields runs on migrated data, keeping the existing
try/catch that converts errors via toStorageDecryptError and preserving
validAccounts accumulation.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: d85dbde0-bd54-490e-b59f-2b0c10f91215

📥 Commits

Reviewing files that changed from the base of the PR and between 1f516d1 and 0c48823.

⛔ Files ignored due to path filters (1)
  • package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (20)
  • .github/workflows/ci.yml
  • .github/workflows/supply-chain.yml
  • docs/development/TESTING.md
  • docs/runbooks/operations.md
  • index.ts
  • lib/background-jobs.ts
  • lib/codex-manager.ts
  • lib/data-retention.ts
  • lib/file-lock.ts
  • lib/storage.ts
  • lib/unified-settings.ts
  • package.json
  • scripts/license-policy-check.js
  • test/authorization.test.ts
  • test/background-jobs.test.ts
  • test/codex-manager-cli.test.ts
  • test/data-retention.test.ts
  • test/file-lock.test.ts
  • test/storage.test.ts
  • test/unified-settings.test.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (3)
docs/**

⚙️ CodeRabbit configuration file

keep README, SECURITY, and docs consistent with actual CLI flags and workflows. whenever behavior changes, require updated upgrade notes and mention new npm scripts.

Files:

  • docs/runbooks/operations.md
  • docs/development/TESTING.md
test/**

⚙️ CodeRabbit configuration file

tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.

Files:

  • test/authorization.test.ts
  • test/codex-manager-cli.test.ts
  • test/data-retention.test.ts
  • test/background-jobs.test.ts
  • test/storage.test.ts
  • test/file-lock.test.ts
  • test/unified-settings.test.ts
lib/**

⚙️ CodeRabbit configuration file

focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.

Files:

  • lib/background-jobs.ts
  • lib/storage.ts
  • lib/unified-settings.ts
  • lib/file-lock.ts
  • lib/codex-manager.ts
  • lib/data-retention.ts
🧬 Code graph analysis (8)
index.ts (2)
lib/logger.ts (1)
  • logWarn (341-346)
lib/audit.ts (1)
  • auditLog (123-153)
test/codex-manager-cli.test.ts (2)
lib/codex-manager.ts (1)
  • runCodexMultiAuthCli (4529-4610)
scripts/codex.js (1)
  • runCodexMultiAuthCli (501-501)
test/data-retention.test.ts (1)
lib/data-retention.ts (2)
  • RetentionPolicy (6-12)
  • enforceDataRetention (119-146)
lib/storage.ts (3)
lib/file-lock.ts (1)
  • acquireFileLock (57-123)
lib/storage/migrations.ts (1)
  • AccountMetadataV3 (40-57)
lib/utils.ts (1)
  • isRecord (11-13)
test/background-jobs.test.ts (1)
lib/background-jobs.ts (2)
  • runBackgroundJobWithRetry (85-129)
  • getBackgroundJobDlqPath (81-83)
test/storage.test.ts (1)
lib/storage.ts (2)
  • saveAccounts (1326-1331)
  • loadAccounts (1006-1008)
lib/data-retention.ts (1)
lib/utils.ts (1)
  • sleep (65-67)
test/unified-settings.test.ts (1)
lib/unified-settings.ts (2)
  • getUnifiedSettingsPath (229-231)
  • saveUnifiedPluginConfig (285-296)
🪛 LanguageTool
docs/runbooks/operations.md

[uncategorized] ~28-~28: The official name of this software platform is spelled with a capital “H”.
Context: ...ce (for example weekly-YYYYMMDD or CI ${{ github.run_id }}) - remove previous key af...

(GITHUB)

🔇 Additional comments (17)
docs/runbooks/operations.md (1)

27-28: good update for rotation idempotency workflow.

this command and stable run-id guidance are clear and consistent with the weekly ops flow in docs/runbooks/operations.md (line 27 and line 28).

as per coding guidelines, docs/**: keep README, SECURITY, and docs consistent with actual CLI flags and workflows. whenever behavior changes, require updated upgrade notes and mention new npm scripts.

lib/data-retention.ts (1)

67-69: good retry coverage across retention io callsites

wrapping readdir/stat/unlink/rmdir and single-file pruning paths is consistent in lib/data-retention.ts:67 (line 67), lib/data-retention.ts:81 (line 81), and lib/data-retention.ts:106 (line 106). this meaningfully hardens transient io failures.

Also applies to: 81-92, 106-110

test/data-retention.test.ts (1)

7-27: cleanup retry helper looks solid and deterministic

test/data-retention.test.ts:7 (line 7) and test/data-retention.test.ts:45 (line 45) improve teardown stability under transient windows fs lock errors while preserving deterministic assertions.

Also applies to: 45-45

lib/background-jobs.ts (1)

104-104: good hardening on error sanitization wiring.

lib/background-jobs.ts:104 and lib/background-jobs.ts:119 consistently sanitize error strings before dlq/log output.

Also applies to: 119-119

package.json (1)

69-69: no issue: workflow correctly installs dev deps before sbom script

the sbom workflow in .github/workflows/supply-chain.yml:72 runs npm ci without --omit=dev, so devDependencies (including cyclonedx-npm from package.json:107) are installed before the sbom script executes at line 75. the setup is correct.

lib/codex-manager.ts (1)

1521-1521: the guard at lib/codex-manager.ts:1521 is solid and already has regression coverage.

test/codex-manager-cli.test.ts:2123 creates a cursor by encoding "12junk" as base64, which decodes to non-digits and correctly triggers the "Invalid --cursor value" error at test/codex-manager-cli.test.ts:2132. this covers the exact scenario the regex guard protects against.

docs/development/TESTING.md (1)

53-60: docs update is aligned with the new local gate

this section clearly documents the ordering change and the added npm run audit:ci / npm run license:check checks.

As per coding guidelines, docs/**: "whenever behavior changes, require updated upgrade notes and mention new npm scripts."

.github/workflows/supply-chain.yml (1)

11-13: nice cleanup on supply-chain workflow config

centralizing the denylist and using npm run sbom improves maintainability and keeps policy settings in one place.

Also applies to: 32-32, 75-75

.github/workflows/ci.yml (1)

55-60: ci flow update looks good

artifact handoff plus needs: test makes the coverage gate ordering explicit, and npm run test:coverage is clearer for coverage intent.

Also applies to: 68-68, 74-78, 90-90

test/authorization.test.ts (1)

59-68: good test isolation hardening

using full auth-env snapshot/restore makes these vitest cases deterministic and prevents cross-test env bleed in test/authorization.test.ts:59.

Also applies to: 72-84, 88-100, 127-127

test/codex-manager-cli.test.ts (2)

2053-2064: good pagination contract assertions.

test/codex-manager-cli.test.ts:2053 and test/codex-manager-cli.test.ts:2082 now pin schemaVersion, pagination.cursor, and pagination.pageSize, which tightens the json api contract for cursor paging.

Also applies to: 2082-2093


2103-2147: good malformed cursor regression coverage.

test/codex-manager-cli.test.ts:2103 validates both malformed decoded payloads and invalid base64 input with explicit exit-code and error assertions.

test/file-lock.test.ts (1)

88-122: worker contention path now exercises the real lock api cleanly.

test/file-lock.test.ts:88test/file-lock.test.ts:122 and test/file-lock.test.ts:134 validate lock acquisition/release through the transpiled module in separate processes, then assert no lost/duplicate writes under contention.

Also applies to: 134-166

test/unified-settings.test.ts (2)

6-25: cleanup retry helper is a good stability addition.

test/unified-settings.test.ts:6 and test/unified-settings.test.ts:44 improve teardown reliability for transient windows remove errors (ebusy/eperm/enotempty).

Also applies to: 44-44


249-302: good lock-before-read regression test.

test/unified-settings.test.ts:249 verifies saveUnifiedPluginConfig reads settings while the file lock is held, which directly covers the new lock ordering in lib/unified-settings.ts.

lib/storage.ts (1)

350-359: lock release fallback handling is cleaner now.

lib/storage.ts:350 and lib/storage.ts:368 now keep best-effort cleanup observable and only trigger forced fallback on non-enoent release failures.

Also applies to: 368-378

test/storage.test.ts (1)

680-714: strong regression coverage for decrypt failures and lock edge cases.

test/storage.test.ts:680 plus test/storage.test.ts:1579test/storage.test.ts:1676 cover fail-fast decrypt behavior, lock.release() fallback on windows-style errors, and concurrent save lock serialization.

Also applies to: 1579-1676

Address unresolved PR #32 feedback across audit logging, CLI parsing, lock handling,
retention races, storage rotation, and runbook/license guidance.

Includes targeted regressions for background jobs, codex-manager CLI,
data retention, and file lock behavior.

Co-authored-by: Codex <noreply@openai.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
lib/storage.ts (1)

113-139: 🧹 Nitpick | 🔵 Trivial

scope account-file mutex by lock path to avoid cross-project head-of-line blocking.

lib/storage.ts:113-139 serializes all account file ops through one global queue. this can unnecessarily block independent storage paths in the same process (project-scoped vs global), reducing concurrency for no correctness gain.

proposed refactor
-let accountFileMutex: Promise<void> = Promise.resolve();
+const accountFileMutexByPath = new Map<string, Promise<void>>();

-function withAccountFileMutex<T>(fn: () => Promise<T>): Promise<T> {
-	const previousMutex = accountFileMutex;
+function withAccountFileMutex<T>(path: string, fn: () => Promise<T>): Promise<T> {
+	const previousMutex = accountFileMutexByPath.get(path) ?? Promise.resolve();
 	let releaseLock: () => void;
-	accountFileMutex = new Promise<void>((resolve) => {
+	const nextMutex = new Promise<void>((resolve) => {
 		releaseLock = resolve;
 	});
-	return previousMutex.then(fn).finally(() => releaseLock());
+	accountFileMutexByPath.set(path, nextMutex);
+	return previousMutex.then(fn).finally(() => {
+		releaseLock();
+		if (accountFileMutexByPath.get(path) === nextMutex) {
+			accountFileMutexByPath.delete(path);
+		}
+	});
 }

 async function withStorageSerializedFileLock<T>(path: string, fn: () => Promise<T>): Promise<T> {
-	return withAccountFileMutex(() =>
+	return withAccountFileMutex(path, () =>
 		withAccountFileLock(path, () => withStorageLock(fn)),
 	);
 }

As per coding guidelines, lib/**: “focus on auth rotation, windows filesystem IO, and concurrency.”

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/storage.ts` around lines 113 - 139, The global accountFileMutex
serializes all account file operations and causes head-of-line blocking; change
withAccountFileMutex and accountFileMutex to use a per-path mutex map (keyed by
the file path used in withStorageSerializedFileLock) so independent paths don't
block each other: replace the single accountFileMutex with a Map<string,
Promise<void>> (or similar) and modify withAccountFileMutex to accept a path (or
create a new helper like withAccountFileMutexForPath(path, fn)) that
grabs/releases the Promise for that path, chains on
previousPromise.then(fn).finally(() => release and cleanup the map entry if no
waiter), and update withStorageSerializedFileLock to pass the path into the
per-path account-file mutex while still sequencing with withAccountFileLock and
withStorageLock; keep existing release semantics and ensure map entry is removed
when resolved to avoid leaks.
♻️ Duplicate comments (6)
index.ts (1)

1687-1693: ⚠️ Potential issue | 🟠 Major

fallback audit resource still logs raw url with query/hash.

Line 1692 and Line 2468 in index.ts:1692 and index.ts:2468 return raw url when parsing fails. lib/audit.ts:130 persists resource verbatim, so query/hash secrets can leak.

proposed fix
 								const safeAuditResource = (() => {
 									try {
 										const parsed = new URL(url);
 										return `${parsed.origin}${parsed.pathname}`;
 									} catch {
-										return url;
+										const raw = String(url);
+										return raw.split("?")[0]?.split("#")[0] ?? "unknown";
 									}
 								})();
@@
 								const safePluginAuditResource = (() => {
 									try {
 										const parsed = new URL(url);
 										return `${parsed.origin}${parsed.pathname}`;
 									} catch {
-										return url;
+										const raw = String(url);
+										return raw.split("?")[0]?.split("#")[0] ?? "unknown";
 									}
 								})();

Also applies to: 2463-2469

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@index.ts` around lines 1687 - 1693, The fallback for safeAuditResource
currently returns the raw url when new URL(url) throws, which can leak
query/hash; update the catch branch in the safeAuditResource IIFE to return a
sanitized version (strip anything after the first '?' or '#' — e.g. use
url.split(/[?#]/)[0]) instead of the raw url, and apply the same change to the
other identical occurrence (the second safeAuditResource implementation) so both
fallbacks consistently remove query and fragment before persisting.
test/data-retention.test.ts (1)

101-197: ⚠️ Potential issue | 🟠 Major

retry-branch coverage is still incomplete for windows error codes and single-file unlink.

test/data-retention.test.ts:101-197 covers EBUSY and ENOTEMPTY, but leaves EPERM/EACCES/EAGAIN from lib/data-retention.ts:21 and the pruneSingleFile retry path at lib/data-retention.ts:104-111 untested.

proposed test additions
+it.each(["EPERM", "EACCES", "EAGAIN"] as const)(
+	"retries transient %s during directory entry pruning",
+	async (code) => {
+		// inject one-time error on fs.stat or fs.rmdir for stale log path
+		// assert retry flag hit and stale entry removed
+	},
+);
+
+it.each(["EBUSY", "EPERM", "EACCES", "EAGAIN"] as const)(
+	"retries transient %s during single-file retention pruning",
+	async (code) => {
+		// create stale background-job-dlq.jsonl (or quota-cache.json)
+		// inject one-time error on fs.unlink for that file
+		// assert enforceDataRetention succeeds and file is removed
+	},
+);

As per coding guidelines, test/**: "tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/data-retention.test.ts` around lines 101 - 197, Tests are missing
coverage for Windows-like transient error handling (EPERM/EACCES/EAGAIN) and the
single-file retry path in pruneSingleFile; add deterministic vitest cases that
import enforceDataRetention/pruneSingleFile from lib/data-retention.js, create a
stale file, then spyOn fs.unlink (for pruneSingleFile) and fs.stat/fs.rmdir (for
enforceDataRetention) to first throw a NodeJS.ErrnoException with code set to
"EPERM"/"EACCES"/"EAGAIN" (or simulate one transient unlink error) and then
succeed on retry, assert removedLogs increments and that the file/directory is
eventually gone, and ensure all spies are restored in finally blocks to keep
tests deterministic.
test/background-jobs.test.ts (2)

181-195: ⚠️ Potential issue | 🟡 Minor

add a regression case for token values containing /, +, and =.

Line 181 in test/background-jobs.test.ts:181 only exercises sk_test_123/rt_456, so this suite will not catch the current regex gap in lib/background-jobs.ts:43-46.

proposed fix
 						throw new Error(
-							"network failed for person@example.com Bearer sk_test_123 refresh_token=rt_456",
+							"network failed for person@example.com Bearer sk_test/123+abc== refresh_token=rt/456+xyz==",
 						);
 ...
-			expect(entry.error).not.toContain("sk_test_123");
-			expect(entry.error).not.toContain("rt_456");
+			expect(entry.error).not.toContain("sk_test/123+abc==");
+			expect(entry.error).not.toContain("rt/456+xyz==");
 ...
-			expect(serializedWarnings).not.toContain("sk_test_123");
-			expect(serializedWarnings).not.toContain("rt_456");
+			expect(serializedWarnings).not.toContain("sk_test/123+abc==");
+			expect(serializedWarnings).not.toContain("rt/456+xyz==");

As per coding guidelines, test/**: "tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/background-jobs.test.ts` around lines 181 - 195, Add a regression test
to cover token values containing "/", "+", and "=" to exercise the current
redaction regex gap referenced in lib/background-jobs.ts (lines around the
redaction logic) — update the test in background-jobs.test.ts that asserts DLQ
redaction (the block that currently uses "sk_test_123" and "rt_456") to also
enqueue/fail a job using secrets like "sk/te+st=123" and "rt/ra/sh+=token=" and
then read the DLQ entry to assert entry.error contains "***REDACTED***" and does
not contain the raw token substrings; this ensures the redaction logic in the
background job handler (the function that writes DLQ entries) correctly
sanitizes tokens with "/", "+", and "=".

186-186: ⚠️ Potential issue | 🟠 Major

do not assert pii leakage in thrown errors.

Line 186 in test/background-jobs.test.ts:186 requires an email in the thrown error message, which hard-codes a sensitive-data leak contract.

proposed fix
-			).rejects.toThrow("person@example.com");
+			).rejects.toThrow("network failed");
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/background-jobs.test.ts` at line 186, The test currently asserts a
specific email string in the thrown error (the
.rejects.toThrow("person@example.com") assertion), which hard-codes PII; remove
the exact-email assertion and instead assert the rejection without PII (e.g.,
use .rejects.toThrow() or assert the error type/class) and if you need to check
content assert it is generic/non-PII (for example, assert the error.message does
not contain 'person@example.com' or check a validation error class) so update
the assertion in test/background-jobs.test.ts to avoid referencing
"person@example.com".
lib/background-jobs.ts (2)

49-50: ⚠️ Potential issue | 🟠 Major

add jitter to retry delay to avoid synchronized retries under 429/ebusy.

Line 49 in lib/background-jobs.ts:49 is deterministic, so concurrent workers can re-hit at the same time and amplify contention.

proposed fix
 function getDelayMs(attempt: number, baseDelayMs: number, maxDelayMs: number): number {
-	return Math.min(maxDelayMs, baseDelayMs * 2 ** Math.max(0, attempt - 1));
+	const capped = Math.min(maxDelayMs, baseDelayMs * 2 ** Math.max(0, attempt - 1));
+	const jitter = Math.floor(capped * 0.2 * (Math.random() * 2 - 1));
+	return Math.max(1, capped + jitter);
 }

As per coding guidelines, lib/**: "verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/background-jobs.ts` around lines 49 - 50, The retry delay function
getDelayMs(attempt, baseDelayMs, maxDelayMs) is deterministic and needs
randomized jitter to prevent synchronized retries; modify getDelayMs to compute
the exponential backoff then apply a bounded random jitter (e.g., ±X% or add a
random 0..jitterMs) while capping at maxDelayMs and never returning negative
values, keeping parameters baseDelayMs, maxDelayMs and attempt semantics intact;
update or add vitest cases to assert jitter stays within the expected bounds and
add tests/notes for EBUSY/429 retry behavior for the affected queues.

39-46: ⚠️ Potential issue | 🟠 Major

redaction regex still leaks base64-like token suffixes.

Line 43 and Line 46 in lib/background-jobs.ts:43 and lib/background-jobs.ts:46 stop matching at /, +, and =, so token tails can leak.

proposed fix
 function sanitizeErrorMessage(message: string): string {
 	return message
 		.replace(/[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}/gi, "***REDACTED***")
 		.replace(
-			/\b(?:access|refresh|id)?_?token(?:=|:)?\s*([A-Z0-9._-]+)/gi,
+			/\b(?:access|refresh|id)?_?token(?:=|:)?\s*([^\s,;]+)/gi,
 			"token=***REDACTED***",
 		)
-		.replace(/\b(Bearer)\s+[A-Z0-9._-]+\b/gi, "$1 ***REDACTED***");
+		.replace(/\b(Bearer)\s+[^\s,;]+\b/gi, "$1 ***REDACTED***");
 }

As per coding guidelines, lib/**: "check for logging that leaks tokens or emails."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/background-jobs.ts` around lines 39 - 46, sanitizeErrorMessage is leaking
base64-like token suffixes because the character classes used in the two
token-redaction replace calls (the capture for access/refresh/id tokens and the
Bearer token pattern) omit "+" "/" and "="; update both regexes in
sanitizeErrorMessage (the pattern used in the replace that matches
\b(?:access|refresh|id)?_?token... and the one matching \b(Bearer)\s+...) to
include plus, slash and equals (e.g. expand the character class from [A-Z0-9._-]
to include +/=: [A-Z0-9._+\-/=]) while keeping the case-insensitive flag and
retaining the replacement strings, so tokens containing base64 characters are
fully redacted.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lib/codex-manager.ts`:
- Around line 262-270: The current redactFreeformSecretText function uses
character classes [A-Z0-9._-] in the token and Bearer regexes which omit "+",
"/", and "=" and can allow parts of JWT/URL-safe tokens to leak; update the two
regexes inside redactFreeformSecretText (the
/\b(?:access|refresh|id)?_?token(?:=|:)?\s*([A-Z0-9._-]+)/gi and the
/\b(Bearer)\s+[A-Z0-9._-]+\b/gi patterns) to include +, /, and = characters
(e.g. expand the class to [A-Z0-9._+\-/=]) while keeping the i flag so case is
handled, ensuring all token-like values are fully replaced with the redaction
string.

In `@lib/file-lock.ts`:
- Around line 83-85: Stale-lock eviction currently only checks age but the lock
file written by handle.writeFile contains a pid and acquiredAt; update the
eviction logic (the stale-check branches that read the lock JSON) to also verify
pid liveness before breaking the lock: parse the lock JSON (pid, acquiredAt), if
age > staleAfterMs then attempt a cross-platform liveness probe (e.g. try
process.kill(pid, 0) in a try/catch and treat ESRCH as dead) and only evict if
the PID is not alive; additionally ensure the writer that uses handle.writeFile
refreshes acquiredAt periodically while holding the lock so long-running
processes aren't considered stale.
- Around line 57-65: The cleanupIncompleteLockFile currently discards the
original write/close error when fs.unlink fails with a non-ENOENT, masking the
real acquisition failure; change cleanupIncompleteLockFile(path: string) to
accept an optional originalErr?: Error and in the catch block, if code !==
"ENOENT" rethrow originalErr (if provided) else rethrow the unlink error; then
update all callers (the call sites around the previous lines that invoke
cleanupIncompleteLockFile) to pass the original write/close/acquisition error
into cleanupIncompleteLockFile so the original failure is preserved when
unlinking fails.

In `@scripts/license-policy-check.js`:
- Around line 7-9: The denylist bypass happens because normalizeSpdxToken
currently only strips "-ONLY" and "-OR-LATER" but not a trailing "+", so tokens
like "GPL-2.0+" don't normalize to the denylist form; update normalizeSpdxToken
to also remove a trailing "+" and ensure every place you parse/split SPDX
expressions (the token-splitting logic referenced around the token extraction
code) calls normalizeSpdxToken for both the denylist entries and the extracted
tokens before comparison; then add a regression test that asserts denylist
entries like "GPL-2.0+" and "LGPL-2.1+" are correctly matched and blocked by the
policy check.

In `@test/background-jobs.test.ts`:
- Around line 126-129: The test currently overrides the retryable predicate (the
retryable property in the test setup) which bypasses the default 429 retry logic
in lib/background-jobs.ts; remove that override so the test exercises the
module's default logic, or replace the inline predicate with the real predicate
exported by lib/background-jobs.ts (import the retry predicate or isRetryable
function from lib/background-jobs.ts and use it in place of the inline
retryable) so the test validates the default 429 retry path instead of
short-circuiting it.

In `@test/codex-manager-cli.test.ts`:
- Around line 2233-2253: Extend the existing redaction test for
runCodexMultiAuthCli (used in test/codex-manager-cli.test.ts) to include a
regression case where rotateStoredSecretEncryptionMock rejects with an Error
containing bearer/refresh tokens that include '+' and '/' characters (e.g.,
"Bearer sk+test/123" and "refresh_token=rt+456/789"), then assert the non-JSON
console error output still contains the "***REDACTED***" placeholder and does
not contain the raw email or those token substrings; you can either add a new
it(...) near the existing "redacts rotate-secrets failure details in non-json
output" test or extend that test to set a second mockRejectedValue case and
repeat the same expectations against console.error captured via errorSpy,
referencing rotateStoredSecretEncryptionMock and runCodexMultiAuthCli to locate
the code.

In `@test/file-lock.test.ts`:
- Around line 168-198: The test currently injects only an EBUSY error via the
fs.unlink mock (see unlinkSpy.mockImplementation, injectedBusy, and the
lock.release() calls); add coverage for Windows-style transient unlink
contention by making the first injected error use code "EPERM" on Windows
(process.platform === "win32") or keep "EBUSY" otherwise, so the mock throws an
error with code "EPERM" when path === lockPath for the first attempt on Windows;
ensure the rest of the test flow (second release resolves, fs.stat asserts
ENOENT, and unlinkSpy.mockRestore()) remains unchanged and references to
acquireFileLock and lock.release stay intact.
- Around line 101-119: The test uses a fixed sleep (setTimeout 2ms) to force
overlap which is flaky; replace that with a deterministic rendezvous barrier so
workers synchronize precisely before writing. Modify the worker script (the code
that imports acquireFileLock and uses sharedFilePath, workerId, iterations) to:
after reading existing, signal readiness by appending a short token to a
dedicated rendezvous file (e.g., `${lockPath}.barrier`), then loop reading that
barrier file until its token count equals the number of workers (pass the total
worker count into the worker argv from the test harness), and only then proceed
to write the shared file and continue iterations; keep acquireFileLock usage the
same. This removes timing coupling and makes the contention deterministic
without any setTimeout-based sleeps.

---

Outside diff comments:
In `@lib/storage.ts`:
- Around line 113-139: The global accountFileMutex serializes all account file
operations and causes head-of-line blocking; change withAccountFileMutex and
accountFileMutex to use a per-path mutex map (keyed by the file path used in
withStorageSerializedFileLock) so independent paths don't block each other:
replace the single accountFileMutex with a Map<string, Promise<void>> (or
similar) and modify withAccountFileMutex to accept a path (or create a new
helper like withAccountFileMutexForPath(path, fn)) that grabs/releases the
Promise for that path, chains on previousPromise.then(fn).finally(() => release
and cleanup the map entry if no waiter), and update
withStorageSerializedFileLock to pass the path into the per-path account-file
mutex while still sequencing with withAccountFileLock and withStorageLock; keep
existing release semantics and ensure map entry is removed when resolved to
avoid leaks.

---

Duplicate comments:
In `@index.ts`:
- Around line 1687-1693: The fallback for safeAuditResource currently returns
the raw url when new URL(url) throws, which can leak query/hash; update the
catch branch in the safeAuditResource IIFE to return a sanitized version (strip
anything after the first '?' or '#' — e.g. use url.split(/[?#]/)[0]) instead of
the raw url, and apply the same change to the other identical occurrence (the
second safeAuditResource implementation) so both fallbacks consistently remove
query and fragment before persisting.

In `@lib/background-jobs.ts`:
- Around line 49-50: The retry delay function getDelayMs(attempt, baseDelayMs,
maxDelayMs) is deterministic and needs randomized jitter to prevent synchronized
retries; modify getDelayMs to compute the exponential backoff then apply a
bounded random jitter (e.g., ±X% or add a random 0..jitterMs) while capping at
maxDelayMs and never returning negative values, keeping parameters baseDelayMs,
maxDelayMs and attempt semantics intact; update or add vitest cases to assert
jitter stays within the expected bounds and add tests/notes for EBUSY/429 retry
behavior for the affected queues.
- Around line 39-46: sanitizeErrorMessage is leaking base64-like token suffixes
because the character classes used in the two token-redaction replace calls (the
capture for access/refresh/id tokens and the Bearer token pattern) omit "+" "/"
and "="; update both regexes in sanitizeErrorMessage (the pattern used in the
replace that matches \b(?:access|refresh|id)?_?token... and the one matching
\b(Bearer)\s+...) to include plus, slash and equals (e.g. expand the character
class from [A-Z0-9._-] to include +/=: [A-Z0-9._+\-/=]) while keeping the
case-insensitive flag and retaining the replacement strings, so tokens
containing base64 characters are fully redacted.

In `@test/background-jobs.test.ts`:
- Around line 181-195: Add a regression test to cover token values containing
"/", "+", and "=" to exercise the current redaction regex gap referenced in
lib/background-jobs.ts (lines around the redaction logic) — update the test in
background-jobs.test.ts that asserts DLQ redaction (the block that currently
uses "sk_test_123" and "rt_456") to also enqueue/fail a job using secrets like
"sk/te+st=123" and "rt/ra/sh+=token=" and then read the DLQ entry to assert
entry.error contains "***REDACTED***" and does not contain the raw token
substrings; this ensures the redaction logic in the background job handler (the
function that writes DLQ entries) correctly sanitizes tokens with "/", "+", and
"=".
- Line 186: The test currently asserts a specific email string in the thrown
error (the .rejects.toThrow("person@example.com") assertion), which hard-codes
PII; remove the exact-email assertion and instead assert the rejection without
PII (e.g., use .rejects.toThrow() or assert the error type/class) and if you
need to check content assert it is generic/non-PII (for example, assert the
error.message does not contain 'person@example.com' or check a validation error
class) so update the assertion in test/background-jobs.test.ts to avoid
referencing "person@example.com".

In `@test/data-retention.test.ts`:
- Around line 101-197: Tests are missing coverage for Windows-like transient
error handling (EPERM/EACCES/EAGAIN) and the single-file retry path in
pruneSingleFile; add deterministic vitest cases that import
enforceDataRetention/pruneSingleFile from lib/data-retention.js, create a stale
file, then spyOn fs.unlink (for pruneSingleFile) and fs.stat/fs.rmdir (for
enforceDataRetention) to first throw a NodeJS.ErrnoException with code set to
"EPERM"/"EACCES"/"EAGAIN" (or simulate one transient unlink error) and then
succeed on retry, assert removedLogs increments and that the file/directory is
eventually gone, and ensure all spies are restored in finally blocks to keep
tests deterministic.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4e936d85-e0e6-4a85-a341-873db0f57108

📥 Commits

Reviewing files that changed from the base of the PR and between 0c48823 and 82e5046.

📒 Files selected for processing (12)
  • docs/runbooks/operations.md
  • index.ts
  • lib/background-jobs.ts
  • lib/codex-manager.ts
  • lib/data-retention.ts
  • lib/file-lock.ts
  • lib/storage.ts
  • scripts/license-policy-check.js
  • test/background-jobs.test.ts
  • test/codex-manager-cli.test.ts
  • test/data-retention.test.ts
  • test/file-lock.test.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (3)
test/**

⚙️ CodeRabbit configuration file

tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.

Files:

  • test/background-jobs.test.ts
  • test/data-retention.test.ts
  • test/file-lock.test.ts
  • test/codex-manager-cli.test.ts
docs/**

⚙️ CodeRabbit configuration file

keep README, SECURITY, and docs consistent with actual CLI flags and workflows. whenever behavior changes, require updated upgrade notes and mention new npm scripts.

Files:

  • docs/runbooks/operations.md
lib/**

⚙️ CodeRabbit configuration file

focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.

Files:

  • lib/data-retention.ts
  • lib/storage.ts
  • lib/background-jobs.ts
  • lib/codex-manager.ts
  • lib/file-lock.ts
🧬 Code graph analysis (8)
test/background-jobs.test.ts (2)
lib/logger.ts (1)
  • error (389-393)
lib/background-jobs.ts (2)
  • runBackgroundJobWithRetry (85-131)
  • getBackgroundJobDlqPath (81-83)
lib/data-retention.ts (1)
lib/utils.ts (1)
  • sleep (65-67)
test/data-retention.test.ts (1)
lib/data-retention.ts (2)
  • RetentionPolicy (6-12)
  • enforceDataRetention (119-146)
lib/storage.ts (3)
lib/file-lock.ts (1)
  • acquireFileLock (68-137)
lib/storage/migrations.ts (1)
  • AccountMetadataV3 (40-57)
lib/utils.ts (1)
  • isRecord (11-13)
test/file-lock.test.ts (1)
lib/file-lock.ts (1)
  • acquireFileLock (68-137)
index.ts (2)
lib/audit.ts (1)
  • auditLog (123-153)
lib/logger.ts (1)
  • logWarn (341-346)
lib/codex-manager.ts (1)
lib/data-redaction.ts (1)
  • redactForExternalOutput (31-50)
test/codex-manager-cli.test.ts (2)
lib/codex-manager.ts (1)
  • runCodexMultiAuthCli (4571-4652)
scripts/codex.js (3)
  • runCodexMultiAuthCli (501-501)
  • exitCode (505-505)
  • exitCode (532-532)
🪛 LanguageTool
docs/runbooks/operations.md

[uncategorized] ~31-~31: The official name of this software platform is spelled with a capital “H”.
Context: ...ce (for example weekly-YYYYMMDD or CI ${{ github.run_id }}) - remove previous key af...

(GITHUB)

🔇 Additional comments (6)
scripts/license-policy-check.js (1)

23-50: good hardening on license-shape normalization and extraction.

lib/scripts/license-policy-check.js:23 through lib/scripts/license-policy-check.js:50 correctly handles license and licenses shapes and reduces false unknowns. no new windows edge case or concurrency risk is introduced in this segment.

Also applies to: 66-67

lib/codex-manager.ts (1)

1556-1557: good hardening on cli parsing and rotate-secrets error sanitization.

lib/codex-manager.ts:1556-1557, lib/codex-manager.ts:2003-2019, lib/codex-manager.ts:2223-2233, and lib/codex-manager.ts:4126-4129 close option-smuggling and malformed numeric/cursor paths cleanly. this is covered by test/codex-manager-cli.test.ts:2103-2166 and test/codex-manager-cli.test.ts:2215-2231.

Also applies to: 2003-2007, 2017-2019, 2223-2233, 4126-4129

test/codex-manager-cli.test.ts (1)

2103-2147: nice regression coverage for malformed cursor, page-size parsing, and option-smuggling.

test/codex-manager-cli.test.ts:2103-2166 and test/codex-manager-cli.test.ts:2215-2231 are precise, deterministic checks for the new parser hardening.

Also applies to: 2149-2166, 2215-2231

lib/storage.ts (2)

350-359: lock-release fallback behavior is now safer and observable.

lib/storage.ts:350-359 and lib/storage.ts:368-378 now keep best-effort cleanup while avoiding unconditional fallback deletes, and they add debug visibility for fallback failures.

Also applies to: 368-378


196-204: edecrypt fast-fail propagation is solid.

lib/storage.ts:196-204, lib/storage.ts:936-946, and lib/storage.ts:1126-1128 correctly preserve decrypt-failure semantics instead of silently normalizing corrupted encrypted payloads.

Also applies to: 936-946, 1126-1128

test/file-lock.test.ts (1)

224-233: good regression for orphan-lock cleanup on metadata write failure.

Lines 224-233 in test/file-lock.test.ts:224 verify ENOENT cleanup after a forced write failure, which directly exercises the cleanup branch in lib/file-lock.ts:84-96.

Address remaining PR34 review threads across audit URL sanitization, background-job retry behavior, data-retention retry scoping, unified-settings lock release handling, file-lock stale owner protection, secret redaction hardening, and SPDX plus-form denylist matching.

Add regression coverage for concurrent lock contention, EPERM/EBUSY release retries, retry-limit/jitter counters, audit fallback sanitization, retention transient IO branches, unified-settings release failures, and license-policy plus-form matches.

Co-authored-by: Codex <noreply@openai.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
lib/background-jobs.ts (1)

105-129: ⚠️ Potential issue | 🟠 Major

stop rethrowing raw task errors after sanitization.

in lib/background-jobs.ts line 105-129, you sanitize the message for dlq/logs, but line 129 still rethrows lastError verbatim when it is an Error. this leaks emails/tokens back to callers and upstream logs. downstream impact: test/background-jobs.test.ts line 217 currently locks that leak in.

please also add a vitest regression that asserts thrown error text is sanitized for token/email patterns.

proposed fix
-	throw (lastError instanceof Error ? lastError : new Error(errorMessage));
+	const sanitizedError = new Error(errorMessage);
+	if (lastError instanceof Error) {
+		sanitizedError.name = lastError.name;
+	}
+	throw sanitizedError;

As per coding guidelines, lib/**: "check for logging that leaks tokens or emails."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/background-jobs.ts` around lines 105 - 129, The code currently rethrows
lastError verbatim (throw (lastError instanceof Error ? lastError : new
Error(errorMessage))) which can leak sensitive data; change the rethrow to throw
a new Error containing the already-sanitized errorMessage (or otherwise ensure
lastError.message is replaced with
sanitizeErrorMessage(toErrorMessage(lastError))) so callers only receive
sanitized text; update the DeadLetter flow that uses
sanitizeErrorMessage/toErrorMessage and ensure appendDeadLetter/logWarn remain
unchanged; add a vitest regression in test/background-jobs.test.ts that triggers
the failing path and asserts the thrown error.message does not contain raw
token/email patterns but matches the sanitized output.
♻️ Duplicate comments (1)
lib/unified-settings.ts (1)

36-68: ⚠️ Potential issue | 🟠 Major

release helper still strands lockfiles on windows transient errors.

in lib/unified-settings.ts line 42-47 and line 59-65, ebusy/eperm from release() are swallowed after logging, but no fallback cleanup runs. this can strand settings.json.lock and block subsequent writers until stale expiry, which is a concurrency/reliability risk on windows.

please add a vitest regression that forces release() to throw ebusy/eperm and verifies the next save does not wait for stale lock expiry.

proposed fix
 function releaseUnifiedSettingsLockSync(release: () => void): void {
 	try {
 		release();
 	} catch (error) {
 		const code = (error as NodeJS.ErrnoException | undefined)?.code;
 		if (code === "ENOENT") return;
 		if (code === "EBUSY" || code === "EPERM") {
+			try {
+				unlinkSync(UNIFIED_SETTINGS_LOCK_PATH);
+			} catch (cleanupError) {
+				const cleanupCode = (cleanupError as NodeJS.ErrnoException | undefined)?.code;
+				if (cleanupCode !== "ENOENT") {
+					logWarn("Unified settings lock fallback cleanup failed", {
+						lockPath: UNIFIED_SETTINGS_LOCK_PATH,
+						code: cleanupCode ?? "UNKNOWN",
+					});
+				}
+			}
 			logWarn("Unified settings lock release failed after write", {
 				lockPath: UNIFIED_SETTINGS_LOCK_PATH,
 				code,
 			});
 			return;
 		}
 		throw error;
 	}
 }

 async function releaseUnifiedSettingsLockAsync(release: () => Promise<void>): Promise<void> {
 	try {
 		await release();
 	} catch (error) {
 		const code = (error as NodeJS.ErrnoException | undefined)?.code;
 		if (code === "ENOENT") return;
 		if (code === "EBUSY" || code === "EPERM") {
+			try {
+				await fs.rm(UNIFIED_SETTINGS_LOCK_PATH, { force: true });
+			} catch (cleanupError) {
+				const cleanupCode = (cleanupError as NodeJS.ErrnoException | undefined)?.code;
+				if (cleanupCode !== "ENOENT") {
+					logWarn("Unified settings lock fallback cleanup failed", {
+						lockPath: UNIFIED_SETTINGS_LOCK_PATH,
+						code: cleanupCode ?? "UNKNOWN",
+					});
+				}
+			}
 			logWarn("Unified settings lock release failed after write", {
 				lockPath: UNIFIED_SETTINGS_LOCK_PATH,
 				code,
 			});
 			return;
 		}
 		throw error;
 	}
 }

As per coding guidelines, lib/**: "focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/unified-settings.ts` around lines 36 - 68, The release helpers
releaseUnifiedSettingsLockSync and releaseUnifiedSettingsLockAsync currently
swallow EBUSY/EPERM after logging and can leave a stranded lock; modify both so
that when (error as NodeJS.ErrnoException)?.code is "EBUSY" or "EPERM" you log
as before then perform a fallback cleanup (attempt to forcibly remove the lock
at UNIFIED_SETTINGS_LOCK_PATH: use fs.unlinkSync in
releaseUnifiedSettingsLockSync and fs.promises.unlink in
releaseUnifiedSettingsLockAsync, handling ENOENT silently), and only return
after the fallback cleanup completes or its error is handled; also add a vitest
regression that mocks release() to throw EBUSY and EPERM and asserts a
subsequent save does not block waiting for stale lock expiry.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@index.ts`:
- Around line 1687-1696: Duplicate URL sanitization logic used to build
safeAuditResource should be consolidated into a single helper; create a function
named toSafeAuditResource(url: unknown): string that implements the try-new
URL(origin+pathname) fallback to sanitized rawUrl.split(/[?#]/,1)[0] returning
"unknown" when empty, then replace the inline IIFEs that currently compute
safeAuditResource (the blocks around safeAuditResource at the two call sites) to
simply call toSafeAuditResource(url); ensure both occurrences (previously at the
two duplicated blocks) use this helper so behavior stays identical.

In `@lib/data-retention.ts`:
- Around line 93-95: The bare catch in the try/catch in lib/data-retention.ts
should stop swallowing all errors; change it to catch the error (e.g., catch
(err)) and explicitly handle expected transient races (continue only when
err.code === 'ENOENT' or other known benign codes) while logging and surfacing
unexpected failures—use the module's logger (e.g., processLogger or existing
logger) to record err and context and rethrow or return a failed status for
non-retryable errors so they are visible in production.

In `@test/data-retention.test.ts`:
- Around line 100-278: Add a test that verifies retry exhaustion by forcing
persistent EBUSY when pruning directory entries: mock fs.stat (or fs.rmdir if
that path is used) so calls for the stale file always throw a
NodeJS.ErrnoException with code "EBUSY", call enforceDataRetention and assert it
rejects with that error code (or throws) after the max retries exercised in
lib/data-retention.ts (see retry logic around lines 33-34), and ensure you
restore the spy in a finally block; reference enforceDataRetention and the
pruneDirectoryByAge/prune logic to locate where retries occur.
- Line 7: The test helper's RETRYABLE_REMOVE_CODES set is missing EACCES and
EAGAIN and should match the retry logic in lib/data-retention.ts; update the
RETRYABLE_REMOVE_CODES constant in test/data-retention.test.ts (the Set assigned
to RETRYABLE_REMOVE_CODES) to include "EACCES" and "EAGAIN" so the test cleanup
retries use the same error codes as the data-retention retry logic.

In `@test/license-policy-check.test.ts`:
- Around line 39-42: The test matrix in test/license-policy-check.test.ts only
covers string license values; add regression cases that exercise non-string
shapes parsed by scripts/license-policy-check.js (the { license: { type: "..." }
} shape and the licenses[] array shape) so the test for denylisted SPDX
plus-form also includes entries like { denyList: "GPL-2.0+", license: { type:
"GPL-2.0+" } } and { denyList: "LGPL-2.1+", license: [{ type: "MIT OR LGPL-2.1+"
}] } (and mirror similar additions for the other table at the 54-55 region) to
ensure the parsing branches in scripts/license-policy-check.js:27-54 are covered
and prevent policy-bypass regressions.
- Around line 9-26: Add a deterministic regression test for removeWithRetry that
forces the Windows retry branch by stubbing fs.rm to throw a retriable error
code (e.g., "EBUSY" / "EPERM" / "ENOTEMPTY") for the first few invocations then
succeed, using vitest (vi.spyOn or vi.stub) and vi.useFakeTimers to control
backoff timing; call removeWithRetry and assert it eventually returns and that
fs.rm was called multiple times (and restore the spy afterward). Reference the
function removeWithRetry and the constant RETRYABLE_REMOVE_CODES in your test so
it explicitly targets the retry logic, ensure the test does not rely on real
filesystem state or real timers, and make the assertions deterministic (e.g.,
advance timers to trigger retries and verify call count and final success).

In `@test/unified-settings.test.ts`:
- Around line 6-25: Extract the duplicated removeWithRetry helper (and its
constant RETRYABLE_REMOVE_CODES) into a shared test utility module (e.g.,
test/helpers/fs-utils.ts), export removeWithRetry with the same signature async
function removeWithRetry(targetPath: string, options: { recursive?: boolean;
force?: boolean }): Promise<void>, and update callers (e.g., in
unified-settings.test.ts and file-lock.test.ts) to import it instead of defining
it inline; preserve behavior (retry loop, ENOENT handling, retryable codes,
backoff timing) and ensure proper NodeJS.ErrnoException typing is retained in
the shared module.

---

Outside diff comments:
In `@lib/background-jobs.ts`:
- Around line 105-129: The code currently rethrows lastError verbatim (throw
(lastError instanceof Error ? lastError : new Error(errorMessage))) which can
leak sensitive data; change the rethrow to throw a new Error containing the
already-sanitized errorMessage (or otherwise ensure lastError.message is
replaced with sanitizeErrorMessage(toErrorMessage(lastError))) so callers only
receive sanitized text; update the DeadLetter flow that uses
sanitizeErrorMessage/toErrorMessage and ensure appendDeadLetter/logWarn remain
unchanged; add a vitest regression in test/background-jobs.test.ts that triggers
the failing path and asserts the thrown error.message does not contain raw
token/email patterns but matches the sanitized output.

---

Duplicate comments:
In `@lib/unified-settings.ts`:
- Around line 36-68: The release helpers releaseUnifiedSettingsLockSync and
releaseUnifiedSettingsLockAsync currently swallow EBUSY/EPERM after logging and
can leave a stranded lock; modify both so that when (error as
NodeJS.ErrnoException)?.code is "EBUSY" or "EPERM" you log as before then
perform a fallback cleanup (attempt to forcibly remove the lock at
UNIFIED_SETTINGS_LOCK_PATH: use fs.unlinkSync in releaseUnifiedSettingsLockSync
and fs.promises.unlink in releaseUnifiedSettingsLockAsync, handling ENOENT
silently), and only return after the fallback cleanup completes or its error is
handled; also add a vitest regression that mocks release() to throw EBUSY and
EPERM and asserts a subsequent save does not block waiting for stale lock
expiry.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 83624405-4f0f-42b6-9952-8c14401abda2

📥 Commits

Reviewing files that changed from the base of the PR and between 82e5046 and 1749d31.

📒 Files selected for processing (14)
  • index.ts
  • lib/background-jobs.ts
  • lib/codex-manager.ts
  • lib/data-retention.ts
  • lib/file-lock.ts
  • lib/unified-settings.ts
  • scripts/license-policy-check.js
  • test/background-jobs.test.ts
  • test/codex-manager-cli.test.ts
  • test/data-retention.test.ts
  • test/file-lock.test.ts
  • test/index-retry.test.ts
  • test/license-policy-check.test.ts
  • test/unified-settings.test.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (2)
test/**

⚙️ CodeRabbit configuration file

tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.

Files:

  • test/license-policy-check.test.ts
  • test/unified-settings.test.ts
  • test/data-retention.test.ts
  • test/index-retry.test.ts
  • test/file-lock.test.ts
  • test/background-jobs.test.ts
  • test/codex-manager-cli.test.ts
lib/**

⚙️ CodeRabbit configuration file

focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.

Files:

  • lib/background-jobs.ts
  • lib/codex-manager.ts
  • lib/data-retention.ts
  • lib/unified-settings.ts
  • lib/file-lock.ts
🧬 Code graph analysis (9)
test/license-policy-check.test.ts (1)
scripts/license-policy-check.js (1)
  • denyList (15-18)
lib/background-jobs.ts (1)
lib/logger.ts (2)
  • error (389-393)
  • logWarn (341-346)
test/unified-settings.test.ts (1)
lib/unified-settings.ts (4)
  • getUnifiedSettingsPath (264-266)
  • saveUnifiedPluginConfig (320-331)
  • loadUnifiedPluginConfigSync (277-285)
  • saveUnifiedPluginConfigSync (299-308)
lib/data-retention.ts (1)
lib/utils.ts (1)
  • sleep (65-67)
test/file-lock.test.ts (1)
lib/file-lock.ts (1)
  • acquireFileLock (114-181)
index.ts (2)
lib/audit.ts (1)
  • auditLog (123-153)
lib/logger.ts (1)
  • logWarn (341-346)
test/background-jobs.test.ts (1)
lib/background-jobs.ts (2)
  • runBackgroundJobWithRetry (84-130)
  • getBackgroundJobDlqPath (80-82)
test/codex-manager-cli.test.ts (2)
lib/codex-manager.ts (1)
  • runCodexMultiAuthCli (4571-4652)
scripts/codex.js (3)
  • runCodexMultiAuthCli (501-501)
  • exitCode (505-505)
  • exitCode (532-532)
lib/unified-settings.ts (3)
lib/logger.ts (2)
  • error (389-393)
  • logWarn (341-346)
lib/utils.ts (1)
  • sleep (65-67)
lib/file-lock.ts (2)
  • acquireFileLockSync (222-294)
  • acquireFileLock (114-181)
🔇 Additional comments (40)
lib/data-retention.ts (4)

21-21: enotempty now included in retryable codes — good.

past review flagged this gap; lib/data-retention.ts:21 now includes ENOTEMPTY alongside the other transient codes. this aligns with test/data-retention.test.ts:7 and handles windows concurrent-writer races on fs.rmdir.


28-40: retry logic looks solid.

withRetentionIoRetry at lib/data-retention.ts:28-40 uses exponential backoff (10ms base, doubling) with 5 attempts. the unreachable guard at line 39 is good defensive coding. no concerns here.


67-91: fs operations properly wrapped with retry — lgtm.

lib/data-retention.ts:67-91 wraps all relevant calls (readdir, stat, rmdir, unlink) with withRetentionIoRetry. this handles EBUSY/EPERM/EACCES/EAGAIN/ENOTEMPTY scenarios on windows and under concurrent access. corresponding tests in test/data-retention.test.ts:101-278 exercise these paths.


100-113: single-file pruning retry path is covered.

lib/data-retention.ts:102 and lib/data-retention.ts:106 wrap stat and unlink with retry. the ENOENT fallback at line 110 is correct. test coverage exists at test/data-retention.test.ts:202-237 for transient unlink failures.

test/data-retention.test.ts (4)

101-156: ebusy retry test covers stat and rmdir paths — good.

test/data-retention.test.ts:101-156 injects one-time EBUSY on both fs.stat and fs.rmdir, asserts retry flags, and confirms file/dir removal. spies restored in finally block keeps it deterministic. this addresses the windows edge case for directory entry pruning.


158-200: parameterized test for EPERM/EACCES/EAGAIN addresses past feedback.

test/data-retention.test.ts:158-200 exercises all three codes on the fs.stat path during directory entry pruning. good use of it.each for coverage without duplication.


202-237: single-file unlink retry path now covered.

test/data-retention.test.ts:202-237 injects transient EACCES on fs.unlink for the flagged-accounts file and asserts retry + removal. addresses past review comment about missing single-file retry coverage.


239-278: enotempty race test is correct.

test/data-retention.test.ts:239-278 simulates a concurrent writer recreating content during rmdir. retry handles it and eventually succeeds. good regression case for windows concurrent-writer behavior.

scripts/license-policy-check.js (2)

7-13: spdx normalization and token-set denylist matching look solid.

the updates in scripts/license-policy-check.js:7-13 and scripts/license-policy-check.js:76-79 correctly cover +, -only, and -or-later forms while avoiding substring false positives. no concurrency risk introduced here.

Also applies to: 17-18, 56-62, 76-79


27-54: license source extraction is a good hardening step.

extractRawLicense in scripts/license-policy-check.js:27-54 now handles multiple lockfile shapes (license, licenses string, and licenses[]) and keeps unknown handling behavior clear.

Also applies to: 70-71

test/license-policy-check.test.ts (1)

70-74: assertion contract is clean and deterministic.

the checks in test/license-policy-check.test.ts:70-74 validate both process failure and emitted violation text, which is exactly what this policy gate should lock down.

lib/background-jobs.ts (1)

53-56: good hardening on default retry and attempts accounting.

lib/background-jobs.ts line 53-56 and line 91-127 correctly wire default 429 retryability and persist actual attempts made into dlq/log metadata.

Also applies to: 91-127

test/codex-manager-cli.test.ts (1)

2103-2166: good deterministic coverage for input guards and redaction edge cases.

test/codex-manager-cli.test.ts line 2103-2166 and line 2215-2253 add clean vitest regressions for malformed cursor/page-size inputs, option-smuggling, and +// token redaction paths.

Also applies to: 2215-2253

lib/unified-settings.ts (1)

300-307: good lock scope expansion for full read-modify-write sections.

lib/unified-settings.ts line 300-307, line 322-329, and line 370-377 now hold the file lock across the whole read-modify-write path, which closes the lost-update race.

Also applies to: 322-329, 370-377

lib/file-lock.ts (8)

26-35: looks good: pid liveness check correctly treats EPERM as "alive".

the isPidAlive helper at lib/file-lock.ts:26-35 properly returns true when process.kill(pid, 0) throws EPERM, which means the process exists but the current user lacks permission to signal it. this prevents incorrect eviction of locks held by other users on shared filesystems.


68-75: stale-lock pid liveness check now prevents evicting live owners.

the pid liveness check at lib/file-lock.ts:68-75 addresses the prior critical finding about long-running writers losing their locks. if the lock file age exceeds staleAfterMs but the owning pid is still alive, eviction is skipped. this is the correct behavior for windows and cross-platform safety.


100-112: cleanup helper correctly preserves original acquisition error.

cleanupIncompleteLockFile at lib/file-lock.ts:100-112 now passes originalError through to throwOriginalOrCleanupError when cleanup fails with non-ENOENT, addressing the prior major finding about error masking. the error precedence logic is sound.


141-145: write/close failure cleanup is now best-effort and rethrows original error.

at lib/file-lock.ts:141-145, if writeError or closeError occur, the code calls cleanupIncompleteLockFile(path, originalFailure) then rethrows via throwOriginalOrCleanupError. this addresses the prior critical finding about orphan lock files after write/close failures.


152-161: release marks released=true after successful unlink, treats ENOENT as success.

at lib/file-lock.ts:152-161, setting released = true after unlink succeeds (or on ENOENT) allows safe idempotent calls. the placement post-unlink is correct per the pr objectives about retry-on-transient-EBUSY behavior.


189-196: sync pid liveness check mirrors async behavior.

removeIfStaleSync at lib/file-lock.ts:189-196 uses the same parseLockPid and isPidAlive logic as the async version, maintaining consistency.


208-220: sync cleanup helper mirrors async error preservation.

cleanupIncompleteLockFileSync at lib/file-lock.ts:208-220 correctly mirrors the async version's error preservation logic.


250-254: sync write/close failure cleanup follows same pattern as async.

at lib/file-lock.ts:250-254, the sync path mirrors the async cleanup-and-rethrow pattern.

test/index-retry.test.ts (1)

217-261: solid regression test for audit url sanitization fallback path.

at test/index-retry.test.ts:217-261, this test exercises the audit URL parsing fallback by passing a relative URL with query params and fragment. the assertions at lines 252-255 verify no sensitive data leaks through. the try/finally cleanup with vi.doUnmock and vi.resetModules is correct for module-level mocking isolation.

test/unified-settings.test.ts (3)

249-302: lock-under-read test correctly validates read-modify-write scope.

at test/unified-settings.test.ts:249-302, the test verifies that saveUnifiedPluginConfig holds the lock while reading existing settings via the lockState.held flag and fs.readFile spy. this aligns with the pr objective about expanding lock scope to cover full read-modify-write.


304-333: EPERM release regression for async writes looks good.

at test/unified-settings.test.ts:304-333, the test mocks lock.release() to throw EPERM and verifies that saveUnifiedPluginConfig still succeeds. this covers the windows-style transient unlink contention path per pr objectives.


335-364: EBUSY release regression for sync writes looks good.

at test/unified-settings.test.ts:335-364, the test mocks sync lock.release() to throw EBUSY and verifies that saveUnifiedPluginConfigSync succeeds. this complements the async EPERM test.

lib/codex-manager.ts (7)

140-144: option-smuggling guard is a good hardening addition.

looksLikeOptionToken at lib/codex-manager.ts:140-144 prevents values like --foo from being accepted as option arguments. this mitigates option injection via malicious input.


262-270: token redaction now covers JWT/URL-safe characters.

at lib/codex-manager.ts:262-270, the character class [A-Z0-9._~+/=-] in the token and bearer regexes now includes +, /, and = per the prior review feedback. this prevents partial redaction of base64-encoded tokens.


272-285: sanitizeCliErrorMessage provides defense-in-depth for CLI error output.

at lib/codex-manager.ts:272-285, the function chains redactForExternalOutput and redactFreeformSecretText to sanitize error messages before logging. this addresses the pr objective about safe CLI error output.


1553-1562: cursor validation tightened to reject non-numeric payloads.

at lib/codex-manager.ts:1556, the /^\d+$/ test on the decoded base64 value ensures only strictly numeric cursors are accepted. this prevents potential injection via malformed cursor values.


1998-2013: --page-size input validation now rejects non-numeric values.

at lib/codex-manager.ts:2003-2006, the /^\d+$/ regex validates that --page-size values contain only digits before parsing. this prevents parseInt coercion issues like "10abc"10.


2221-2237: idempotency-key validation prevents option smuggling.

at lib/codex-manager.ts:2223 and 2232, looksLikeOptionToken(value) rejects idempotency-key values that look like CLI options. combined with the empty-value check, this hardens against malformed input.


4125-4127: rotate-secrets error path uses sanitizeCliErrorMessage.

at lib/codex-manager.ts:4126, error messages are sanitized before logging and auditing. this prevents token/email leakage in error output per the pr objectives.

test/file-lock.test.ts (7)

88-99: runtime transpilation enables proper module isolation for workers.

at test/file-lock.test.ts:88-99, transpiling lib/file-lock.ts at runtime and passing the module URL to workers ensures workers use the actual lock implementation rather than a stub. this is a solid approach for cross-process lock testing.


101-101: increased iterations improves contention coverage.

at test/file-lock.test.ts:101, iterationsPerWorker = 50 (up from prior 10) increases the probability of detecting race conditions. combined with 6 workers, this yields 300 total lock acquisitions.


106-121: worker script now uses actual acquireFileLock from transpiled module.

at test/file-lock.test.ts:106-121, the worker script imports acquireFileLock dynamically from the passed module URL and uses it with proper options. the removal of the internal retry loop and use of the real lock implementation is correct.


167-200: EBUSY and EPERM release retry regression tests address prior feedback.

at test/file-lock.test.ts:167-200, the parameterized it.each(["EBUSY", "EPERM"]) test covers both transient unlink failure codes as requested in the prior review. the test verifies that after the first release throws, a retry succeeds and the lock file is cleaned up.


202-217: pid liveness prevents stale eviction of live lock owners.

at test/file-lock.test.ts:202-217, this test creates a lock file with process.pid (guaranteed alive) and a stale timestamp, then verifies acquisition fails with EEXIST rather than evicting the live owner's lock. this validates the pid liveness check in removeIfStale.


219-255: cleanup on metadata write failure regression test looks good.

at test/file-lock.test.ts:219-255, the test mocks fs.open to return a handle whose writeFile throws EIO, then verifies the lock file is cleaned up (ENOENT). this validates the cleanup path in acquireFileLock when write fails.


257-302: error precedence test verifies original error is preserved.

at test/file-lock.test.ts:257-302, the test mocks both writeFile (throws EIO) and unlink (throws EPERM for cleanup), then verifies the original "metadata write failed" error is thrown rather than the cleanup error. this validates the throwOriginalOrCleanupError logic.

- dedupe safe audit resource sanitization in plugin fetch flow
- surface retention retry exhaustion and cover persistent EBUSY regression
- share test removeWithRetry helper and add license policy parsing regressions

Co-authored-by: Codex <noreply@openai.com>
ndycode added a commit that referenced this pull request Mar 6, 2026
Conflict resolution: applied the stacked hardening follow-up on top of the merged PR #32 runtime and test surfaces while preserving the earlier DX workflow choices.

Co-authored-by: Codex <noreply@openai.com>
@ndycode
Copy link
Owner Author

ndycode commented Mar 6, 2026

Merged via integration PR to safely combine multiple approved changes.

@ndycode ndycode closed this Mar 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant