feat: complete enterprise hardening and storage lock reliability#34
feat: complete enterprise hardening and storage lock reliability#34ndycode wants to merge 8 commits intofeat/enterprise-hardeningfrom
Conversation
- cleanup stale/dead process lock artifacts before acquiring account lock - ensure lock release always attempts fallback cleanup - keep clearAccounts/saveTransactions serialized across file and memory locks Co-authored-by: Codex <noreply@openai.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 Walkthroughwalkthroughintroduces serialized, platform-aware storage locking and explicit decrypt-error propagation. adds StorageError and STORAGE_DECRYPT_ERROR_CODE, a file-queue + in-process mutex ordering for storage mutations, atomic write retries, and retryable IO helpers. key behavior changes live in changes
sequence diagram(s)sequenceDiagram
autonumber
participant caller as "caller"
participant inproc as "in-process mutex\n(rgba(70,130,180,0.5))"
participant filequeue as "file queue / file-lock\n(rgba(34,139,34,0.5))"
participant storagemtx as "storage mutex\n(rgba(218,165,32,0.5))"
participant fs as "filesystem\n(rgba(128,0,128,0.5))"
caller->>inproc: request serialized lock
inproc->>filequeue: enqueue / acquire file lock
filequeue->>fs: create/acquire lock file
filequeue-->>inproc: lock acquired
inproc->>storagemtx: acquire storage mutex
storagemtx-->>caller: perform storage mutation
caller->>storagemtx: release storage mutex
storagemtx-->>inproc: signal release
inproc->>filequeue: release file lock
filequeue->>fs: remove lock file (or leave for fallback)
filequeue-->>inproc: fully released
estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes notes for review
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Comment |
Ensure account-storage mutations keep deterministic ordering while preserving the historical file-lock before in-process mutex acquisition sequence. Co-authored-by: Codex <noreply@openai.com>
There was a problem hiding this comment.
Actionable comments posted: 7
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@lib/storage.ts`:
- Around line 382-386: withAccountFileLock currently calls
cleanupDeadProcessStorageLock(lockPath) before calling
acquireFileLock(lockPath,...), which creates a race where two processes can both
remove a stale lock then race to create it; instead remove the external cleanup
call and rely on acquireFileLock to handle stale-lock removal (or move the
cleanup logic into acquireFileLock). Update withAccountFileLock to stop calling
cleanupDeadProcessStorageLock and ensure acquireFileLock (in lib/file-lock.ts)
invokes removeIfStale/remove-or-cleanup atomically so stale locks are handled
inside acquireFileLock rather than externally.
- Around line 373-379: The catch block around the storage read should not call
releaseStorageLockFallback when the error is an EBUSY from
fs.readFile/JSON.parse; update the catch in the function that reads the lock
(the block that currently inspects (error as NodeJS.ErrnoException).code) to
explicitly handle error.code === "ENOENT" (return) and error.code === "EBUSY"
(return or rethrow as appropriate) and only call
releaseStorageLockFallback(lockPath) for other error types (e.g., parse errors).
Reference the existing catch scope that uses (error as
NodeJS.ErrnoException).code and the releaseStorageLockFallback(lockPath) call to
implement this conditional behavior.
- Around line 123-139: The comment claiming "file lock -> in-process mutex"
contradicts the implementation in withStorageSerializedFileLock (which currently
calls withAccountFileMutex → withAccountFileLock → withStorageLock), so either
update the comment to state the actual acquisition order (in-process mutex ->
file lock -> storage lock) or reorder the calls to match the comment;
specifically, either change the comment near withStorageSerializedFileLock to
reflect the real sequence or change the implementation to call
withAccountFileLock(path, () => withAccountFileMutex(() => withStorageLock(fn)))
so the acquisition becomes file lock -> in-process mutex -> storage lock (and
ensure variable/closure usage still compiles).
- Around line 340-346: The catch block in releaseStorageLockFallback currently
swallows all errors from fs.rm; update it to log a debug-level message including
the lockPath and the caught error so failed cleanup is observable (e.g., use the
existing logger or processLogger if available), while preserving the
"best-effort" behavior by not rethrowing; ensure you reference
releaseStorageLockFallback and the fs.rm call so the log includes both the path
and error details.
- Around line 355-362: When detecting and removing a stale lock (the branch that
computes lockPid and isDeadProcess using process.kill) and the separate
age-based cleanup branch, add observability logs using the existing logger
(e.g., processLogger or storage logger used elsewhere in this module) that
record the action and context: include the lock path/name, lockPid, whether
isDeadProcess was true, the lock age (timestamp or computed age), and a concise
reason ("stale: dead PID" or "stale: age threshold exceeded"). Place one log
right before or immediately after the dead-process cleanup path (where
isDeadProcess is true) and another log in the age-based cleanup path to make
both events visible to operators. Ensure the log messages are structured and
include these fields so they match ops runbook expectations.
- Around line 390-401: The current finally always calls
releaseStorageLockFallback(lockPath) even when lock.release() succeeds; change
the flow so the fallback is only invoked when lock.release() throws or otherwise
fails: try calling await lock.release() and on catch (error) check the error
code like in the existing block, log the warning, then call await
releaseStorageLockFallback(lockPath) from inside that catch (or when a boolean
“failedToRelease” flag is set) so the fallback only runs on failure; update
references to lock.release, releaseStorageLockFallback, lockPath and preserve
the ENOENT handling/logging behavior.
- Around line 348-380: Add explicit debug/info/error logs inside
cleanupDeadProcessStorageLock to record: when the lock file is read and parsed
(include pid and acquiredAt), when process.kill(pid, 0) indicates the process is
alive (skipping cleanup) or throws ESRCH (releasing lock), when EPERM or other
errors occur (and whether lock is stale), when
releaseStorageLockFallback(lockPath) is invoked, and when ENOENT or JSON parse
errors are encountered; use the existing processLogger (or the module logger)
and include lockPath and relevant metadata in each message. Also add vitest
regression tests for cleanupDeadProcessStorageLock covering: (1) process exists
and lock is fresh — assert no releaseStorageLockFallback call, (2) process dead
(mock process.kill to throw ESRCH) — assert releaseStorageLockFallback called,
(3) process exists but process.kill throws EPERM — assert no release unless lock
is stale (simulate acquiredAt older than
ACCOUNT_STORAGE_LOCK_OPTIONS.staleAfterMs), and (4) missing or corrupted lock
file — assert fallback release or ENOENT path; mock fs.readFile, process.kill,
and releaseStorageLockFallback to verify log calls and behavior.
ℹ️ Review info
Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 0d15b434-50c2-4cf6-8218-c272deab5dd3
📒 Files selected for processing (1)
lib/storage.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (1)
lib/**
⚙️ CodeRabbit configuration file
focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.
Files:
lib/storage.ts
🧬 Code graph analysis (1)
lib/storage.ts (1)
lib/file-lock.ts (1)
acquireFileLock(57-107)
🔇 Additional comments (2)
lib/storage.ts (2)
112-112: lgtm - mutex declaration is straightforward.the
accountFileMutexfollows the same pattern asstorageMutexat line 111.
1318-1321: regression tests do exist for the locking pattern—they just weren't cited in the pr.test/storage.test.ts:175 directly exercises
withAccountStorageTransactionwith concurrent promise.all calls and verifies serialization without data loss. test/storage-async.test.ts:295 covers the mutex behavior with rapid sequential saves and concurrent writes. test/storage.test.ts:1407+ tests ebusy/eperm retry logic, covering windows filesystem edge cases. these tests validate the lock-order changes inwithAccountStorageTransaction,saveAccounts, andclearAccountsat lib/storage.ts:1318, 1334, 1346.> Likely an incorrect or invalid review comment.
Drop pre-acquire dead-process lock cleanup and only run fallback lock-file deletion when lock.release() fails. Also align lock-order comment and add debug observability for fallback cleanup failures. Co-authored-by: Codex <noreply@openai.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@lib/storage.ts`:
- Around line 123-138: The new file-lock serialization helpers
withAccountFileMutex and withStorageSerializedFileLock and their use in
saveAccounts lack tests covering lock.release() error paths; add unit tests in
test/storage.test.ts that mock the lock.release() implementation to throw EBUSY
and EPERM and assert that releaseStorageLockFallback is invoked and saveAccounts
still completes (no uncaught rejection), add a test that simulates concurrent
saveAccounts calls to verify ordering under lock contention (use mocked locks
that delay release) and add a Windows-specific test case where lock.release()
throws EPERM to validate the fallback cleanup path; reference the functions
withAccountFileMutex, withStorageSerializedFileLock, saveAccounts,
releaseStorageLockFallback and the lock.release() call when adding the mocks and
assertions.
ℹ️ Review info
Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 3d9bb546-70b4-4fee-b0a7-5413a4e724b5
📒 Files selected for processing (1)
lib/storage.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (1)
lib/**
⚙️ CodeRabbit configuration file
focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.
Files:
lib/storage.ts
🧬 Code graph analysis (1)
lib/storage.ts (1)
lib/file-lock.ts (1)
acquireFileLock(57-107)
🔇 Additional comments (7)
lib/storage.ts (7)
112-112: lgtm: in-process mutex matches existingstorageMutexpattern.the
accountFileMutexatlib/storage.ts:112andwithAccountFileMutexatlib/storage.ts:123-130mirror the existingstorageMutex/withStorageLockimplementation. pattern is correct for serializing async operations within a single process.Also applies to: 123-130
132-138: lock order comment now matches implementation.
lib/storage.ts:133-134correctly states acquisition order: file-queue mutex → file lock → storage mutex. code at lines 135-137 does exactly that:withAccountFileMutex→withAccountFileLock→withStorageLock. past review comment resolved.
339-348: fallback now has debug observability as requested.
lib/storage.ts:343-346logs cleanup failures at debug level with path and error. addresses past review comment about silent swallowing.
350-370: release fallback only fires on failure—double cleanup fixed.
lib/storage.ts:357-367now only callsreleaseStorageLockFallbackwhenlock.release()throws (and it's notENOENT). this addresses the past review about unconditional double cleanup.one edge case worth noting: if
lock.release()throwsEBUSY(windows antivirus holding the file),releaseStorageLockFallbackusesfs.rm({ force: true })which will also likely fail withEBUSYand log at debug level. lock file may remain until next acquire cleans it viaremoveIfStaleinlib/file-lock.ts:91-93. this is probably acceptable for "best-effort" but could leave stale locks on heavily-contested windows systems.
1285-1288:withAccountStorageTransactionnow uses serialized lock path.
lib/storage.ts:1285-1288routes throughwithStorageSerializedFileLock, ensuring consistent acquisition order withsaveAccountsandclearAccounts. this eliminates the possibility of lock-order deadlocks between these paths.
1299-1304:saveAccountsuses serialized lock—consistent with transaction path.
lib/storage.ts:1301-1303wrapssaveAccountsUnlockedinwithStorageSerializedFileLock. matches the acquisition order used bywithAccountStorageTransaction.
1311-1336:clearAccountsuses serialized lock—consistent with other mutation paths.
lib/storage.ts:1313-1335wraps the entire clear operation inwithStorageSerializedFileLock. all three mutation paths (withAccountStorageTransaction,saveAccounts,clearAccounts) now share the same lock acquisition sequence.
Add regression tests for lock.release() EBUSY/EPERM fallback cleanup and concurrent save lock serialization to close outstanding PR feedback. Co-authored-by: Codex <noreply@openai.com>
|
@greptile pls review |
Address unresolved review threads by hardening audit/log redaction, lock scope, cursor validation, retention retries, and storage decrypt behavior; align workflows/docs/scripts; and add regression tests for all actionable items. Co-authored-by: Codex <noreply@openai.com>
There was a problem hiding this comment.
Actionable comments posted: 9
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
lib/storage.ts (1)
930-947:⚠️ Potential issue | 🟠 Majorv1 migration output is bypassed when rebuilding normalized accounts.
lib/storage.ts:930computesbaseStorageviamigrateV1ToV3, butlib/storage.ts:936iteratesrawAccountsinstead ofbaseStorage.accounts. this can skip account-level migration normalization before decrypt/dedupe.proposed fix
- const rawActiveIndex = clampIndex(activeIndexValue, rawAccounts.length); - const activeKey = extractActiveKey(rawAccounts, rawActiveIndex); + const sourceAccounts = Array.isArray(baseStorage.accounts) ? baseStorage.accounts : []; + const rawActiveIndex = clampIndex(activeIndexValue, sourceAccounts.length); + const activeKey = extractActiveKey(sourceAccounts as unknown[], rawActiveIndex); @@ - const validAccounts: AccountMetadataV3[] = []; - for (const account of rawAccounts) { + const validAccounts: AccountMetadataV3[] = []; + for (const account of sourceAccounts) { if (!isStoredAccountCandidate(account)) { continue; } @@ - const clampedRawIndex = clampIndex(rawIndex, rawAccounts.length); - const familyKey = extractActiveKey(rawAccounts, clampedRawIndex); + const clampedRawIndex = clampIndex(rawIndex, sourceAccounts.length); + const familyKey = extractActiveKey(sourceAccounts as unknown[], clampedRawIndex);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/storage.ts` around lines 930 - 947, The code builds baseStorage using fromVersion and migrateV1ToV3 but then iterates rawAccounts, bypassing any migration/normalization; change the loop to iterate baseStorage.accounts (or the normalized accounts array on AccountStorageV3) when populating validAccounts so decryptAccountSensitiveFields runs on migrated data, keeping the existing try/catch that converts errors via toStorageDecryptError and preserving validAccounts accumulation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@index.ts`:
- Around line 1725-1732: The fallback for safeAuditResource currently returns
the raw url when new URL(url) throws, which can leak query/hash into auditLog
(see auditLog in lib/audit.ts); update the fallback in the safeAuditResource
IIFE (and the analogous spot around line 1739) to strip query and hash from the
raw string before returning (e.g., remove everything from the first '?' or '#'
onward) so only origin+pathname or sanitized path is logged; ensure the
sanitized value is what gets passed to auditLog/resource.
In `@lib/background-jobs.ts`:
- Around line 54-57: getDelayMs currently uses a deterministic exponential
backoff which causes worker synchronization; modify getDelayMs in
lib/background-jobs.ts to compute the capped exponential base delay (as it does
now) then apply a randomized jitter of about ±20% (e.g., multiply by 1 +
randomBetween(-0.2, 0.2)) and return an integer ms to avoid synchronized retries
on 429/EBUSY. Keep the existing cap logic and ensure jitter is applied after
capping; update tests in test/background-jobs.test.ts to include a
concurrent-retry assertion that verifies returned delays for the same attempt
can differ (i.e., show jitter is present) while preserving existing behavior for
EPERM/other cases.
- Around line 39-47: The token-redaction regex in sanitizeErrorMessage currently
uses [A-Z0-9._-]+ which misses base64 chars like /, +, =; update the
token-capturing patterns in sanitizeErrorMessage (the
/\b(?:access|refresh|id)?_?token(?:=|:)?\s*([A-Z0-9._-]+)/gi and the Bearer
pattern) to use a non-whitespace matcher such as \S+ so tokens are consumed
until whitespace (e.g., /\b(?:access|refresh|id)?_?token(?:=|:)?\s*\S+/gi and
/\b(Bearer)\s+\S+/gi), and add a unit test in the "redacts sensitive error text"
test to assert that tokens with base64-like suffixes (containing /, +, =) are
fully redacted.
In `@lib/data-retention.ts`:
- Line 21: The RETRYABLE_RETENTION_CODES set is missing "ENOTEMPTY", causing
production code to skip retries while tests expect it; update the
RETRYABLE_RETENTION_CODES constant to include "ENOTEMPTY" (matching the test's
RETRYABLE_REMOVE_CODES) so the retry logic that checks this set (used around the
fs.rmdir call) will treat ENOTEMPTY as transient and retry accordingly.
In `@lib/file-lock.ts`:
- Around line 69-89: When write or close fails while acquiring the lock in
lib/file-lock.ts (the block using handle.writeFile and handle.close that sets
writeError/closeError), perform a best-effort unlink of the lock file before
rethrowing the error to avoid leaving a stale lock on disk; do the same for the
analogous block around lines 154-175, and add a vitest regression that simulates
both a write failure and a close failure during acquisition to assert the lock
file is removed (no orphan lock) even when errors are thrown.
In `@lib/unified-settings.ts`:
- Around line 265-272: The current pattern around
acquireFileLockSync/lock.release in unified-settings (used with
UNIFIED_SETTINGS_LOCK_PATH and SETTINGS_LOCK_OPTIONS and wrapping
readSettingsRecordSync/writeSettingsRecordSync) can propagate lock.release
errors (EPERM/EBUSY) and fail after a successful write; update all call sites
(the blocks around the lock.release calls) to catch errors from lock.release(),
treat EPERM/EBUSY as non-fatal by logging a warning and swallowing them while
rethrowing unexpected errors, ensuring the writeSettingsRecordSync result is not
rolled back; also add vitest tests in test/unified-settings.test.ts that
stub/mock lock.release to throw EPERM and EBUSY and assert the write succeeds
and the lock file does not leave the process in a failing state.
In `@scripts/license-policy-check.js`:
- Around line 48-54: The exact-token denylist check is missing common SPDX
variants (e.g., "gpl-2.0-only", "gpl-2.0-or-later") causing blocked licenses to
slip through; update the matching logic that uses extractLicenseTokens so that
for each token you also derive and check a normalized base SPDX id (strip
suffixes like "-only" and "-or-later", and any trailing "+") against the
denylist (i.e., when evaluating in the function that performs the denylist
lookup near the token match), ensuring you test both the original token and the
stripped/base form before allowing it.
In `@test/background-jobs.test.ts`:
- Around line 141-160: The test currently asserts that the thrown error contains
PII ("person@example.com"); change the assertion to avoid checking for secrets
by expecting a generic failure instead—e.g., assert that
runBackgroundJobWithRetry (from runBackgroundJobWithRetry /
getBackgroundJobDlqPath) rejects (using .rejects.toThrow() with no PII-specific
message, or .rejects.toThrowError()/toBeRejected) and keep any checks about
redaction focused on logs/DLQ contents rather than the thrown error text; update
the assertion in the failing test block so it no longer requires the email/token
to appear.
In `@test/data-retention.test.ts`:
- Around line 101-156: Add deterministic Vitest cases exercising the remaining
retry branches in enforceDataRetention: simulate transient EPERM/EACCES/EAGAIN
errors on directory operations (fs.stat/fs.rmdir) and a transient
EBUSY/EACCES-like error on single-file removal (fs.unlink) to verify the retry
logic in the functions implemented in lib/data-retention.ts (refer to
enforceDataRetention and its internal directory-entry pruning and single-file
unlink paths). For each test, spy/mock the specific fs method (stat, rmdir,
unlink) to throw the transient error once for the target path then succeed,
assert that the retry flag was hit, assert the file/dir is ultimately removed
(expect ENOENT) and restore spies in finally blocks to keep tests deterministic
and isolated.
---
Outside diff comments:
In `@lib/storage.ts`:
- Around line 930-947: The code builds baseStorage using fromVersion and
migrateV1ToV3 but then iterates rawAccounts, bypassing any
migration/normalization; change the loop to iterate baseStorage.accounts (or the
normalized accounts array on AccountStorageV3) when populating validAccounts so
decryptAccountSensitiveFields runs on migrated data, keeping the existing
try/catch that converts errors via toStorageDecryptError and preserving
validAccounts accumulation.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: d85dbde0-bd54-490e-b59f-2b0c10f91215
⛔ Files ignored due to path filters (1)
package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (20)
.github/workflows/ci.yml.github/workflows/supply-chain.ymldocs/development/TESTING.mddocs/runbooks/operations.mdindex.tslib/background-jobs.tslib/codex-manager.tslib/data-retention.tslib/file-lock.tslib/storage.tslib/unified-settings.tspackage.jsonscripts/license-policy-check.jstest/authorization.test.tstest/background-jobs.test.tstest/codex-manager-cli.test.tstest/data-retention.test.tstest/file-lock.test.tstest/storage.test.tstest/unified-settings.test.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (3)
docs/**
⚙️ CodeRabbit configuration file
keep README, SECURITY, and docs consistent with actual CLI flags and workflows. whenever behavior changes, require updated upgrade notes and mention new npm scripts.
Files:
docs/runbooks/operations.mddocs/development/TESTING.md
test/**
⚙️ CodeRabbit configuration file
tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.
Files:
test/authorization.test.tstest/codex-manager-cli.test.tstest/data-retention.test.tstest/background-jobs.test.tstest/storage.test.tstest/file-lock.test.tstest/unified-settings.test.ts
lib/**
⚙️ CodeRabbit configuration file
focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.
Files:
lib/background-jobs.tslib/storage.tslib/unified-settings.tslib/file-lock.tslib/codex-manager.tslib/data-retention.ts
🧬 Code graph analysis (8)
index.ts (2)
lib/logger.ts (1)
logWarn(341-346)lib/audit.ts (1)
auditLog(123-153)
test/codex-manager-cli.test.ts (2)
lib/codex-manager.ts (1)
runCodexMultiAuthCli(4529-4610)scripts/codex.js (1)
runCodexMultiAuthCli(501-501)
test/data-retention.test.ts (1)
lib/data-retention.ts (2)
RetentionPolicy(6-12)enforceDataRetention(119-146)
lib/storage.ts (3)
lib/file-lock.ts (1)
acquireFileLock(57-123)lib/storage/migrations.ts (1)
AccountMetadataV3(40-57)lib/utils.ts (1)
isRecord(11-13)
test/background-jobs.test.ts (1)
lib/background-jobs.ts (2)
runBackgroundJobWithRetry(85-129)getBackgroundJobDlqPath(81-83)
test/storage.test.ts (1)
lib/storage.ts (2)
saveAccounts(1326-1331)loadAccounts(1006-1008)
lib/data-retention.ts (1)
lib/utils.ts (1)
sleep(65-67)
test/unified-settings.test.ts (1)
lib/unified-settings.ts (2)
getUnifiedSettingsPath(229-231)saveUnifiedPluginConfig(285-296)
🪛 LanguageTool
docs/runbooks/operations.md
[uncategorized] ~28-~28: The official name of this software platform is spelled with a capital “H”.
Context: ...ce (for example weekly-YYYYMMDD or CI ${{ github.run_id }}) - remove previous key af...
(GITHUB)
🔇 Additional comments (17)
docs/runbooks/operations.md (1)
27-28: good update for rotation idempotency workflow.this command and stable run-id guidance are clear and consistent with the weekly ops flow in
docs/runbooks/operations.md(line 27 and line 28).as per coding guidelines,
docs/**: keep README, SECURITY, and docs consistent with actual CLI flags and workflows. whenever behavior changes, require updated upgrade notes and mention new npm scripts.lib/data-retention.ts (1)
67-69: good retry coverage across retention io callsiteswrapping
readdir/stat/unlink/rmdirand single-file pruning paths is consistent inlib/data-retention.ts:67(line 67),lib/data-retention.ts:81(line 81), andlib/data-retention.ts:106(line 106). this meaningfully hardens transient io failures.Also applies to: 81-92, 106-110
test/data-retention.test.ts (1)
7-27: cleanup retry helper looks solid and deterministic
test/data-retention.test.ts:7(line 7) andtest/data-retention.test.ts:45(line 45) improve teardown stability under transient windows fs lock errors while preserving deterministic assertions.Also applies to: 45-45
lib/background-jobs.ts (1)
104-104: good hardening on error sanitization wiring.
lib/background-jobs.ts:104andlib/background-jobs.ts:119consistently sanitize error strings before dlq/log output.Also applies to: 119-119
package.json (1)
69-69: no issue: workflow correctly installs dev deps before sbom scriptthe sbom workflow in
.github/workflows/supply-chain.yml:72runsnpm ciwithout--omit=dev, so devDependencies (including cyclonedx-npm frompackage.json:107) are installed before the sbom script executes at line 75. the setup is correct.lib/codex-manager.ts (1)
1521-1521: the guard atlib/codex-manager.ts:1521is solid and already has regression coverage.
test/codex-manager-cli.test.ts:2123creates a cursor by encoding"12junk"as base64, which decodes to non-digits and correctly triggers the "Invalid --cursor value" error attest/codex-manager-cli.test.ts:2132. this covers the exact scenario the regex guard protects against.docs/development/TESTING.md (1)
53-60: docs update is aligned with the new local gatethis section clearly documents the ordering change and the added
npm run audit:ci/npm run license:checkchecks.As per coding guidelines,
docs/**: "whenever behavior changes, require updated upgrade notes and mention new npm scripts.".github/workflows/supply-chain.yml (1)
11-13: nice cleanup on supply-chain workflow configcentralizing the denylist and using
npm run sbomimproves maintainability and keeps policy settings in one place.Also applies to: 32-32, 75-75
.github/workflows/ci.yml (1)
55-60: ci flow update looks goodartifact handoff plus
needs: testmakes the coverage gate ordering explicit, andnpm run test:coverageis clearer for coverage intent.Also applies to: 68-68, 74-78, 90-90
test/authorization.test.ts (1)
59-68: good test isolation hardeningusing full auth-env snapshot/restore makes these vitest cases deterministic and prevents cross-test env bleed in
test/authorization.test.ts:59.Also applies to: 72-84, 88-100, 127-127
test/codex-manager-cli.test.ts (2)
2053-2064: good pagination contract assertions.
test/codex-manager-cli.test.ts:2053andtest/codex-manager-cli.test.ts:2082now pinschemaVersion,pagination.cursor, andpagination.pageSize, which tightens the json api contract for cursor paging.Also applies to: 2082-2093
2103-2147: good malformed cursor regression coverage.
test/codex-manager-cli.test.ts:2103validates both malformed decoded payloads and invalid base64 input with explicit exit-code and error assertions.test/file-lock.test.ts (1)
88-122: worker contention path now exercises the real lock api cleanly.
test/file-lock.test.ts:88–test/file-lock.test.ts:122andtest/file-lock.test.ts:134validate lock acquisition/release through the transpiled module in separate processes, then assert no lost/duplicate writes under contention.Also applies to: 134-166
test/unified-settings.test.ts (2)
6-25: cleanup retry helper is a good stability addition.
test/unified-settings.test.ts:6andtest/unified-settings.test.ts:44improve teardown reliability for transient windows remove errors (ebusy/eperm/enotempty).Also applies to: 44-44
249-302: good lock-before-read regression test.
test/unified-settings.test.ts:249verifiessaveUnifiedPluginConfigreads settings while the file lock is held, which directly covers the new lock ordering inlib/unified-settings.ts.lib/storage.ts (1)
350-359: lock release fallback handling is cleaner now.
lib/storage.ts:350andlib/storage.ts:368now keep best-effort cleanup observable and only trigger forced fallback on non-enoentrelease failures.Also applies to: 368-378
test/storage.test.ts (1)
680-714: strong regression coverage for decrypt failures and lock edge cases.
test/storage.test.ts:680plustest/storage.test.ts:1579–test/storage.test.ts:1676cover fail-fast decrypt behavior,lock.release()fallback on windows-style errors, and concurrent save lock serialization.Also applies to: 1579-1676
Address unresolved PR #32 feedback across audit logging, CLI parsing, lock handling, retention races, storage rotation, and runbook/license guidance. Includes targeted regressions for background jobs, codex-manager CLI, data retention, and file lock behavior. Co-authored-by: Codex <noreply@openai.com>
There was a problem hiding this comment.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
lib/storage.ts (1)
113-139: 🧹 Nitpick | 🔵 Trivialscope account-file mutex by lock path to avoid cross-project head-of-line blocking.
lib/storage.ts:113-139serializes all account file ops through one global queue. this can unnecessarily block independent storage paths in the same process (project-scoped vs global), reducing concurrency for no correctness gain.proposed refactor
-let accountFileMutex: Promise<void> = Promise.resolve(); +const accountFileMutexByPath = new Map<string, Promise<void>>(); -function withAccountFileMutex<T>(fn: () => Promise<T>): Promise<T> { - const previousMutex = accountFileMutex; +function withAccountFileMutex<T>(path: string, fn: () => Promise<T>): Promise<T> { + const previousMutex = accountFileMutexByPath.get(path) ?? Promise.resolve(); let releaseLock: () => void; - accountFileMutex = new Promise<void>((resolve) => { + const nextMutex = new Promise<void>((resolve) => { releaseLock = resolve; }); - return previousMutex.then(fn).finally(() => releaseLock()); + accountFileMutexByPath.set(path, nextMutex); + return previousMutex.then(fn).finally(() => { + releaseLock(); + if (accountFileMutexByPath.get(path) === nextMutex) { + accountFileMutexByPath.delete(path); + } + }); } async function withStorageSerializedFileLock<T>(path: string, fn: () => Promise<T>): Promise<T> { - return withAccountFileMutex(() => + return withAccountFileMutex(path, () => withAccountFileLock(path, () => withStorageLock(fn)), ); }As per coding guidelines,
lib/**: “focus on auth rotation, windows filesystem IO, and concurrency.”🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/storage.ts` around lines 113 - 139, The global accountFileMutex serializes all account file operations and causes head-of-line blocking; change withAccountFileMutex and accountFileMutex to use a per-path mutex map (keyed by the file path used in withStorageSerializedFileLock) so independent paths don't block each other: replace the single accountFileMutex with a Map<string, Promise<void>> (or similar) and modify withAccountFileMutex to accept a path (or create a new helper like withAccountFileMutexForPath(path, fn)) that grabs/releases the Promise for that path, chains on previousPromise.then(fn).finally(() => release and cleanup the map entry if no waiter), and update withStorageSerializedFileLock to pass the path into the per-path account-file mutex while still sequencing with withAccountFileLock and withStorageLock; keep existing release semantics and ensure map entry is removed when resolved to avoid leaks.
♻️ Duplicate comments (6)
index.ts (1)
1687-1693:⚠️ Potential issue | 🟠 Majorfallback audit resource still logs raw url with query/hash.
Line 1692 and Line 2468 in
index.ts:1692andindex.ts:2468return rawurlwhen parsing fails.lib/audit.ts:130persistsresourceverbatim, so query/hash secrets can leak.proposed fix
const safeAuditResource = (() => { try { const parsed = new URL(url); return `${parsed.origin}${parsed.pathname}`; } catch { - return url; + const raw = String(url); + return raw.split("?")[0]?.split("#")[0] ?? "unknown"; } })(); @@ const safePluginAuditResource = (() => { try { const parsed = new URL(url); return `${parsed.origin}${parsed.pathname}`; } catch { - return url; + const raw = String(url); + return raw.split("?")[0]?.split("#")[0] ?? "unknown"; } })();Also applies to: 2463-2469
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@index.ts` around lines 1687 - 1693, The fallback for safeAuditResource currently returns the raw url when new URL(url) throws, which can leak query/hash; update the catch branch in the safeAuditResource IIFE to return a sanitized version (strip anything after the first '?' or '#' — e.g. use url.split(/[?#]/)[0]) instead of the raw url, and apply the same change to the other identical occurrence (the second safeAuditResource implementation) so both fallbacks consistently remove query and fragment before persisting.test/data-retention.test.ts (1)
101-197:⚠️ Potential issue | 🟠 Majorretry-branch coverage is still incomplete for windows error codes and single-file unlink.
test/data-retention.test.ts:101-197coversEBUSYandENOTEMPTY, but leavesEPERM/EACCES/EAGAINfromlib/data-retention.ts:21and thepruneSingleFileretry path atlib/data-retention.ts:104-111untested.proposed test additions
+it.each(["EPERM", "EACCES", "EAGAIN"] as const)( + "retries transient %s during directory entry pruning", + async (code) => { + // inject one-time error on fs.stat or fs.rmdir for stale log path + // assert retry flag hit and stale entry removed + }, +); + +it.each(["EBUSY", "EPERM", "EACCES", "EAGAIN"] as const)( + "retries transient %s during single-file retention pruning", + async (code) => { + // create stale background-job-dlq.jsonl (or quota-cache.json) + // inject one-time error on fs.unlink for that file + // assert enforceDataRetention succeeds and file is removed + }, +);As per coding guidelines,
test/**: "tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/data-retention.test.ts` around lines 101 - 197, Tests are missing coverage for Windows-like transient error handling (EPERM/EACCES/EAGAIN) and the single-file retry path in pruneSingleFile; add deterministic vitest cases that import enforceDataRetention/pruneSingleFile from lib/data-retention.js, create a stale file, then spyOn fs.unlink (for pruneSingleFile) and fs.stat/fs.rmdir (for enforceDataRetention) to first throw a NodeJS.ErrnoException with code set to "EPERM"/"EACCES"/"EAGAIN" (or simulate one transient unlink error) and then succeed on retry, assert removedLogs increments and that the file/directory is eventually gone, and ensure all spies are restored in finally blocks to keep tests deterministic.test/background-jobs.test.ts (2)
181-195:⚠️ Potential issue | 🟡 Minoradd a regression case for token values containing
/,+, and=.Line 181 in
test/background-jobs.test.ts:181only exercisessk_test_123/rt_456, so this suite will not catch the current regex gap inlib/background-jobs.ts:43-46.proposed fix
throw new Error( - "network failed for person@example.com Bearer sk_test_123 refresh_token=rt_456", + "network failed for person@example.com Bearer sk_test/123+abc== refresh_token=rt/456+xyz==", ); ... - expect(entry.error).not.toContain("sk_test_123"); - expect(entry.error).not.toContain("rt_456"); + expect(entry.error).not.toContain("sk_test/123+abc=="); + expect(entry.error).not.toContain("rt/456+xyz=="); ... - expect(serializedWarnings).not.toContain("sk_test_123"); - expect(serializedWarnings).not.toContain("rt_456"); + expect(serializedWarnings).not.toContain("sk_test/123+abc=="); + expect(serializedWarnings).not.toContain("rt/456+xyz==");As per coding guidelines,
test/**: "tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/background-jobs.test.ts` around lines 181 - 195, Add a regression test to cover token values containing "/", "+", and "=" to exercise the current redaction regex gap referenced in lib/background-jobs.ts (lines around the redaction logic) — update the test in background-jobs.test.ts that asserts DLQ redaction (the block that currently uses "sk_test_123" and "rt_456") to also enqueue/fail a job using secrets like "sk/te+st=123" and "rt/ra/sh+=token=" and then read the DLQ entry to assert entry.error contains "***REDACTED***" and does not contain the raw token substrings; this ensures the redaction logic in the background job handler (the function that writes DLQ entries) correctly sanitizes tokens with "/", "+", and "=".
186-186:⚠️ Potential issue | 🟠 Majordo not assert pii leakage in thrown errors.
Line 186 in
test/background-jobs.test.ts:186requires an email in the thrown error message, which hard-codes a sensitive-data leak contract.proposed fix
- ).rejects.toThrow("person@example.com"); + ).rejects.toThrow("network failed");🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/background-jobs.test.ts` at line 186, The test currently asserts a specific email string in the thrown error (the .rejects.toThrow("person@example.com") assertion), which hard-codes PII; remove the exact-email assertion and instead assert the rejection without PII (e.g., use .rejects.toThrow() or assert the error type/class) and if you need to check content assert it is generic/non-PII (for example, assert the error.message does not contain 'person@example.com' or check a validation error class) so update the assertion in test/background-jobs.test.ts to avoid referencing "person@example.com".lib/background-jobs.ts (2)
49-50:⚠️ Potential issue | 🟠 Majoradd jitter to retry delay to avoid synchronized retries under 429/ebusy.
Line 49 in
lib/background-jobs.ts:49is deterministic, so concurrent workers can re-hit at the same time and amplify contention.proposed fix
function getDelayMs(attempt: number, baseDelayMs: number, maxDelayMs: number): number { - return Math.min(maxDelayMs, baseDelayMs * 2 ** Math.max(0, attempt - 1)); + const capped = Math.min(maxDelayMs, baseDelayMs * 2 ** Math.max(0, attempt - 1)); + const jitter = Math.floor(capped * 0.2 * (Math.random() * 2 - 1)); + return Math.max(1, capped + jitter); }As per coding guidelines,
lib/**: "verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/background-jobs.ts` around lines 49 - 50, The retry delay function getDelayMs(attempt, baseDelayMs, maxDelayMs) is deterministic and needs randomized jitter to prevent synchronized retries; modify getDelayMs to compute the exponential backoff then apply a bounded random jitter (e.g., ±X% or add a random 0..jitterMs) while capping at maxDelayMs and never returning negative values, keeping parameters baseDelayMs, maxDelayMs and attempt semantics intact; update or add vitest cases to assert jitter stays within the expected bounds and add tests/notes for EBUSY/429 retry behavior for the affected queues.
39-46:⚠️ Potential issue | 🟠 Majorredaction regex still leaks base64-like token suffixes.
Line 43 and Line 46 in
lib/background-jobs.ts:43andlib/background-jobs.ts:46stop matching at/,+, and=, so token tails can leak.proposed fix
function sanitizeErrorMessage(message: string): string { return message .replace(/[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}/gi, "***REDACTED***") .replace( - /\b(?:access|refresh|id)?_?token(?:=|:)?\s*([A-Z0-9._-]+)/gi, + /\b(?:access|refresh|id)?_?token(?:=|:)?\s*([^\s,;]+)/gi, "token=***REDACTED***", ) - .replace(/\b(Bearer)\s+[A-Z0-9._-]+\b/gi, "$1 ***REDACTED***"); + .replace(/\b(Bearer)\s+[^\s,;]+\b/gi, "$1 ***REDACTED***"); }As per coding guidelines,
lib/**: "check for logging that leaks tokens or emails."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/background-jobs.ts` around lines 39 - 46, sanitizeErrorMessage is leaking base64-like token suffixes because the character classes used in the two token-redaction replace calls (the capture for access/refresh/id tokens and the Bearer token pattern) omit "+" "/" and "="; update both regexes in sanitizeErrorMessage (the pattern used in the replace that matches \b(?:access|refresh|id)?_?token... and the one matching \b(Bearer)\s+...) to include plus, slash and equals (e.g. expand the character class from [A-Z0-9._-] to include +/=: [A-Z0-9._+\-/=]) while keeping the case-insensitive flag and retaining the replacement strings, so tokens containing base64 characters are fully redacted.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@lib/codex-manager.ts`:
- Around line 262-270: The current redactFreeformSecretText function uses
character classes [A-Z0-9._-] in the token and Bearer regexes which omit "+",
"/", and "=" and can allow parts of JWT/URL-safe tokens to leak; update the two
regexes inside redactFreeformSecretText (the
/\b(?:access|refresh|id)?_?token(?:=|:)?\s*([A-Z0-9._-]+)/gi and the
/\b(Bearer)\s+[A-Z0-9._-]+\b/gi patterns) to include +, /, and = characters
(e.g. expand the class to [A-Z0-9._+\-/=]) while keeping the i flag so case is
handled, ensuring all token-like values are fully replaced with the redaction
string.
In `@lib/file-lock.ts`:
- Around line 83-85: Stale-lock eviction currently only checks age but the lock
file written by handle.writeFile contains a pid and acquiredAt; update the
eviction logic (the stale-check branches that read the lock JSON) to also verify
pid liveness before breaking the lock: parse the lock JSON (pid, acquiredAt), if
age > staleAfterMs then attempt a cross-platform liveness probe (e.g. try
process.kill(pid, 0) in a try/catch and treat ESRCH as dead) and only evict if
the PID is not alive; additionally ensure the writer that uses handle.writeFile
refreshes acquiredAt periodically while holding the lock so long-running
processes aren't considered stale.
- Around line 57-65: The cleanupIncompleteLockFile currently discards the
original write/close error when fs.unlink fails with a non-ENOENT, masking the
real acquisition failure; change cleanupIncompleteLockFile(path: string) to
accept an optional originalErr?: Error and in the catch block, if code !==
"ENOENT" rethrow originalErr (if provided) else rethrow the unlink error; then
update all callers (the call sites around the previous lines that invoke
cleanupIncompleteLockFile) to pass the original write/close/acquisition error
into cleanupIncompleteLockFile so the original failure is preserved when
unlinking fails.
In `@scripts/license-policy-check.js`:
- Around line 7-9: The denylist bypass happens because normalizeSpdxToken
currently only strips "-ONLY" and "-OR-LATER" but not a trailing "+", so tokens
like "GPL-2.0+" don't normalize to the denylist form; update normalizeSpdxToken
to also remove a trailing "+" and ensure every place you parse/split SPDX
expressions (the token-splitting logic referenced around the token extraction
code) calls normalizeSpdxToken for both the denylist entries and the extracted
tokens before comparison; then add a regression test that asserts denylist
entries like "GPL-2.0+" and "LGPL-2.1+" are correctly matched and blocked by the
policy check.
In `@test/background-jobs.test.ts`:
- Around line 126-129: The test currently overrides the retryable predicate (the
retryable property in the test setup) which bypasses the default 429 retry logic
in lib/background-jobs.ts; remove that override so the test exercises the
module's default logic, or replace the inline predicate with the real predicate
exported by lib/background-jobs.ts (import the retry predicate or isRetryable
function from lib/background-jobs.ts and use it in place of the inline
retryable) so the test validates the default 429 retry path instead of
short-circuiting it.
In `@test/codex-manager-cli.test.ts`:
- Around line 2233-2253: Extend the existing redaction test for
runCodexMultiAuthCli (used in test/codex-manager-cli.test.ts) to include a
regression case where rotateStoredSecretEncryptionMock rejects with an Error
containing bearer/refresh tokens that include '+' and '/' characters (e.g.,
"Bearer sk+test/123" and "refresh_token=rt+456/789"), then assert the non-JSON
console error output still contains the "***REDACTED***" placeholder and does
not contain the raw email or those token substrings; you can either add a new
it(...) near the existing "redacts rotate-secrets failure details in non-json
output" test or extend that test to set a second mockRejectedValue case and
repeat the same expectations against console.error captured via errorSpy,
referencing rotateStoredSecretEncryptionMock and runCodexMultiAuthCli to locate
the code.
In `@test/file-lock.test.ts`:
- Around line 168-198: The test currently injects only an EBUSY error via the
fs.unlink mock (see unlinkSpy.mockImplementation, injectedBusy, and the
lock.release() calls); add coverage for Windows-style transient unlink
contention by making the first injected error use code "EPERM" on Windows
(process.platform === "win32") or keep "EBUSY" otherwise, so the mock throws an
error with code "EPERM" when path === lockPath for the first attempt on Windows;
ensure the rest of the test flow (second release resolves, fs.stat asserts
ENOENT, and unlinkSpy.mockRestore()) remains unchanged and references to
acquireFileLock and lock.release stay intact.
- Around line 101-119: The test uses a fixed sleep (setTimeout 2ms) to force
overlap which is flaky; replace that with a deterministic rendezvous barrier so
workers synchronize precisely before writing. Modify the worker script (the code
that imports acquireFileLock and uses sharedFilePath, workerId, iterations) to:
after reading existing, signal readiness by appending a short token to a
dedicated rendezvous file (e.g., `${lockPath}.barrier`), then loop reading that
barrier file until its token count equals the number of workers (pass the total
worker count into the worker argv from the test harness), and only then proceed
to write the shared file and continue iterations; keep acquireFileLock usage the
same. This removes timing coupling and makes the contention deterministic
without any setTimeout-based sleeps.
---
Outside diff comments:
In `@lib/storage.ts`:
- Around line 113-139: The global accountFileMutex serializes all account file
operations and causes head-of-line blocking; change withAccountFileMutex and
accountFileMutex to use a per-path mutex map (keyed by the file path used in
withStorageSerializedFileLock) so independent paths don't block each other:
replace the single accountFileMutex with a Map<string, Promise<void>> (or
similar) and modify withAccountFileMutex to accept a path (or create a new
helper like withAccountFileMutexForPath(path, fn)) that grabs/releases the
Promise for that path, chains on previousPromise.then(fn).finally(() => release
and cleanup the map entry if no waiter), and update
withStorageSerializedFileLock to pass the path into the per-path account-file
mutex while still sequencing with withAccountFileLock and withStorageLock; keep
existing release semantics and ensure map entry is removed when resolved to
avoid leaks.
---
Duplicate comments:
In `@index.ts`:
- Around line 1687-1693: The fallback for safeAuditResource currently returns
the raw url when new URL(url) throws, which can leak query/hash; update the
catch branch in the safeAuditResource IIFE to return a sanitized version (strip
anything after the first '?' or '#' — e.g. use url.split(/[?#]/)[0]) instead of
the raw url, and apply the same change to the other identical occurrence (the
second safeAuditResource implementation) so both fallbacks consistently remove
query and fragment before persisting.
In `@lib/background-jobs.ts`:
- Around line 49-50: The retry delay function getDelayMs(attempt, baseDelayMs,
maxDelayMs) is deterministic and needs randomized jitter to prevent synchronized
retries; modify getDelayMs to compute the exponential backoff then apply a
bounded random jitter (e.g., ±X% or add a random 0..jitterMs) while capping at
maxDelayMs and never returning negative values, keeping parameters baseDelayMs,
maxDelayMs and attempt semantics intact; update or add vitest cases to assert
jitter stays within the expected bounds and add tests/notes for EBUSY/429 retry
behavior for the affected queues.
- Around line 39-46: sanitizeErrorMessage is leaking base64-like token suffixes
because the character classes used in the two token-redaction replace calls (the
capture for access/refresh/id tokens and the Bearer token pattern) omit "+" "/"
and "="; update both regexes in sanitizeErrorMessage (the pattern used in the
replace that matches \b(?:access|refresh|id)?_?token... and the one matching
\b(Bearer)\s+...) to include plus, slash and equals (e.g. expand the character
class from [A-Z0-9._-] to include +/=: [A-Z0-9._+\-/=]) while keeping the
case-insensitive flag and retaining the replacement strings, so tokens
containing base64 characters are fully redacted.
In `@test/background-jobs.test.ts`:
- Around line 181-195: Add a regression test to cover token values containing
"/", "+", and "=" to exercise the current redaction regex gap referenced in
lib/background-jobs.ts (lines around the redaction logic) — update the test in
background-jobs.test.ts that asserts DLQ redaction (the block that currently
uses "sk_test_123" and "rt_456") to also enqueue/fail a job using secrets like
"sk/te+st=123" and "rt/ra/sh+=token=" and then read the DLQ entry to assert
entry.error contains "***REDACTED***" and does not contain the raw token
substrings; this ensures the redaction logic in the background job handler (the
function that writes DLQ entries) correctly sanitizes tokens with "/", "+", and
"=".
- Line 186: The test currently asserts a specific email string in the thrown
error (the .rejects.toThrow("person@example.com") assertion), which hard-codes
PII; remove the exact-email assertion and instead assert the rejection without
PII (e.g., use .rejects.toThrow() or assert the error type/class) and if you
need to check content assert it is generic/non-PII (for example, assert the
error.message does not contain 'person@example.com' or check a validation error
class) so update the assertion in test/background-jobs.test.ts to avoid
referencing "person@example.com".
In `@test/data-retention.test.ts`:
- Around line 101-197: Tests are missing coverage for Windows-like transient
error handling (EPERM/EACCES/EAGAIN) and the single-file retry path in
pruneSingleFile; add deterministic vitest cases that import
enforceDataRetention/pruneSingleFile from lib/data-retention.js, create a stale
file, then spyOn fs.unlink (for pruneSingleFile) and fs.stat/fs.rmdir (for
enforceDataRetention) to first throw a NodeJS.ErrnoException with code set to
"EPERM"/"EACCES"/"EAGAIN" (or simulate one transient unlink error) and then
succeed on retry, assert removedLogs increments and that the file/directory is
eventually gone, and ensure all spies are restored in finally blocks to keep
tests deterministic.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 4e936d85-e0e6-4a85-a341-873db0f57108
📒 Files selected for processing (12)
docs/runbooks/operations.mdindex.tslib/background-jobs.tslib/codex-manager.tslib/data-retention.tslib/file-lock.tslib/storage.tsscripts/license-policy-check.jstest/background-jobs.test.tstest/codex-manager-cli.test.tstest/data-retention.test.tstest/file-lock.test.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (3)
test/**
⚙️ CodeRabbit configuration file
tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.
Files:
test/background-jobs.test.tstest/data-retention.test.tstest/file-lock.test.tstest/codex-manager-cli.test.ts
docs/**
⚙️ CodeRabbit configuration file
keep README, SECURITY, and docs consistent with actual CLI flags and workflows. whenever behavior changes, require updated upgrade notes and mention new npm scripts.
Files:
docs/runbooks/operations.md
lib/**
⚙️ CodeRabbit configuration file
focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.
Files:
lib/data-retention.tslib/storage.tslib/background-jobs.tslib/codex-manager.tslib/file-lock.ts
🧬 Code graph analysis (8)
test/background-jobs.test.ts (2)
lib/logger.ts (1)
error(389-393)lib/background-jobs.ts (2)
runBackgroundJobWithRetry(85-131)getBackgroundJobDlqPath(81-83)
lib/data-retention.ts (1)
lib/utils.ts (1)
sleep(65-67)
test/data-retention.test.ts (1)
lib/data-retention.ts (2)
RetentionPolicy(6-12)enforceDataRetention(119-146)
lib/storage.ts (3)
lib/file-lock.ts (1)
acquireFileLock(68-137)lib/storage/migrations.ts (1)
AccountMetadataV3(40-57)lib/utils.ts (1)
isRecord(11-13)
test/file-lock.test.ts (1)
lib/file-lock.ts (1)
acquireFileLock(68-137)
index.ts (2)
lib/audit.ts (1)
auditLog(123-153)lib/logger.ts (1)
logWarn(341-346)
lib/codex-manager.ts (1)
lib/data-redaction.ts (1)
redactForExternalOutput(31-50)
test/codex-manager-cli.test.ts (2)
lib/codex-manager.ts (1)
runCodexMultiAuthCli(4571-4652)scripts/codex.js (3)
runCodexMultiAuthCli(501-501)exitCode(505-505)exitCode(532-532)
🪛 LanguageTool
docs/runbooks/operations.md
[uncategorized] ~31-~31: The official name of this software platform is spelled with a capital “H”.
Context: ...ce (for example weekly-YYYYMMDD or CI ${{ github.run_id }}) - remove previous key af...
(GITHUB)
🔇 Additional comments (6)
scripts/license-policy-check.js (1)
23-50: good hardening on license-shape normalization and extraction.
lib/scripts/license-policy-check.js:23throughlib/scripts/license-policy-check.js:50correctly handleslicenseandlicensesshapes and reduces false unknowns. no new windows edge case or concurrency risk is introduced in this segment.Also applies to: 66-67
lib/codex-manager.ts (1)
1556-1557: good hardening on cli parsing and rotate-secrets error sanitization.
lib/codex-manager.ts:1556-1557,lib/codex-manager.ts:2003-2019,lib/codex-manager.ts:2223-2233, andlib/codex-manager.ts:4126-4129close option-smuggling and malformed numeric/cursor paths cleanly. this is covered bytest/codex-manager-cli.test.ts:2103-2166andtest/codex-manager-cli.test.ts:2215-2231.Also applies to: 2003-2007, 2017-2019, 2223-2233, 4126-4129
test/codex-manager-cli.test.ts (1)
2103-2147: nice regression coverage for malformed cursor, page-size parsing, and option-smuggling.
test/codex-manager-cli.test.ts:2103-2166andtest/codex-manager-cli.test.ts:2215-2231are precise, deterministic checks for the new parser hardening.Also applies to: 2149-2166, 2215-2231
lib/storage.ts (2)
350-359: lock-release fallback behavior is now safer and observable.
lib/storage.ts:350-359andlib/storage.ts:368-378now keep best-effort cleanup while avoiding unconditional fallback deletes, and they add debug visibility for fallback failures.Also applies to: 368-378
196-204: edecrypt fast-fail propagation is solid.
lib/storage.ts:196-204,lib/storage.ts:936-946, andlib/storage.ts:1126-1128correctly preserve decrypt-failure semantics instead of silently normalizing corrupted encrypted payloads.Also applies to: 936-946, 1126-1128
test/file-lock.test.ts (1)
224-233: good regression for orphan-lock cleanup on metadata write failure.Lines 224-233 in
test/file-lock.test.ts:224verify ENOENT cleanup after a forced write failure, which directly exercises the cleanup branch inlib/file-lock.ts:84-96.
Address remaining PR34 review threads across audit URL sanitization, background-job retry behavior, data-retention retry scoping, unified-settings lock release handling, file-lock stale owner protection, secret redaction hardening, and SPDX plus-form denylist matching. Add regression coverage for concurrent lock contention, EPERM/EBUSY release retries, retry-limit/jitter counters, audit fallback sanitization, retention transient IO branches, unified-settings release failures, and license-policy plus-form matches. Co-authored-by: Codex <noreply@openai.com>
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
lib/background-jobs.ts (1)
105-129:⚠️ Potential issue | 🟠 Majorstop rethrowing raw task errors after sanitization.
in
lib/background-jobs.tsline 105-129, you sanitize the message for dlq/logs, but line 129 still rethrowslastErrorverbatim when it is anError. this leaks emails/tokens back to callers and upstream logs. downstream impact:test/background-jobs.test.tsline 217 currently locks that leak in.please also add a vitest regression that asserts thrown error text is sanitized for token/email patterns.
proposed fix
- throw (lastError instanceof Error ? lastError : new Error(errorMessage)); + const sanitizedError = new Error(errorMessage); + if (lastError instanceof Error) { + sanitizedError.name = lastError.name; + } + throw sanitizedError;As per coding guidelines,
lib/**: "check for logging that leaks tokens or emails."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/background-jobs.ts` around lines 105 - 129, The code currently rethrows lastError verbatim (throw (lastError instanceof Error ? lastError : new Error(errorMessage))) which can leak sensitive data; change the rethrow to throw a new Error containing the already-sanitized errorMessage (or otherwise ensure lastError.message is replaced with sanitizeErrorMessage(toErrorMessage(lastError))) so callers only receive sanitized text; update the DeadLetter flow that uses sanitizeErrorMessage/toErrorMessage and ensure appendDeadLetter/logWarn remain unchanged; add a vitest regression in test/background-jobs.test.ts that triggers the failing path and asserts the thrown error.message does not contain raw token/email patterns but matches the sanitized output.
♻️ Duplicate comments (1)
lib/unified-settings.ts (1)
36-68:⚠️ Potential issue | 🟠 Majorrelease helper still strands lockfiles on windows transient errors.
in
lib/unified-settings.tsline 42-47 and line 59-65,ebusy/epermfromrelease()are swallowed after logging, but no fallback cleanup runs. this can strandsettings.json.lockand block subsequent writers until stale expiry, which is a concurrency/reliability risk on windows.please add a vitest regression that forces
release()to throwebusy/epermand verifies the next save does not wait for stale lock expiry.proposed fix
function releaseUnifiedSettingsLockSync(release: () => void): void { try { release(); } catch (error) { const code = (error as NodeJS.ErrnoException | undefined)?.code; if (code === "ENOENT") return; if (code === "EBUSY" || code === "EPERM") { + try { + unlinkSync(UNIFIED_SETTINGS_LOCK_PATH); + } catch (cleanupError) { + const cleanupCode = (cleanupError as NodeJS.ErrnoException | undefined)?.code; + if (cleanupCode !== "ENOENT") { + logWarn("Unified settings lock fallback cleanup failed", { + lockPath: UNIFIED_SETTINGS_LOCK_PATH, + code: cleanupCode ?? "UNKNOWN", + }); + } + } logWarn("Unified settings lock release failed after write", { lockPath: UNIFIED_SETTINGS_LOCK_PATH, code, }); return; } throw error; } } async function releaseUnifiedSettingsLockAsync(release: () => Promise<void>): Promise<void> { try { await release(); } catch (error) { const code = (error as NodeJS.ErrnoException | undefined)?.code; if (code === "ENOENT") return; if (code === "EBUSY" || code === "EPERM") { + try { + await fs.rm(UNIFIED_SETTINGS_LOCK_PATH, { force: true }); + } catch (cleanupError) { + const cleanupCode = (cleanupError as NodeJS.ErrnoException | undefined)?.code; + if (cleanupCode !== "ENOENT") { + logWarn("Unified settings lock fallback cleanup failed", { + lockPath: UNIFIED_SETTINGS_LOCK_PATH, + code: cleanupCode ?? "UNKNOWN", + }); + } + } logWarn("Unified settings lock release failed after write", { lockPath: UNIFIED_SETTINGS_LOCK_PATH, code, }); return; } throw error; } }As per coding guidelines,
lib/**: "focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/unified-settings.ts` around lines 36 - 68, The release helpers releaseUnifiedSettingsLockSync and releaseUnifiedSettingsLockAsync currently swallow EBUSY/EPERM after logging and can leave a stranded lock; modify both so that when (error as NodeJS.ErrnoException)?.code is "EBUSY" or "EPERM" you log as before then perform a fallback cleanup (attempt to forcibly remove the lock at UNIFIED_SETTINGS_LOCK_PATH: use fs.unlinkSync in releaseUnifiedSettingsLockSync and fs.promises.unlink in releaseUnifiedSettingsLockAsync, handling ENOENT silently), and only return after the fallback cleanup completes or its error is handled; also add a vitest regression that mocks release() to throw EBUSY and EPERM and asserts a subsequent save does not block waiting for stale lock expiry.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@index.ts`:
- Around line 1687-1696: Duplicate URL sanitization logic used to build
safeAuditResource should be consolidated into a single helper; create a function
named toSafeAuditResource(url: unknown): string that implements the try-new
URL(origin+pathname) fallback to sanitized rawUrl.split(/[?#]/,1)[0] returning
"unknown" when empty, then replace the inline IIFEs that currently compute
safeAuditResource (the blocks around safeAuditResource at the two call sites) to
simply call toSafeAuditResource(url); ensure both occurrences (previously at the
two duplicated blocks) use this helper so behavior stays identical.
In `@lib/data-retention.ts`:
- Around line 93-95: The bare catch in the try/catch in lib/data-retention.ts
should stop swallowing all errors; change it to catch the error (e.g., catch
(err)) and explicitly handle expected transient races (continue only when
err.code === 'ENOENT' or other known benign codes) while logging and surfacing
unexpected failures—use the module's logger (e.g., processLogger or existing
logger) to record err and context and rethrow or return a failed status for
non-retryable errors so they are visible in production.
In `@test/data-retention.test.ts`:
- Around line 100-278: Add a test that verifies retry exhaustion by forcing
persistent EBUSY when pruning directory entries: mock fs.stat (or fs.rmdir if
that path is used) so calls for the stale file always throw a
NodeJS.ErrnoException with code "EBUSY", call enforceDataRetention and assert it
rejects with that error code (or throws) after the max retries exercised in
lib/data-retention.ts (see retry logic around lines 33-34), and ensure you
restore the spy in a finally block; reference enforceDataRetention and the
pruneDirectoryByAge/prune logic to locate where retries occur.
- Line 7: The test helper's RETRYABLE_REMOVE_CODES set is missing EACCES and
EAGAIN and should match the retry logic in lib/data-retention.ts; update the
RETRYABLE_REMOVE_CODES constant in test/data-retention.test.ts (the Set assigned
to RETRYABLE_REMOVE_CODES) to include "EACCES" and "EAGAIN" so the test cleanup
retries use the same error codes as the data-retention retry logic.
In `@test/license-policy-check.test.ts`:
- Around line 39-42: The test matrix in test/license-policy-check.test.ts only
covers string license values; add regression cases that exercise non-string
shapes parsed by scripts/license-policy-check.js (the { license: { type: "..." }
} shape and the licenses[] array shape) so the test for denylisted SPDX
plus-form also includes entries like { denyList: "GPL-2.0+", license: { type:
"GPL-2.0+" } } and { denyList: "LGPL-2.1+", license: [{ type: "MIT OR LGPL-2.1+"
}] } (and mirror similar additions for the other table at the 54-55 region) to
ensure the parsing branches in scripts/license-policy-check.js:27-54 are covered
and prevent policy-bypass regressions.
- Around line 9-26: Add a deterministic regression test for removeWithRetry that
forces the Windows retry branch by stubbing fs.rm to throw a retriable error
code (e.g., "EBUSY" / "EPERM" / "ENOTEMPTY") for the first few invocations then
succeed, using vitest (vi.spyOn or vi.stub) and vi.useFakeTimers to control
backoff timing; call removeWithRetry and assert it eventually returns and that
fs.rm was called multiple times (and restore the spy afterward). Reference the
function removeWithRetry and the constant RETRYABLE_REMOVE_CODES in your test so
it explicitly targets the retry logic, ensure the test does not rely on real
filesystem state or real timers, and make the assertions deterministic (e.g.,
advance timers to trigger retries and verify call count and final success).
In `@test/unified-settings.test.ts`:
- Around line 6-25: Extract the duplicated removeWithRetry helper (and its
constant RETRYABLE_REMOVE_CODES) into a shared test utility module (e.g.,
test/helpers/fs-utils.ts), export removeWithRetry with the same signature async
function removeWithRetry(targetPath: string, options: { recursive?: boolean;
force?: boolean }): Promise<void>, and update callers (e.g., in
unified-settings.test.ts and file-lock.test.ts) to import it instead of defining
it inline; preserve behavior (retry loop, ENOENT handling, retryable codes,
backoff timing) and ensure proper NodeJS.ErrnoException typing is retained in
the shared module.
---
Outside diff comments:
In `@lib/background-jobs.ts`:
- Around line 105-129: The code currently rethrows lastError verbatim (throw
(lastError instanceof Error ? lastError : new Error(errorMessage))) which can
leak sensitive data; change the rethrow to throw a new Error containing the
already-sanitized errorMessage (or otherwise ensure lastError.message is
replaced with sanitizeErrorMessage(toErrorMessage(lastError))) so callers only
receive sanitized text; update the DeadLetter flow that uses
sanitizeErrorMessage/toErrorMessage and ensure appendDeadLetter/logWarn remain
unchanged; add a vitest regression in test/background-jobs.test.ts that triggers
the failing path and asserts the thrown error.message does not contain raw
token/email patterns but matches the sanitized output.
---
Duplicate comments:
In `@lib/unified-settings.ts`:
- Around line 36-68: The release helpers releaseUnifiedSettingsLockSync and
releaseUnifiedSettingsLockAsync currently swallow EBUSY/EPERM after logging and
can leave a stranded lock; modify both so that when (error as
NodeJS.ErrnoException)?.code is "EBUSY" or "EPERM" you log as before then
perform a fallback cleanup (attempt to forcibly remove the lock at
UNIFIED_SETTINGS_LOCK_PATH: use fs.unlinkSync in releaseUnifiedSettingsLockSync
and fs.promises.unlink in releaseUnifiedSettingsLockAsync, handling ENOENT
silently), and only return after the fallback cleanup completes or its error is
handled; also add a vitest regression that mocks release() to throw EBUSY and
EPERM and asserts a subsequent save does not block waiting for stale lock
expiry.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 83624405-4f0f-42b6-9952-8c14401abda2
📒 Files selected for processing (14)
index.tslib/background-jobs.tslib/codex-manager.tslib/data-retention.tslib/file-lock.tslib/unified-settings.tsscripts/license-policy-check.jstest/background-jobs.test.tstest/codex-manager-cli.test.tstest/data-retention.test.tstest/file-lock.test.tstest/index-retry.test.tstest/license-policy-check.test.tstest/unified-settings.test.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (2)
test/**
⚙️ CodeRabbit configuration file
tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.
Files:
test/license-policy-check.test.tstest/unified-settings.test.tstest/data-retention.test.tstest/index-retry.test.tstest/file-lock.test.tstest/background-jobs.test.tstest/codex-manager-cli.test.ts
lib/**
⚙️ CodeRabbit configuration file
focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.
Files:
lib/background-jobs.tslib/codex-manager.tslib/data-retention.tslib/unified-settings.tslib/file-lock.ts
🧬 Code graph analysis (9)
test/license-policy-check.test.ts (1)
scripts/license-policy-check.js (1)
denyList(15-18)
lib/background-jobs.ts (1)
lib/logger.ts (2)
error(389-393)logWarn(341-346)
test/unified-settings.test.ts (1)
lib/unified-settings.ts (4)
getUnifiedSettingsPath(264-266)saveUnifiedPluginConfig(320-331)loadUnifiedPluginConfigSync(277-285)saveUnifiedPluginConfigSync(299-308)
lib/data-retention.ts (1)
lib/utils.ts (1)
sleep(65-67)
test/file-lock.test.ts (1)
lib/file-lock.ts (1)
acquireFileLock(114-181)
index.ts (2)
lib/audit.ts (1)
auditLog(123-153)lib/logger.ts (1)
logWarn(341-346)
test/background-jobs.test.ts (1)
lib/background-jobs.ts (2)
runBackgroundJobWithRetry(84-130)getBackgroundJobDlqPath(80-82)
test/codex-manager-cli.test.ts (2)
lib/codex-manager.ts (1)
runCodexMultiAuthCli(4571-4652)scripts/codex.js (3)
runCodexMultiAuthCli(501-501)exitCode(505-505)exitCode(532-532)
lib/unified-settings.ts (3)
lib/logger.ts (2)
error(389-393)logWarn(341-346)lib/utils.ts (1)
sleep(65-67)lib/file-lock.ts (2)
acquireFileLockSync(222-294)acquireFileLock(114-181)
🔇 Additional comments (40)
lib/data-retention.ts (4)
21-21: enotempty now included in retryable codes — good.past review flagged this gap;
lib/data-retention.ts:21now includesENOTEMPTYalongside the other transient codes. this aligns withtest/data-retention.test.ts:7and handles windows concurrent-writer races onfs.rmdir.
28-40: retry logic looks solid.
withRetentionIoRetryatlib/data-retention.ts:28-40uses exponential backoff (10ms base, doubling) with 5 attempts. the unreachable guard at line 39 is good defensive coding. no concerns here.
67-91: fs operations properly wrapped with retry — lgtm.
lib/data-retention.ts:67-91wraps all relevant calls (readdir,stat,rmdir,unlink) withwithRetentionIoRetry. this handles EBUSY/EPERM/EACCES/EAGAIN/ENOTEMPTY scenarios on windows and under concurrent access. corresponding tests intest/data-retention.test.ts:101-278exercise these paths.
100-113: single-file pruning retry path is covered.
lib/data-retention.ts:102andlib/data-retention.ts:106wrapstatandunlinkwith retry. the ENOENT fallback at line 110 is correct. test coverage exists attest/data-retention.test.ts:202-237for transient unlink failures.test/data-retention.test.ts (4)
101-156: ebusy retry test covers stat and rmdir paths — good.
test/data-retention.test.ts:101-156injects one-time EBUSY on bothfs.statandfs.rmdir, asserts retry flags, and confirms file/dir removal. spies restored in finally block keeps it deterministic. this addresses the windows edge case for directory entry pruning.
158-200: parameterized test for EPERM/EACCES/EAGAIN addresses past feedback.
test/data-retention.test.ts:158-200exercises all three codes on thefs.statpath during directory entry pruning. good use ofit.eachfor coverage without duplication.
202-237: single-file unlink retry path now covered.
test/data-retention.test.ts:202-237injects transient EACCES onfs.unlinkfor the flagged-accounts file and asserts retry + removal. addresses past review comment about missing single-file retry coverage.
239-278: enotempty race test is correct.
test/data-retention.test.ts:239-278simulates a concurrent writer recreating content during rmdir. retry handles it and eventually succeeds. good regression case for windows concurrent-writer behavior.scripts/license-policy-check.js (2)
7-13: spdx normalization and token-set denylist matching look solid.the updates in
scripts/license-policy-check.js:7-13andscripts/license-policy-check.js:76-79correctly cover+,-only, and-or-laterforms while avoiding substring false positives. no concurrency risk introduced here.Also applies to: 17-18, 56-62, 76-79
27-54: license source extraction is a good hardening step.
extractRawLicenseinscripts/license-policy-check.js:27-54now handles multiple lockfile shapes (license,licensesstring, andlicenses[]) and keeps unknown handling behavior clear.Also applies to: 70-71
test/license-policy-check.test.ts (1)
70-74: assertion contract is clean and deterministic.the checks in
test/license-policy-check.test.ts:70-74validate both process failure and emitted violation text, which is exactly what this policy gate should lock down.lib/background-jobs.ts (1)
53-56: good hardening on default retry and attempts accounting.
lib/background-jobs.tsline 53-56 and line 91-127 correctly wire default 429 retryability and persist actual attempts made into dlq/log metadata.Also applies to: 91-127
test/codex-manager-cli.test.ts (1)
2103-2166: good deterministic coverage for input guards and redaction edge cases.
test/codex-manager-cli.test.tsline 2103-2166 and line 2215-2253 add clean vitest regressions for malformed cursor/page-size inputs, option-smuggling, and+//token redaction paths.Also applies to: 2215-2253
lib/unified-settings.ts (1)
300-307: good lock scope expansion for full read-modify-write sections.
lib/unified-settings.tsline 300-307, line 322-329, and line 370-377 now hold the file lock across the whole read-modify-write path, which closes the lost-update race.Also applies to: 322-329, 370-377
lib/file-lock.ts (8)
26-35: looks good: pid liveness check correctly treats EPERM as "alive".the
isPidAlivehelper atlib/file-lock.ts:26-35properly returnstruewhenprocess.kill(pid, 0)throwsEPERM, which means the process exists but the current user lacks permission to signal it. this prevents incorrect eviction of locks held by other users on shared filesystems.
68-75: stale-lock pid liveness check now prevents evicting live owners.the pid liveness check at
lib/file-lock.ts:68-75addresses the prior critical finding about long-running writers losing their locks. if the lock file age exceedsstaleAfterMsbut the owning pid is still alive, eviction is skipped. this is the correct behavior for windows and cross-platform safety.
100-112: cleanup helper correctly preserves original acquisition error.
cleanupIncompleteLockFileatlib/file-lock.ts:100-112now passesoriginalErrorthrough tothrowOriginalOrCleanupErrorwhen cleanup fails with non-ENOENT, addressing the prior major finding about error masking. the error precedence logic is sound.
141-145: write/close failure cleanup is now best-effort and rethrows original error.at
lib/file-lock.ts:141-145, ifwriteErrororcloseErroroccur, the code callscleanupIncompleteLockFile(path, originalFailure)then rethrows viathrowOriginalOrCleanupError. this addresses the prior critical finding about orphan lock files after write/close failures.
152-161: release marksreleased=trueafter successful unlink, treats ENOENT as success.at
lib/file-lock.ts:152-161, settingreleased = trueafter unlink succeeds (or on ENOENT) allows safe idempotent calls. the placement post-unlink is correct per the pr objectives about retry-on-transient-EBUSY behavior.
189-196: sync pid liveness check mirrors async behavior.
removeIfStaleSyncatlib/file-lock.ts:189-196uses the sameparseLockPidandisPidAlivelogic as the async version, maintaining consistency.
208-220: sync cleanup helper mirrors async error preservation.
cleanupIncompleteLockFileSyncatlib/file-lock.ts:208-220correctly mirrors the async version's error preservation logic.
250-254: sync write/close failure cleanup follows same pattern as async.at
lib/file-lock.ts:250-254, the sync path mirrors the async cleanup-and-rethrow pattern.test/index-retry.test.ts (1)
217-261: solid regression test for audit url sanitization fallback path.at
test/index-retry.test.ts:217-261, this test exercises the audit URL parsing fallback by passing a relative URL with query params and fragment. the assertions at lines 252-255 verify no sensitive data leaks through. the try/finally cleanup withvi.doUnmockandvi.resetModulesis correct for module-level mocking isolation.test/unified-settings.test.ts (3)
249-302: lock-under-read test correctly validates read-modify-write scope.at
test/unified-settings.test.ts:249-302, the test verifies thatsaveUnifiedPluginConfigholds the lock while reading existing settings via thelockState.heldflag andfs.readFilespy. this aligns with the pr objective about expanding lock scope to cover full read-modify-write.
304-333: EPERM release regression for async writes looks good.at
test/unified-settings.test.ts:304-333, the test mocks lock.release() to throw EPERM and verifies thatsaveUnifiedPluginConfigstill succeeds. this covers the windows-style transient unlink contention path per pr objectives.
335-364: EBUSY release regression for sync writes looks good.at
test/unified-settings.test.ts:335-364, the test mocks sync lock.release() to throw EBUSY and verifies thatsaveUnifiedPluginConfigSyncsucceeds. this complements the async EPERM test.lib/codex-manager.ts (7)
140-144: option-smuggling guard is a good hardening addition.
looksLikeOptionTokenatlib/codex-manager.ts:140-144prevents values like--foofrom being accepted as option arguments. this mitigates option injection via malicious input.
262-270: token redaction now covers JWT/URL-safe characters.at
lib/codex-manager.ts:262-270, the character class[A-Z0-9._~+/=-]in the token and bearer regexes now includes+,/, and=per the prior review feedback. this prevents partial redaction of base64-encoded tokens.
272-285: sanitizeCliErrorMessage provides defense-in-depth for CLI error output.at
lib/codex-manager.ts:272-285, the function chainsredactForExternalOutputandredactFreeformSecretTextto sanitize error messages before logging. this addresses the pr objective about safe CLI error output.
1553-1562: cursor validation tightened to reject non-numeric payloads.at
lib/codex-manager.ts:1556, the/^\d+$/test on the decoded base64 value ensures only strictly numeric cursors are accepted. this prevents potential injection via malformed cursor values.
1998-2013: --page-size input validation now rejects non-numeric values.at
lib/codex-manager.ts:2003-2006, the/^\d+$/regex validates that --page-size values contain only digits before parsing. this prevents parseInt coercion issues like"10abc"→10.
2221-2237: idempotency-key validation prevents option smuggling.at
lib/codex-manager.ts:2223and2232,looksLikeOptionToken(value)rejects idempotency-key values that look like CLI options. combined with the empty-value check, this hardens against malformed input.
4125-4127: rotate-secrets error path uses sanitizeCliErrorMessage.at
lib/codex-manager.ts:4126, error messages are sanitized before logging and auditing. this prevents token/email leakage in error output per the pr objectives.test/file-lock.test.ts (7)
88-99: runtime transpilation enables proper module isolation for workers.at
test/file-lock.test.ts:88-99, transpilinglib/file-lock.tsat runtime and passing the module URL to workers ensures workers use the actual lock implementation rather than a stub. this is a solid approach for cross-process lock testing.
101-101: increased iterations improves contention coverage.at
test/file-lock.test.ts:101,iterationsPerWorker = 50(up from prior 10) increases the probability of detecting race conditions. combined with 6 workers, this yields 300 total lock acquisitions.
106-121: worker script now uses actual acquireFileLock from transpiled module.at
test/file-lock.test.ts:106-121, the worker script importsacquireFileLockdynamically from the passed module URL and uses it with proper options. the removal of the internal retry loop and use of the real lock implementation is correct.
167-200: EBUSY and EPERM release retry regression tests address prior feedback.at
test/file-lock.test.ts:167-200, the parameterizedit.each(["EBUSY", "EPERM"])test covers both transient unlink failure codes as requested in the prior review. the test verifies that after the first release throws, a retry succeeds and the lock file is cleaned up.
202-217: pid liveness prevents stale eviction of live lock owners.at
test/file-lock.test.ts:202-217, this test creates a lock file withprocess.pid(guaranteed alive) and a stale timestamp, then verifies acquisition fails with EEXIST rather than evicting the live owner's lock. this validates the pid liveness check inremoveIfStale.
219-255: cleanup on metadata write failure regression test looks good.at
test/file-lock.test.ts:219-255, the test mocksfs.opento return a handle whosewriteFilethrows EIO, then verifies the lock file is cleaned up (ENOENT). this validates the cleanup path inacquireFileLockwhen write fails.
257-302: error precedence test verifies original error is preserved.at
test/file-lock.test.ts:257-302, the test mocks both writeFile (throws EIO) and unlink (throws EPERM for cleanup), then verifies the original "metadata write failed" error is thrown rather than the cleanup error. this validates thethrowOriginalOrCleanupErrorlogic.
- dedupe safe audit resource sanitization in plugin fetch flow - surface retention retry exhaustion and cover persistent EBUSY regression - share test removeWithRetry helper and add license policy parsing regressions Co-authored-by: Codex <noreply@openai.com>
Conflict resolution: applied the stacked hardening follow-up on top of the merged PR #32 runtime and test surfaces while preserving the earlier DX workflow choices. Co-authored-by: Codex <noreply@openai.com>
|
Merged via integration PR to safely combine multiple approved changes. |
Summary
Validation
Thread Resolution Update (2026-03-05)
Addressed all open review threads in this stacked PR with focused reliability/security fixes:
index.tsto strip query/fragment on URL parse failure.lib/background-jobs.ts): full token redaction charset and jittered backoff.lib/data-retention.ts) to avoid recursive retry amplification and addedENOTEMPTYretryability.EPERM/EBUSY) non-fatal post-write inlib/unified-settings.ts.lib/file-lock.ts): PID liveness check and original error preservation.lib/codex-manager.ts.scripts/license-policy-check.jsand added regression coverage.Added/updated regressions:
test/background-jobs.test.tstest/codex-manager-cli.test.tstest/data-retention.test.tstest/file-lock.test.tstest/index-retry.test.tstest/unified-settings.test.tstest/license-policy-check.test.ts(new)Validation:
npm run typechecknpm run lintnpx vitest run test/background-jobs.test.ts test/codex-manager-cli.test.ts test/data-retention.test.ts test/file-lock.test.ts test/index-retry.test.ts test/unified-settings.test.ts test/license-policy-check.test.tsThread Resolution Update (2026-03-05, final)
What changed
How to test
px vitest run test/data-retention.test.ts test/license-policy-check.test.ts test/unified-settings.test.ts test/file-lock.test.ts
pm run lint
pm run typecheck
Risk / rollout notes
note: greptile review for oc-chatgpt-multi-auth. cite files like
lib/foo.ts:123. confirm regression tests + windows concurrency/token redaction coverage.Greptile Summary
this pr lands a focused reliability and token-safety hardening pass across the storage, lock, retention, and cli layers. the core changes are: deterministic lock ordering in
lib/storage.ts(file-queue mutex → file lock → storage mutex), per-leaf-op retry inlib/data-retention.tsto avoid subtree restarts, pid-liveness gating before stale lock eviction inlib/file-lock.ts, read-modify-write atomicity forlib/unified-settings.ts,toSafeAuditResourcestripping query/fragment from audit log urls inindex.ts, and expanded redaction coverage in bothlib/background-jobs.tsandlib/codex-manager.ts. ci is also hardened with a dependency gate on the test job before coverage runs.key changes
lib/storage.ts: introduceswithStorageSerializedFileLockwith explicit documented acquisition order; addsSTORAGE_DECRYPT_ERROR_CODEfast-fail path;rotateStoredSecretEncryptionis now transactionallib/file-lock.ts: pid liveness check before stale eviction; atomicreleased = trueplacement; cleanup on partial write failureslib/unified-settings.ts: lock scope now covers full read-modify-write cycle (not just the write);EPERM/EBUSYrelease failures are non-fatal post-writelib/data-retention.ts: retry is scoped to individual leaf i/o ops; misleading"Skipping..."log message logged immediately before athrowthat actually aborts the prune cycle — needs correctionlib/background-jobs.ts: jittered backoff, 429 retryability, accurate attempt counter; redaction regex has an optional separator that can over-redact plain status phrases like"token expired"test/background-jobs.test.ts: inlines aremoveWithRetrycopy with fewer retryable codes than the shared helper added in the same pr — should import fromtest/helpers/remove-with-retry.tsinsteadConfidence Score: 3/5
lib/data-retention.tsthat fires immediately before athrow— operators will read that log and expect a skip, not a crash of the full prune cycle, which is a real ops incident risk on windows with av-locked files. the inlinedremoveWithRetryintest/background-jobs.test.tswith an incomplete retryable-codes set is a minor but tangible risk on windows ci runners. the background-jobs redaction regex is over-aggressive but not a security risk. no new token leakage vectors introduced.lib/data-retention.ts(misleading log + abort behaviour) andtest/background-jobs.test.ts(incomplete inline helper)Important Files Changed
withRetentionIoRetryto handle transient windows lock errors; howeverthrow errorafter logging "Skipping" contradicts the stated best-effort skip intent and will abort the entire prune cycle on persistent lock failuresreleasedflag moved post-unlink to prevent premature reuse, and stale eviction now checks pid liveness before unlinking — solid concurrency improvements for windowswithStorageSerializedFileLockwith explicit acquisition order (queue mutex → file lock → storage mutex) to prevent deadlocks; addsSTORAGE_DECRYPT_ERROR_CODEfor fast-fail on decryption errors;rotateStoredSecretEncryptionnow correctly uses a transactionwriteSettingsRecordSync/Asyncand moved to callers, so read-modify-write is now atomic under the file lock;EPERM/EBUSYrelease failures are non-fatal post-write;SETTINGS_LOCK_OPTIONSconstant avoids copy-pasteattemptsMadecounter, and DLQ error sanitisation; redaction regex has an optional separator that can over-redact plain status phrases, reducing DLQ diagnostic valuelooksLikeOptionToken, base64 cursor validation, page-size digit-only guard, andsanitizeCliErrorMessagefor rotate-secrets audit; changes look correct and well-testedtoSafeAuditResourcehelper and applies it to all three audit-log call sites; fallback strips query/fragment on URL parse failure; error audit now recordserrorType(class name) instead of the full message to avoid token leakagenormalizeSpdxToken) and structuredextractLicenseTokenssplitting; switches from substringincludesto set membership check, eliminating false positives from substring matches likeMITmatchingMIT-CMUremoveWithRetrywith an incomplete retryable-codes set instead of importing the shared helper added in the same PRneeds: test, uploads/downloads a dist artifact, and uses the correcttest:coveragescript; eliminates the previous race where coverage could run before a successful buildSequence Diagram
sequenceDiagram participant C as Caller participant M as accountFileMutex participant F as acquireFileLock participant S as storageMutex participant D as Disk C->>M: withAccountFileMutex() activate M M->>F: withAccountFileLock(path) activate F F->>D: open(.lock, wx) D-->>F: fd F->>D: writeFile(pid+timestamp) F->>D: close(fd) F->>S: withStorageLock(fn) activate S S->>D: loadAccountsInternal() D-->>S: accounts S->>D: saveAccountsUnlocked() D-->>S: ok S-->>F: result deactivate S F->>D: unlink(.lock) F-->>M: result deactivate F M-->>C: result deactivate MComments Outside Diff (2)
General comment
orphaned lock file on write failure — windows filesystem deadlock risk
when
handle.writeFilethrows (common on windows due to av-scanner holding the fd briefly —EPERM), the file was already created byfs.open("wx"). the error path closes the handle and throws, but never unlinks the file. subsequentacquireFileLockcalls hitEEXISTand fall into the stale-detection path, which tries toJSON.parsethe lock file. an empty or partially-written file causes the parse to throw, and the stale cleanup skips the unlink — leaving the lock stuck for the fullstaleAfterMs(120 s).same problem exists in
acquireFileLockSyncat the equivalent block.no vitest coverage for this branch — add a test that injects
EPERMonhandle.writeFileand asserts the lock file is removed before the next attempt.Prompt To Fix With AI
General comment
write happens outside the file lock in the refactored
writeSettingsRecordSyncthe old code held the file lock for the entire temp-write → rename sequence. the refactored
writeSettingsRecordSyncwrites the temp file and renames before any lock is acquired — the lock is now only acquired in the callers (saveUnifiedPluginConfigSync, etc.). sincewriteSettingsRecordSyncis called inside the locked block in the callers, the write is actually protected today.however, the rename loop in
writeSettingsRecordSynchappens while the lock is held, which is correct. but the temp path for the sync variant is${path}.${process.pid}.${Date.now()}.tmp— no random suffix. two same-process sync writes occurring at the same millisecond timestamp would overwrite each other's temp file. the file lock serialises same-process callers, but a cross-process scenario (multiple terminals starting at the same ms) could clobber the temp file before rename. the async variant already uses a random suffix — apply the same pattern here for consistency and safety.Prompt To Fix With AI
General comment
writeSettingsRecordSync/Asyncnow have implicit caller contract for lock safetythe old design held the file lock inside
writeSettingsRecordSyncandwriteSettingsRecordAsync, so any call was automatically safe. the refactor moved the lock up tosaveUnifiedPluginConfigSync,saveUnifiedPluginConfig, andsaveUnifiedDashboardSettings, which correctly protects the full read-modify-write cycle — but turns the two write helpers into functions that silently require the caller to holdUNIFIED_SETTINGS_LOCK_PATH.on windows, a future internal caller that doesn't acquire the lock first could race concurrent renames on
settings.jsonwithout any guard.consider adding a
// REQUIRES: caller holds UNIFIED_SETTINGS_LOCK_PATH file lockdoc comment on both private write helpers (lines 131 and 182) to make the contract explicit, or keeping the internal lock AND the outer lock (nested acquires of the same file-lock path are already guarded by thestaleAfterMscheck).Prompt To Fix With AI
lib/data-retention.ts, line 486-491 (link)misleading log message before abort
the logger emits
"Skipping retention entry after retry exhaustion"but then immediatelythrow error— which aborts the entire prune cycle, not just that one entry. the test attest/data-retention.test.tsconfirms the throw is intentional, but the log message contradicts what actually happens, making ops triage confusing. on windows, an antivirus-locked file hitting this path will bubble up through every caller and kill the full retention run while the audit log says "Skipping".fix the message to match the behaviour, e.g.:
Last reviewed commit: 65112ff
Context used:
dashboard- What: Every code change must explain how it defends against Windows filesystem concurrency bugs and ... (source)