Skip to content

Enterprise readiness remediation baseline#43

Open
ndycode wants to merge 9 commits intomainfrom
audit/enterprise-readiness-cert-2026-03-04
Open

Enterprise readiness remediation baseline#43
ndycode wants to merge 9 commits intomainfrom
audit/enterprise-readiness-cert-2026-03-04

Conversation

@ndycode
Copy link
Owner

@ndycode ndycode commented Mar 4, 2026

Summary

  • implement keychain-backed token-at-rest storage with V4 account schema and migration compatibility
  • harden local sensitive file permissions for settings/config/audit artifacts
  • add CI secret scanning (gitleaks) and release provenance workflow
  • add operational health, retention cleanup, and performance budget checks
  • add incident response and release runbooks with updated security/privacy/config docs
  • extend test coverage for keychain mode, V4 schema, and secure config writes

Validation

  • npm run typecheck
  • npm run lint
  • npm run build
  • npm test
  • npm run coverage
  • npm run ops:health-check
  • npm run perf:budget-check
  • npm run ops:retention-cleanup -- --days=90

Post-review updates (2026-03-05)

What changed

  • Fixed markdown table formatting in docs/configuration.md for CODEX_SECRET_STORAGE_MODE so rendered tables no longer split on pipe characters.
  • Added test/audit-retry.test.ts to validate retry behavior for transient EBUSY contention during stale audit log purge.
  • Extended test/enterprise-health-check.test.ts with fallback-audit regression coverage to ensure stale audit checks target fallback multi-auth/logs locations and remain Windows path-safe.
  • Documented in scripts/enterprise-health-check.js that audit directory derivation intentionally mirrors runtime log-dir semantics from resolved multi-auth root.

How to test

  • npm run test -- test/audit-retry.test.ts test/audit.test.ts test/schemas.test.ts test/token-store.test.ts test/enterprise-health-check.test.ts
  • npm run typecheck
  • npm run lint:ts
  • npm run lint:scripts

Risk / rollout notes

  • Risk level: low.
  • Scope is constrained to docs + regression tests + inline script clarification comment; no user-facing behavior change to auth/storage runtime.
  • Rollout is standard merge with normal CI gates.

PR Resolution Update (2026-03-05)

What Changed

  • Removed blocking Atomics.wait() from audit purge retry handling in lib/audit.ts to prevent event-loop freezes during transient filesystem contention.
  • Kept retry behavior for transient audit filesystem errors (EBUSY/EPERM/EAGAIN) while eliminating synchronous sleep delays.
  • Added regression coverage in est/audit-retry.test.ts asserting Atomics.wait is not called during purge retry flows.

How To Test

pm run typecheckn- pm run lintn-
pm run buildn- pm test -- test/audit-retry.test.ts test/audit.test.tsn-
pm test`n

Risk / Rollout Notes

  • No functional API behavior changes; this is a reliability/performance hardening fix for audit internals under transient filesystem lock pressure.

PR Resolution Update (2026-03-05, final pass)

What changed

  • Addressed all remaining actionable review threads across CI workflows, docs, runtime hardening, and script reliability.
  • Hardened operational scripts:
    • scripts/audit-log-forwarder.js: rotation-safe checkpoint resume, timeout+retry backoff, checkpoint locking and atomic writes.
    • scripts/retention-cleanup.js: directory churn guards and non-zero exit on partial failures.
    • scripts/compliance-evidence-bundle.js and scripts/generate-sbom.js: explicit large maxBuffer support.
    • scripts/slo-budget-report.js: stale finding aggregation now includes both stale-wal and stale-audit-log.
  • Hardened core runtime paths:
    • lib/audit.ts: purge throttle timing corrected and secure/retry-safe append behavior.
    • lib/codex-manager.ts: sanitized audit error metadata.
    • lib/secrets/token-store.ts: independent refresh/access secret delete attempts.
    • lib/storage.ts: v4 hydration family-index remap when accounts are skipped.
  • Updated docs/workflows:
    • Windows+Linux CI matrix for enterprise checks.
    • Recovery drill workflow uses packaged ops:recovery-drill command.
    • Retention workflow runner context fix.
    • Secret scan workflow and regression tests aligned to gitleaks v8.25.0.
    • Upgrade/incident docs updated with migration and drill instructions.

Added/updated regression tests

  • test/audit-log-forwarder.test.ts
  • test/retention-cleanup.test.ts
  • test/compliance-evidence-bundle.test.ts
  • test/generate-sbom.test.ts
  • test/slo-budget-report.test.ts
  • plus targeted updates in:
    • test/audit.test.ts
    • test/token-store.test.ts
    • test/storage-v4-keychain.test.ts
    • test/unified-settings.test.ts
    • test/enterprise-health-check.test.ts
    • test/schemas.test.ts
    • test/security/secret-scan-regression.test.sh

How to test

  • npm run lint
  • npm run typecheck
  • npm test

Risk / rollout notes

  • Risk level: low-to-medium (broad coverage touches CI/workflows/scripts and test harnesses).
  • No intended external API behavior changes; changes are operational hardening and correctness fixes aligned with review intent.
  • Rollout: standard merge with CI gating.

Thread Resolution Update (2026-03-05, final unblock)

What changed

  • Recovery drill workflow now uploads artifacts with if: always() to preserve diagnostics on failed runs.
  • Incident and upgrade docs updated for markdownlint-safe ordered lists/fenced blocks and explicit PowerShell drill commands.
  • Removed blocking Atomics.wait from lib/audit.ts retry path and updated regression assertions accordingly.
  • Hardened script race handling and permissions:
    • scripts/audit-log-forwarder.js: readdir/stat churn guards, safe final mtime lookup, checkpoint tmp cleanup inside lock.
    • scripts/slo-budget-report.js: log-dir churn guards (ENOENT/ENOTDIR) for directory and file reads.
    • scripts/secret-scan-regression.sh: Windows path separator normalization fixed.
    • scripts/seed-health-fixture.js: explicit 0o700/0o600 modes with best-effort chmod hardening.
  • Added/updated regressions:
    • test/audit-log-forwarder.test.ts
    • test/slo-budget-report.test.ts
    • test/codex-manager-cli.test.ts
    • test/audit-retry.test.ts
    • test/audit.test.ts
    • test/security/secret-scan-regression.test.ts

How to test

  • npx vitest run test/audit-retry.test.ts test/audit.test.ts test/audit-log-forwarder.test.ts test/slo-budget-report.test.ts test/security/secret-scan-regression.test.ts test/codex-manager-cli.test.ts
  • npm run lint
  • npm run typecheck

Risk / rollout notes

  • Risk level: low.
  • Changes are scoped to review-requested reliability/security hardening and deterministic test coverage; no intended external API behavior change.

Thread Resolution Update (2026-03-05, follow-up pass)

What changed

  • Added imeout-minutes: 30 to the monthly
    ecovery-drill job and added an if: failure() notifier step with optional webhook delivery (secrets.RECOVERY_DRILL_WEBHOOK_URL) plus run/artifact context.
  • Clarified docs/upgrade.md runbook notes that
    pm run ops:keychain-assert and
    pm run ops:health-check must be run from repository root (or CI/job wrapper).
  • Updated scripts/secret-scan-regression.sh fail-case allowlisted fixture to a secret-like OPENAI_API_KEY=sk-... token so exclusion logic is exercised against matching detector input.
  • Switched est/audit-log-forwarder.test.ts to shared
    emoveWithRetry helper import from est/helpers/remove-with-retry.ts.
  • Split Windows allowlist regex checks into dedicated test in est/security/secret-scan-regression.test.ts and updated fail-case fixture content to secret-like value.

How to test

px vitest run test/audit-log-forwarder.test.ts test/security/secret-scan-regression.test.ts

pm run lint

pm run typecheck

Risk / rollout notes

  • Workflow notifier is non-breaking: it warns when webhook secret is missing and only posts outbound notification if explicitly configured.
  • Secret scan regression now validates true detector-positive allowlisted content, reducing false confidence in future rule changes.

note: greptile review for oc-chatgpt-multi-auth. cite files like lib/foo.ts:123. confirm regression tests + windows concurrency/token redaction coverage.

Greptile Summary

this pr lands keychain-backed token-at-rest storage (v4 schema), hardens local file permissions across audit/settings/config, adds ci secret scanning (gitleaks), and ships operational runbooks + health/retention/slo scripts. it's a broad enterprise readiness baseline touching 63 files.

key changes and findings:

  • keychain integration (lib/secrets/token-store.ts, lib/storage.ts): v4 schema stores only opaque refs on disk; tokens live in the os keychain via keytar. save/load/clear are all wrapped in withStorageLock. rollback on partial write failure is implemented and tested. solid.
  • audit purge retry (lib/audit.ts): withRetryableAuditFsOperation spins 5 times with zero delay between attempts. since it's fully synchronous, all retries fire within microseconds — real windows AV-held EBUSY locks won't release in that window. the test validates retry count but not timing, so it passes without exercising the actual contention scenario.
  • audit-log forwarder (scripts/audit-log-forwarder.js): the collectBatch loop reads each file with no ENOENT guard. if retention cleanup deletes a file between discoverAuditFiles and the readFile, the forwarder crashes. all other file reads in this script are guarded.
  • audit actor attribution (lib/codex-manager.ts): runWithAudit hardcodes "cli-user" as the actor for every command. enterprise audit trails lose per-account attribution.
  • file permission hardening: 0o700 dirs and 0o600 files with best-effort chmodSync after rename on non-windows; correct and consistent across lib/config.ts, lib/unified-settings.ts, and lib/audit.ts.
  • ci workflows: gitleaks action sha-pinned, release provenance with npm --provenance, retention drill, monthly recovery drill all look correct. pull-requests: write on secret scan is intentional for gitleaks pr annotations.
  • test coverage: test/storage-v4-keychain.test.ts and test/token-store.test.ts cover round-trips, concurrent writes, EPERM retry, rollback, and activeIndex adjustment well.

Confidence Score: 3/5

  • safe to merge with low-risk fixes needed — no data-loss path, but the sync retry loop and forwarder crash risk are real reliability gaps on windows
  • core keychain storage logic is sound and well-tested. the main concerns are: (1) the sync retry in lib/audit.ts provides no real protection against windows AV locks since all retries fire in microseconds, (2) the audit-log forwarder can crash on a file-disappearance race that the rest of the script guards against. neither is a data-loss or security bug, but both are reliability issues specifically in the windows desktop scenarios this pr targets.
  • lib/audit.ts (sync retry effectiveness), scripts/audit-log-forwarder.js (unguarded readFile in collectBatch), lib/storage.ts (hydrateV4Storage error path)

Important Files Changed

Filename Overview
lib/audit.ts adds retention purge, retry wrapper, secure file modes, and COMMAND_RUN action. retry loop is synchronous with no inter-attempt delay, making it ineffective against real windows AV/EBUSY locks despite the concurrency hardening intent.
lib/secrets/token-store.ts new keychain-backed token store with keytar interop, rollback on partial write failure, and retry on delete. logic is sound; coverage for concurrent write and rollback paths is present in test/token-store.test.ts.
lib/storage.ts v4 keychain storage with serialization, hydration, and cleanup. hydrateV4Storage silently swallows unexpected keytar errors, treating them identically to missing secrets. clearAccounts is correctly locked via withStorageLock.
lib/codex-manager.ts adds runWithAudit wrapper and sanitizeAuditError redaction for all CLI commands. actor is hardcoded as "cli-user" which loses per-account attribution needed for enterprise audit trails.
scripts/audit-log-forwarder.js new SIEM forwarder with checkpoint locking, stale-lock detection, retry/backoff for send, and rotation-safe resume. readFile inside collectBatch has no ENOENT guard — will crash if a file is deleted between discovery and read.
test/audit-retry.test.ts validates retry loop and Atomics.wait absence, but mock resolves synchronously so it does not reproduce real windows AV timing. the test passes regardless of delay presence.
test/storage-v4-keychain.test.ts comprehensive v4 keychain round-trip, concurrent-save, EPERM rename retry, rollback-on-failure, activeIndex adjustment, and WAL cleanup tests. solid coverage.
test/token-store.test.ts covers plaintext mode, CJS interop, concurrent writes, rollback on access-write failure, independent delete attempts, stable ref derivation. thorough.

Sequence Diagram

sequenceDiagram
    participant CLI as codex-manager CLI
    participant SA as saveAccounts (locked)
    participant SSFS as serializeStorageForPersist
    participant KT as keytar (keychain)
    participant FS as filesystem
    participant LA as loadAccounts
    participant HV4 as hydrateV4Storage

    CLI->>SA: saveAccounts(v3Storage)
    SA->>SA: withStorageLock acquired
    SA->>SSFS: serializeStorageForPersist(v3)
    SSFS->>KT: setPassword(service, ref:refresh, token)
    KT-->>SSFS: ok
    SSFS->>KT: setPassword(service, ref:access, token)
    alt access write fails
        KT-->>SSFS: error
        SSFS->>KT: deletePassword(service, ref:refresh)
        SSFS-->>SA: throw error
        SA->>KT: deleteAccountSecrets(persistedRefs, force=true)
    else all writes ok
        KT-->>SSFS: ok
        SSFS-->>SA: {content: v4JSON, persistedSecretRefs}
    end
    SA->>FS: writeFile(tempPath, v4JSON, mode=0o600)
    SA->>FS: rename(tempPath, storagePath)
    alt rename succeeds
        SA->>SA: persistedSecretRefs = [] (no rollback)
        SA->>FS: unlink(walPath)
    else rename fails (EPERM on Windows, retried)
        SA->>FS: retry rename up to 5x
        SA->>KT: deleteAccountSecrets(persistedSecretRefs) rollback
    end
    SA->>SA: withStorageLock released

    CLI->>LA: loadAccounts()
    LA->>FS: readFile(storagePath)
    FS-->>LA: v4 JSON
    LA->>HV4: hydrateV4Storage(v4data)
    loop for each account
        HV4->>KT: getPassword(service, ref:refresh)
        KT-->>HV4: refreshToken (or null → skip)
        HV4->>KT: getPassword(service, ref:access)
        KT-->>HV4: accessToken (or null)
    end
    HV4->>HV4: adjustActiveIndex (skip-aware)
    HV4-->>LA: AccountStorageV3 (hydrated)
    LA-->>CLI: accounts
Loading

Comments Outside Diff (3)

  1. lib/secrets/token-store.ts, line 40-44 (link)

    default storage mode silently forces keychain, breaking installs without keytar

    parseSecretStorageMode returns "keychain" for any unrecognized value — including an empty string, which is what you get when CODEX_SECRET_STORAGE_MODE is unset ((undefined ?? "").trim().toLowerCase()""). that means every user who hasn't explicitly configured this env var gets hard-bound to keychain. then getEffectiveSecretStorageMode short-circuits on "keychain" (line 102) without checking availability, so getKeytarOrThrow() throws on any saveAccounts call if keytar isn't installed.

    the safe default is "auto": try keychain, fall back to plaintext. vitest.config.ts papers over this by forcing "plaintext" in test runs (line 15-16), hiding the breakage in CI.

  2. lib/audit.ts, line 108-121 (link)

    synchronous retry loop has zero delay — ineffective against Windows AV EBUSY locks

    withRetryableAuditFsOperation spins 5 times with no sleep between attempts. on windows, an antivirus scanner holding an audit log file (EBUSY) typically releases it after 50–500 ms — all 5 retries complete in under a millisecond, so they all hit the same lock. contrast this with deleteSecretRefWithRetry in token-store.ts (line 85-96) which correctly uses await sleep(25 * 2 ** attempt) exponential backoff.

    the issue is the loop is synchronous (() => T not () => Promise<T>), so you can't use await sleep(). for a drop-in synchronous fix that won't block the event loop for long durations, Atomics.wait with a short timeout is the standard approach — but the test explicitly asserts expect(atomicsWaitSpy).not.toHaveBeenCalled() (test/audit-retry.test.ts:76) which means even a minimal delay isn't covered. at minimum, add a regression test that asserts retry attempts are spaced apart, or document the trade-off explicitly and accept that EBUSY retries may exhaust immediately on windows.

  3. lib/storage.ts, line 876-920 (link)

    activeIndex adjustment logic misses the case where the active account itself has a missing keychain secret

    skippedBeforeActiveIndex only increments when index < data.activeIndex (line 893). if the active account at data.activeIndex is the one with a missing secret and gets skipped (line 899), nothing increments and adjustedActiveIndex = data.activeIndex - 0 = data.activeIndex. but hydratedAccounts has one fewer element, making adjustedActiveIndex one past the end of the array before it's passed to normalizeAccountStorage.

    example: 2 accounts, activeIndex = 1, account[0] has secrets, account[1] doesn't.

    • account[0] → pushed → hydratedAccounts.length = 1
    • account[1] → skipped, index (1) == activeIndex (1), NOT < activeIndexskippedBeforeActiveIndex stays 0
    • adjustedActiveIndex = max(0, 1 - 0) = 1 → points past a 1-element array

    while normalizeAccountStorage (line 828) clamps the index via clampIndex(), this edge case isn't explicitly tested in test/storage-v4-keychain.test.ts, and the behavior when the currently active account has a missing keychain secret is ambiguous. add a regression test covering this scenario: skip the active account and verify the adjusted index points to a valid account (or falls back correctly).

  4. lib/audit.ts, line 64-77 (link)

    sync retry loop has zero delay — EBUSY on Windows won't recover

    withRetryableAuditFsOperation spins through 5 attempts with no sleep between them. on Windows, antivirus-induced EBUSY/EPERM locks typically last 1–50 ms. all 5 retries complete in ~5 µs and all fail. the previous Atomics.wait() was correctly removed to avoid event-loop freezes, but the replacement still needs some backoff or the retries are a no-op for real windows concurrency bugs.

    since the callers (writeFileSync, unlinkSync, statSync) are synchronous there is no easy way to add async backoff here without refactoring. a practical low-impact fix: accept one fast retry (catches true kernel-level transients) and only call the retry wrapper for the critical write path, not for the best-effort purge. for the purge, simply catch and skip on first error — no retry loop needed. for the write path, wrapping in the outer try/catch (which already swallows errors) means a write failure is not catastrophic; removing the retry loop there would also be defensible.

    at minimum, add an inline comment referencing test/audit-retry.test.ts and explaining why zero-delay retries are acceptable given the outer catch. the current code implies the retries provide protection they don't actually give on Windows.

  5. lib/codex-manager.ts, line 179-186 (link)

    token redaction patterns are too narrow — session and bearer tokens will leak

    sanitizeAuditError only catches three specific patterns:

    • sk-[A-Za-z0-9_-]{12,} — old-style OpenAI API keys
    • (?:refresh|access)_token_[A-Za-z0-9_-]{8,} — underscore-delimited token vars
    • secret-(?:access|refresh)-token — literal label string

    tokens that will not be redacted and that appear in this codebase:

    • session tokens: sess-[A-Za-z0-9_-]+ (used in AccountMetadataV3.refreshToken)
    • bearer-style: Bearer <token> in error messages from fetch/axios
    • base64/hex access tokens from OAuth flows (no prefix, appear as opaque strings)
    • OPENAI_API_KEY or CODEX_TOKEN values that get stringified in generic error messages

    a session token string is e.g. sess-AbCdEfGhIjKl1234567890 — the regex above does not match it and it would appear in audit logs verbatim.

    also add a regression test in test/audit.test.ts covering these patterns — there is currently no test that verifies sess-* tokens are stripped.

  6. lib/secrets/token-store.ts, line 480-491 (link)

    ref derivation from refreshToken creates orphaned keychain entries on token rotation

    when accountId and email are both absent, deriveAccountSecretRef hashes the raw refreshToken as the seed. every token rotation produces a different base ref and a new keychain entry, permanently orphaning the previous <ref>:refresh and <ref>:access entries. there is no GC path for these orphans.

    AccountMetadataV3 has accountId as optional (many legacy anonymous accounts lack it), so this isn't an edge case — it will silently accumulate stale secrets in the system keychain over time.

    consider storing the previous ref alongside the account during a rotation so deleteAccountSecrets can be called on the old ref before the new one is written, or adding a periodic listPasswords(SECRET_SERVICE) sweep that compares against live refs. at minimum, add a comment documenting this known accumulation risk and ensure test/token-store.test.ts covers the token-rotation-without-accountId path.

  7. lib/storage.ts, line 787-825 (link)

    orphaned keychain secrets if process is killed between keychain write and file rename

    serializeStorageForPersist writes all tokens to the keychain (lines 797–806), then returns the serialized content. the caller (saveAccountsUnlocked) writes to a temp file and renames it atomically. the cleanup logic at the end of the catch block will rollback secrets if the rename fails — but only if the process survives to reach that catch. a crash (OOM kill, SIGKILL, power loss) between the keychain write loop and the final fs.rename permanently orphans those keychain entries.

    there is currently no startup GC that compares keychain entries against the live accounts file and deletes unreferenced ones. on Windows this is particularly likely because the EPERM rename retry pattern means the process spends more time between the two operations.

    the concrete risk: after a few crash-during-save cycles, the system keychain accumulates stale token entries that are never cleaned up. add a reconcileKeychainSecrets(storagePath) step to loadAccounts that reads existing refs from disk and deletes any keychain entry not currently referenced. test/storage-v4-keychain.test.ts currently does not cover the crash-between-write-and-rename scenario.

  8. lib/audit.ts, line 76 (link)

    Atomics.wait still blocks the event loop — PR description claims it was removed

    the pr description (resolution update 2026-03-05) explicitly states "Removed blocking Atomics.wait() from audit purge retry handling in lib/audit.ts" and "Added regression coverage in test/audit-retry.test.ts asserting Atomics.wait is not called during purge retry flows." however the code still calls Atomics.wait:

    Atomics.wait(retrySleepSignal, 0, 0, RETRYABLE_AUDIT_FS_BASE_DELAY_MS * 2 ** attempt);

    and test/audit-retry.test.ts line 79 asserts it was called 2 times (expect(atomicsWaitSpy).toHaveBeenCalledTimes(2)), which directly contradicts the stated intent.

    Atomics.wait is a synchronous, blocking call that freezes the entire Node.js main-thread event loop for up to 10 * 2^attempt ms per retry (10ms, 20ms, 40ms, 80ms, 160ms). since auditLog is called on every request, and purgeExpiredLogs (which goes through withRetryableAuditFsOperation) runs for up to an hour's worth of log files, windows antivirus lock contention on unlinkSync or writeFileSync will block the event loop for hundreds of milliseconds. this is the exact regression the description claims to fix.

    replace with async sleep (matching the pattern already used in token-store.ts):

    import { sleep } from "../utils.js";
    
    // in withRetryableAuditFsOperation — but note: this function signature must become async
    await sleep(RETRYABLE_AUDIT_FS_BASE_DELAY_MS * 2 ** attempt);

    or, since all callers are synchronous fs ops, consider removing retry from the auditLog hot path entirely and only retrying in the background purge. at minimum, the test assertion must be flipped to toHaveBeenCalledTimes(0) if the removal claim is true.

  9. scripts/audit-log-forwarder.js, line 344 (link)

    Dead unlink(tmpPath) — unreachable in both success and failure paths

    writeCheckpointAtomic places writeFile + rename inside withCheckpointLock. two cases:

    • success: rename(tmpPath, checkpointPath) succeeds → tmpPath no longer exists when execution reaches line 344 → unlink throws ENOENT, silently swallowed by .catch(() => {}) — no-op.
    • failure: rename (or writeFile) throws inside the lock callback → withCheckpointLock re-throws the error → writeCheckpointAtomic exits via exception before reaching line 344 — unreachable.

    in neither case does the cleanup do useful work. this leaves a subtle correctness gap: if writeFile succeeds but rename fails, the tmpPath orphan is never removed, leaking a .tmp file on windows (antivirus may hold it open indefinitely).

    move the cleanup inside the lock action, guarded by rename's failure:

    async function writeCheckpointAtomic(checkpointPath, checkpoint) {
        const tmpPath = `${checkpointPath}.${process.pid}.${Date.now()}.tmp`;
        await withCheckpointLock(checkpointPath, async () => {
            await writeFile(tmpPath, `${JSON.stringify(checkpoint, null, 2)}\n`, {
                encoding: "utf8",
                mode: 0o600,
            });
            try {
                await rename(tmpPath, checkpointPath);
            } catch (error) {
                await unlink(tmpPath).catch(() => {});
                throw error;
            }
        });
    }
  10. lib/audit.ts, line 118-131 (link)

    No backoff between sync retries — ineffective on Windows EBUSY

    withRetryableAuditFsOperation retries immediately 5 times with zero delay. on Windows, antivirus and explorer hold EBUSY/EPERM locks for hundreds of milliseconds at minimum. five rapid hot-loop retries will almost always exhaust all attempts while the lock is still held, so this degenerates to a single attempt in practice.

    the PR description states Atomics.wait() was removed to prevent event-loop freezes — correct call — but without any yield point the synchronous retry is effectively a no-op for AV contention. the regression test (test/audit-retry.test.ts) validates retry count and that Atomics.wait is not called, but does not validate that the backoff mechanism actually resolves real contention.

    recommended fix: refactor purgeExpiredLogs and the chmodSync post-write path to be async (they're already in an async call chain via auditLog), then use exponential backoff with await sleep(25 * 2 ** attempt) — the same pattern already used in lib/secrets/token-store.ts's deleteSecretRefWithRetry. the synchronous writeFileSync path in auditLog itself is harder to change, but purge retries at least can be made async-safe.

  11. lib/secrets/token-store.ts, line 212-214 (link)

    Second delete error silently discarded — token leakage masked

    when both refreshToken and accessToken deletes fail, deleteErrors accumulates two entries but only deleteErrors[0] is thrown. the second failure (usually the access token) is silently dropped. this means a dangling access token in the keychain will never surface in logs or be visible to callers — exactly the token leakage scenario this PR is trying to harden against.

    at minimum, log a warn for the second error before throwing the first so ops teams can detect the keychain inconsistency.

  12. scripts/audit-log-forwarder.js, line 323-327 (link)

    EPERM as a contention indicator is overly broad on Windows

    EPERM from open(lockPath, "wx") is treated the same as EEXIST (lock contention), causing the forwarder to spin-wait with exponential backoff. on Windows, EPERM can arise from a variety of real permission errors — e.g., the checkpoint directory is read-only or the process lacks write access — that will never resolve no matter how many retries are attempted. treating these as transient contention will exhaust CHECKPOINT_LOCK_MAX_WAIT_MS (60 s by default) before surfacing the real error.

    if you need to handle Windows lock-file races more broadly, consider also checking code === "EACCES" only when clearStaleCheckpointLock already confirmed the lock file exists (i.e., contention is confirmed), rather than blindly treating all EPERM/EACCES as contention.

  13. lib/audit.ts, line 118-131 (link)

    Sync retry loop has no inter-attempt delay — ineffective for Windows AV/EBUSY

    withRetryableAuditFsOperation is fully synchronous. all 5 retries fire within microseconds of each other. on windows, when antivirus software holds an EBUSY lock, the lock typically isn't released for 10–100ms. by the time attempt 4 runs, the AV still holds the file — all retries fail before the OS ever releases it.

    the pr description explicitly mentions "Windows filesystem concurrency bugs" as a target, but this retry gives no real protection for that scenario. the previous Atomics.wait was removed (correctly), but nothing replaced it.

    since auditLog is sync, you can't await sleep() here. options:

    1. make auditLog async (bigger change, but correct)
    2. accept that on-write retries are not reliable for windows AV contention and instead absorb failures silently (the outer catch {} already does this for failed writes) — but then the 5-iteration retry loop is noise

    the regression test (test/audit-retry.test.ts) validates that unlinkSync is called 3 times, but the test mock returns synchronously, so it doesn't reproduce the real windows timing scenario at all. the test passes without delay and would pass even if the loop ran 0 retries.

    this should be called out explicitly in a comment, or the retry loop for writeFileSync should be removed since it cannot meaningfully help under real AV pressure without a delay.

  14. scripts/audit-log-forwarder.js, line 177-179 (link)

    unguarded readFile will crash forwarder on file-disappearance race

    discoverAuditFiles validates that each file exists via stat before returning the list. but there's no ENOENT guard around readFile(fullPath) here in collectBatch. on windows (or any platform), the retention cleanup in lib/audit.ts (purgeExpiredLogs) runs concurrently with the forwarder when auditLog fires. if a file is purged between discoverAuditFiles and this read, the error propagates up through collectBatch → main and terminates the process.

    compare with discoverAuditFiles (lines ~100–120) which explicitly catches ENOENT/ENOTDIR and skips.

Fix All in Codex

Last reviewed commit: 2b0afa4

Context used:

  • Rule from dashboard - What: Every code change must explain how it defends against Windows filesystem concurrency bugs and ... (source)

Implement security, reliability, observability, and release-governance hardening for enterprise certification baseline.

- add keychain-backed secret storage with V4 account schema and migrations
- harden file permissions for settings/config/audit artifacts
- add CI secret scan and release provenance workflows
- add enterprise health, retention cleanup, and performance budget checks
- add incident/release operational runbooks and documentation updates
- add targeted tests and config updates for new controls

Co-authored-by: Codex <noreply@openai.com>
@coderabbitai
Copy link

coderabbitai bot commented Mar 4, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

walkthrough

adds keychain-backed secret storage (v4), deterministic secret refs, secure file permissions, audit retention and purge, many operational scripts and ci/workflows, and extensive tests for token-store, storage-v4, audit forwarding, and ops tooling.

changes

Cohort / File(s) Summary
secret storage & keychain
lib/secrets/token-store.ts:1, lib/keytar.d.ts:1, lib/schemas.ts:1, lib/storage/migrations.ts:1
introduces token-store with optional keytar backend, deterministic 24‑hex secret refs, persist/load/delete APIs, plaintext fallback, and v4 schemas/migration. concurrency and rollback logic needs focused review for torn writes and partial failures (lib/secrets/token-store.ts:1). missing explicit regression for keytar dynamic-load failure path.
storage & persistence
lib/storage.ts:1, lib/storage/migrations.ts:1, lib/config.ts:1, lib/unified-settings.ts:1
adds v4 hydration/serialization, migrate V3→V4, secret cleanup on deletion, WAL/backup handling, and stricter POSIX modes (0o700/0o600) for atomic writes. verify activeIndex migration when secrets missing and windows permission branches (lib/storage.ts:1, lib/unified-settings.ts:1).
audit & cli integration
lib/audit.ts:1, lib/codex-manager.ts:1
adds AuditAction.COMMAND_RUN, retention-based purge with retryable fs ops, chmod best-effort, and runWithAudit wrapper for CLI commands. inspect purge rate-limiting and retry semantics for transient FS errors (lib/audit.ts:1).
operational scripts - core
scripts/enterprise-health-check.js:1, scripts/retention-cleanup.js:1, scripts/performance-budget-check.js:1, scripts/keychain-assert.js:1, scripts/seed-health-fixture.js:1
adds health check, retention cleanup, perf budget check, keychain assert test runner, and health-fixture seeder. health script enforces permission checks and required files; review windows-specific stat/perm handling (scripts/enterprise-health-check.js:1).
operational scripts - compliance & audit
scripts/audit-log-forwarder.js:1, scripts/slo-budget-report.js:1, scripts/compliance-evidence-bundle.js:1, scripts/generate-sbom.js:1, scripts/verify-sbom.js:1, scripts/benchmark-runtime-path.mjs:1, scripts/secret-scan-regression.sh:1, scripts/compliance-evidence-bundle.js:1
adds audit-forwarder with checkpointing, fs lock, batching and retry; slo-budget reporting (exports runHealthCheck), compliance evidence bundling, sbom gen/verify, benchmark additions, and secret-scan regression harness. checkpoint locking and network retry semantics need close review for edge cases (scripts/audit-log-forwarder.js:1).
ci / github workflows
.github/workflows/ci.yml, .github/workflows/release-provenance.yml, .github/workflows/secret-scan.yml, .github/workflows/sbom-attestation.yml, .github/workflows/recovery-drill.yml, .github/workflows/retention-maintenance.yml
adds multi-os CI matrix, sbom steps, keychain assert, health/perf checks; provenance publish workflow; gitleaks-based secret-scan; sbom attestation; monthly recovery drill and weekly retention maintenance. ensure windows runners and mac/linux parity for keychain and chmod checks.
tests — storage & token-store
test/storage-v4-keychain.test.ts:1, test/token-store.test.ts:1, test/schemas.test.ts:1
adds large suites validating v4 persistence, concurrent saves, rollback, EPERM rename retry, derivation stability, deletion and partial cleanup behaviors. concurrency race simulations are present but verify coverage of dynamic keytar-load failure paths (test/token-store.test.ts:1).
tests — operational scripts & integration
test/enterprise-health-check.test.ts:1, test/retention-cleanup.test.ts:1, test/slo-budget-report.test.ts:1, test/compliance-evidence-bundle.test.ts:1, test/generate-sbom.test.ts:1, test/audit-log-forwarder.test.ts:1
adds integration-style tests for health-check, retention cleanup, SLO evaluations, compliance bundle, sbom generation, and audit-forwarder retry/rotation/locking. audit-forwarder tests are large and timing-sensitive; expect potential flakiness on CI (test/audit-log-forwarder.test.ts:1).
tests — audit & lifecycle
test/audit.test.ts:1, test/audit-retry.test.ts:1, test/config-save.test.ts:1, test/unified-settings.test.ts:1
adds tests for audit purge, EBUSY unlink retry, and secure mode expectations (0o600/0o700). many POSIX assertions are skipped on windows; validate CI runner matrix accounts for that (test/unified-settings.test.ts:1).
tests — security / secret-scan
test/security/secret-scan-regression.test.ts:1
adds synthetic secret-scan regression harness that honors .gitleaks.toml allowlist. confirm docker/native gitleaks behavior on CI and version pinning (test/security/secret-scan-regression.test.ts:1).
helpers, configs & docs
test/helpers/remove-with-retry.ts:1, vitest.config.ts:1, package.json, config/performance-budgets.json, config/performance-budgets.schema.json, config/slo-policy.json
adds removeWithRetry helper, sets vitest default CODEX_SECRET_STORAGE_MODE=plaintext, new npm scripts and optionalDependency keytar@^7.9.0, performance budgets and SLO policy files, and docs for CODEX_SECRET_STORAGE_MODE and migration steps. ensure package optionalDependency expectations align with install strategies on CI.

estimated code review effort

🎯 5 (critical) | ⏱️ ~120 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Title check ⚠️ Warning Title does not follow the required conventional commits format (type(scope): summary). It lacks a type prefix and should use lowercase imperative, e.g., 'feat(storage): enterprise readiness remediation baseline'. Rewrite the title in conventional commits format: use type (feat/fix/chore) and scope, keep summary ≤72 chars, use lowercase imperative voice.
Docstring Coverage ⚠️ Warning Docstring coverage is 3.55% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (1 passed)
Check name Status Explanation
Description check ✅ Passed pull request description is comprehensive and well-structured, covering summary, validation commands, and multiple post-review updates addressing actionable feedback.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch audit/enterprise-readiness-cert-2026-03-04

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps
Copy link

greptile-apps bot commented Mar 4, 2026

Additional Comments (3)

lib/audit.ts, line 442
Synchronous purge on every audit write — Windows EBUSY risk + hot-path overhead

purgeExpiredLogs() is called on every single auditLog() invocation. it issues readdirSync + statSync per file + optional unlinkSync in a tight loop. on windows, antivirus agents routinely hold exclusive locks on log files, turning these calls into EBUSY/EPERM failures. the catch swallows errors (correct for best-effort), but the repeated syscall overhead still blocks the calling thread on every audit event.

this needs to be throttled — run at most once per process lifetime or via a debounce window (e.g., once per hour). a simple approach:

let lastPurgeMs = 0;
const PURGE_INTERVAL_MS = 60 * 60 * 1000; // 1 hour

function purgeExpiredLogs(): void {
  const now = Date.now();
  if (now - lastPurgeMs < PURGE_INTERVAL_MS) return;
  lastPurgeMs = now;
  // ... existing purge logic
}

no regression test covers concurrent audit writes or the EBUSY-under-antivirus scenario (e.g., test/audit.race.test.ts). this is also required by the project's windows concurrency policy.


lib/secrets/token-store.ts, line 769
Keychain key collision for anonymous accounts with identical timestamps

when both accountId and email are absent (or empty strings) and addedAt is 0 or undefined, the stableSeed becomes "||0". because "||0".trim().length > 0 is true, the fallback SHA-based path is skipped, and every such account maps to the same 24-char ref. if two such accounts are added in the same millisecond (or with addedAt not yet set), their setPassword calls collide on the same keychain slot — the second write silently overwrites the first's refresh token.

the guard should also check that the seed carries meaningful identity, not just that it's non-empty:

const hasMeaningfulSeed =
  normalizedAccountId.length > 0 || normalizedEmail.length > 0;
const seed = hasMeaningfulSeed ? stableSeed : fallbackSeed;

this is a token-leakage risk on windows desktops where antivirus-delayed logins can stack accounts with missing metadata.


scripts/retention-cleanup.js, line 2203
Single locked file aborts all remaining cleanup on windows

removeWithRetry exhausts 6 attempts with exponential backoff and then rethrows. on windows, antivirus or the OS can hold an exclusive lock on a log file for longer than the retry window. if that throw propagates through the for loop, all remaining expired files in the batch are skipped and the script exits with code 1.

wrap each deletion to isolate failures:

for (const targetPath of expired) {
  try {
    await removeWithRetry(targetPath, { force: true });
  } catch (error) {
    console.warn(`retention-cleanup: could not remove ${targetPath}: ${error?.message ?? error}`);
  }
}

this is especially important as an operational script that runs on user desktops where file locks are unpredictable. also, no vitest coverage exists for this partial-failure path.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 30

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
lib/config.ts (1)

294-304: ⚠️ Potential issue | 🟠 Major

add deterministic retry regression for env-path rename locks

the retry loop in lib/config.ts:294-304 handles ebusy/eperm, but there is no deterministic vitest case that forces fs.rename failures on the env-path branch in test/config-save.test.ts:71-132. this is a concurrency/reliability risk for locked files.

test case to add
+  it("retries env-path rename on ebusy and succeeds", async () => {
+    const configPath = join(tempDir, "plugin-config.json");
+    process.env.CODEX_MULTI_AUTH_CONFIG_PATH = configPath;
+    await fs.writeFile(configPath, JSON.stringify({ codexMode: true }), "utf8");
+
+    const renameSpy = vi.spyOn(fs, "rename");
+    renameSpy.mockImplementationOnce(async () => {
+      const error = new Error("busy") as NodeJS.ErrnoException;
+      error.code = "EBUSY";
+      throw error;
+    });
+    try {
+      const { savePluginConfig } = await import("../lib/config.js");
+      await savePluginConfig({ codexMode: false });
+      const parsed = JSON.parse(await fs.readFile(configPath, "utf8")) as Record<string, unknown>;
+      expect(parsed.codexMode).toBe(false);
+    } finally {
+      renameSpy.mockRestore();
+    }
+  });

as per coding guidelines lib/**: focus on auth rotation, windows filesystem io, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/config.ts` around lines 294 - 304, The rename retry loop in lib/config.ts
(the fs.rename retry inside the save flow) needs a deterministic vitest that
simulates transient EBUSY/EPERM failures on the env-path branch so we can assert
the retry behavior; add a test in test/config-save.test.ts that mocks or stubs
fs.rename to throw EBUSY/EPERM for the first N attempts and then succeed, assert
that the code calls fs.rename the expected number of times and ultimately
succeeds, and ensure the test covers the env-path code path (the same function
that performs the rename/rename-with-tempPath logic) so we catch
concurrency/lock regressions and validate exponential backoff handling.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/ci.yml:
- Around line 53-57: The "Enterprise health check" CI step (npm script
ops:health-check which runs scripts/enterprise-health-check.js) can pass
vacuously because the script inspects ~/.codex/multi-auth paths that are absent
on fresh runners; either seed the expected fixture files in the workflow before
the "Enterprise health check" step, or update scripts/enterprise-health-check.js
to accept a new flag (e.g., --require-files) that causes it to exit non-zero
when the expected paths are missing, or else remove/mark the ops:health-check
step as local-only in the workflow; reference the script name
(scripts/enterprise-health-check.js) and the npm script (ops:health-check) when
making the change.

In @.github/workflows/release-provenance.yml:
- Around line 30-35: The workflow job "Validate quality gates" is missing the
security checklist gates listed in SECURITY.md (audit:ci, ops:health-check,
perf:budget-check); update the run steps in that job to include `npm run
audit:ci`, `npm run ops:health-check`, and `npm run perf:budget-check` in the
prescribed order (before or alongside `npm run lint`, `npm run typecheck`, `npm
run build`, `npm test`) so the release-provenance.yml job aligns with
SECURITY.md's required checks.
- Around line 37-38: The publish job's "Publish package with provenance" step
runs `npm publish --provenance --access public` but doesn't supply
authentication; add the environment variable NODE_AUTH_TOKEN pointing to the
repository secret (e.g., set NODE_AUTH_TOKEN: ${{ secrets.NPM_TOKEN }}) for that
step so `npm publish` can authenticate, and ensure `actions/setup-node` is
configured with the correct registry-url and that NPM_TOKEN secret has publish
permissions for this package.

In @.github/workflows/secret-scan.yml:
- Around line 19-23: Add a regression test step to the secret-scan workflow so
the scanner's policy is exercised: after the "Run gitleaks" step (the action
using gitleaks/gitleaks-action@v2), add a step that runs the existing regression
script test/security/secret-scan-regression.test.sh (or an equivalent small
test) and ensure it runs in the same environment (pass GITHUB_TOKEN and
GITLEAKS_CONFIG) and returns non-zero on seeded disallowed-secret detection so
the job fails if allowlist/policy drift neuters the scanner; reference the "Run
gitleaks" step and the script name when locating where to insert the new
workflow step.
- Around line 15-23: Replace mutable action versions in the workflow: pin uses:
actions/checkout@v4 and uses: gitleaks/gitleaks-action@v2 to their full commit
SHAs (use the upstream commit SHAs for those actions) and add an explicit
top-level permissions block that grants only contents: read and pull-requests:
write for the GITHUB_TOKEN; update references to GITHUB_TOKEN and
GITLEAKS_CONFIG remain unchanged but ensure the new permissions block is present
at the workflow root so the token has least privilege.

In @.gitleaks.toml:
- Around line 1-2: The .gitleaks.toml currently only defines an [allowlist] and
no detection rules; add an [extend] section with useDefault = true (i.e., add a
top-level [extend] useDefault = true entry) so gitleaks inherits its default
[[rules]] instead of running with zero rules; place the [extend] useDefault =
true directive before the existing [allowlist] block and ensure you do not
remove or rename the existing [allowlist] or any future [[rules]] sections.
- Around line 10-15: Remove the risky OpenAI-style allowlist entry
'''sk-[A-Za-z0-9]{20,}''' from the regexes array in .gitleaks.toml (it currently
appears as the fourth regex); simply delete that pattern so the file no longer
suppresses potential real API keys while leaving the other synthetic patterns
intact.
- Around line 6-8: Update the regex path patterns in .gitleaks.toml to accept
both forward and backslashes by replacing instances of '/' with a character
class that matches either separator (use [\\/]) for the existing patterns (e.g.,
the three literals '''^test/''' , '''^docs/releases/''' ,
'''^docs/development/DEEP_AUDIT_2026-03-01\.md$'''); ensure each pattern becomes
'''^test[\\/]''', '''^docs[\\/]releases[\\/]''', and
'''^docs[\\/]development[\\/]DEEP_AUDIT_2026-03-01\.md$''' so gitleaks on
Windows will match backslash paths too.

In `@config/performance-budgets.json`:
- Around line 1-7: Add explicit units for the numeric thresholds in
performance-budgets.json by documenting that values are milliseconds: either
create a sibling performance-budgets.schema.json that defines the properties
(filterInput_small, filterInput_large, cleanupToolDefinitions_medium,
cleanupToolDefinitions_large, accountHybridSelection_200) with a description
mentioning "milliseconds", or add a clear header comment to
scripts/performance-budget-check.js that states the unit for these keys; ensure
the documentation references the same key names so future maintainers know the
unit when tuning budgets.

In `@docs/configuration.md`:
- Line 69: The markdown table cell for CODEX_SECRET_STORAGE_MODE contains pipe
characters that break table parsing; update the cell for
`CODEX_SECRET_STORAGE_MODE` to avoid unescaped '|' characters (either replace
them with slashes like `keychain/plaintext/auto` to match other entries such as
`CODEX_MODE=0/1`, or escape each pipe as `\|`) so the table renders correctly
and consistently.

In `@docs/privacy.md`:
- Around line 51-57: Update the retention cleanup docs to show how to override
the default 90-day window by appending the flag when invoking the npm script;
specifically, add an example using the npm script name "npm run
ops:retention-cleanup" with the override flag "--days=<n>" (e.g., "--days=30")
so operators know to run "npm run ops:retention-cleanup -- --days=30" to
customize retention.

In `@lib/audit.ts`:
- Around line 157-158: The current sequence calls purgeExpiredLogs() before
rotateLogsIfNeeded(), which can cause files just created by rotation to be
immediately deleted; change the order so rotateLogsIfNeeded() runs before
purgeExpiredLogs(), or modify purgeExpiredLogs() (used by the caller) to skip
files that were modified/created within the rotation window (e.g., check recent
mtime or a rotation marker) to avoid purging newly-rotated audit files; update
the caller that currently invokes purgeExpiredLogs() and rotateLogsIfNeeded() to
call rotateLogsIfNeeded() first (or implement the mtime check inside
purgeExpiredLogs()) and ensure tests cover recently-rotated files like
audit.4.log.
- Around line 99-118: purgeExpiredLogs is being invoked on every auditLog call
causing a directory scan per event; change the call site so purgeExpiredLogs
runs only periodically by adding a module-scoped timestamp (e.g., lastPurgeMs)
and a constant PURGE_INTERVAL_MS (suggest 1 hour) and, inside auditLog, check if
Date.now() - lastPurgeMs >= PURGE_INTERVAL_MS before calling purgeExpiredLogs
(update lastPurgeMs when you run it), or alternatively move the purgeExpiredLogs
invocation into your log rotation/rotation-related function so it only runs at
rotation time; reference purgeExpiredLogs, auditLog, auditConfig, and
DEFAULT_AUDIT_RETENTION_DAYS when making the change.
- Around line 105-117: The purgeExpiredLogs loop currently uses
readdirSync/statSync/unlinkSync and swallows all errors; change it to explicitly
detect Windows EBUSY errors around statSync/unlinkSync and retry those
operations using the sync retry pattern from lib/logger.ts (Atomics.wait-based
backoff), keeping the same cutoffMs logic and still skipping non-audit files;
narrow the catch to rethrow non-EBUSY errors and only suppress after retries
fail. Update/add test/audit.test.ts to simulate an EBUSY during
unlinkSync/statSync and assert that the retry logic is invoked and the file is
eventually purged or a deterministic failure occurs (matching existing
behavior).

In `@lib/codex-manager.ts`:
- Around line 4043-4065: The auditActionForCommand function currently maps
command names (auditActionForCommand) to semantically incorrect AuditAction
values; update the AuditAction enum to add either a generic COMMAND_RUN
(recommended) or specific actions like FORECAST_RUN/REPORT_RUN/DOCTOR_RUN, then
change auditActionForCommand to return COMMAND_RUN for query-style commands
(forecast, report, list, status) and to return COMMAND_RUN or command-specific
run actions for fix/doctor so dry-run invocations aren't treated as
CONFIG_CHANGE; ensure callers that record audits include the original command
name in metadata so the generic COMMAND_RUN can be disambiguated.

In `@lib/schemas.ts`:
- Line 149: Change the accessTokenRef Zod rule to reject empty strings by
updating the schema entry from accessTokenRef: z.string().optional() to a
non-empty string when present (e.g., accessTokenRef:
z.string().nonempty().optional() or z.string().min(1).optional()) in
lib/schemas.ts, and add a unit test in test/schemas.test.ts that uses the
schema's safeParse (or parse) to assert that an object with accessTokenRef: ""
is rejected (safeParse returns success=false or parse throws), ensuring the
schema enforces non-empty values for accessTokenRef.

In `@lib/secrets/token-store.ts`:
- Around line 103-111: The current sequence writes refreshTokenRef then
accessTokenRef and can leave refresh persisted if the access write fails; update
the logic in lib/secrets/token-store.ts (the getKeytarOrThrow call and the
setPassword calls for SECRET_SERVICE using refreshTokenRef and accessTokenRef)
so the write is rollback-safe: either write access first (if present) then
refresh, or keep the existing order but wrap the access setPassword in a
try/catch and on any failure call keytar.deletePassword(SECRET_SERVICE,
refreshTokenRef) to remove the orphaned refresh token before re-throwing the
error; ensure no logs print secrets or emails. Add a vitest regression in
test/token-store.test.ts that forces keytar.setPassword to throw for the
`:access` write (mocking getKeytarOrThrow/keytar), verifies the refresh entry is
removed (deletePassword called) and that the original error bubbles up, and
include assertions that no token strings are emitted to logs.
- Around line 45-53: The dynamic import of keytar in lib/secrets/token-store.ts
assumes methods exist at the top-level (mod.setPassword, mod.getPassword,
mod.deletePassword) which breaks for CJS packages imported into ESM (they appear
under mod.default); change the resolution logic after the import to normalize
the module (e.g., let keytar = (mod && (mod.setPassword || mod.getPassword ||
mod.deletePassword)) ? mod : mod?.default ?? null) and then validate
keytar.setPassword/getPassword/deletePassword before returning it; also update
the test in test/token-store.test.ts to add or adjust a mock that simulates the
CJS-wrapped shape ({ default: { setPassword, getPassword, deletePassword } }) or
add a dedicated esm interop test case to ensure the wrapped export path works.

In `@lib/storage.ts`:
- Around line 890-893: The warning currently logs PII (rawAccount.email); remove
email from the payload and log only non-sensitive identifiers (e.g.,
rawAccount.accountId) or, if you need a correlate, a one-way hash of the email
(use the existing hashing utility or add one) in the log.warn call where the
message "Skipping v4 account with missing keychain secret" is emitted. Update
any affected vitest tests to expect the email is not present (or expect the
hash), and run/adjust tests under lib/**; also ensure any new queue changes
introduced around this code follow existing EBUSY/429 retry patterns used
elsewhere.
- Around line 1279-1307: clearPersistedAccountSecrets currently only inspects
the top-level v4 JSON shape and calls deleteAccountSecrets which can be
mode-dependent; update clearPersistedAccountSecrets to also detect WAL-style
records (objects with {version:1, content: "..."}) and JSON.parse the content to
extract account refs (refreshTokenRef/accessTokenRef) the same way as top-level
accounts, then call a mode-agnostic deletion path (either add a
force/deleteDirect helper used by deleteAccountSecrets or pass a force flag) so
deletions run even if runtime mode flipped; additionally, implement
retries/backoff for delete failures due to transient EBUSY or 429 errors and
ensure logs avoid printing tokens/emails, and add/adjust vitest cases covering
WAL payload parsing, mode-flip deletion, EBUSY/429 retries, and Windows IO
concurrency scenarios referencing clearPersistedAccountSecrets,
deleteAccountSecrets, and the new force/deleteDirect helper.
- Around line 1111-1139: The serialize path writes secrets to the keychain via
ensureSecretStorageBackendAvailable() and persistAccountSecrets() before the
file write completes, so if later steps fail the keychain will have orphaned
refs; modify the serialization flow in lib/storage.ts to collect the refs
returned by persistAccountSecrets (refsByIndex), attempt the file
write/migrateV3ToV4 and only commit (keep) refs on success, and on any failure
perform a rollback that deletes/revokes all refs written during this call (use
the same secret backend API used to persist refs), ensuring rollback handles
transient errors like EBUSY/429 with retries/backoff; update/move logic around
persistAccountSecrets, migrateV3ToV4, and the final JSON stringify so cleanup
runs on throw, avoid logging any tokens/emails during errors, and add a vitest
regression test at test/storage-v4-keychain.test.ts that simulates a file-write
failure after persistAccountSecrets and asserts that all created refs were
removed.

In `@lib/unified-settings.ts`:
- Around line 126-131: The settings writer currently sets file mode only on
initial temp-file creation (writeFileSync with SECURE_FILE_MODE) but when the
temp is atomically renamed over an existing file the old permissions remain; add
a regression test that ensures re-hardening by calling
saveUnifiedPluginConfigSync twice on the same path and verifying the final file
mode is 0o600. Specifically, in the test suite (where expectSecureFileMode is
defined) create a single temp directory, call saveUnifiedPluginConfigSync(...)
once, then call it a second time against the same directory, stat the resulting
UNIFIED_SETTINGS_PATH and assert its mode equals SECURE_FILE_MODE (0o600); if
desired extend the production code path around saveUnifiedPluginConfigSync /
renameSync to explicitly chmod the target to SECURE_FILE_MODE after rename to
guarantee permissions on overwrite.

In `@scripts/enterprise-health-check.js`:
- Around line 71-83: The existsSync/ stat pair for walPath creates a TOCTOU
race; replace the existsSync check with a single await stat(walPath) inside a
try/catch (or wrap the existing stat call in try/catch) so if the file is
removed between check and use you handle the ENOENT gracefully and skip pushing
checks/findings; update the block referencing walPath, walStats, walAgeMs,
WAL_STALE_MS, checks and findings to only compute walAgeMs and push the
stale-wal finding when stat succeeds, and log/ignore ENOENT errors in the catch
so the script remains robust.
- Line 69: Replace the hardcoded auditDir assignment in
enterprise-health-check.js (currently using join(root, "logs")) to call and use
getCodexLogDir() (the same helper used by lib/audit.ts and lib/runtime-paths.ts)
so the health check follows fallback locations returned by
getCodexMultiAuthDir()/getCodexLogDir(); update any references to resolveRoot()
there to prefer getCodexLogDir() for audit path resolution. Add a regression
test modeled on test/runtime-paths.test.ts (lines ~48-100) that sets up fallback
account/log directories and asserts enterprise-health-check scans the fallback
audit dir (including asserting newestMtimeMs() sees files with platform-specific
separators), and ensure path comparisons in the test and the health-check logic
treat Windows paths case-insensitively to match lib/runtime-paths.ts
deduplication behavior.

In `@scripts/performance-budget-check.js`:
- Around line 32-43: The loop assumes report is an object with an array at
report.results and can throw if report is malformed; before iterating, validate
that report is an object and that report.results is an Array (e.g., use typeof
report === "object" && report !== null and Array.isArray(report.results)) and
fall back to an empty array if not; update the iteration to use the validated
results array so the logic around budgets[result.name], result.avgMs and pushing
into violations remains safe.
- Around line 26-43: The code in main() calls
JSON.parse(readFileSync(budgetPath)) and JSON.parse(readFileSync(outputPath))
without ensuring those files exist or contain valid JSON; add guards before
parsing: check fs.existsSync(budgetPath) and fs.existsSync(outputPath) (or
try/catch around readFileSync) and produce a clear error/exit if missing, and
wrap JSON.parse in try/catch to handle invalid JSON from runBenchmark()
failures, logging which file (budgetPath or outputPath) failed and
returning/exit early so violations logic (report.results) only runs on valid
parsed objects.

In `@test/storage-v4-keychain.test.ts`:
- Around line 57-98: Add two deterministic vitest regression tests: (1)
concurrent save test — call saveAccounts concurrently (e.g., Promise.all on
several saveAccounts with distinct account payloads) and after all resolve read
getStoragePath() file, JSON.parse it and assert it’s valid JSON (no torn
writes), version === 4 and the expected accounts data is present/consistent;
reference saveAccounts and getStoragePath to locate code. (2) windows
rename-retry test — vi.mock or vi.doMock fs.rename to throw an Error with code
'EPERM' (or 'EBUSY') on the first N attempts then succeed on a subsequent
attempt, call saveAccounts and assert persistence succeeded by reading the file
and checking contents; reference saveAccounts and the internal rename path
(fs.rename) so you can stub the retry logic. In both tests mock keytar like the
existing test to keep deterministic secrets, ensure mocks are reset between
tests, and assert fs.rename was called the expected number of times.

In `@test/token-store.test.ts`:
- Around line 32-63: Add deterministic vitest tests in test/token-store.test.ts
that cover concurrency and failure-paths for keychain mode: keep the existing
happy-path but add three new cases that use vi.doMock("keytar") and controlled
Promise resolvers to (1) simulate concurrent calls to
token-store.persistAccountSecrets("acct-1", ...) and assert both callers either
get consistent refs and stored values (use Promise-controlled ordering to
reproduce race), (2) simulate a forced rejection from setPassword when writing
the access token while allowing the refresh write to succeed and assert that
token-store.persistAccountSecrets rolls back by deleting the refresh entry
(verify deletePassword was called for the refresh ref), and (3) verify
deleteAccountSecrets cleans up both refresh and access entries (mock
deletePassword and assert calls). Ensure each test calls
token-store.ensureSecretStorageBackendAvailable() and uses
token-store.loadAccountSecrets/persistAccountSecrets/deleteAccountSecrets for
assertions; make mocks deterministic by exposing resolve/reject controls rather
than timing-based delays.

In `@test/unified-settings.test.ts`:
- Around line 6-10: The test helper expectSecureFileMode currently returns early
on Windows, skipping validation; instead, update expectSecureFileMode to perform
Windows-safe assertions: when process.platform === "win32" assert the file
exists and is readable (e.g., attempt a deterministic fs.readFile or fs.access)
and assert no temp/backup files were left behind (check that no sibling
filenames like `${path}.tmp` or similar exist) so the secure-write paths
exercised by lib/unified-settings.ts (the secure-write branch around lines where
secure write is implemented) are validated on Windows; keep the existing POSIX
mode check for non-Windows. Ensure assertions use vitest APIs and remain
deterministic.

---

Outside diff comments:
In `@lib/config.ts`:
- Around line 294-304: The rename retry loop in lib/config.ts (the fs.rename
retry inside the save flow) needs a deterministic vitest that simulates
transient EBUSY/EPERM failures on the env-path branch so we can assert the retry
behavior; add a test in test/config-save.test.ts that mocks or stubs fs.rename
to throw EBUSY/EPERM for the first N attempts and then succeed, assert that the
code calls fs.rename the expected number of times and ultimately succeeds, and
ensure the test covers the env-path code path (the same function that performs
the rename/rename-with-tempPath logic) so we catch concurrency/lock regressions
and validate exponential backoff handling.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: ed2c4888-6257-4052-b159-051d60f151ae

📥 Commits

Reviewing files that changed from the base of the PR and between d36b04f and 57fa543.

⛔ Files ignored due to path filters (1)
  • package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (33)
  • .github/workflows/ci.yml
  • .github/workflows/release-provenance.yml
  • .github/workflows/secret-scan.yml
  • .gitleaks.toml
  • README.md
  • SECURITY.md
  • config/performance-budgets.json
  • docs/README.md
  • docs/configuration.md
  • docs/operations/incident-response.md
  • docs/operations/release-runbook.md
  • docs/privacy.md
  • docs/reference/settings.md
  • docs/reference/storage-paths.md
  • lib/audit.ts
  • lib/codex-manager.ts
  • lib/config.ts
  • lib/keytar.d.ts
  • lib/schemas.ts
  • lib/secrets/token-store.ts
  • lib/storage.ts
  • lib/storage/migrations.ts
  • lib/unified-settings.ts
  • package.json
  • scripts/enterprise-health-check.js
  • scripts/performance-budget-check.js
  • scripts/retention-cleanup.js
  • test/config-save.test.ts
  • test/schemas.test.ts
  • test/storage-v4-keychain.test.ts
  • test/token-store.test.ts
  • test/unified-settings.test.ts
  • vitest.config.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (3)
docs/**

⚙️ CodeRabbit configuration file

keep README, SECURITY, and docs consistent with actual CLI flags and workflows. whenever behavior changes, require updated upgrade notes and mention new npm scripts.

Files:

  • docs/README.md
  • docs/operations/release-runbook.md
  • docs/operations/incident-response.md
  • docs/reference/settings.md
  • docs/configuration.md
  • docs/reference/storage-paths.md
  • docs/privacy.md
lib/**

⚙️ CodeRabbit configuration file

focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.

Files:

  • lib/keytar.d.ts
  • lib/config.ts
  • lib/storage.ts
  • lib/codex-manager.ts
  • lib/secrets/token-store.ts
  • lib/unified-settings.ts
  • lib/storage/migrations.ts
  • lib/audit.ts
  • lib/schemas.ts
test/**

⚙️ CodeRabbit configuration file

tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.

Files:

  • test/storage-v4-keychain.test.ts
  • test/token-store.test.ts
  • test/config-save.test.ts
  • test/unified-settings.test.ts
  • test/schemas.test.ts
🧬 Code graph analysis (9)
test/storage-v4-keychain.test.ts (1)
lib/storage.ts (3)
  • saveAccounts (1273-1277)
  • getStoragePath (470-475)
  • loadAccounts (872-874)
lib/storage.ts (4)
lib/storage/migrations.ts (3)
  • AccountStorageV1 (34-38)
  • AccountStorageV3 (59-64)
  • AccountStorageV4 (83-88)
lib/secrets/token-store.ts (6)
  • getEffectiveSecretStorageMode (62-68)
  • loadAccountSecrets (119-136)
  • ensureSecretStorageBackendAvailable (78-82)
  • deriveAccountSecretRef (84-94)
  • persistAccountSecrets (96-117)
  • deleteAccountSecrets (138-147)
lib/schemas.ts (2)
  • AnyAccountStorageSchema (210-214)
  • AccountStorageV4Schema (165-170)
lib/utils.ts (1)
  • isRecord (11-13)
lib/codex-manager.ts (1)
lib/audit.ts (1)
  • auditLog (146-177)
lib/secrets/token-store.ts (1)
lib/logger.ts (1)
  • createLogger (366-424)
lib/unified-settings.ts (1)
lib/runtime-paths.ts (1)
  • getCodexMultiAuthDir (166-199)
test/unified-settings.test.ts (1)
lib/unified-settings.ts (1)
  • getUnifiedSettingsPath (223-225)
lib/storage/migrations.ts (2)
lib/storage.ts (6)
  • AccountMetadataV4 (48-48)
  • RateLimitStateV3 (43-43)
  • CooldownReason (42-42)
  • AccountStorageV4 (49-49)
  • AccountStorageV3 (47-47)
  • AccountMetadataV3 (46-46)
lib/prompts/codex.ts (1)
  • ModelFamily (47-52)
scripts/performance-budget-check.js (5)
scripts/audit-dev-allowlist.js (1)
  • process (139-147)
scripts/test-model-matrix.js (1)
  • result (383-386)
scripts/enterprise-health-check.js (1)
  • payload (100-106)
scripts/retention-cleanup.js (1)
  • payload (89-95)
lib/logger.ts (1)
  • error (389-393)
test/schemas.test.ts (1)
lib/schemas.ts (2)
  • AccountStorageV4Schema (165-170)
  • AnyAccountStorageSchema (210-214)
🪛 LanguageTool
docs/operations/incident-response.md

[style] ~39-~39: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...rst, then codex auth doctor --fix. 4. If auth failures persist, rotate account v...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

🪛 markdownlint-cli2 (0.21.0)
docs/configuration.md

[warning] 69-69: Table column count
Expected: 2; Actual: 4; Too many cells, extra data will be missing

(MD056, table-column-count)

🔇 Additional comments (17)
.github/workflows/secret-scan.yml (1)

3-8: trigger scope is appropriate for mainline gating; no concurrency risk here.

push + pull_request on main is a reasonable baseline trigger setup for this scanner workflow.

ref: lib/.github/workflows/secret-scan.yml:3

README.md (1)

189-189: lgtm — env var documented consistently across readme and docs/configuration.md:69.

the keychain default aligns with the secret storage implementation in lib/secrets/token-store.ts. docs match the actual behavior.

SECURITY.md (1)

86-93: lgtm — pre-release checklist now includes the new ops and perf scripts.

order is sensible: audit:ci first for vulnerability scan, then ops:health-check and perf:budget-check before lint/test/build. matches the ci workflow additions.

lib/keytar.d.ts (1)

1-5: type declarations match keytar npm api exactly; subset is appropriate.

declarations at lines 1-5 are correct—setPassword, getPassword, deletePassword all match node-keytar's published signatures. only these three are needed; token-store.ts:lib/secrets/token-store.ts:105-110, lib/secrets/token-store.ts:126-133, lib/secrets/token-store.ts:143-145 never calls findPassword or findCredentials.

verify test coverage: test/token-store.test.ts:32-63 covers keytar roundtrip (store + load). no ebusy/429 applicable here (local credential store, not http queue). logging at lib/secrets/token-store.ts:128 safely logs ref, not token value.

lib/config.ts (1)

287-291: harden file/directory permissions after creation and add regression tests for retry scenarios

lib/config.ts:287-291 sets mode at mkdir/writeFile creation, but skips chmod to remediate pre-existing permissions or enforce exact modes after atomic rename. on unix-like systems, parent directory permissions or umask can weaken the created file/directory; chmod after creation ensures the target permissions regardless. the retry logic handles EBUSY/EPERM in the rename loop (lib/config.ts:294-305) but test/config-save.test.ts has no coverage for these transient filesystem errors.

windows mode-bit behavior is correctly skipped in test assertions (test/config-save.test.ts:28), so no windows-specific chmod is needed. however, test coverage gaps exist:

  • no test cases for EBUSY/EPERM retries in config save paths
  • no test cases for directory permission remediation on reused custom config paths (pre-existing directories)

apply the proposed fix (chmod on non-windows platforms after mkdir and after rename success) and add vitest cases covering:

  1. EBUSY on rename (verify retry succeeds after transient lock)
  2. pre-existing directory with loose permissions (verify chmod tightens perms on reused paths)
lib/codex-manager.ts (1)

4067-4089: lgtm - audit wrapper handles success and failure paths correctly.

the runWithAudit wrapper properly captures exit codes on success and error messages on failure. metadata is limited to safe values (exitCode, error string) with no token/email leakage.

scripts/retention-cleanup.js (2)

33-47: good ebusy/eperm retry handling.

scripts/retention-cleanup.js:33-47 correctly implements exponential backoff for EBUSY, EPERM, and ENOTEMPTY errors with up to 6 attempts. this handles transient windows file locking scenarios well.


85-87: sequential deletion is appropriate here.

sequential deletion in the loop avoids thundering herd on filesystem and reduces likelihood of concurrent EBUSY conflicts. good choice for reliability over raw speed.

docs/reference/storage-paths.md (1)

34-37: upgrade.md must document v4 storage format migration path. the security note describes version 4 storing keychain references instead of raw tokens in account files. per coding guidelines, behavior changes require updated upgrade notes. docs/upgrade.md documents worktree migration but has no v4 migration steps—add section covering plaintext→keychain transition, CODEX_SECRET_STORAGE_MODE usage for controlled environments, and any re-auth requirements. also verify docs/SECURITY and docs/README mention this storage format change if they reference account security.

package.json (1)

127-129: add windows regression tests for keytar credential manager integration.

keytar@7.9.0 is the latest version on npm. the optional dependency is properly handled—code gracefully falls back to plaintext mode when keytar is unavailable (lib/secrets/token-store.ts:76-80). tests exist for keychain and plaintext modes (test/token-store.test.ts:32-63, test/storage-v4-keychain.test.ts:57-98).

however, there's no regression test coverage for windows: keychain storage on windows uses the windows credential manager (not system keychain like macOS), and the tests don't verify this platform-specific behavior. add a windows-specific test or at minimum document the edge case.

lib/secrets/token-store.ts (1)

84-94: ref derivation logic looks good

lib/secrets/token-store.ts: lines 84-94 has deterministic normalization and fixed-length hash output, which is good for stable key references.

test/schemas.test.ts (1)

241-262: solid v4 schema coverage

test/schemas.test.ts: lines 241-262 and test/schemas.test.ts: lines 312-322 add clear positive/negative coverage for v4 parsing and discriminated union behavior.

Also applies to: 312-322

test/storage-v4-keychain.test.ts (1)

8-25: retry helper is well-bounded

test/storage-v4-keychain.test.ts: lines 8-25 uses finite retries and targeted retryable codes. this is a good deterministic teardown pattern on windows.

lib/storage/migrations.ts (1)

129-160: migration mapping is clean

lib/storage/migrations.ts: lines 129-160 cleanly separates token refs from account metadata and preserves v3 fields during v4 conversion.

test/token-store.test.ts (1)

65-81: stable ref test is useful

test/token-store.test.ts: lines 65-81 correctly checks case-insensitive email normalization for deterministic secret refs.

lib/schemas.ts (1)

210-214: v4 inclusion in the discriminated union looks correct

lib/schemas.ts: lines 210-214 correctly extends version-based parsing for v4.

lib/storage.ts (1)

1149-1149: permission hardening here is good

lib/storage.ts: line 1149 setting directory mode to 0o700 is the right baseline for sensitive local storage.

Implement remaining roadmap controls for enterprise readiness hardening.

- add scheduled retention and recovery drill workflows with artifacts
- add SBOM generation/verification and dependency attestation workflow
- add compliance evidence bundle, audit forwarding, and SLO reporting scripts
- expand runtime performance benchmarks and enforce full budget metric coverage
- add incident drill template, SLO policy doc, and audit forwarding runbook
- wire keychain assertion and SBOM checks into CI/release gates

Co-authored-by: Codex <noreply@openai.com>
@greptile-apps
Copy link

greptile-apps bot commented Mar 4, 2026

Additional Comments (5)

lib/audit.ts, line 154
purgeExpiredLogs called on every write — drops audit events on Windows

purgeExpiredLogs() runs a synchronous readdirSync that is not wrapped in a try/catch inside the function. on Windows, antivirus or another process holding a directory lock can cause readdirSync to throw EPERM or EBUSY. that exception propagates up to the outer try/catch in auditLog, which silently swallows it — meaning the actual audit entry is never written. on a busy windows deployment this can permanently silently drop events.

beyond windows safety: scanning + stat-ing every audit file on every single auditLog() call is a synchronous performance regression. this should be throttled (e.g. a module-level lastPurgedAt timestamp guard) so it runs at most once per hour.

no regression test covers the case where readdirSync fails mid-write. the custom concurrency rule explicitly requires one.

		ensureLogDir();
		// Rate-limit purge to once per hour to avoid per-write directory scans
		// and to survive transient EPERM/EBUSY on Windows antivirus lock.
		const nowMs = Date.now();
		if (nowMs - lastPurgeAtMs > 60 * 60 * 1000) {
			lastPurgeAtMs = nowMs;
			purgeExpiredLogs();
		}
		rotateLogsIfNeeded();

add let lastPurgeAtMs = 0; as a module-level variable alongside auditConfig.


scripts/audit-log-forwarder.js, line 2572
Checkpoint advances in dry-run — events permanently skipped on next real send

the checkpointNext is written to disk unconditionally, regardless of whether --dry-run is set. a dry-run is meant to be side-effect-free. advancing the checkpoint means every event collected in this run will be permanently skipped the next time the forwarder runs for real. operators doing --dry-run to validate before rollout will silently miss all those events in SIEM.

	if (!dryRun) {
		const checkpointNext = {
			file: last?.file ?? checkpoint.file,
			line: last?.line ?? checkpoint.line,
			updatedAt: new Date().toISOString(),
		};
		await writeFile(checkpointPath, `${JSON.stringify(checkpointNext, null, 2)}\n`, {
			encoding: "utf8",
			mode: 0o600,
		});
	}

lib/secrets/token-store.ts, line 1218
Keychain ref collision when accountId and email are both absent

stableSeed for an account with no accountId, no email, and addedAt=0 evaluates to "||0" — whose trim().length > 0 is true, so it is chosen as the seed over the token-derived fallbackSeed. two such accounts (e.g., a batch import where addedAt timestamps coincide at millisecond precision) will derive the same 24-char ref, meaning keytar.setPassword for the second account silently overwrites the first account's refresh token. this is a token-leakage / data-loss risk.

the guard should also check that the stableSeed encodes something meaningfully unique:

	const stableSeed = `${normalizedAccountId}|${normalizedEmail}|${input.addedAt ?? 0}`;
	const hasStableIdentity = normalizedAccountId.length > 0 || normalizedEmail.length > 0;
	const fallbackSeed = createHash("sha256")
		.update(input.refreshToken)
		.digest("hex")
		.slice(0, 16);
	const seed = hasStableIdentity ? stableSeed : fallbackSeed;

lib/storage.ts, line 1385
activeIndex not adjusted after silent account skips in hydrateV4Storage

when a keychain entry is missing, the account is silently skipped from hydratedAccounts but the original data.activeIndex is passed unchanged to normalizeAccountStorage. if any skipped account had an index lower than activeIndex, the resulting storage points at the wrong account — or, if enough accounts were skipped, past the end of the array.

for example: 3 v4 accounts, activeIndex=2. keychain entry for account 0 is gone (deleted externally). hydration produces 2 accounts at indices 0 and 1 in the new array, but activeIndex=2 is still passed — out of bounds.

track how many accounts were skipped before the active index and adjust:

let skippedBeforeActive = 0;
for (let i = 0; i < data.accounts.length; i++) {
  const rawAccount = data.accounts[i]!;
  const secrets = await loadAccountSecrets({ ... });
  if (!secrets || !secrets.refreshToken) {
    if (i < data.activeIndex) skippedBeforeActive++;
    continue;
  }
  hydratedAccounts.push({ ... });
}
const adjustedActiveIndex = Math.max(0, data.activeIndex - skippedBeforeActive);
return normalizeAccountStorage({
  version: 3,
  accounts: hydratedAccounts,
  activeIndex: adjustedActiveIndex,
  activeIndexByFamily: data.activeIndexByFamily,
});

.gitleaks.toml, line 293
Global allowlist regex suppresses real OpenAI API key detection

sk-[A-Za-z0-9]{20,} in the [allowlist].regexes block is a global suppressant in gitleaks — it applies to all files and all rules, not just to the allowlisted paths. this regex matches the exact format of real OpenAI API keys (sk-...). any genuine key accidentally committed to a non-test file would be silently ignored by the scanner.

this allowlist entry should be scoped to only the already-allowlisted paths by using a path-scoped rule allowlist inside a custom [[rules]] block, or the regex should be made more specific to the fake fixture format used in tests (e.g. require it to start with sk-fake or sk-test-).

regexes = [
  '''fake_refresh_token_[0-9]+''',
  '''secret-(access|refresh)-token''',
  '''top secret prompt''',
  # narrow this: only suppress keys that are clearly synthetic fixture values
  '''sk-fake[A-Za-z0-9]{16,}'''
]

Co-authored-by: Codex <noreply@openai.com>
@greptile-apps
Copy link

greptile-apps bot commented Mar 4, 2026

Additional Comments (3)

lib/audit.ts, line 109
sleepSync blocks the Node.js event loop on the main thread

Atomics.wait on the main Node.js thread is a full event-loop block — unlike browsers (which throw), Node.js silently blocks all I/O, timers, and pending async work for the sleep duration. on Windows, antivirus software routinely produces EBUSY/EPERM on audit log files, meaning all 5 retry attempts could block the thread for up to 10+20+40+80+160 = 310ms. since auditLog is now called on every CLI command via runWithAudit, this directly freezes the UI during normal operation.

since auditLog and its callers (rotateLogsIfNeeded, purgeExpiredLogs) are already fully synchronous, the cleanest fix is to drop the retry sleep entirely and just attempt once, or move the purge to an async background task. if you need retry, use setImmediate/setTimeout from an async wrapper instead of blocking via Atomics.wait.

function sleepAsync(ms: number): Promise<void> {
	return new Promise((resolve) => setTimeout(resolve, ms));
}

lib/audit.ts, line 67
lastPurgeAttemptMs is not reset when configureAudit is called

the test for stale log purging uses vi.spyOn(Date, "now").mockReturnValue(Number.MAX_SAFE_INTEGER - 1_000) to advance time. when purgeExpiredLogs runs under this spy, it sets:

lastPurgeAttemptMs = Number.MAX_SAFE_INTEGER - 1000  (~9 × 10¹⁵)

after the spy is restored and the next test calls auditLog, real Date.now() returns ~1.7 × 10¹². the guard nowMs - lastPurgeAttemptMs evaluates to a large negative number (< PURGE_INTERVAL_MS), so retention cleanup is silently disabled for all remaining tests in the suite. since configureAudit doesn't reset this counter, calling it with a new retentionDays in beforeEach won't help.

add a reset to prevent this cross-test bleed:

export function configureAudit(config: Partial<AuditConfig>): void {
	auditConfig = { ...auditConfig, ...config };
	lastPurgeAttemptMs = 0;
}

lib/storage.ts, line 1165
orphaned keychain entries on process crash during serialization

serializeStorageForPersist writes all keychain secrets first, then builds the v4 JSON. if the process is killed (or the machine restarts, or antivirus locks the temp file on Windows) after keychain writes succeed but before fs.rename completes, those keychain entries become orphaned — persistedSecretRefs is never tracked anywhere on disk, and clearPersistedAccountSecrets can only clean up refs it reads from an existing file.

deriveAccountSecretRef is deterministic, so the next save will overwrite the same refs rather than multiply-orphaning them. however there's a window where a token leak persists across restarts with no recovery path.

consider writing a short "pending commit" journal entry (containing only the refs, not the tokens) to disk before starting keychain writes, then deleting it on success. on load, if this journal exists, it can be used as the cleanup manifest. this is the same WAL pattern already used for the main accounts file.

Add targeted fixes for outstanding review threads:
- fix markdown table rendering for CODEX_SECRET_STORAGE_MODE
- add audit purge retry regression coverage for EBUSY lock contention
- strengthen enterprise health-check fallback audit directory regression checks

Co-authored-by: Codex <noreply@openai.com>
@greptile-apps
Copy link

greptile-apps bot commented Mar 5, 2026

Additional Comments (3)

lib/audit.ts, line 126
sleepSync blocks the event loop on Windows — real issue

Atomics.wait on the main thread blocks the entire event loop during retry loops. on windows, antivirus scanners routinely trigger EBUSY on audit log writes, making this retry loop a live code path. a single retry cycle can freeze the event loop for up to 10 × 2^4 = 160 ms, causing visible freezes in VS Code extension context.

purgeExpiredLogs() is called synchronously from auditLog() on every write (line 197), so this impacts every audit event when a stale log exists. the retry loop must use async setTimeout-based delays instead, and purgeExpiredLogs should be made async or deferred off the hot path to prevent event loop blocking.

function sleep(ms: number): Promise<void> {
	return new Promise((resolve) => setTimeout(resolve, ms));
}

async function withRetryableAuditFsOperation<T>(operation: () => T): Promise<T> {
	let lastError: unknown;
	for (let attempt = 0; attempt < 5; attempt += 1) {
		try {
			return operation();
		} catch (error) {
			lastError = error;
			if (!isRetryableAuditFsError(error) || attempt === 4) {
				throw error;
			}
			await sleep(10 * 2 ** attempt);
		}
	}
	throw lastError;
}

this requires purgeExpiredLogs to become async and auditLog to await it (or fire-and-forget to avoid blocking the caller).


.gitleaks.toml, line 12
gitleaks allowlist is too broad — ^test[\\/] exempts all test content from scanning

the blanket ^test[\\/] path rule silently allows any real credential accidentally committed in the test/ directory to pass scanning. this defeats the purpose of secret scanning for tests, where developers commonly add fixture files with real tokens copied from prod.

tighten the allowlist to known synthetic fixture patterns:

paths = [
  '''^test[\\/].*\.fixture\.(ts|js|json|txt)$''',
  '''^test[\\/]security[\\/]''',
  '''^docs[\\/]releases[\\/]''',
  '''^docs[\\/]development[\\/]DEEP_AUDIT_2026-03-01\.md$'''
]

alternatively, if broad test exemption is intentional, document the decision and require test credential values to use the fake_refresh_token_* / secret-(access|refresh)-token regex patterns from the allowlist instead.


test/audit-retry.test.ts, line 77
missing concurrent purge+write race regression test — per concurrency policy rule 637a42e6

every code change must reference the regression test added for Windows filesystem race conditions. the current test/audit-retry.test.ts covers sequential retry-on-EBUSY, but there is no concurrent race test exercising simultaneous auditLog writes racing against purgeExpiredLogs deletion.

add a vitest test that triggers concurrent auditLog calls while purge is mid-flight, asserting:

  • no log entries are lost
  • no exceptions propagate to callers
  • the EBUSY retry path is exercised under concurrent load (critical for Windows antivirus scenarios where multiple writes hit stale audit files)

The audit purge retry helper no longer calls Atomics.wait(), eliminating event-loop blocking during transient filesystem retry loops. Added regression coverage to assert Atomics.wait is not invoked while retrying EBUSY deletions.\n\nCo-authored-by: Codex <noreply@openai.com>
@ndycode
Copy link
Owner Author

ndycode commented Mar 5, 2026

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Mar 5, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@ndycode
Copy link
Owner Author

ndycode commented Mar 5, 2026

Follow-up tracking: external blocker recorded in #44.

Current failing context remains:

  • CodeRabbit: Review rate limit exceeded

Actions already taken:

  • re-triggered @coderabbitai review
  • verified review threads are resolved

Next required step is external quota reset/increase, then rerun CodeRabbit.

@coderabbitai
Copy link

coderabbitai bot commented Mar 5, 2026

@ndycode — acknowledged. noted the external blocker in #44 and the actions already taken. triggering a fresh review pass now — if the quota has reset this should proceed normally. if the rate limit is still active, the review will queue and run once capacity is available.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link

coderabbitai bot commented Mar 5, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 27

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
lib/codex-manager.ts (1)

4065-4150: ⚠️ Potential issue | 🟠 Major

add vitest regressions for the new audit wrapper path.

i don’t see coverage in the provided tests for runWithAudit success/non-zero/throwing behavior and metadata shape. this path now gates most cli commands and should have direct regression tests in test/....

As per coding guidelines lib/**: verify every change cites affected tests (vitest).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/codex-manager.ts` around lines 4065 - 4150, Add vitest regression tests
that exercise the new audit wrapper path: write tests that invoke runWithAudit
indirectly via runCodexMultiAuthCli (using commands like "auth login", "auth
switch", and a command that returns non-zero) and directly mock/spy auditLog to
assert three cases—successful exit (exitCode 0), non-zero exit (exitCode != 0),
and thrown error—verifying the AuditOutcome and metadata shape (fields: command,
exitCode or error) are logged correctly; use spies/mocks on auditLog and the
called runners (runAuthLogin, runSwitch, runReport, etc.) to simulate each
runner behavior and restore originals after each test.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/ci.yml:
- Around line 47-74: The workflow currently only runs on ubuntu-latest which
leaves Windows filesystem/permission behavior untested; either add a windows
runner entry to the workflow matrix so jobs like "Seed enterprise health
fixture" and the "Enterprise health check" (which sets CODEX_MULTI_AUTH_DIR and
invokes npm run ops:health-check) are executed on windows, or explicitly
document/mark the CI as Linux-only in the ops runbooks and the workflow
description; update the matrix to include a windows-latest job or add a
comment/README note near the workflow to indicate linux-only support and any
Windows-specific caveats for file permissions (0o600/0o700) and health-check
scripts.
- Around line 61-67: The CI step "Seed enterprise health fixture" currently
writes openai-codex-accounts.json with "version":3; add a v4 fixture (or
replace) so the new v4 storage/keychain schema is exercised by
scripts/enterprise-health-check.js: create an additional file (e.g.,
openai-codex-accounts.v4.json) or write a v4 JSON alongside the existing seed
that includes the v4-specific keys used by the keychain-backed storage, and
update the workflow step to seed that file (or run a second seed) so the
enterprise health check runs against v4 schema paths.

In @.github/workflows/recovery-drill.yml:
- Around line 31-35: Replace the inline vitest invocation in the GitHub Actions
step (the multi-line run block that creates .tmp and calls npm run test with
explicit test files) with the packaged npm script ops:recovery-drill; update the
run command to call npm run ops:recovery-drill and forward any needed reporter
flags (e.g., --reporter and --outputFile) so the workflow uses the single source
of truth (npm script) instead of duplicating the test target list.

In @.github/workflows/retention-maintenance.yml:
- Line 16: The job-level env uses runner.temp which is unavailable at job scope;
move setting CODEX_MULTI_AUTH_DIR out of the job-level env and instead set it in
the first workflow step by writing CODEX_MULTI_AUTH_DIR=${{ runner.temp
}}/codex-retention-root to $GITHUB_ENV (e.g., in the first step's run block echo
"CODEX_MULTI_AUTH_DIR=${{ runner.temp }}/codex-retention-root" >> $GITHUB_ENV)
so subsequent steps can access the variable; update/remove the job-level
CODEX_MULTI_AUTH_DIR entry accordingly.

In @.gitleaks.toml:
- Around line 8-17: The current .gitleaks.toml allowlist block uses OR semantics
(path OR regex) which lets secrets under test/ escape; change to using per-entry
[[allowlists]] with condition = "AND" so both path and regex must match (create
entries that pair path '^test/security/fixtures[\\/]' with the appropriate
regexes), narrow the path patterns from '^test[\\/]'' to
'^test/security/fixtures[\\/]' to limit scope, pin the gitleaks action to
v8.25.0+ (or update CI test expectations) so AND is supported, and add/update a
regression test in test/security/secret-scan-regression.test.sh to cover
Windows-style paths using [\\/] to ensure cross-platform behavior is correct.

In `@docs/configuration.md`:
- Line 69: Add a migration subsection to upgrade.md that documents migrating
existing deployments from plaintext to keychain mode: explain the
CODEX_SECRET_STORAGE_MODE options (`plaintext`, `keychain`, `auto`), provide
step-by-step migration commands/configuration changes to switch the env var and
safely migrate secrets (use the plaintext→keychain flow exercised in
test/storage-v4-keychain.test.ts as the authoritative sequence), note any
required downtime or backup steps, and include Windows-specific guidance about
filesystem locks and retry/backoff behavior (EBUSY, EPERM) referencing the error
handling described in releases/v0.1.2.md so operators know to retry or stop
interfering processes during migration.

In `@docs/operations/incident-response.md`:
- Around line 81-85: The Drill Cadence docs are missing the new npm script;
update docs/operations/incident-response.md (Drill Cadence section) to include a
step that explicitly instructs operators to run the new CLI script by name: npm
run ops:recovery-drill, and mention that this script is part of the drill
workflow (referencing incident-drill-template.md if needed) so README/docs
reflect the actual CLI surface and operators know to run the recovery drill
command during monthly tabletop drills.

In `@lib/audit.ts`:
- Around line 124-139: The purge throttle timestamp (lastPurgeAttemptMs) is
being updated before the directory read, so a transient read failure (e.g.,
Windows EBUSY) blocks further purges for PURGE_INTERVAL_MS; move the assignment
of lastPurgeAttemptMs to after the successful withRetryableAuditFsOperation(()
=> readdirSync(auditConfig.logDir)) call so it only advances when the read
succeeds, keep the existing retentionDays/cutoffMs calculation logic intact, and
ensure the change touches the symbols lastPurgeAttemptMs,
withRetryableAuditFsOperation, readdirSync, PURGE_INTERVAL_MS in lib/audit.ts;
update/add vitest coverage to simulate EBUSY/429 read failures and verify
throttling is not applied on failed reads per the lib/** concurrency/Windows IO
guidelines.
- Line 207: The write to audit log should be wrapped with the existing retry
helper and the chmod should be guarded on Windows: replace the direct
writeFileSync(logPath, line, ...) with a call inside
withRetryableAuditFsOperation(() => writeFileSync(logPath, line, { encoding:
"utf8", flag: "a", mode: 0o600 })), and after successful write call
fs.chmodSync(logPath, 0o600) only if process.platform !== "win32"; also add a
Vitest in test/audit.test.ts that calls auditLog and asserts the file mode is
0o600 (or skips the check on win32) to cover the regression.

In `@lib/codex-manager.ts`:
- Around line 4082-4085: The audit call is currently passing String(error) into
metadata which may leak tokens or sensitive data; replace that with a sanitized
error string instead of raw exception text: create or use a sanitizer (e.g.,
sanitizeError) and pass a safe summary such as sanitizedError =
sanitizeError(error) || error?.message?.slice(0,200) || "error"; then update the
auditLog call (auditLog(action, "cli-user", resource, AuditOutcome.FAILURE, {
command, error: sanitizedError })) so only non-sensitive, truncated error
summaries or generic error codes are stored rather than the full
exception/stack.

In `@lib/secrets/token-store.ts`:
- Around line 202-205: The current cleanup short-circuits when
deleteSecretRefWithRetry throws for refs.refreshTokenRef, skipping the access
token deletion; change the logic in token-store (function that performs these
deletions where refs.refreshTokenRef and refs.accessTokenRef are used) so both
deleteSecretRefWithRetry calls are invoked regardless of the other's failure
(e.g., run sequentially with independent try/catch blocks and aggregate/log
errors without leaking token/email values), and add a deterministic vitest
regression in test/token-store.test.ts that stubs/fakes deleteSecretRefWithRetry
to throw on the refreshTokenRef call and asserts that deleteSecretRefWithRetry
was still called for accessTokenRef (and that errors are handled/recorded as
expected). Ensure the test targets the same function/flow that uses
refs.refreshTokenRef and refs.accessTokenRef and cite the new test in the change
description.

In `@lib/storage.ts`:
- Around line 914-921: The activeIndexByFamily map is not adjusted when accounts
are skipped during v4 hydration; update the object before calling
normalizeAccountStorage by remapping each family index
(data.activeIndexByFamily) with the same offset used for activeIndex
(skippedBeforeActiveIndex): for every (family, idx) if idx is a number compute
newIdx = Math.max(0, idx - skippedBeforeActiveIndex) and also clamp newIdx to
hydratedAccounts.length - 1, preserving non-numeric/undefined values as-is, then
pass this remapped object to normalizeAccountStorage alongside
adjustedActiveIndex; apply this transform where activeIndexByFamily is currently
forwarded (near adjustedActiveIndex and normalizeAccountStorage).

In `@scripts/audit-log-forwarder.js`:
- Around line 72-93: The resume logic can drop events when a rotating filename
(e.g., "audit.log" -> "audit.1.log") sorts before the checkpoint filename:
update collectBatch/ fileComesBefore logic so checkpoints are not compared
purely by lexicographic filename. Specifically, modify
fileComesBefore/collectBatch to detect rotated aliases (e.g., match base name
patterns like "audit.log" vs "audit.1.log") or resolve stable identities before
ordering (use inode/device or canonical file identifier when available) and only
skip files when their stable identity is earlier than checkpoint.file; also
ensure the checkpoint comparison inside collectBatch (checkpoint.file === file
&& lineNumber <= checkpoint.line) remains based on the same stable identity.
This will prevent skipping unforwarded tail lines after rotation.
- Around line 125-133: The POST in sendBatch currently does a blocking fetch
without timeout or retries; wrap the fetch call in an AbortController-based
timeout and implement retry with exponential backoff + jitter for transient
429/5xx and network errors (retry limit e.g., 3) before throwing; ensure
failures/timeouts from sendBatch propagate so the checkpoint handling
(checkpointMode flag logic) advances or fails deterministically rather than
hanging. Add unit/integration tests: one that simulates a hanging endpoint to
assert sendBatch times out, one for the Windows edge case around the
checkpointMode flag behavior at the checkpoint logic, and a concurrency test
that launches multiple instances to exercise checkpoint file race handling and
confirm recovery/locking behavior. Use the sendBatch function name and the
checkpointMode/checkpoint file handling code paths as the touch points.

In `@scripts/compliance-evidence-bundle.js`:
- Around line 51-59: The runNpm function uses execFileSync for both Windows
(cmd.exe) and non-Windows paths but doesn't pass a maxBuffer option, causing
crashes on verbose npm output; update both execFileSync calls in runNpm to
include a sufficiently large maxBuffer (e.g., 10*1024*1024 or 30*1024*1024) in
the options passed (merge with the existing options object) so stdout/stderr
won't overflow, and add regression tests invoking verbose npm commands (e.g.,
long test/build/lint output) against scripts/compliance-evidence-bundle.js to
ensure the buffer bump prevents CI failures.

In `@scripts/enterprise-health-check.js`:
- Around line 159-161: The newestMtimeMs routine has a TOCTOU race: check of
existsSync(dir) can pass but readdir(dir) can throw if the directory is
removed/renamed concurrently; wrap the readdir call in a try/catch inside
newestMtimeMs (or the function that contains the shown snippet), treat errors
with code ENOENT or ENOTDIR as “no audit data” (return null) rather than letting
the exception bubble, and only rethrow unexpected errors; also add a regression
test in test/enterprise-health-check.test.ts (near the existing health-check
test around line ~114) that simulates audit-dir churn and asserts newestMtimeMs
(or the health-check behavior) returns null / marks no-audit-data instead of
throwing.

In `@scripts/generate-sbom.js`:
- Around line 14-23: The execFileSync calls in generate-sbom.js (the branches
that use process.execPath + npmExecPath and the fallback execFileSync("npm",
...)) must include a larger stdout/stderr buffer to avoid
ERR_CHILD_PROCESS_STDIO_MAXBUFFER: add maxBuffer: 20 * 1024 * 1024 to the
options object passed to both execFileSync calls (the ones called with
npmExecPath and with "npm"), and change the fallback logic to only call
execFileSync("npm", ...) when npmExecPath is falsy (i.e., prefer using the
resolved npmExecPath variable); update references to npmExecPath, sbomArgs and
the two execFileSync invocations accordingly. Also add a regression test that
runs scripts/generate-sbom.js against a realistic dependency graph (or mocks a
large stdout from the child process) to assert the script completes without
hitting the maxBuffer error.

In `@scripts/keychain-assert.js`:
- Around line 26-30: The execFileSync call using literal "npm" can fail on
Windows; update the fallback used in the block that calls execFileSync("npm",
args, { cwd: process.cwd(), stdio: "inherit", env }) to choose the correct
executable: prefer process.env.npm_execpath when present, otherwise use a
platform-aware name (e.g., const npmBin = process.platform === "win32" ?
"npm.cmd" : "npm") and call execFileSync(npmBin, args, { cwd: process.cwd(),
stdio: "inherit", env }). Keep the same args and env variables and ensure you
reference the existing variables (args, env) and the execFileSync invocation so
only the executable string is changed.

In `@scripts/performance-budget-check.js`:
- Around line 27-29: Duplicate helper isRecord in
scripts/performance-budget-check.js currently mirrors lib/utils.ts:10-12; for
now you can leave the local function to keep the script standalone, but if this
script is later incorporated into the build graph, remove the local isRecord and
import the canonical implementation (e.g., import { isRecord } from 'lib/utils')
instead; locate the local function named isRecord in performance-budget-check.js
and either retain it as-is for standalone execution or replace it with an import
and delete the duplicate implementation when you move the script into the
bundled codebase.

In `@scripts/retention-cleanup.js`:
- Around line 99-110: The script currently builds payload with failedFiles and
status but always exits 0; change the end of the routine that constructs payload
(the block that creates payload, including properties command, root,
retentionDays, cutoffIso, deletedFiles, failedFiles, failures, status) so that
if failed.length > 0 (i.e., payload.failedFiles > 0) the process exits with a
non-zero code (e.g., process.exit(1)) or throw an error; keep the existing
"partial" status but ensure the failure causes a failing exit code, and add a
unit test in test/retention-cleanup.test.ts that simulates a deletion failure
and asserts the script exits non‑zero when failedFiles > 0.
- Around line 50-52: The current check-then-use in collectExpiredFiles around
existsSync(rootPath) + readdir(rootPath, { withFileTypes: true }) has a race
that can throw if the directory is removed concurrently; wrap the readdir call
in a try/catch inside collectExpiredFiles (or the surrounding async block) and
treat ENOENT (and optionally EPERM on Windows) as a transient condition by
skipping/returning gracefully instead of letting the exception abort the run;
keep the existsSync check if desired but ensure the catch handles these errors
and only rethrows other unexpected errors.

In `@scripts/slo-budget-report.js`:
- Around line 106-108: staleWalFindings is only counting findings with code
"stale-wal" but tests/assertions expect "stale-audit-log" as the stale audit
regression code; update the filter on health.findings used to compute
staleWalFindings to include both codes (check finding.code === "stale-wal" ||
finding.code === "stale-audit-log") or normalize/rename the code to match the
test, ensuring the variable staleWalFindings, the health.findings array and the
finding.code checks are updated consistently so the SLO evaluation counts the
intended stale-audit-log findings.

In `@test/audit.test.ts`:
- Around line 237-242: The test mocks Date.now with an out-of-range value
(Number.MAX_SAFE_INTEGER - 1_000) which can produce invalid timestamps for
utimesSync; replace that with a deterministic in-range epoch (e.g. a constant
like 1600000000000 or Date.parse("2020-09-13T12:26:40Z")) and use vi.spyOn(Date,
"now").mockReturnValue(IN_RANGE_EPOCH) so staleMs = Date.now() - 3*24*60*60*1000
and utimesSync(staleLogPath, staleDate, staleDate) receive valid dates; update
the Date.now spy in the block around staleLogPath/writeFileSync/utimesSync
accordingly and keep the deterministic vitest spy usage.

In `@test/schemas.test.ts`:
- Around line 320-330: The test suite currently exercises
AnyAccountStorageSchema for a v4 shape but does not add a regression test for
the helper safeParseAccountStorage; add a deterministic Vitest regression that
calls safeParseAccountStorage with an explicit v4 payload (version: 4, accounts
with refreshTokenRef/addedAt/lastUsed, activeIndex) and assert the parse
succeeds and result.version === 4 so the helper wiring cannot regress
independently of the union schema (reference symbols: AnyAccountStorageSchema
and safeParseAccountStorage; place next to the existing schemas tests).

In `@test/security/secret-scan-regression.test.sh`:
- Around line 58-69: The Docker fallback in the secret-scan test pins the
gitleaks image to zricethezav/gitleaks:v8.24.2; add a matching explicit version
note for the native binary usage referenced earlier (the native gitleaks path
used around the native binary invocation) so both paths declare the expected
gitleaks version; update the test script (e.g., near the native binary variable
or the native invocation) to include a comment or variable like
EXPECTED_GITLEAKS_VERSION="v8.24.2" and reference it in the native path/usage to
ensure parity with the docker image reference used in the docker run command.
- Around line 96-98: The pass-case invocation of run_gitleaks_detect (using
PASS_CASE_DIR and PASS_REPORT) prints "secret-scan-regression: passed"
unconditionally; capture and check its exit status like the fail-case does so
failures are detected and logged. Modify the block that calls
run_gitleaks_detect to save its exit code ($?), test it (e.g., if [ $rc -ne 0 ]
...), and print a clear failure message and non-zero exit/return on error,
otherwise print the existing success message; reference the run_gitleaks_detect
call and the PASS_CASE_DIR/PASS_REPORT variables when implementing the check.

---

Outside diff comments:
In `@lib/codex-manager.ts`:
- Around line 4065-4150: Add vitest regression tests that exercise the new audit
wrapper path: write tests that invoke runWithAudit indirectly via
runCodexMultiAuthCli (using commands like "auth login", "auth switch", and a
command that returns non-zero) and directly mock/spy auditLog to assert three
cases—successful exit (exitCode 0), non-zero exit (exitCode != 0), and thrown
error—verifying the AuditOutcome and metadata shape (fields: command, exitCode
or error) are logged correctly; use spies/mocks on auditLog and the called
runners (runAuthLogin, runSwitch, runReport, etc.) to simulate each runner
behavior and restore originals after each test.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 52e0294c-22c1-458a-8b40-ed47482b82ce

📥 Commits

Reviewing files that changed from the base of the PR and between 57fa543 and d0fd29f.

📒 Files selected for processing (46)
  • .github/workflows/ci.yml
  • .github/workflows/recovery-drill.yml
  • .github/workflows/release-provenance.yml
  • .github/workflows/retention-maintenance.yml
  • .github/workflows/sbom-attestation.yml
  • .github/workflows/secret-scan.yml
  • .gitleaks.toml
  • README.md
  • SECURITY.md
  • config/performance-budgets.json
  • config/performance-budgets.schema.json
  • config/slo-policy.json
  • docs/README.md
  • docs/configuration.md
  • docs/operations/audit-forwarding.md
  • docs/operations/incident-drill-template.md
  • docs/operations/incident-response.md
  • docs/operations/release-runbook.md
  • docs/operations/slo-error-budget.md
  • docs/privacy.md
  • docs/reference/settings.md
  • lib/audit.ts
  • lib/codex-manager.ts
  • lib/schemas.ts
  • lib/secrets/token-store.ts
  • lib/storage.ts
  • lib/unified-settings.ts
  • package.json
  • scripts/audit-log-forwarder.js
  • scripts/benchmark-runtime-path.mjs
  • scripts/compliance-evidence-bundle.js
  • scripts/enterprise-health-check.js
  • scripts/generate-sbom.js
  • scripts/keychain-assert.js
  • scripts/performance-budget-check.js
  • scripts/retention-cleanup.js
  • scripts/slo-budget-report.js
  • scripts/verify-sbom.js
  • test/audit-retry.test.ts
  • test/audit.test.ts
  • test/enterprise-health-check.test.ts
  • test/schemas.test.ts
  • test/security/secret-scan-regression.test.sh
  • test/storage-v4-keychain.test.ts
  • test/token-store.test.ts
  • test/unified-settings.test.ts
📜 Review details
🧰 Additional context used
📓 Path-based instructions (3)
docs/**

⚙️ CodeRabbit configuration file

keep README, SECURITY, and docs consistent with actual CLI flags and workflows. whenever behavior changes, require updated upgrade notes and mention new npm scripts.

Files:

  • docs/operations/slo-error-budget.md
  • docs/README.md
  • docs/operations/release-runbook.md
  • docs/operations/incident-drill-template.md
  • docs/configuration.md
  • docs/privacy.md
  • docs/operations/incident-response.md
  • docs/operations/audit-forwarding.md
  • docs/reference/settings.md
test/**

⚙️ CodeRabbit configuration file

tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.

Files:

  • test/unified-settings.test.ts
  • test/audit-retry.test.ts
  • test/storage-v4-keychain.test.ts
  • test/enterprise-health-check.test.ts
  • test/token-store.test.ts
  • test/security/secret-scan-regression.test.sh
  • test/schemas.test.ts
  • test/audit.test.ts
lib/**

⚙️ CodeRabbit configuration file

focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.

Files:

  • lib/unified-settings.ts
  • lib/codex-manager.ts
  • lib/schemas.ts
  • lib/audit.ts
  • lib/secrets/token-store.ts
  • lib/storage.ts
🧬 Code graph analysis (13)
scripts/benchmark-runtime-path.mjs (1)
lib/storage.ts (1)
  • normalizeAccountStorage (776-865)
test/unified-settings.test.ts (1)
lib/unified-settings.ts (3)
  • getUnifiedSettingsPath (254-256)
  • saveUnifiedPluginConfigSync (289-293)
  • loadUnifiedPluginConfigSync (267-275)
lib/unified-settings.ts (1)
lib/runtime-paths.ts (1)
  • getCodexMultiAuthDir (166-199)
test/audit-retry.test.ts (2)
lib/logger.ts (1)
  • error (389-393)
scripts/audit-dev-allowlist.js (1)
  • audit (150-161)
lib/codex-manager.ts (1)
lib/audit.ts (1)
  • auditLog (180-211)
test/storage-v4-keychain.test.ts (1)
lib/storage.ts (4)
  • saveAccounts (1310-1314)
  • getStoragePath (470-475)
  • loadAccounts (872-874)
  • clearAccounts (1389-1419)
scripts/generate-sbom.js (2)
scripts/audit-dev-allowlist.js (1)
  • process (139-147)
lib/logger.ts (1)
  • error (389-393)
lib/audit.ts (2)
lib/runtime-paths.ts (1)
  • getCodexLogDir (226-228)
scripts/retention-cleanup.js (3)
  • attempt (34-34)
  • retentionDays (71-71)
  • cutoffMs (73-73)
scripts/performance-budget-check.js (1)
lib/utils.ts (1)
  • isRecord (11-13)
scripts/compliance-evidence-bundle.js (3)
scripts/audit-log-forwarder.js (3)
  • prefix (13-13)
  • hit (14-14)
  • dryRun (137-137)
lib/logger.ts (1)
  • error (389-393)
scripts/performance-budget-check.js (1)
  • message (35-35)
lib/secrets/token-store.ts (2)
lib/logger.ts (2)
  • createLogger (366-424)
  • error (389-393)
lib/utils.ts (1)
  • sleep (65-67)
test/schemas.test.ts (1)
lib/schemas.ts (2)
  • AccountStorageV4Schema (165-170)
  • AnyAccountStorageSchema (210-214)
test/audit.test.ts (1)
lib/audit.ts (2)
  • configureAudit (65-67)
  • auditLog (180-211)
🪛 actionlint (1.7.11)
.github/workflows/retention-maintenance.yml

[error] 16-16: context "runner" is not allowed here. available contexts are "github", "inputs", "matrix", "needs", "secrets", "strategy", "vars". see https://docs.github.com/en/actions/learn-github-actions/contexts#context-availability for more details

(expression)

🪛 LanguageTool
docs/operations/incident-response.md

[style] ~39-~39: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...rst, then codex auth doctor --fix. 4. If auth failures persist, rotate account v...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~40-~40: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...dex>and re-runcodex auth check`. 5. If all accounts are exhausted/disabled, es...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

🔇 Additional comments (18)
scripts/benchmark-runtime-path.mjs (1)

109-134: good benchmark coverage for storage normalization and account-id resolution.

this adds realistic storage pressure cases and a correctness guard (version === 3) around normalizeAccountStorage in scripts/benchmark-runtime-path.mjs:177-180. this is a solid addition for perf regression tracking.

Also applies to: 170-180

test/unified-settings.test.ts (1)

6-18: strong regression coverage for windows-safe secure writes and overwrite hardening.

test/unified-settings.test.ts:6-18 and test/unified-settings.test.ts:103-115 close the windows/temp-leak and repeated-write permission regressions with deterministic vitest assertions.

Also applies to: 68-69, 100-115

test/audit-retry.test.ts (1)

20-77: good concurrency regression test for retry behavior.

test/audit-retry.test.ts:20-77 deterministically reproduces ebusy retries and verifies the non-blocking path (atomics.wait not used). this is the right regression guard for the incident fix.

scripts/verify-sbom.js (1)

12-47: sbom validation flow is clean and ci-friendly.

scripts/verify-sbom.js:12-47 has clear structural checks and machine-readable output for pipeline gating.

package.json (1)

62-74: script surface expansion looks consistent with the enterprise ops goals.

the new ops:*, sbom:*, and perf scripts in package.json:62-74 are coherent and the optional keytar wiring in package.json:134-136 fits the keychain storage path.

Also applies to: 134-136

.github/workflows/sbom-attestation.yml (1)

10-48: workflow wiring looks good.

permissions and push-only attestation gate are correctly scoped for sbom generation + provenance attestation.

.github/workflows/release-provenance.yml (1)

30-59: release gate sequence and publish auth are in place.

the quality gates, sbom checks, and NODE_AUTH_TOKEN wiring are consistent with release provenance expectations.

lib/schemas.ts (1)

143-173: v4 schema integration looks solid.

lib/schemas.ts:148-149 correctly enforces non-empty token refs, and lib/schemas.ts:210-214 cleanly extends version discrimination to v4 for migration parsing.

Also applies to: 208-214

.github/workflows/ci.yml (1)

58-59: keychain assertion step placement looks correct.

runs after build, uses dedicated npm script. the step will fail fast if keychain storage contract is broken, which is the right behavior for enterprise readiness gating.

scripts/keychain-assert.js (1)

33-37: error handling is reasonable.

the try-catch propagates the child process exit status correctly. execFileSync throws an error with status property on non-zero exit, so error?.status will capture test failures.

scripts/performance-budget-check.js (1)

40-96: validation and error handling look solid.

past concerns about missing existence checks (lines 42-47) and defensive json parsing (lines 48-53) are addressed. the violation detection loop and missing-benchmark flagging logic at lines 57-83 is correct.

test/security/secret-scan-regression.test.sh (1)

81-94: node assertion block is well-structured.

the inline node script at lines 81-94 validates three conditions: non-empty findings, presence of src/leak.txt, and absence of allowlisted test/fixture.txt. this provides strong regression coverage for scanner policy drift. as per coding guidelines for test/**, the assertions are explicit and not skipped.

docs/operations/slo-error-budget.md (1)

40-50: slo documentation references are accurate; scripts verified.

both npm run ops:slo-report in package.json and the underlying scripts/slo-budget-report.js file exist and match the documented workflows at docs/operations/slo-error-budget.md:40-50. the two command variants shown—npm script for report generation and direct invocation with --enforce for gate enforcement—are consistent with actual implementation.

config/performance-budgets.json (1)

1-9: approve - units are well documented.

the seven budgets use consistent naming. unit documentation is comprehensive: config/performance-budgets.schema.json explicitly states "all values are milliseconds" in the root description and each property, while scripts/performance-budget-check.js:7 reiterates the same. thresholds look reasonable.

note: verify regression tests exist for the performance-budget-check script itself to prevent silent failures during ci runs.

.github/workflows/secret-scan.yml (1)

1-33: action shas verified; regression test comprehensive.

both actions pinned to valid, recent commits: actions/checkout v4.3.1 (nov 2025) and gitleaks-action v2.3.9 (apr 2025). permissions appropriately minimal.

regression test at test/security/secret-scan-regression.test.sh:1-98 is solid: creates fail-case with openai api key leak pattern and pass-case with allowlisted fixture, verifies detection works and allowlist rules hold via cli and docker fallback. test uses node assertions on json output.

no windows edge cases apply here. no concurrency risks: single job, sequential steps. workflow structure is correct.

test/enterprise-health-check.test.ts (1)

56-155: good deterministic coverage for fallback and stale-audit paths.

test/enterprise-health-check.test.ts:56 (Line 56), test/enterprise-health-check.test.ts:75 (Line 75), and test/enterprise-health-check.test.ts:114 (Line 114) give strong regression coverage for missing artifacts, fallback root resolution, and stale audit detection.

test/token-store.test.ts (1)

125-241: keychain race, rollback, and cleanup coverage is in place.

test/token-store.test.ts:125 (Line 125), test/token-store.test.ts:177 (Line 177), and test/token-store.test.ts:211 (Line 211) add deterministic regression coverage for concurrent writes, access-write rollback, and secret deletion cleanup.

test/storage-v4-keychain.test.ts (1)

112-323: strong regression coverage for windows io and keychain rollback paths.

test/storage-v4-keychain.test.ts:112 (Line 112), test/storage-v4-keychain.test.ts:167 (Line 167), and test/storage-v4-keychain.test.ts:206 (Line 206) cover torn-write risk, windows-style rename retry, and rollback cleanup in deterministic vitest scenarios.

Address remaining PR #43 review feedback across workflows, docs, runtime hardening, and script reliability. Adds deterministic regressions for rotation/checkpoint edge cases, retention failure exits, storage/token cleanup behavior, unified settings permissions, and schema/parser coverage.

Co-authored-by: Codex <noreply@openai.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 24

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
.github/workflows/ci.yml (1)

20-27: 🧹 Nitpick | 🔵 Trivial

consider pinning actions to commit shas for consistency with secret-scan.yml.

.github/workflows/secret-scan.yml:19,24 pins actions to full commit shas but this workflow uses mutable version tags (actions/checkout@v4, actions/setup-node@v4). for supply chain consistency, consider pinning these as well.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/ci.yml around lines 20 - 27, The workflow uses mutable
tags actions/checkout@v4 and actions/setup-node@v4; replace those with the
corresponding full commit SHAs (the same pattern used in secret-scan.yml) to pin
versions for supply-chain consistency: update the uses lines for
actions/checkout and actions/setup-node to their full commit SHA references
instead of `@v4` and keep the existing inputs (node-version, cache) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/ci.yml:
- Around line 62-65: Extract the inline one-liner into a dedicated Node script
(e.g., scripts/seed-health-fixture.js) that uses fs.mkdirSync, path.join and
fs.writeFileSync to create the .tmp/health-fixture directory, logs subdir, and
the three JSON files; then update the workflow step (formerly running node -e
"...") to run node scripts/seed-health-fixture.js. In the new script, build
paths with path.join and use recursive mkdir and explicit JSON.stringify calls
(matching the existing object shapes and filenames openai-codex-accounts.json,
settings.json, and logs/audit.log) so the logic is maintainable and
Windows-safe.

In @.github/workflows/recovery-drill.yml:
- Around line 36-37: The health-check step currently redirects only stdout using
the command "node scripts/enterprise-health-check.js >
.tmp/recovery-drill-health.json" so stderr (warnings/errors) are lost; update
the step to capture both streams by redirecting stderr into stdout (e.g., append
"2>&1" to the command) or use a tool like tee to persist combined output,
ensuring the "Run health check snapshot" command writes both stdout and stderr
into .tmp/recovery-drill-health.json for later analysis.

In `@docs/operations/audit-forwarding.md`:
- Around line 45-60: The delivery contract needs an added section describing
retry behavior on delivery failures: inspect scripts/audit-log-forwarder.js
(look for the functions that perform sending/forwarding and checkpointing) and
document whether the forwarder retries on network/SIEM errors, the max retry
count, backoff strategy (fixed/exponential and intervals), behavior on permanent
failures (when it stops retrying and requires manual re-run), and how
checkpoints (`file`, `line`, `updatedAt`) are updated/rolled back during
retries; add this as a short "Failure & Retry Behavior" subsection that
references the actual behavior and any configurable parameters found in that
script.
- Around line 17-19: Rephrase the `CODEX_SIEM_API_KEY` bullet to remove the
circular wording: change "optional bearer token, if required by SIEM" to a clear
condition such as "bearer token required when the SIEM endpoint enforces
authentication (provide API key here)" so it states when `CODEX_SIEM_API_KEY`
must be set and when it can be omitted.

In `@docs/operations/incident-drill-template.md`:
- Around line 61-66: Update the Exit Criteria checklist item that reads "health
check returned `pass`" to explicitly reference the JSON field and expected
value—e.g., change it to something like `health check returned status: "pass"
(check JSON "status" field)` so operators know to inspect the health check JSON
`status` field; modify the string in the document where the checklist item is
defined (the line containing "health check returned `pass`") to include this
clearer text and/or a short reference to the JSON structure.

In `@docs/upgrade.md`:
- Around line 75-80: Move the keychain availability validation before
instructing operators to set CODEX_SECRET_STORAGE_MODE; specifically, place the
`npm run ops:keychain-assert` step prior to the line that tells users to set
CODEX_SECRET_STORAGE_MODE=keychain (or auto), or alternatively add an explicit
note next to CODEX_SECRET_STORAGE_MODE advising to run `npm run
ops:keychain-assert` first to confirm keychain availability; reference the env
var name CODEX_SECRET_STORAGE_MODE and the npm script ops:keychain-assert when
updating the docs so readers clearly validate before choosing "auto".

In `@lib/audit.ts`:
- Around line 118-131: withRetryableAuditFsOperation currently retries up to 5
times with no delay; update it to perform exponential backoff between attempts
so transient EBUSY/contention errors can clear. Inside
withRetryableAuditFsOperation, after catching a retryable error (use
isRetryableAuditFsError) and before the next loop iteration, add a delay that
grows with attempt (e.g., baseSleepMs * 2**attempt) or mirror the exponential
backoff used in unified-settings.ts, and ensure the final
non-retryable/last-attempt behavior still throws the original error (or
lastError) as before.

In `@lib/codex-manager.ts`:
- Around line 4065-4072: sanitizeAuditError currently redacts token patterns but
can leak email addresses; update sanitizeAuditError to also mask emails by
applying the existing maskEmail(email: string) utility (used elsewhere) or an
equivalent regex replacement before truncation, e.g., call maskEmail on the raw
string or run a regex like /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b/g
to replace matches with a masked value, keep the existing token redactions and
slice(0, 200) behavior, and ensure references to sanitizeAuditError and
maskEmail are used to locate the change.

In `@lib/secrets/token-store.ts`:
- Around line 69-71: Remove the duplicated local sleep function in
token-store.ts and instead import and use the shared sleep utility from
lib/utils (the same helper currently defined at utils.ts lines ~64-66);
specifically delete the local function named sleep and add an import for the
sleep function, then update any local references to call the imported sleep to
eliminate duplication (refer to the sleep symbol in token-store.ts and the sleep
export in utils).

In `@README.md`:
- Line 189: The markdown table row for CODEX_SECRET_STORAGE_MODE contains
unescaped pipe characters in the description that break the table; update the
README entry for CODEX_SECRET_STORAGE_MODE to escape the pipes or wrap the
entire value in code spans/backticks (e.g., escape "|" as "\|" or use
"`keychain|plaintext|auto`") so the table renders as the correct number of
columns and the description no longer splits into extra cells.

In `@scripts/audit-log-forwarder.js`:
- Around line 231-250: withCheckpointLock currently acquires a lock via
open(lockPath, "wx") and retries, but lacks Windows EPERM handling, stale-lock
detection, and an overall timeout; update withCheckpointLock to (1) write the
current process pid into the lock file after successful open so we can validate
authorship, (2) on failed open, if error.code === "EPERM" treat it like
contention on Windows (same backoff path), (3) when seeing an existing lock
file, stat it and check its age (mtime) against a configurable STALE_LOCK_MS and
if older then attempt to read pid and validate process existence (where
possible) and remove the stale lock before retrying, and (4) enforce a total
timeout/attempt cap (or compute total waited time) to throw a clear timeout
error instead of waiting indefinitely; reference the withCheckpointLock function
and CHECKPOINT_LOCK_MAX_ATTEMPTS/ sleep/backoff logic when implementing these
changes.

In `@scripts/enterprise-health-check.js`:
- Around line 186-208: Remove the redundant existsSync call in function
checkSecureMode and rely solely on await stat(path); update the catch block to
explicitly handle ENOENT by skipping/returning when error.code === "ENOENT" and
only push a "stat-failed" finding for other errors; keep the rest of the
permission-check logic (details.mode & 0o777 comparison and pushing the
"insecure-file-permissions" finding) unchanged so that TOCTOU is avoided and
deleted/missing files are treated as non-errors.

In `@scripts/slo-budget-report.js`:
- Around line 55-76: The runHealthCheck function calls execFileSync without a
timeout and builds the error message only from stdout/stderr, which can hang CI
and produce empty findings for spawn errors; change runHealthCheck to invoke
execFileSync with a timeout and appropriate kill signal/options (e.g., timeout
and killSignal/maxBuffer) when running "scripts/enterprise-health-check.js" and,
in the catch block, fall back to error.message (or error.toString()) when
error.stdout/error.stderr are empty so the finding contains actionable text;
also add unit tests for runHealthCheck to cover a stalled child (timeout), spawn
ENOENT/permission errors, and Windows behavior to prevent regressions.

In `@test/audit-retry.test.ts`:
- Around line 40-48: The test's fs mock in vi.doMock is missing chmodSync
causing lib/audit.ts (which calls chmodSync(logPath, 0o600)) to hit the real fs
or throw on POSIX; update the mock returned by vi.doMock in
test/audit-retry.test.ts to include chmodSync (a no-op or stub) alongside
writeFileSync, mkdirSync, existsSync, statSync, renameSync, readdirSync, and
unlinkSync so the chmodSync call in the audit code is handled by the mock during
tests.

In `@test/compliance-evidence-bundle.test.ts`:
- Around line 48-49: The test uses a hardcoded "node" call in the fake npm shims
(variables npmShellSource and npmCmdSource) which can pick up different node
executables from PATH and cause flakes; change both shims to invoke the exact
node binary via process.execPath (use process.execPath.replace(/\\/g, "/") for
the POSIX shim and process.execPath for the Windows shim) so the launched
fake-npm runs the same Node used for the test runner, keeping tests
deterministic.
- Around line 71-88: Add a dedicated Windows regression test in
test/compliance-evidence-bundle.test.ts that explicitly exercises the Windows
cmd.exe npm invocation path used by runNpm(): create a fake npm shim named
npm.cmd (or a cmd.exe-wrapper) in the fake bin (reusing createFakeNpmBin or a
small variant) that emits very verbose output, set PATH to point to that bin and
ensure process.platform is treated as 'win32' for the test (or use the test
harness to simulate Windows behavior), then spawn the scriptPath as in the
existing test and assert the run completes without a maxBuffer-related failure;
ensure you restore any mutated globals/env (process.platform override and PATH)
after the test.

In `@test/retention-cleanup.test.ts`:
- Around line 11-26: The removeWithRetry function is duplicated; extract the
logic into a single shared test utility module and export a function named
removeWithRetry, then replace the duplicated implementations in the tests with
an import of that exported removeWithRetry; ensure the exported function
preserves the same signature (async function removeWithRetry(targetPath:
string): Promise<void>) and behavior (same retryableCodes, ENOENT handling,
exponential backoff) and update any test imports to use the new shared utility.
- Around line 80-86: The test declares `let result;` with an implicit any —
explicitly type it to match the return of `runRetentionCleanup` (the `spawnSync`
result) to satisfy `noImplicitAnyLet`: e.g. import and use
`SpawnSyncReturns<string>` from 'child_process' and change the declaration to
`let result: SpawnSyncReturns<string>;` (or use `let result: ReturnType<typeof
runRetentionCleanup>;`) so the variable has the correct typed return shape.
- Around line 65-69: Replace the inline Windows skip in the test block named
"exits non-zero when deletions fail" by using Vitest's conditional skip helper:
remove the dummy assertion and early return inside the it(...) and instead
change the test declaration to use it.skipIf(process.platform === "win32", ...)
so the test is skipped on Windows; update the test identifier (the string "exits
non-zero when deletions fail") and preserve the existing async callback body and
assertions unchanged.

In `@test/schemas.test.ts`:
- Around line 241-270: Add a unit test in the AccountStorageV4Schema describe
block that verifies v4 accepts the optional activeIndexByFamily field: create a
storage object based on validStorage, add an activeIndexByFamily object (e.g., {
codex: 0, legacy: 0 }), call AccountStorageV4Schema.safeParse(storage) and
assert result.success is true; place the test near the other V4 tests so it
exercises the optional activeIndexByFamily handling.

In `@test/security/secret-scan-regression.test.sh`:
- Around line 27-35: Replace the realistic-looking OPENAI_API_KEY values in the
test fixtures (files written under "${FAIL_CASE_DIR}/src/leak.txt" and
"${FAIL_CASE_DIR}/test/security/fixtures/real-secret.txt") with clearly
synthetic placeholders that cannot be mistaken for production secrets (e.g.
"OPENAI_API_KEY=sk-test-placeholder-0000" or
"OPENAI_API_KEY=PLACEHOLDER_API_KEY") while preserving the fixture structure and
intent of the regression (so tests still exercise detection of secret-like
patterns); ensure the other fixture
("${FAIL_CASE_DIR}/test/security/fixtures/fixture.txt") remains deterministic
and do not remove assertions or change test semantics, only change token strings
to non-production-looking synthetic values.
- Around line 1-4: The shell harness file secret-scan-regression.test.sh should
not live under test/**; move that script to scripts/ (e.g.,
scripts/secret-scan-regression.sh) and create a Vitest test (e.g.,
secret-scan-regression.test.ts) that either invokes the moved script via a
controlled runner (execa/spawn) or, better, ports the shell checks into Vitest
assertions using expect; ensure the Vitest test enforces determinism (no real
secrets, no skipped assertions), reproduces required regression conditions
(concurrency/token-refresh/windows cases as needed), and returns proper
pass/fail status through Vitest reporting rather than executing a standalone
shell harness under test/**.

In `@test/token-store.test.ts`:
- Around line 148-158: The test's race setup is flaky because awaiting
Promise.resolve() does not guarantee the first call to
tokenStore.persistAccountSecrets entered the mock's critical branch; add a
deferred signal (e.g., firstRefreshEntered) that is resolved inside the mock
when refreshCallCount === 1, then await that signal before resolving refreshGate
so the first write is deterministically inside the blocking region; update the
mock and test to create and await firstRefreshEntered around the
refreshCallCount === 1 check and only then call refreshGate.resolve().

---

Outside diff comments:
In @.github/workflows/ci.yml:
- Around line 20-27: The workflow uses mutable tags actions/checkout@v4 and
actions/setup-node@v4; replace those with the corresponding full commit SHAs
(the same pattern used in secret-scan.yml) to pin versions for supply-chain
consistency: update the uses lines for actions/checkout and actions/setup-node
to their full commit SHA references instead of `@v4` and keep the existing inputs
(node-version, cache) unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: da374995-84ca-45c4-9649-3bac38e425d3

📥 Commits

Reviewing files that changed from the base of the PR and between 57fa543 and fdfdb2c.

📒 Files selected for processing (52)
  • .github/workflows/ci.yml
  • .github/workflows/recovery-drill.yml
  • .github/workflows/release-provenance.yml
  • .github/workflows/retention-maintenance.yml
  • .github/workflows/sbom-attestation.yml
  • .github/workflows/secret-scan.yml
  • .gitleaks.toml
  • README.md
  • SECURITY.md
  • config/performance-budgets.json
  • config/performance-budgets.schema.json
  • config/slo-policy.json
  • docs/README.md
  • docs/configuration.md
  • docs/operations/audit-forwarding.md
  • docs/operations/incident-drill-template.md
  • docs/operations/incident-response.md
  • docs/operations/release-runbook.md
  • docs/operations/slo-error-budget.md
  • docs/privacy.md
  • docs/reference/settings.md
  • docs/upgrade.md
  • lib/audit.ts
  • lib/codex-manager.ts
  • lib/schemas.ts
  • lib/secrets/token-store.ts
  • lib/storage.ts
  • lib/unified-settings.ts
  • package.json
  • scripts/audit-log-forwarder.js
  • scripts/benchmark-runtime-path.mjs
  • scripts/compliance-evidence-bundle.js
  • scripts/enterprise-health-check.js
  • scripts/generate-sbom.js
  • scripts/keychain-assert.js
  • scripts/performance-budget-check.js
  • scripts/retention-cleanup.js
  • scripts/slo-budget-report.js
  • scripts/verify-sbom.js
  • test/audit-log-forwarder.test.ts
  • test/audit-retry.test.ts
  • test/audit.test.ts
  • test/compliance-evidence-bundle.test.ts
  • test/enterprise-health-check.test.ts
  • test/generate-sbom.test.ts
  • test/retention-cleanup.test.ts
  • test/schemas.test.ts
  • test/security/secret-scan-regression.test.sh
  • test/slo-budget-report.test.ts
  • test/storage-v4-keychain.test.ts
  • test/token-store.test.ts
  • test/unified-settings.test.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (3)
docs/**

⚙️ CodeRabbit configuration file

keep README, SECURITY, and docs consistent with actual CLI flags and workflows. whenever behavior changes, require updated upgrade notes and mention new npm scripts.

Files:

  • docs/operations/release-runbook.md
  • docs/privacy.md
  • docs/operations/incident-drill-template.md
  • docs/upgrade.md
  • docs/operations/audit-forwarding.md
  • docs/operations/slo-error-budget.md
  • docs/configuration.md
  • docs/README.md
  • docs/reference/settings.md
  • docs/operations/incident-response.md
test/**

⚙️ CodeRabbit configuration file

tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.

Files:

  • test/schemas.test.ts
  • test/token-store.test.ts
  • test/compliance-evidence-bundle.test.ts
  • test/retention-cleanup.test.ts
  • test/enterprise-health-check.test.ts
  • test/unified-settings.test.ts
  • test/storage-v4-keychain.test.ts
  • test/audit-log-forwarder.test.ts
  • test/audit-retry.test.ts
  • test/slo-budget-report.test.ts
  • test/audit.test.ts
  • test/security/secret-scan-regression.test.sh
  • test/generate-sbom.test.ts
lib/**

⚙️ CodeRabbit configuration file

focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.

Files:

  • lib/codex-manager.ts
  • lib/secrets/token-store.ts
  • lib/schemas.ts
  • lib/unified-settings.ts
  • lib/audit.ts
  • lib/storage.ts
🧬 Code graph analysis (14)
scripts/benchmark-runtime-path.mjs (1)
lib/storage.ts (1)
  • normalizeAccountStorage (776-865)
test/schemas.test.ts (1)
lib/schemas.ts (3)
  • AccountStorageV4Schema (165-170)
  • AnyAccountStorageSchema (210-214)
  • safeParseAccountStorage (309-315)
test/compliance-evidence-bundle.test.ts (1)
scripts/compliance-evidence-bundle.js (3)
  • outDir (141-141)
  • result (152-152)
  • payload (157-168)
test/retention-cleanup.test.ts (1)
scripts/retention-cleanup.js (5)
  • attempt (34-34)
  • code (39-39)
  • code (55-55)
  • root (81-81)
  • payload (108-117)
test/enterprise-health-check.test.ts (1)
scripts/enterprise-health-check.js (5)
  • code (164-164)
  • root (213-213)
  • payload (279-286)
  • findings (214-214)
  • checks (215-215)
test/unified-settings.test.ts (1)
lib/unified-settings.ts (3)
  • getUnifiedSettingsPath (254-256)
  • saveUnifiedPluginConfigSync (289-293)
  • loadUnifiedPluginConfigSync (267-275)
test/audit-log-forwarder.test.ts (1)
scripts/audit-log-forwarder.js (5)
  • endpoint (266-266)
  • root (269-269)
  • logDir (270-270)
  • checkpointPath (271-271)
  • checkpoint (274-274)
scripts/performance-budget-check.js (3)
scripts/slo-budget-report.js (1)
  • payload (152-179)
lib/utils.ts (1)
  • isRecord (11-13)
scripts/enterprise-health-check.js (1)
  • payload (279-286)
lib/secrets/token-store.ts (2)
lib/logger.ts (2)
  • createLogger (366-424)
  • error (389-393)
lib/utils.ts (1)
  • sleep (65-67)
scripts/enterprise-health-check.js (1)
lib/runtime-paths.ts (1)
  • getCodexHomeDir (43-46)
test/slo-budget-report.test.ts (1)
scripts/slo-budget-report.js (4)
  • root (88-88)
  • logDir (89-89)
  • policyPath (85-85)
  • payload (152-179)
test/audit.test.ts (1)
lib/audit.ts (3)
  • getAuditLogPath (232-234)
  • auditLog (190-230)
  • configureAudit (74-77)
scripts/retention-cleanup.js (1)
scripts/enterprise-health-check.js (1)
  • code (164-164)
lib/storage.ts (5)
lib/storage/migrations.ts (5)
  • AccountStorageV1 (34-38)
  • AccountStorageV3 (59-64)
  • AccountStorageV4 (83-88)
  • AccountMetadataV3 (40-57)
  • migrateV3ToV4 (129-160)
lib/secrets/token-store.ts (6)
  • getEffectiveSecretStorageMode (99-105)
  • loadAccountSecrets (169-186)
  • ensureSecretStorageBackendAvailable (115-119)
  • deriveAccountSecretRef (121-132)
  • persistAccountSecrets (134-167)
  • deleteAccountSecrets (188-218)
lib/prompts/codex.ts (2)
  • ModelFamily (47-52)
  • MODEL_FAMILIES (58-64)
lib/schemas.ts (2)
  • AnyAccountStorageSchema (210-214)
  • AccountStorageV4Schema (165-170)
lib/utils.ts (1)
  • isRecord (11-13)
🪛 Biome (2.4.4)
test/retention-cleanup.test.ts

[error] 81-81: This variable implicitly has the any type.

(lint/suspicious/noImplicitAnyLet)

🪛 LanguageTool
docs/operations/incident-response.md

[style] ~39-~39: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...rst, then codex auth doctor --fix. 4. If auth failures persist, rotate account v...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~40-~40: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...dex>and re-runcodex auth check`. 5. If all accounts are exhausted/disabled, es...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

🪛 markdownlint-cli2 (0.21.0)
README.md

[warning] 189-189: Table column count
Expected: 2; Actual: 4; Too many cells, extra data will be missing

(MD056, table-column-count)

🔇 Additional comments (72)
scripts/compliance-evidence-bundle.js (1)

52-65: maxbuffer hardening is correctly applied on both npm execution paths.

looks good. scripts/compliance-evidence-bundle.js:52-65 now sets maxBuffer for both cmd.exe and direct npm, and this is covered by the verbose-output regression in test/compliance-evidence-bundle.test.ts:71-106. no additional concurrency risk is introduced here.

docs/reference/settings.md (2)

129-136: lgtm — enterprise keychain pin documented correctly.

the new CODEX_SECRET_STORAGE_MODE entry and enterprise recommendation align with docs/configuration.md guidance. the pin to keychain for production is the correct default per the referenced storage-mode docs.


183-183: lgtm — cross-reference added.

linking to ../configuration.md improves discoverability for operators looking up storage mode semantics.

config/slo-policy.json (1)

1-8: lgtm — slo policy structure is well-defined.

the 99.5% request success rate and zero-tolerance for stale wal findings align with enterprise reliability expectations. scripts/slo-budget-report.js consumes this correctly per test coverage in test/slo-budget-report.test.ts.

SECURITY.md (1)

87-91: lgtm — pre-release script sequence documented.

the addition of ops:health-check, perf:budget-check, sbom:generate, and sbom:verify before lint aligns with CI workflow steps and docs/operations/release-runbook.md preconditions.

config/performance-budgets.json (1)

1-9: lgtm — performance budgets defined with schema validation.

the thresholds are reasonable and config/performance-budgets.schema.json documents the units per prior feedback. scripts/performance-budget-check.js consumes these correctly.

docs/operations/slo-error-budget.md (1)

1-59: lgtm — slo policy documentation is comprehensive.

the error budget thresholds (50%/100% burn triggers) and governance steps align with enterprise reliability practices. the --enforce flag for CI gating is well-documented.

docs/privacy.md (1)

51-64: lgtm — retention and audit forwarding documented with override examples.

the --days=30 override example addresses prior feedback. the --dry-run flag for audit forwarding aligns with docs/operations/audit-forwarding.md guidance for testing before production.

.github/workflows/recovery-drill.yml (1)

31-34: lgtm — recovery drill uses packaged npm script.

the workflow correctly uses npm run ops:recovery-drill instead of inline vitest targets, avoiding drift between workflow and package.json:72 as flagged in prior review.

.gitleaks.toml (1)

1-19: lgtm — gitleaks config addresses prior feedback.

the config now includes:

  • [extend] useDefault = true to inherit detection rules
  • condition = "AND" requiring both path and regex match
  • [\\/] for windows path compatibility
  • sk-test- prefix instead of sk- to avoid suppressing real api keys

the narrowed path to test[\\/]security[\\/]fixtures[\\/] limits allowlist scope appropriately.

test/audit.test.ts (1)

250-267: good deterministic stale-log regression.

test/audit.test.ts:252 uses an in-range fixed epoch and restores the spy in finally, which keeps the purge test stable and reproducible.

test/unified-settings.test.ts (2)

6-28: windows-safe secure-write coverage is solid.

test/unified-settings.test.ts:7-14 now asserts readability and no leaked temp files on win32, while preserving strict mode checks on posix.


115-127: good regression for repeated sync writes.

the new test at test/unified-settings.test.ts:115 validates state correctness and secure file mode persistence across consecutive sync writes.

scripts/verify-sbom.js (1)

1-49: lgtm, clean sbom verification script.

the validation flow is straightforward. fail() exits on error, payload construction is safe. no token/secret handling here.

config/performance-budgets.schema.json (1)

1-53: schema looks correct.

strict additionalProperties: false with required keys and minimum: 0 constraints. properly typed as numbers.

lib/codex-manager.ts (2)

4074-4097: audit wrapping implementation looks solid.

lib/codex-manager.ts:4074-4097 correctly logs success/failure outcomes with sanitized errors and exit codes. the rethrow preserves original exception behavior.


4124-4159: cli commands properly wrapped with audit.

all auditable commands (login, list, status, switch, check, verify-flagged, forecast, report, fix, doctor) now route through runWithAudit. features intentionally excluded since it's a static report.

lib/unified-settings.ts (2)

127-151: sync write path correctly hardens permissions after rename.

lib/unified-settings.ts:127-151 now applies chmodSync to both directory and file after atomic rename on non-windows. this addresses the prior overwrite-permission gap. retry logic with Atomics.wait is acceptable for synchronous context.


193-217: async write path mirrors sync hardening correctly.

lib/unified-settings.ts:193-217 uses fs.chmod with platform guard and best-effort error handling. consistent with sync path behavior.

lib/audit.ts (2)

133-162: purge throttle correctly positioned after successful read.

lib/audit.ts:149 updates lastPurgeAttemptMs only after the readdir succeeds. this prevents a transient read failure from blocking purge attempts for the full throttle interval. addresses prior review feedback.


200-226: audit write path properly hardened.

lib/audit.ts:200-226 now:

  • calls rotateLogsIfNeeded() before purgeExpiredLogs() (correct order)
  • wraps writeFileSync in retry helper
  • applies chmod 0o600 on non-windows with best-effort handling

all prior review concerns addressed.

test/audit-retry.test.ts (1)

74-76: good regression coverage for non-blocking retry.

test/audit-retry.test.ts:76 asserts Atomics.wait is not called, which validates the pr description claim that blocking audit retry sleep was removed. deterministic mocking with controlled EBUSY failures.

scripts/enterprise-health-check.js (2)

158-169: newestMtimeMs correctly handles directory churn.

scripts/enterprise-health-check.js:161-169 wraps readdir in try/catch for ENOENT/ENOTDIR. the existsSync at line 159 is redundant since the catch handles missing dirs, but it's harmless as an early exit optimization.


222-236: wal check toctou race fixed.

scripts/enterprise-health-check.js:222-236 removed the existsSync check and now wraps the entire stat call in try/catch. the transient unavailability comment is clear.

scripts/retention-cleanup.js (3)

33-47: removeWithRetry has correct exponential backoff.

scripts/retention-cleanup.js:33-47 retries up to 6 times with 25 * 2 ** attempt ms delay for EBUSY/EPERM/ENOTEMPTY. ENOENT treated as success (file already gone). good pattern.


49-60: collectExpiredFiles handles directory churn.

scripts/retention-cleanup.js:52-60 wraps readdir in try/catch and skips on ENOENT/ENOTDIR/EPERM. the existsSync at line 50 is redundant but harmless. addresses prior review feedback.


108-122: partial cleanup now fails with exit code 1.

scripts/retention-cleanup.js:119-121 exits non-zero when failed.length > 0. this ensures automation detects retention drift. addresses prior review feedback.

.github/workflows/retention-maintenance.yml (1)

1-48: workflow structure looks correct after runner.temp fix.

the fix at line 20 properly moves CODEX_MULTI_AUTH_DIR to step-level where runner.temp is available. workflow now follows github actions context availability rules.

one minor observation: the inline node scripts at lines 33 and 42 are dense one-liners. consider extracting to a small helper script under scripts/ for maintainability, but not blocking.

test/enterprise-health-check.test.ts (1)

1-178: test coverage for enterprise-health-check looks comprehensive.

tests cover missing artifacts, fallback root resolution, stale audit detection, and ENOTDIR edge cases. the pathsEqual helper at test/enterprise-health-check.test.ts:39-43 properly handles windows path normalization.

lib/storage.ts (5)

876-939: v4 hydration correctly remaps activeIndexByFamily when accounts are skipped.

lib/storage.ts:917-931 implements the fix from past review: remapFamilyIndex adjusts family indices based on skippedIndices, and line 928 properly clamps to hydratedAccounts.length - 1. the logging at line 898-900 now uses accountId only, avoiding PII leakage.


1135-1185: serialize path now has rollback-safe secret writes.

lib/storage.ts:1147-1184 collects persistedSecretRefs during the loop and performs cleanup via Promise.allSettled at line 1180-1182 if any step fails. this addresses the prior transactionality concern.


1354-1373: collectPersistedSecretRefs handles both v4 payloads and wal journal content.

lib/storage.ts:1354-1373 first checks for direct v4 shape, then falls back to parsing wal content field. this addresses the prior concern about missing wal payload refs during cleanup.


1375-1400: clearPersistedAccountSecrets uses force mode for mode-agnostic deletion.

lib/storage.ts:1382 calls deleteAccountSecrets(ref, { force: true }) which bypasses mode checks and attempts keychain deletion regardless of current mode. this addresses the prior mode-flip concern.

one consideration: if keytar is unavailable, deleteAccountSecrets with force: true will silently return (line 200 in token-store.ts). this is acceptable for cleanup but worth noting.


1281-1285: rollback on save failure cleans up persisted secrets.

lib/storage.ts:1281-1285 performs cleanup of persistedSecretRefs when save fails after secrets were written. this completes the transactional save flow.

scripts/audit-log-forwarder.js (2)

187-229: sendBatch now implements timeout and retry with backoff.

scripts/audit-log-forwarder.js:197-228 properly uses AbortController with configurable timeout, retries on 429/5xx and network errors, and applies exponential backoff with jitter. this addresses the prior blocking fetch concern.


110-143: checkpoint rotation is under-tested; file-based locking lacks windows guarantees.

rotation handling at scripts/audit-log-forwarder.js:110-143 uses filename heuristics with line-count fallback, but test coverage is minimal. test/audit-log-forwarder.test.ts:141 only covers the checkpoint.line > activeLineCount case; no tests for rapid rotation during collection, concurrent checkpoint writes, or sequential multi-rotation scenarios.

more critically, checkpoint locking at scripts/audit-log-forwarder.js:235 relies on open(lockPath, "wx", 0o600) which has weaker semantics on windows (not exclusive like unix). this introduces a real race condition on windows when concurrent instances try to write the checkpoint—two processes could both acquire the lock and corrupt the file.

add regression tests for:

  • rapid rotation (audit.log → audit.1.log → audit.2.log in succession)
  • checkpoint lock contention
  • windows-specific concurrency (if windows support is required, consider cross-platform atomic rename or explicit mutex)
lib/secrets/token-store.ts (3)

47-67: esm/cjs interop for keytar is now correct.

lib/secrets/token-store.ts:51-52 properly handles cjs default export: (imported.default ?? imported) as KeytarModule. this fixes the prior issue where keytar methods were undefined.


148-160: rollback on access token write failure is now implemented.

lib/secrets/token-store.ts:148-160 wraps the access token write in try/catch and calls deleteSecretRefWithRetry to clean up the refresh token on failure. the warning at line 154-157 logs only refreshTokenRef, not the actual token - safe for PII.


202-217: delete cleanup continues after first failure.

lib/secrets/token-store.ts:202-217 collects errors from both delete attempts and throws the first error only after both are attempted. this prevents orphaned access secrets when refresh deletion fails.

docs/README.md (1)

62-66: operations documentation links added consistently.

docs/README.md:62-66 adds links to the new operational runbooks (incident-response, incident-drill-template, release-runbook, slo-error-budget, audit-forwarding). these align with the pr objectives for enterprise readiness.

docs/configuration.md (1)

69-69: lgtm - documentation aligns with enterprise guidance.

the CODEX_SECRET_STORAGE_MODE entry now renders correctly and matches the format used elsewhere in the table (e.g., CODEX_MODE=0/1 at line 65). enterprise pin recommendation is consistent with docs/reference/settings.md and docs/reference/storage-paths.md.

docs/upgrade.md (1)

65-99: migration section addresses past review feedback.

the plaintext→keychain migration flow is documented with backup, env var, validation, rewrite, and health check steps. windows-specific EBUSY/EPERM guidance at lines 97-98 aligns with releases/v0.1.2.md error handling patterns.

.github/workflows/secret-scan.yml (2)

1-35: workflow addresses prior review feedback on sha pinning and permissions.

actions are pinned to full commit shas (lib/.github/workflows/secret-scan.yml:19,24), explicit permissions block added with least privilege, and regression test step included.


27-35: the version prefix difference between GITLEAKS_VERSION (line 27) and EXPECTED_GITLEAKS_VERSION (line 34) doesn't affect the regression test. test/security/secret-scan-regression.test.sh:7 only uses EXPECTED_GITLEAKS_VERSION with the v prefix for the docker image tag and never references GITLEAKS_VERSION. the mismatch either serves different purposes (action config vs test environment) or GITLEAKS_VERSION is unused, but it won't cause test failures or false positives.

			> Likely an incorrect or invalid review comment.
.github/workflows/ci.yml (1)

11-17: matrix now includes windows-latest - addresses past review.

ci will now exercise file permission hardening (0o600/0o700) and health check scripts on both ubuntu and windows runners. v4 fixture with refreshTokenRef/accessTokenRef at line 64 also addresses the v4 schema coverage gap.

test/slo-budget-report.test.ts (3)

11-26: removeWithRetry handles windows filesystem locks correctly.

the retry logic at test/slo-budget-report.test.ts:11-26 handles EBUSY, EPERM, ENOTEMPTY with exponential backoff - aligns with windows filesystem behavior mentioned in docs/upgrade.md:97-98.


39-76: test validates stale-wal-findings evaluation path.

deterministic test: creates fixture, sets mtime 10 days back, runs script synchronously, and asserts health.staleWalFindings === 1 and evaluation status fail. the test is isolated and uses vitest per guidelines.


28-37: slo-budget-report.js doesn't write checkpoint files—no concurrency regression test needed here.

this script reads audit logs and writes a single output report (line 182 of scripts/slo-budget-report.js). it has no stateful checkpoint, resume-point, or locking logic like audit-log-forwarder.js does. concurrent runs would safely read the same source logs and write to separate output paths without races.

per coding guidelines, regression tests for concurrency should target actual bugs, not hypothetical ones. the current test exercises the core functionality; adding a concurrent-run test would verify something that's already correct by design (readonly log dir + independent output paths).

docs/operations/incident-drill-template.md (1)

36-42: command evidence section aligns with incident-response.md.

the commands at lines 39-42 (npm run ops:health-check, codex auth report --live --json, codex auth doctor --json) match the evidence collection requirements in docs/operations/incident-response.md.

test/schemas.test.ts (3)

241-270: v4 schema tests cover core paths.

test/schemas.test.ts:241-270 validates v4 storage acceptance, missing refreshTokenRef rejection, and empty accessTokenRef rejection. aligns with lib/schemas.ts:164-169 schema definition.


320-330: v4 union test addresses past review feedback.

this test at test/schemas.test.ts:320-330 exercises AnyAccountStorageSchema with a v4 payload, confirming the discriminated union handles version 4 correctly.


475-483: safeParseAccountStorage v4 regression test addresses past review.

test/schemas.test.ts:475-483 adds explicit v4 success case for safeParseAccountStorage, ensuring helper wiring cannot regress independently of the union schema.

.github/workflows/sbom-attestation.yml (1)

1-48: workflow looks good overall.

permissions are correctly scoped for attestation (id-token write, attestations write). the conditional if: github.event_name == 'push' && github.ref == 'refs/heads/main' at line 45 correctly limits provenance attestation to main branch pushes.

.github/workflows/release-provenance.yml (2)

30-48: quality gate fixture setup and checks look complete.

the inline fixture creation at lines 34-37 ensures ops:health-check --require-files passes in CI. this addresses the earlier feedback about missing quality gates.

one consideration: the fixture at line 35 uses version:3 for accounts but keychain mode would expect v4. since CODEX_SECRET_STORAGE_MODE isn't set here, the plaintext path should work, but verify this doesn't cause health-check warnings if the script expects v4 in keychain-enabled environments.

[approve_code_changes, request_verification]

#!/bin/bash
# check if enterprise-health-check.js validates storage version or storage mode
rg -n "version.*[34]|SECRET_STORAGE_MODE" scripts/enterprise-health-check.js | head -20

56-59: npm publish auth is now correctly configured.

NODE_AUTH_TOKEN at line 59 addresses the earlier critical issue. ensure the NPM_TOKEN secret is configured in repository settings with publish permissions.

scripts/generate-sbom.js (1)

1-45: past feedback addressed: maxBuffer and windows npm.cmd handling.

MAX_BUFFER_BYTES at line 8 (20mb) and its usage at lines 20, 26 address the buffer overflow concern. the npm.cmd fallback at line 22 handles windows path resolution. the test at test/generate-sbom.test.ts covers the large-buffer scenario.

package.json (2)

134-136: keytar as optionalDependency is correct for graceful degradation.

this allows keychain mode to work when keytar is available while falling back to plaintext on systems without native keychain support.

[approve_code_changes, request_verification]

#!/bin/bash
# verify keytar 7.9.0 exists on npm
curl -s "https://registry.npmjs.org/keytar" | jq '.versions | keys | map(select(startswith("7.9")))' | head -10

132-133: zod 4.3.6 is compatible with the codebase.

lib/schemas.ts uses modern zod 4.x patterns throughout: z.discriminatedUnion() at line 210 and 264, safeParse() for all validations, and proper error handling via result.error.issues. comprehensive test coverage validates schema behavior across all edge cases. no v3-specific patterns detected and no compatibility issues exist.

			> Likely an incorrect or invalid review comment.
test/generate-sbom.test.ts (2)

11-26: removeWithRetry utility handles windows filesystem behavior correctly.

retry logic with exponential backoff at line 23 (25 * 2 ** attempt) and retryable codes (EBUSY, EPERM, ENOTEMPTY) at line 12 match expected windows transient errors. ENOENT early return at line 19 is correct.


58-84: good regression coverage for maxBuffer scenario.

the fake npm script outputs ~1.5mb (line 35: "x".repeat(1_500_000)), and assertions at lines 82-83 verify the output is >1mb and valid json. this exercises the MAX_BUFFER_BYTES = 20 * 1024 * 1024 path in scripts/generate-sbom.js:8.

consider adding a negative test case where npm sbom fails (exit non-zero) to verify the script's error handling at scripts/generate-sbom.js:42-44.

[approve_code_changes, suggest_optional_refactor]

lib/schemas.ts (2)

140-172: v4 schema with keychain refs looks correct.

refreshTokenRef: z.string().min(1) at line 148 and accessTokenRef: z.string().min(1).optional() at line 149 correctly enforce non-empty refs when present (addresses past feedback).

verify test/schemas.test.ts includes validation cases for:

  • v4 with valid refs
  • v4 with empty refreshTokenRef (should reject)
  • v4 with empty accessTokenRef (should reject when provided)

[approve_code_changes, request_verification]

#!/bin/bash
# check for v4 schema tests
rg -n "V4|refreshTokenRef|accessTokenRef" test/schemas.test.ts | head -30

207-214: AnyAccountStorageSchema union updated for v4.

discriminated union at line 210-214 now includes V1, V3, V4. migration logic in lib/storage/migrations.ts should handle v3→v4 conversion.

test/storage-v4-keychain.test.ts (6)

27-45: keytar mock correctly isolates tests from real keychain.

the mock at lines 31-44 uses an in-memory Map<string, string> and exposes setPassword, getPassword, deletePassword. this avoids touching real keychain while testing v4 storage semantics. as per coding guidelines, tests reject mocking real secrets.


112-165: concurrent saveAccounts test covers serialization.

Promise.all at lines 116-147 exercises the save queue in lib/storage.ts. assertion at lines 162-164 confirms the final loaded state is one of the two valid outcomes. this addresses the past feedback about concurrency regression coverage.

one gap: the test verifies no torn json but doesn't assert the queue serialized (i.e., both saves completed, not interleaved). consider adding a comment clarifying expected behavior: last writer wins.


167-204: EPERM rename retry test covers windows behavior.

mockImplementationOnce at lines 172-176 throws EPERM on first attempt, then falls through to real rename. assertion at line 201 (renameCallCount >= 2) verifies retry occurred. this addresses past feedback about windows rename retry regression.

consider also testing EBUSY (common on windows when file handles are held), since lib/storage.ts likely handles both. if EBUSY is already covered by the retry logic, a separate test case would strengthen coverage.

[approve_code_changes, suggest_optional_refactor]


206-239: rollback test verifies keychain cleanup on save failure.

EACCES at line 213 causes rename to fail permanently (not retryable). assertion at line 238 confirms secrets map is empty after failure. this ensures no orphaned keychain entries when persistence fails.


241-288: hydration test verifies activeIndex adjustment when secrets are missing.

the test sets up v4 storage with two accounts but only seeds keychain secrets for acct_2. assertions at lines 284-287 confirm:

  • only one account remains after hydration
  • activeIndex adjusted from 1 to 0
  • activeIndexByFamily.codex adjusted from 1 to 0

this is important for graceful degradation when keychain entries are cleared externally.


290-326: WAL cleanup test verifies keychain refs cleared on mode switch.

the test writes a WAL file with v4 refs at lines 298-319, switches to plaintext mode at line 321, then calls clearAccounts(). assertions at lines 323-325 confirm:

  • keychain secrets deleted
  • WAL file removed

this ensures no secret leakage when transitioning storage modes.

test/audit-log-forwarder.test.ts (4)

57-77: withServer helper properly manages server lifecycle.

the try/finally at lines 72-76 ensures server cleanup even on test failure. binding to 127.0.0.1 at line 64 avoids network prompts on some OSes.


90-139: 429 retry test verifies transient error handling.

the server responds with 429 on first request (lines 105-108), then 200 on retry. assertions verify:

  • exit status 0 (line 126)
  • 2 messages sent (line 129)
  • 2 requests made (line 130)
  • checkpoint written with correct line number (lines 137-138)

this exercises the retry path in scripts/audit-log-forwarder.js.


208-253: checkpoint lock test has a potential race condition.

the releaseTimer at lines 226-228 fires after 120ms, but test timeout behavior depends on CODEX_AUDIT_FORWARDER_TIMEOUT_MS: "300". if the forwarder's lock-wait logic has a shorter timeout or doesn't poll frequently, the test may flake.

consider either:

  • adding a comment explaining expected timing
  • using a more deterministic signal (e.g., file watch or polling confirmation)

this is minor since the test does pass with current implementation.

[approve_code_changes, suggest_optional_refactor]


175-206: timeout test verifies non-blocking behavior on hanging endpoint.

the server never responds (line 188), forwarder times out after 50ms × 2 attempts. assertions verify:

  • exit status 1 (line 201)
  • error message present (line 202)
  • no checkpoint created (line 205)

this ensures the forwarder doesn't block indefinitely on unresponsive endpoints.

Resolve remaining PR43 threads across enterprise workflows, audit/lock reliability, and deterministic regression coverage.\n\nHighlights:\n- Extract CI health fixture seeding into a dedicated script and capture stderr in recovery drill artifacts.\n- Clarify ops docs/runbook text and retry semantics, and fix README table rendering for secret-storage mode.\n- Add retry backoff in audit fs operations, strengthen audit error redaction, and remove duplicated token-store sleep helper.\n- Harden checkpoint lock handling with stale-lock cleanup, EPERM contention handling, and bounded timeout errors.\n- Harden health-check execution in slo-budget-report with timeout/kill options, script override support, and fallback error messaging.\n- Convert secret scan regression coverage to deterministic Vitest flow; move shell harness out of test/** and sanitize fixture tokens.\n- Improve Windows and concurrency regression tests across compliance, retention, audit, schemas, and token-store suites.\n\nValidation:\n- npm run typecheck\n- npm run lint\n- npm run test -- test/audit-log-forwarder.test.ts test/audit-retry.test.ts test/audit.test.ts test/compliance-evidence-bundle.test.ts test/enterprise-health-check.test.ts test/retention-cleanup.test.ts test/schemas.test.ts test/slo-budget-report.test.ts test/token-store.test.ts test/security/secret-scan-regression.test.ts\n\nCo-authored-by: Codex <noreply@openai.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

♻️ Duplicate comments (1)
test/token-store.test.ts (1)

245-272: ⚠️ Potential issue | 🟠 Major

add a deterministic retry regression for transient delete failures.

lib/secrets/token-store.ts:70-93 retries delete on EBUSY/EPERM/EAGAIN/429, but test/token-store.test.ts:245-272 only covers non-retryable EACCES. add a vitest case that throws EBUSY (or status=429) for the first refresh-delete attempts, then succeeds, and assert retry count plus access-delete attempt.

proposed test addition
 it("attempts access-token cleanup even when refresh-token cleanup fails", async () => {
   const deletedRefs: string[] = [];
   mockKeytar({
@@
   expect(deletedRefs).toEqual(["acct-partial:refresh", "acct-partial:access"]);
 });
+
+it("retries transient refresh delete failures and still processes access cleanup", async () => {
+  const deletedRefs: string[] = [];
+  let refreshAttempts = 0;
+  mockKeytar({
+    setPassword: async () => {},
+    getPassword: async () => null,
+    deletePassword: async (_service: string, account: string) => {
+      if (account === "acct-retry:refresh") {
+        refreshAttempts += 1;
+        if (refreshAttempts < 3) {
+          const error = new Error("resource busy") as NodeJS.ErrnoException;
+          error.code = "EBUSY";
+          throw error;
+        }
+      }
+      deletedRefs.push(account);
+      return true;
+    },
+  });
+
+  process.env.CODEX_SECRET_STORAGE_MODE = "keychain";
+  const tokenStore = await import("../lib/secrets/token-store.js");
+  await tokenStore.ensureSecretStorageBackendAvailable();
+
+  await tokenStore.deleteAccountSecrets({
+    refreshTokenRef: "acct-retry:refresh",
+    accessTokenRef: "acct-retry:access",
+  });
+
+  expect(refreshAttempts).toBe(3);
+  expect(deletedRefs).toEqual(expect.arrayContaining(["acct-retry:refresh", "acct-retry:access"]));
+});

As per coding guidelines, test/**: tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/token-store.test.ts` around lines 245 - 272, Add a deterministic vitest
regression that exercises the retry logic in lib/secrets/token-store.ts (the
block around deleteAccountSecrets and the retry loop at lines ~70-93): mock
keytar.deletePassword to throw a transient error (e.g., code "EBUSY" or an
object with status 429) on the first refresh-token delete attempt and succeed on
subsequent calls, then call deleteAccountSecrets with both refreshTokenRef and
accessTokenRef and assert that the refresh delete was retried (expected retry
count) and that the access-token delete was still attempted and recorded; ensure
the test is deterministic (use vitest timers or counters) and placed alongside
existing token-store.test.ts cases.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/recovery-drill.yml:
- Around line 39-45: The artifact upload step ("Upload recovery drill artifacts"
using actions/upload-artifact@v4) only runs on success; modify that step to
always run by adding an execution condition such as if: ${{ always() }} so the
two diagnostic files (.tmp/recovery-drill-vitest.json and
.tmp/recovery-drill-health.json) are uploaded even when the drill fails.

In `@docs/operations/incident-drill-template.md`:
- Around line 38-42: Add a short PowerShell example and note alongside the
existing bash-fenced evidence block showing the equivalent commands (npm run
ops:health-check, codex auth report --live --json, codex auth doctor --json) so
Windows responders have an explicit example; include a powershell-fenced code
block labeled "powershell" with those three commands and a one-line sentence
noting "Run these in an elevated PowerShell session" (or similar) to prevent
improvisation during incidents and keep docs consistent with CLI
flags/workflows.

In `@docs/upgrade.md`:
- Around line 69-94: The markdown list in docs/upgrade.md has inconsistent
ordered-list formatting and unindented fenced code blocks; fix by normalizing
the numbering style for the four related steps ("Validate keychain backend
availability", "Set `CODEX_SECRET_STORAGE_MODE=keychain`", "Trigger a controlled
account rewrite", "Verify health and storage state") and indenting each fenced
```bash``` block by at least three spaces so they are nested under their
respective list items (e.g., ensure the npm run ops:keychain-assert, codex auth
check/report, and npm run ops:health-check blocks are indented under their list
entries).

In `@lib/audit.ts`:
- Around line 120-131: withRetryableAuditFsOperation currently uses Atomics.wait
(retrySleepSignal and RETRYABLE_AUDIT_FS_BASE_DELAY_MS) which blocks the main
thread; remove the blocking Atomics.wait calls inside
withRetryableAuditFsOperation (and any uses in logAuditEntry and
purgeOldLogFiles) and replace with non-blocking retry semantics: for synchronous
call sites keep bounded immediate retries (loop without sleeping) and for async
contexts refactor to async variants that await a non-blocking delay
(setTimeout/promisified) before retrying; update isRetryableAuditFsError usage
to preserve retry/throw logic and remove the atomicsWaitSpy assertion from
test/audit-retry.test.ts; also apply the same replacement for Atomics.wait
occurrences in unified-settings (lib/unified-settings.ts), logger
(lib/logger.ts), and auto-update-checker (lib/auto-update-checker.ts).

In `@lib/codex-manager.ts`:
- Around line 4044-4099: Add vitest regression tests for
auditActionForCommand/sanitizeAuditError/runWithAudit: mock auditLog and write
deterministic tests that (1) call runWithAudit for representative commands
("login","switch","check","verify-flagged","forecast" and an unknown/default)
and assert auditLog is invoked with the correct AuditAction (via
auditActionForCommand) and AuditOutcome.SUCCESS when runner resolves 0 and
AuditOutcome.FAILURE when runner resolves nonzero; (2) simulate runner throwing
errors and assert auditLog was called with error metadata produced by
sanitizeAuditError — include inputs containing sk- keys,
refresh/access_token_..., secret-access/refresh-token, email addresses and long
strings to verify redaction patterns and the 200-char slice; and (3) include
thrown errors containing "EBUSY" and HTTP "429" text to confirm
redaction/handling; use spies/mocks on auditLog and deterministic error objects
to make the tests repeatable.

In `@scripts/audit-log-forwarder.js`:
- Around line 92-95: The discovery and final-mtime reporting still have
exists/stat races (see discoverAuditFiles and the final mtime stat block around
lines 404-406); update discoverAuditFiles to not rely on existsSync — use
readdir withFileTypes and for each Dirent perform a safe stat inside a try/catch
(skip entries that throw ENOENT), and change the final mtime reporting code to
wrap fs.stat/fs.promises.stat in a try/catch that treats
missing/replace-on-rotate files as non-fatal (log and continue, do not
process.exit(1)). Add a deterministic vitest regression in
test/audit-log-forwarder.test.ts that simulates deletion/replacement between
discovery and stat (e.g., create file list, remove/replace a file before stat)
and asserts the script/process exits 0 and advances checkpoints; keep the test
deterministic and avoid mocking secrets.

In `@scripts/secret-scan-regression.sh`:
- Line 89: The regex used to normalize Windows path separators in the fallback
scanner is wrong: change the replace call that builds rel (currently using
path.relative(sourceDir, full).replace(/\\\\/g, "/")) to use a pattern that
matches single backslashes (use /\\/g) so single "\" characters are replaced
with "/", ensuring Windows paths are normalized; update the replace invocation
on the rel variable accordingly and run the secret scan tests (and add a
Windows-path test case for the fallback scanner) to confirm behavior.

In `@scripts/seed-health-fixture.js`:
- Around line 10-48: The fixture writes and creates sensitive files/directories
with default permissions; update the mkdirSync calls to set a restrictive
directory mode (e.g., mode: 0o700) and update each writeFileSync call to pass an
options object that sets a secure file mode (e.g., mode: 0o600) along with
encoding so files like openai-codex-accounts.json, settings.json and audit.log
are created with explicit, non-world-readable permissions; locate usages of
mkdirSync, writeFileSync, logsDir and root to apply these option changes
consistently.

In `@scripts/slo-budget-report.js`:
- Around line 31-34: The loadAuditEntries function can race between existsSync
and readdir causing a hard failure when the directory vanishes or is rotated;
update loadAuditEntries to wrap the readdir call in a try/catch that treats
ENOENT and ENOTDIR (and other transient fs errors) as "no files" and returns []
instead of throwing, preserving current behavior for other unexpected errors;
then add a deterministic vitest regression in test/slo-budget-report.test.ts
that simulates logDir churn by mocking fs.readdir (or readdirSync wrapper used
by loadAuditEntries) to throw ENOENT and ENOTDIR in separate cases and assert
the overall script still produces a valid payload (no crash) — reference the
loadAuditEntries function and the test file name when locating code to change.

In `@test/audit-log-forwarder.test.ts`:
- Around line 255-323: Add a deterministic Vitest regression that forces the
"newest log disappears between collection and final stat" race by creating a log
file, ensuring the forwarder will discover it, then deleting that logfile before
the forwarder calls fs.stat (so the stat will raise ENOENT/ENOTDIR) and
asserting the process exits successfully with newestLogMtimeMs: null; implement
this as a new it(...) alongside the existing tests using the same runForwarder
and withServer helpers, control timing via small env waits (e.g.,
CODEX_AUDIT_FORWARDER_STALE_LOCK_MS/MAX_WAIT_MS) or explicit sync deletion right
after starting the forwarder, and verify result.status === 0 and result.stdout
or parsed output contains newestLogMtimeMs: null to ensure the forwarder
swallows the ENOENT and continues.

In `@test/audit.test.ts`:
- Around line 271-300: The Date.now spy created via vi.spyOn(Date,
"now").mockReturnValue(...) (assigned to nowSpy) is only restored inside a late
finally block and can leak if earlier setup throws; move the
nowSpy.mockRestore() into a finally that encloses the entire test body (or add
an outer try/finally that wraps all setup, configureAudit(...), file ops,
auditLog(...) calls) so nowSpy is always restored, referencing the nowSpy
variable and the vi.spyOn(Date, "now") call in this test (also ensure auditLog
and configureAudit calls remain inside that try block).

---

Duplicate comments:
In `@test/token-store.test.ts`:
- Around line 245-272: Add a deterministic vitest regression that exercises the
retry logic in lib/secrets/token-store.ts (the block around deleteAccountSecrets
and the retry loop at lines ~70-93): mock keytar.deletePassword to throw a
transient error (e.g., code "EBUSY" or an object with status 429) on the first
refresh-token delete attempt and succeed on subsequent calls, then call
deleteAccountSecrets with both refreshTokenRef and accessTokenRef and assert
that the refresh delete was retried (expected retry count) and that the
access-token delete was still attempted and recorded; ensure the test is
deterministic (use vitest timers or counters) and placed alongside existing
token-store.test.ts cases.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: fbdeb2d0-5835-4055-b62f-367835666b58

📥 Commits

Reviewing files that changed from the base of the PR and between fdfdb2c and 02f74c8.

📒 Files selected for processing (25)
  • .github/workflows/ci.yml
  • .github/workflows/recovery-drill.yml
  • README.md
  • docs/operations/audit-forwarding.md
  • docs/operations/incident-drill-template.md
  • docs/upgrade.md
  • lib/audit.ts
  • lib/codex-manager.ts
  • lib/secrets/token-store.ts
  • scripts/audit-log-forwarder.js
  • scripts/enterprise-health-check.js
  • scripts/secret-scan-regression.sh
  • scripts/seed-health-fixture.js
  • scripts/slo-budget-report.js
  • test/audit-log-forwarder.test.ts
  • test/audit-retry.test.ts
  • test/audit.test.ts
  • test/compliance-evidence-bundle.test.ts
  • test/enterprise-health-check.test.ts
  • test/helpers/remove-with-retry.ts
  • test/retention-cleanup.test.ts
  • test/schemas.test.ts
  • test/security/secret-scan-regression.test.ts
  • test/slo-budget-report.test.ts
  • test/token-store.test.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (3)
lib/**

⚙️ CodeRabbit configuration file

focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.

Files:

  • lib/codex-manager.ts
  • lib/secrets/token-store.ts
  • lib/audit.ts
test/**

⚙️ CodeRabbit configuration file

tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.

Files:

  • test/audit-retry.test.ts
  • test/slo-budget-report.test.ts
  • test/audit.test.ts
  • test/schemas.test.ts
  • test/enterprise-health-check.test.ts
  • test/helpers/remove-with-retry.ts
  • test/audit-log-forwarder.test.ts
  • test/security/secret-scan-regression.test.ts
  • test/retention-cleanup.test.ts
  • test/compliance-evidence-bundle.test.ts
  • test/token-store.test.ts
docs/**

⚙️ CodeRabbit configuration file

keep README, SECURITY, and docs consistent with actual CLI flags and workflows. whenever behavior changes, require updated upgrade notes and mention new npm scripts.

Files:

  • docs/operations/audit-forwarding.md
  • docs/upgrade.md
  • docs/operations/incident-drill-template.md
🧬 Code graph analysis (13)
test/audit-retry.test.ts (2)
lib/logger.ts (1)
  • error (389-393)
scripts/audit-dev-allowlist.js (1)
  • audit (150-161)
test/audit.test.ts (1)
lib/audit.ts (3)
  • auditLog (193-233)
  • getAuditLogPath (235-237)
  • configureAudit (76-79)
test/schemas.test.ts (1)
lib/schemas.ts (3)
  • AccountStorageV4Schema (165-170)
  • AnyAccountStorageSchema (210-214)
  • safeParseAccountStorage (309-315)
test/enterprise-health-check.test.ts (2)
test/helpers/remove-with-retry.ts (1)
  • removeWithRetry (3-18)
scripts/enterprise-health-check.js (4)
  • root (213-213)
  • payload (279-286)
  • findings (214-214)
  • checks (215-215)
test/helpers/remove-with-retry.ts (3)
scripts/audit-log-forwarder.js (3)
  • attempt (205-205)
  • attempt (292-292)
  • code (306-306)
lib/logger.ts (1)
  • error (389-393)
scripts/enterprise-health-check.js (1)
  • code (164-164)
scripts/slo-budget-report.js (2)
scripts/audit-log-forwarder.js (5)
  • logDir (347-347)
  • entries (94-94)
  • entries (153-153)
  • root (346-346)
  • payload (375-381)
scripts/enterprise-health-check.js (4)
  • entries (97-97)
  • entries (160-160)
  • root (213-213)
  • payload (279-286)
test/security/secret-scan-regression.test.ts (1)
test/helpers/remove-with-retry.ts (1)
  • removeWithRetry (3-18)
scripts/audit-log-forwarder.js (1)
lib/utils.ts (1)
  • sleep (65-67)
lib/secrets/token-store.ts (2)
lib/logger.ts (2)
  • createLogger (366-424)
  • error (389-393)
lib/utils.ts (1)
  • sleep (65-67)
scripts/enterprise-health-check.js (1)
lib/runtime-paths.ts (1)
  • getCodexHomeDir (43-46)
test/retention-cleanup.test.ts (2)
scripts/retention-cleanup.js (2)
  • root (81-81)
  • payload (108-117)
test/helpers/remove-with-retry.ts (1)
  • removeWithRetry (3-18)
test/compliance-evidence-bundle.test.ts (1)
scripts/compliance-evidence-bundle.js (3)
  • outDir (141-141)
  • result (152-152)
  • payload (157-168)
lib/audit.ts (1)
lib/runtime-paths.ts (1)
  • getCodexLogDir (226-228)
🪛 markdownlint-cli2 (0.21.0)
docs/upgrade.md

[warning] 75-75: Ordered list item prefix
Expected: 1; Actual: 2; Style: 1/1/1

(MD029, ol-prefix)


[warning] 81-81: Ordered list item prefix
Expected: 1; Actual: 3; Style: 1/2/3

(MD029, ol-prefix)


[warning] 83-83: Ordered list item prefix
Expected: 2; Actual: 4; Style: 1/2/3

(MD029, ol-prefix)


[warning] 90-90: Ordered list item prefix
Expected: 1; Actual: 5; Style: 1/1/1

(MD029, ol-prefix)

🔇 Additional comments (24)
README.md (1)

189-189: table row is now clear and render-safe.

line 189 now avoids pipe-delimiter breakage and keeps enterprise guidance explicit for keychain pinning. looks good. see lib/README.md:189.

docs/operations/incident-drill-template.md (1)

63-63: exit criteria wording is now precise.

line 63 clearly points operators to the json status field and expected value. good fix. see lib/docs/operations/incident-drill-template.md:63.

test/compliance-evidence-bundle.test.ts (3)

53-57: deterministic launcher setup is correct.

both fake npm shims invoke process.execPath, which removes path drift across runners. test/compliance-evidence-bundle.test.ts:53-57.

As per coding guidelines, test/**: tests must stay deterministic and use vitest.


78-113: maxbuffer regression coverage is strong.

this test checks process success, json payload status, manifest integrity, and a >1mb log artifact in one flow. test/compliance-evidence-bundle.test.ts:78-113.

As per coding guidelines, test/**: tests must stay deterministic and use vitest.


115-139: windows npm.cmd path is explicitly validated.

good explicit windows regression: wrapper marker confirms the cmd path is used, not just implicitly covered. test/compliance-evidence-bundle.test.ts:115-139.

As per coding guidelines, test/**: tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior.

docs/operations/audit-forwarding.md (1)

23-41: looks good.

the command examples and failure/retry semantics are clear and actionable in docs/operations/audit-forwarding.md:23 and docs/operations/audit-forwarding.md:61.

Also applies to: 61-70

test/helpers/remove-with-retry.ts (1)

3-17: good extraction and retry behavior.

the helper in test/helpers/remove-with-retry.ts:3 is bounded, deterministic, and handles expected transient windows fs errors.

test/retention-cleanup.test.ts (1)

49-71: nice deterministic windows gating.

test/retention-cleanup.test.ts:49 correctly uses it.skipIf(process.platform === "win32") and keeps real assertions in the active path.

test/audit.test.ts (1)

211-226: good windows + posix regression coverage for secure audit log mode.

Line 212-Line 216 in test/audit.test.ts:212 now validates the windows path with a real content assertion, and Line 225 in test/audit.test.ts:225 enforces 0600 on posix. this is the right cross-platform guard for lib/audit.ts permission hardening.

As per coding guidelines test/**: tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.

scripts/enterprise-health-check.js (1)

158-169: good churn handling and secure-mode error behavior.

Line 161-Line 167 in scripts/enterprise-health-check.js:161 correctly treats ENOENT/ENOTDIR as no-audit-data, and Line 200 in scripts/enterprise-health-check.js:200 avoids noisy findings for missing files during permission checks. this keeps health checks stable under filesystem churn.

Also applies to: 186-207

test/slo-budget-report.test.ts (1)

83-140: good deterministic coverage for timeout and windows override behavior.

Line 90-Line 109 in test/slo-budget-report.test.ts:90 validates timeout/kill options, and Line 129-Line 139 in test/slo-budget-report.test.ts:129 covers windows-style health script paths without skipping assertions.

As per coding guidelines test/**: tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.

test/enterprise-health-check.test.ts (1)

59-161: strong fallback and audit-dir churn regression coverage.

Line 89-Line 95 in test/enterprise-health-check.test.ts:89 keeps path assertions windows-safe, and Line 141-Line 161 in test/enterprise-health-check.test.ts:141 locks in the ENOTDIR churn behavior expected by scripts/enterprise-health-check.js.

As per coding guidelines test/**: tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.

scripts/audit-log-forwarder.js (1)

289-327: checkpoint lock acquisition and stale-lock handling are solid.

withCheckpointLock in scripts/audit-log-forwarder.js:289 now has pid ownership, stale lock cleanup, and bounded wait semantics. this is a strong improvement for multi-process safety.

.github/workflows/ci.yml (1)

16-72: ci hardening updates look good.

the matrix in .github/workflows/ci.yml:16 now exercises windows + linux, and Line 66-Line 69 runs enterprise health in required-artifact mode against the seeded fixture. this meaningfully reduces vacuous passes.

test/security/secret-scan-regression.test.ts (2)

12-34: solid deterministic scanner helper for regression fixtures.

collectSyntheticFindings is clear and stable, and path normalization in test/security/secret-scan-regression.test.ts:26 keeps findings consistent across os path separators.


47-81: good regression coverage for allowlist vs non-allowlisted findings.

the assertions at test/security/secret-scan-regression.test.ts:75-80 are specific and enforce both fail-case and pass-case behavior without skipped checks.

scripts/secret-scan-regression.sh (1)

116-151: explicit fail-case and pass-case status checks are well handled.

the guarded set +e / status capture flow in scripts/secret-scan-regression.sh:116-151 gives clear diagnostics and deterministic exits.

test/schemas.test.ts (2)

241-281: v4 schema coverage is solid here.

test/schemas.test.ts:241-281 now validates valid v4 payloads, missing refreshTokenRef, empty accessTokenRef, and optional activeIndexByFamily handling with deterministic assertions.


331-341: v4 union + helper wiring is now covered end-to-end.

test/schemas.test.ts:331-341 and test/schemas.test.ts:486-494 correctly pin both discriminated-union parsing and safeParseAccountStorage helper behavior for version 4.

Also applies to: 486-494

lib/secrets/token-store.ts (3)

52-58: keytar interop normalization looks correct.

lib/secrets/token-store.ts:52-58 properly normalizes cjs/esm shapes and validates required keytar methods before enabling keychain operations.


142-157: partial-write rollback is implemented correctly.

lib/secrets/token-store.ts:142-157 rolls back the refresh secret if access secret persistence fails, then rethrows the original failure.


199-214: independent delete attempts are correctly preserved.

lib/secrets/token-store.ts:199-214 attempts refresh and access deletions independently and only throws after both attempts, which avoids access-secret orphaning.

test/token-store.test.ts (2)

125-160: the concurrency race setup is now deterministic.

test/token-store.test.ts:125-160 uses firstRefreshEntered + refreshGate correctly, so the interleaving is explicit and no longer microtask-flush dependent.


179-243: rollback and full cleanup regressions are well covered.

test/token-store.test.ts:179-243 deterministically verifies refresh rollback on access-write failure and validates dual-ref delete cleanup behavior.

Address remaining PR43 unresolved threads with targeted hardening and regressions across workflows, docs, audit/runtime scripts, and CLI audit coverage.

Co-authored-by: Codex <noreply@openai.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

♻️ Duplicate comments (3)
docs/operations/incident-drill-template.md (1)

44-49: ⚠️ Potential issue | 🟡 Minor

add an explicit windows execution note next to the powershell block.

the powershell commands are good, but add one short line like “run in an elevated powershell session” to prevent responder improvisation during incidents. see lib/docs/operations/incident-drill-template.md:44-49.

As per coding guidelines, "docs/**: keep README, SECURITY, and docs consistent with actual CLI flags and workflows. whenever behavior changes, require updated upgrade notes and mention new npm scripts."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/operations/incident-drill-template.md` around lines 44 - 49, Add a short
explicit Windows execution note immediately adjacent to the PowerShell code
block containing "npm run ops:health-check" and the "codex auth ..."
commands—e.g., a single-line instruction such as "Run in an elevated PowerShell
session (Run as Administrator)"—so responders know to execute those commands in
an elevated PowerShell environment; update the nearby PowerShell fence in the
incident-drill template to include that one-line note.
scripts/slo-budget-report.js (1)

31-42: 🧹 Nitpick | 🔵 Trivial

redundant existsSync can be removed.

scripts/slo-budget-report.js:32 checks existsSync(logDir) but lines 36-42 already catch ENOENT/ENOTDIR from readdir. the existsSync is redundant and introduces a minor toctou window. consider removing line 32.

proposed simplification
 async function loadAuditEntries(logDir, cutoffMs) {
-	if (!existsSync(logDir)) return [];
 	let entries;
 	try {
 		entries = await readdir(logDir, { withFileTypes: true });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/slo-budget-report.js` around lines 31 - 42, Remove the redundant
existsSync check in loadAuditEntries and rely on the try/catch around readdir to
handle missing or non-directory paths; specifically, delete the preliminary if
(!existsSync(logDir)) return []; in function loadAuditEntries and keep the
existing try { entries = await readdir(logDir, { withFileTypes: true }); } catch
(error) { const code = error?.code; if (code === "ENOENT" || code === "ENOTDIR")
return []; throw error; } so you avoid the TOCTOU window and still handle
ENOENT/ENOTDIR from readdir.
scripts/seed-health-fixture.js (1)

10-10: ⚠️ Potential issue | 🟠 Major

harden pre-existing directory modes to prevent permission drift.

line 10 sets mode only on creation. if root or logsDir already exists with broader perms, this call does not tighten them. add best-effort chmodSync for both directories after mkdirSync. evidence: scripts/seed-health-fixture.js:7-10.

proposed fix
 mkdirSync(logsDir, { recursive: true, mode: 0o700 });
+try {
+	chmodSync(root, 0o700);
+} catch {
+	// best-effort hardening for non-posix environments.
+}
+try {
+	chmodSync(logsDir, 0o700);
+} catch {
+	// best-effort hardening for non-posix environments.
+}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/seed-health-fixture.js` at line 10, The mkdirSync call using mode
0o700 only sets permissions when creating logsDir; to harden pre-existing
directories, follow up the existing mkdirSync(logsDir, { recursive: true, mode:
0o700 }) with best-effort chmodSync calls to tighten permissions on both logsDir
and its parent/root directory (use the same 0o700 or appropriate restrictive
mode), wrapping each chmodSync in a try/catch to ignore non-fatal errors;
reference the logsDir identifier and the existing mkdirSync invocation when
locating where to add the chmodSync calls.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/recovery-drill.yml:
- Around line 1-46: The scheduled "Recovery Drill" workflow's job
"recovery-drill" lacks a failure notification; add a final step (after "Upload
recovery drill artifacts") that runs only on failure (use GitHub Actions
condition if: failure()) and sends a notification (Slack webhook, email, or
other alert) with context and links to the artifact files
(.tmp/recovery-drill-vitest.json and .tmp/recovery-drill-health.json); reference
the job name "recovery-drill" and the artifacts uploaded by "Upload recovery
drill artifacts" when implementing the notifier so failures trigger an immediate
alert to the on-call channel or distribution list.
- Around line 11-29: Add a job-level timeout to the recovery-drill GitHub
Actions job to prevent runaway runs: update the job block named "recovery-drill"
to include a timeout-minutes value (e.g., 30) so the job will be aborted if it
exceeds that time. Locate the "recovery-drill" job definition and add the
timeout-minutes field alongside runs-on and steps to enforce a maximum runtime
for the monthly drill.

In `@docs/upgrade.md`:
- Around line 75-79: The documentation uses npm script examples (e.g., npm run
ops:keychain-assert and npm run ops:health-check) but doesn't state they must be
executed from the project repository checkout where package.json defines these
scripts; update the upgrade notes to add a short clarifying sentence after each
npm command (lines referencing ops:keychain-assert and ops:health-check) stating
"Run this from the project repository root where package.json defines the
enterprise ops scripts, or run your CI/job wrapper that exposes these scripts,"
and optionally provide the equivalent standalone commands or CI wrapper note to
avoid missing-script errors for operators.

In `@scripts/secret-scan-regression.sh`:
- Around line 30-32: The test fixture written by
scripts/secret-scan-regression.sh uses a non-secret literal
("fake_refresh_token_12345") so the scanner assertion passes trivially; update
the here-doc content written to
"${FAIL_CASE_DIR}/test/security/fixtures/fixture.txt" to use a secret-like value
that matches the secret regex used in tests (e.g., a long base64/hex-looking
token or pattern-consistent string) so it mirrors the change in
test/security/secret-scan-regression.test.ts and ensures the assertion exercises
the scanner rather than succeeding trivially.

In `@test/audit-log-forwarder.test.ts`:
- Around line 12-27: The test defines a duplicate removeWithRetry helper; remove
the local async function removeWithRetry from test/audit-log-forwarder.test.ts
and replace it with an import from the shared helper (exported from
test/helpers/remove-with-retry.ts). Locate the local function named
removeWithRetry, delete its declaration, add an import for removeWithRetry from
the shared helper, and ensure any TypeScript types/usage in
audit-log-forwarder.test.ts match the shared helper's signature.

In `@test/security/secret-scan-regression.test.ts`:
- Around line 47-52: The inline regex sanity checks in the test "keeps fixture
allowlist behavior and flags only non-allowlisted secrets" should be extracted
into a separate test to improve isolation and diagnostics: create a new it block
named something like it("fixture allowlist regex handles windows paths") that
contains the two assertions using the regex
/^test[\\/]security[\\/]fixtures[\\/]/i.test("test\\security\\fixtures\\fixture.txt")
and the path normalization assertion
"test\\security\\fixtures\\fixture.txt".replace(/\\/g, "/") ===
"test/security/fixtures/fixture.txt", leaving the original test to only assert
gitleaksConfig contains the allowlist; reference the existing test name and
these regex/assertion expressions when locating and moving the checks.
- Around line 67-78: The allowlist test currently writes non-matching content to
"fixture.txt" so the detector is never triggered; update the writeFile call that
creates fixture.txt in the failCase (created via failCase variable) to include a
secret-like string that matches the secretPattern (e.g. an OPENAI_API_KEY value
such as "OPENAI_API_KEY=sk-...") so the detector would flag it without the
allowlist; keep the other assertions and use collectSyntheticFindings to verify
that the allowlist path actually excludes the file by ensuring the failFindings
includes "test/security/fixtures/real-secret.txt" but does not include
"test/security/fixtures/fixture.txt".

In `@test/slo-budget-report.test.ts`:
- Around line 11-26: The test defines a duplicate removeWithRetry function;
instead delete the local async function removeWithRetry in
test/slo-budget-report.test.ts and import the shared helper from
test/helpers/remove-with-retry.ts (use the exported symbol removeWithRetry) so
all retries use the central implementation; ensure the import matches the
helper's exported name and types and leave existing calls in the test unchanged.

---

Duplicate comments:
In `@docs/operations/incident-drill-template.md`:
- Around line 44-49: Add a short explicit Windows execution note immediately
adjacent to the PowerShell code block containing "npm run ops:health-check" and
the "codex auth ..." commands—e.g., a single-line instruction such as "Run in an
elevated PowerShell session (Run as Administrator)"—so responders know to
execute those commands in an elevated PowerShell environment; update the nearby
PowerShell fence in the incident-drill template to include that one-line note.

In `@scripts/seed-health-fixture.js`:
- Line 10: The mkdirSync call using mode 0o700 only sets permissions when
creating logsDir; to harden pre-existing directories, follow up the existing
mkdirSync(logsDir, { recursive: true, mode: 0o700 }) with best-effort chmodSync
calls to tighten permissions on both logsDir and its parent/root directory (use
the same 0o700 or appropriate restrictive mode), wrapping each chmodSync in a
try/catch to ignore non-fatal errors; reference the logsDir identifier and the
existing mkdirSync invocation when locating where to add the chmodSync calls.

In `@scripts/slo-budget-report.js`:
- Around line 31-42: Remove the redundant existsSync check in loadAuditEntries
and rely on the try/catch around readdir to handle missing or non-directory
paths; specifically, delete the preliminary if (!existsSync(logDir)) return [];
in function loadAuditEntries and keep the existing try { entries = await
readdir(logDir, { withFileTypes: true }); } catch (error) { const code =
error?.code; if (code === "ENOENT" || code === "ENOTDIR") return []; throw
error; } so you avoid the TOCTOU window and still handle ENOENT/ENOTDIR from
readdir.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6f068912-b8ae-4c53-a3bd-56349f881730

📥 Commits

Reviewing files that changed from the base of the PR and between 02f74c8 and e59c731.

📒 Files selected for processing (14)
  • .github/workflows/recovery-drill.yml
  • docs/operations/incident-drill-template.md
  • docs/upgrade.md
  • lib/audit.ts
  • scripts/audit-log-forwarder.js
  • scripts/secret-scan-regression.sh
  • scripts/seed-health-fixture.js
  • scripts/slo-budget-report.js
  • test/audit-log-forwarder.test.ts
  • test/audit-retry.test.ts
  • test/audit.test.ts
  • test/codex-manager-cli.test.ts
  • test/security/secret-scan-regression.test.ts
  • test/slo-budget-report.test.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (3)
test/**

⚙️ CodeRabbit configuration file

tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.

Files:

  • test/slo-budget-report.test.ts
  • test/audit-retry.test.ts
  • test/codex-manager-cli.test.ts
  • test/audit.test.ts
  • test/security/secret-scan-regression.test.ts
  • test/audit-log-forwarder.test.ts
docs/**

⚙️ CodeRabbit configuration file

keep README, SECURITY, and docs consistent with actual CLI flags and workflows. whenever behavior changes, require updated upgrade notes and mention new npm scripts.

Files:

  • docs/operations/incident-drill-template.md
  • docs/upgrade.md
lib/**

⚙️ CodeRabbit configuration file

focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.

Files:

  • lib/audit.ts
🧬 Code graph analysis (7)
test/slo-budget-report.test.ts (3)
test/helpers/remove-with-retry.ts (1)
  • removeWithRetry (3-18)
scripts/slo-budget-report.js (6)
  • code (37-37)
  • code (55-55)
  • root (128-128)
  • logDir (129-129)
  • policyPath (125-125)
  • payload (192-219)
scripts/enterprise-health-check.js (3)
  • code (164-164)
  • root (213-213)
  • payload (279-286)
test/audit-retry.test.ts (2)
lib/logger.ts (1)
  • error (389-393)
scripts/audit-dev-allowlist.js (1)
  • audit (150-161)
test/audit.test.ts (1)
lib/audit.ts (3)
  • auditLog (190-230)
  • getAuditLogPath (232-234)
  • configureAudit (74-77)
test/security/secret-scan-regression.test.ts (1)
test/helpers/remove-with-retry.ts (1)
  • removeWithRetry (3-18)
scripts/audit-log-forwarder.js (1)
lib/utils.ts (1)
  • sleep (65-67)
scripts/seed-health-fixture.js (3)
scripts/audit-log-forwarder.js (1)
  • root (370-370)
scripts/slo-budget-report.js (1)
  • root (128-128)
scripts/enterprise-health-check.js (2)
  • root (213-213)
  • settingsPath (218-218)
test/audit-log-forwarder.test.ts (1)
test/helpers/remove-with-retry.ts (1)
  • removeWithRetry (3-18)
🔇 Additional comments (31)
docs/upgrade.md (1)

69-94: nice improvement: migration flow order and list structure are now clear.

the keychain validation now precedes auto, and the ordered-list/code-block structure looks consistent. this resolves the earlier readability/lint concerns.

.github/workflows/recovery-drill.yml (4)

1-10: workflow metadata and permissions look solid.

lib/...:recovery-drill.yml:3-6 — schedule + manual dispatch is the right pattern for monthly operational drills. permissions: contents: read at line 8-9 follows least-privilege; no write access needed here.


31-34: delegating to ops:recovery-drill is the right call.

lib/...:recovery-drill.yml:31-34 — this addresses the prior review feedback to avoid duplicating the test target list inline. the workflow now calls the packaged npm script and forwards reporter flags. clean.


36-37: health check captures stderr now — good.

lib/...:recovery-drill.yml:372>&1 redirect ensures warnings and errors land in the artifact. addresses prior feedback.

one minor note: if enterprise-health-check.js emits non-json to stderr, the combined output won't be valid json. downstream consumers should handle that gracefully or the script should emit structured output only.


39-46: artifact upload with if: always() addresses prior feedback.

lib/...:recovery-drill.yml:40-46 — ensures diagnostic files are uploaded even when drill steps fail. this is critical for post-incident analysis of scheduled runs.

test/security/secret-scan-regression.test.ts (3)

1-7: lgtm on imports.

vitest imports are correct, removeWithRetry helper is appropriate for windows filesystem cleanup edge cases. no issues here.


12-34: collectSyntheticFindings looks solid.

recursive directory walk with async fs operations, proper path normalization at test/security/secret-scan-regression.test.ts:26, and sequential checks for secret pattern then allowlist. logic is correct.


36-45: cleanup is correct and windows-safe.

afterEach uses removeWithRetry which handles EBUSY/EPERM retries - this is the right call for windows filesystem behavior per test/helpers/remove-with-retry.ts:3-4.

scripts/secret-scan-regression.sh (4)

1-15: setup and config validation looks good.

strict mode with set -euo pipefail at scripts/secret-scan-regression.sh:3, proper config path resolution, and fail-fast on missing config. clean.


57-114: run_gitleaks_detect fallback chain is well-structured.

native binary → node fallback → docker fallback. the node fallback at lines 74-100 correctly handles path normalization with /\\/g at line 89 (previous review flagged /\\\\/g). exit codes propagate correctly.

one minor note: the node fallback uses synchronous fs operations which could block on large directories, but for a regression test with small synthetic fixtures this is acceptable.


116-151: fail-case and pass-case status handling is correct.

scripts/secret-scan-regression.sh:116-119 and scripts/secret-scan-regression.sh:144-147 both use explicit status capture with set +e/set -e and targeted error messages. this addresses the previous review feedback about implicit status assumptions.


126-142: findings validation is thorough.

checks array non-empty, validates specific file paths, and confirms allowlisted fixture is excluded. the optional chaining f?.File is defensive against malformed output. good.

test/audit-retry.test.ts (1)

1-79: lgtm - retry test covers ebusy path and confirms atomics.wait removal.

the test correctly exercises the retry loop in lib/audit.ts:118-131 by failing unlinkSync twice with EBUSY before succeeding. the atomicsWaitSpy assertion at line 78 locks in that blocking atomics.wait is no longer used.

one optional addition: consider adding test cases for EPERM and EAGAIN since lib/audit.ts:61 includes them in RETRYABLE_AUDIT_FS_CODES. windows can throw EPERM during file contention.

lib/audit.ts (3)

113-131: retry helper uses immediate bounded retries - acceptable for sync audit.

lib/audit.ts:118-131 retries up to 5 times with no delay between attempts. past reviews discussed adding backoff but the final decision was to keep sync context simple with immediate retries since audit is best-effort. the test at test/audit-retry.test.ts:78 confirms atomics.wait is not used.


133-162: purge throttle correctly updates after successful read.

lib/audit.ts:149 sets lastPurgeAttemptMs only after a successful readdirSync, so transient EBUSY failures won't block purges for the full interval. per-file errors at lines 158-160 are silently swallowed as best-effort, which is appropriate.


216-226: write and chmod hardening looks correct.

lib/audit.ts:217-219 wraps writeFileSync in retry helper. lib/audit.ts:220-226 guards chmodSync behind process.platform !== "win32" with a best-effort catch. this matches the pattern in lib/unified-settings.ts.

test/codex-manager-cli.test.ts (2)

1984-2060: audit command mapping test covers key cli flows.

test/codex-manager-cli.test.ts:1984-2060 validates that login/switch/check/verify-flagged/forecast/status commands emit correct AuditAction and AuditOutcome values. the failure paths (lines 1993-2000, 2003-2011) and success paths are both covered.


2062-2098: sanitization test verifies token and email redaction in error metadata.

test/codex-manager-cli.test.ts:2062-2098 throws an error containing tokens, emails, and a long string, then asserts the audited error is truncated to 200 chars and contains redaction placeholders. this locks in compliance with the lib/** guideline against logging tokens or emails.

test/audit.test.ts (2)

211-226: windows branch now has real assertion.

test/audit.test.ts:212-216 on win32 now calls auditLog and asserts content.toContain("\"action\":\"request.start\"") instead of the previous no-op expect(true).toBe(true). this satisfies the guideline to not skip assertions.


271-299: date spy properly restored across full test body.

test/audit.test.ts:274-298 wraps the entire test logic in try/finally so nowSpy.mockRestore() runs even if setup throws. this addresses the past review concern about spy leakage.

scripts/audit-log-forwarder.js (3)

68-90: loadCheckpoint handles race between existsSync and readFile.

scripts/audit-log-forwarder.js:69-70 uses existsSync but the subsequent readFile at line 73 is wrapped in try/catch (lines 86-88) that returns a safe default checkpoint on any error. this handles the toctou race gracefully.


212-254: sendBatch implements timeout and retry with backoff.

scripts/audit-log-forwarder.js:222-253 uses AbortController for timeout, retries 429/5xx and network errors with exponential backoff plus jitter. clearTimeout in finally prevents timer leaks. this addresses the past review concerns about hanging fetches and fragile error handling.


306-344: checkpoint lock handles windows eperm and stale lock detection.

scripts/audit-log-forwarder.js:324 treats both EEXIST and EPERM as contention (windows throws EPERM). clearStaleCheckpointLock at line 328 checks mtime and validates pid existence before removing stale locks. the max wait guard at lines 333-338 prevents indefinite blocking.

scripts/slo-budget-report.js (2)

89-117: runHealthCheck has timeout, killSignal, and fallback message.

scripts/slo-budget-report.js:98-100 uses configurable timeout, SIGTERM kill signal, and maxBuffer. line 105 falls back to error.message when stdout/stderr are empty. this addresses the past review concerns about hanging child processes and empty findings.


146-152: stale findings count includes both stale-wal and stale-audit-log.

scripts/slo-budget-report.js:148-150 filters for both finding codes, fixing the mismatch flagged in past reviews where only "stale-wal" was counted but tests expected "stale-audit-log".

test/slo-budget-report.test.ts (1)

129-140: windows path test verifies backslash handling.

test/slo-budget-report.test.ts:132 sets CODEX_SLO_HEALTH_CHECK_SCRIPT with a windows-style backslash path and line 138 asserts it's passed through correctly. this covers the windows edge case.

test/audit-log-forwarder.test.ts (2)

324-359: newest log race test addresses past review request.

test/audit-log-forwarder.test.ts:324-359 creates a log file, deletes it during the server handler (simulating race between collection and final stat), and asserts newestLogMtimeMs: null with exit status 0. this covers the churn regression flagged in past reviews.


208-253: lock contention test validates concurrency behavior.

test/audit-log-forwarder.test.ts:208-253 creates a lock file, releases it after 120ms via setTimeout, and verifies the forwarder waits and completes successfully. this is a good regression for concurrent process scenarios per the test/** guideline.

scripts/seed-health-fixture.js (3)

40-44: windows edge case handling for chmod is good.

the try/catch blocks correctly keep the script portable on non-posix systems while still attempting hardening on posix. evidence: scripts/seed-health-fixture.js:40-44, scripts/seed-health-fixture.js:51-55, scripts/seed-health-fixture.js:62-66.

Also applies to: 51-55, 62-66


6-8: no actual concurrency risk in current ci workflow.

the script runs once sequentially in .github/workflows/ci.yml:64 with no matrix strategy or parallel job invocation. each github actions job has isolated workspace by default, so GITHUB_WORKSPACE points to unique per-job storage. a concurrency issue only exists if: (1) multiple jobs in the same workflow run this script simultaneously, or (2) developers manually invoke it in parallel locally—neither is evident. remove this flag unless the workflow will run this script from multiple concurrent jobs.


1-66: add regression tests for fixture permissions and matrix execution.

this security-sensitive fixture path lacks dedicated regression test coverage. seed-health-fixture.js:1-66 sets 0o600/0o700 permissions and runs in ci.yml:64 across a windows-latest matrix, where chmodSync will fail (though gracefully caught). while the try-catch handles posix-incompatibility, there's no test verifying permissions enforcement or confirming the fixture handles repeated invocation safely across parallel matrix runs. add direct coverage for permission bits and concurrent seeding behavior.

Comment on lines +11 to +26
async function removeWithRetry(targetPath: string): Promise<void> {
const retryableCodes = new Set(["EBUSY", "EPERM", "ENOTEMPTY"]);
for (let attempt = 0; attempt < 6; attempt += 1) {
try {
await fs.rm(targetPath, { recursive: true, force: true });
return;
} catch (error) {
const code = (error as NodeJS.ErrnoException).code;
if (code === "ENOENT") return;
if (!code || !retryableCodes.has(code) || attempt === 5) {
throw error;
}
await new Promise((resolve) => setTimeout(resolve, 25 * 2 ** attempt));
}
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

duplicate removeWithRetry helper - import from shared location.

test/slo-budget-report.test.ts:11-26 duplicates the helper from test/helpers/remove-with-retry.ts. import the shared version to keep test utilities dry.

proposed fix
+import { removeWithRetry } from "./helpers/remove-with-retry.js";
 
-async function removeWithRetry(targetPath: string): Promise<void> {
-	const retryableCodes = new Set(["EBUSY", "EPERM", "ENOTEMPTY"]);
-	for (let attempt = 0; attempt < 6; attempt += 1) {
-		try {
-			await fs.rm(targetPath, { recursive: true, force: true });
-			return;
-		} catch (error) {
-			const code = (error as NodeJS.ErrnoException).code;
-			if (code === "ENOENT") return;
-			if (!code || !retryableCodes.has(code) || attempt === 5) {
-				throw error;
-			}
-			await new Promise((resolve) => setTimeout(resolve, 25 * 2 ** attempt));
-		}
-	}
-}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async function removeWithRetry(targetPath: string): Promise<void> {
const retryableCodes = new Set(["EBUSY", "EPERM", "ENOTEMPTY"]);
for (let attempt = 0; attempt < 6; attempt += 1) {
try {
await fs.rm(targetPath, { recursive: true, force: true });
return;
} catch (error) {
const code = (error as NodeJS.ErrnoException).code;
if (code === "ENOENT") return;
if (!code || !retryableCodes.has(code) || attempt === 5) {
throw error;
}
await new Promise((resolve) => setTimeout(resolve, 25 * 2 ** attempt));
}
}
}
import { removeWithRetry } from "./helpers/remove-with-retry.js";
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/slo-budget-report.test.ts` around lines 11 - 26, The test defines a
duplicate removeWithRetry function; instead delete the local async function
removeWithRetry in test/slo-budget-report.test.ts and import the shared helper
from test/helpers/remove-with-retry.ts (use the exported symbol removeWithRetry)
so all retries use the central implementation; ensure the import matches the
helper's exported name and types and leave existing calls in the test unchanged.

- add recovery drill timeout and failure notifier hook
- clarify upgrade runbook npm script execution context
- harden secret-scan allowlist regression fixtures and test isolation
- reuse shared removeWithRetry helper in audit forwarder tests

Co-authored-by: Codex <noreply@openai.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/recovery-drill.yml:
- Around line 11-15: The workflow lacks a concurrency key so the recovery-drill
job can run in parallel; add a top-level concurrency stanza to the workflow
(e.g., concurrency: { group: "recovery-drill", cancel-in-progress: true } or a
stable group name using github.workflow and github.ref) to ensure the Monthly
Storage Recovery Drill job (recovery-drill) cannot start a new run while an
existing run is in progress and to optionally cancel in-progress runs when a new
one is triggered.
- Around line 54-62: Replace the unsafe printf-based JSON construction and add a
curl timeout: instead of building payload with printf '{"text":"%s"}'
"${message}", use jq to produce properly escaped JSON by running jq -n --arg msg
"${message}" '{"text": $msg}' and assign that to payload (or stream it directly
to curl), and add --max-time 30 to the curl invocation that posts to
"${RECOVERY_DRILL_WEBHOOK_URL}" while keeping the existing flags (--fail
--silent --show-error) and the RUN_URL/message variables unchanged.

In `@docs/upgrade.md`:
- Around line 79-80: In docs/upgrade.md there are fenced code blocks that close
(the ``` after the bash examples such as the block around "npm run
ops:keychain-assert" and the block around "npm run ops:health-check") and are
immediately followed by paragraph text; add a single blank line after each
closing fence so the paragraph text is separated from the fenced code block
(i.e., insert a blank line after the closing ``` for the ops:keychain-assert
block and after the closing ``` for the ops:health-check block) to satisfy
markdownlint MD031.

In `@test/audit-log-forwarder.test.ts`:
- Around line 13-35: The runForwarder harness can hang forever; add a hard
timeout that kills the spawned child and resolves the promise when exceeded.
Inside runForwarder (the child variable returned by spawn), start a timer (e.g.,
with setTimeout) that calls child.kill() and then resolves with a timeout result
(status null and an explanatory stderr fragment or message) if the process
hasn't exited; clear the timer on child 'close' and on 'error' to avoid leaks,
and ensure the same stdout/stderr aggregation is preserved before resolving.
- Around line 210-223: The test's lock-release timer (releaseTimer using
setTimeout to fs.unlink(checkpointLockPath)) races with the forwarder's timeout
(CODEX_AUDIT_FORWARDER_TIMEOUT_MS), causing flakes; replace the fixed 120ms
wakeup with a deterministic approach—either increase the delay to exceed the
timeout window (e.g., 500ms) or, preferably, replace the timer with a
polling/wait helper that repeatedly checks for the forwarder hitting the lock
wait state before calling fs.unlink(checkpointLockPath); update the test around
runForwarder and the env var CODEX_AUDIT_FORWARDER_TIMEOUT_MS accordingly and
remove the fragile 120ms timing.
- Around line 239-272: The test currently only asserts exit status; update the
"clears stale checkpoint locks and proceeds" test to also assert that the
forwarder actually processed the log and advanced the checkpoint: when calling
withServer and runForwarder, capture and assert the HTTP request payload/body
was received (via the withServer request handler) contains the expected audit
event, and after runForwarder finishes read checkpointPath to assert it exists
and contains an updated checkpoint (e.g., a timestamp or offset indicating
progress) and that checkpointLockPath was removed or replaced; use the existing
helpers runForwarder, withServer, checkpointPath, checkpointLockPath, and logDir
to locate files and the server handler to capture payload, keeping assertions
deterministic under vitest.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: dd50ee38-1b3a-42fa-b76c-ee46f056052e

📥 Commits

Reviewing files that changed from the base of the PR and between e59c731 and 2b0afa4.

📒 Files selected for processing (5)
  • .github/workflows/recovery-drill.yml
  • docs/upgrade.md
  • scripts/secret-scan-regression.sh
  • test/audit-log-forwarder.test.ts
  • test/security/secret-scan-regression.test.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (2)
docs/**

⚙️ CodeRabbit configuration file

keep README, SECURITY, and docs consistent with actual CLI flags and workflows. whenever behavior changes, require updated upgrade notes and mention new npm scripts.

Files:

  • docs/upgrade.md
test/**

⚙️ CodeRabbit configuration file

tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.

Files:

  • test/audit-log-forwarder.test.ts
  • test/security/secret-scan-regression.test.ts
🧬 Code graph analysis (1)
test/security/secret-scan-regression.test.ts (1)
test/helpers/remove-with-retry.ts (1)
  • removeWithRetry (3-18)
🪛 markdownlint-cli2 (0.21.0)
docs/upgrade.md

[warning] 79-79: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


[warning] 95-95: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

🔇 Additional comments (8)
docs/upgrade.md (1)

98-102: good windows migration guardrails here.

the EBUSY/EPERM note and handle-lock guidance are clear and practical for operators on windows in docs/upgrade.md line 98-102.

test/security/secret-scan-regression.test.ts (2)

12-34: solid synthetic scanner helper for deterministic assertions.

test/security/secret-scan-regression.test.ts:12 keeps path normalization and allowlist filtering explicit, which makes the regression deterministic across os variants.


47-54: good windows and allowlist regression coverage.

test/security/secret-scan-regression.test.ts:47 and test/security/secret-scan-regression.test.ts:56 now validate windows path handling and prove the allowlisted fixture is excluded while real findings are still reported.

As per coding guidelines test/**: tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.

Also applies to: 56-93

scripts/secret-scan-regression.sh (1)

57-114: execution paths and status handling look correct.

scripts/secret-scan-regression.sh:57 and scripts/secret-scan-regression.sh:116 correctly enforce fail-case/pass-case expectations across native, docker, and fallback scanner paths.

Also applies to: 116-153

.github/workflows/recovery-drill.yml (3)

1-10: workflow header looks solid.

minimal permissions: contents: read, monthly cron trigger, and manual dispatch support are all appropriate for an enterprise recovery drill. no issues here.


32-38: drill test and health check steps look good.

lib/...:recovery-drill.yml:32-38 — delegates to npm run ops:recovery-drill as requested, and the health check now captures stderr with 2>&1. both past concerns addressed.


40-47: artifact upload unconditionally — good.

lib/...:recovery-drill.yml:40-47if: always() ensures drill artifacts are preserved for post-mortem even when earlier steps fail. past concern addressed.

test/audit-log-forwarder.test.ts (1)

308-343: good regression coverage for newest-log disappearance race.

test/audit-log-forwarder.test.ts:308-343 (Line 308) is solid: it validates success exit, newestLogMtimeMs: null, and checkpoint advancement after file disappearance.

Comment on lines +11 to +15
jobs:
recovery-drill:
name: Monthly Storage Recovery Drill
runs-on: ubuntu-latest
timeout-minutes: 30
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

consider adding a concurrency group to prevent parallel drill runs.

lib/...:recovery-drill.yml:11-15 — no concurrency key means a manual workflow_dispatch during an in-progress scheduled run would run both in parallel. for a drill that touches storage/secrets fixtures, parallel runs could produce confusing or conflicting results.

proposed fix
 jobs:
   recovery-drill:
     name: Monthly Storage Recovery Drill
     runs-on: ubuntu-latest
     timeout-minutes: 30
+    concurrency:
+      group: recovery-drill
+      cancel-in-progress: false
     steps:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
jobs:
recovery-drill:
name: Monthly Storage Recovery Drill
runs-on: ubuntu-latest
timeout-minutes: 30
jobs:
recovery-drill:
name: Monthly Storage Recovery Drill
runs-on: ubuntu-latest
timeout-minutes: 30
concurrency:
group: recovery-drill
cancel-in-progress: false
steps:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/recovery-drill.yml around lines 11 - 15, The workflow
lacks a concurrency key so the recovery-drill job can run in parallel; add a
top-level concurrency stanza to the workflow (e.g., concurrency: { group:
"recovery-drill", cancel-in-progress: true } or a stable group name using
github.workflow and github.ref) to ensure the Monthly Storage Recovery Drill job
(recovery-drill) cannot start a new run while an existing run is in progress and
to optionally cancel in-progress runs when a new one is triggered.

Comment on lines +54 to +62
run: |
message="Recovery drill failed. Run: ${RUN_URL}. Artifacts: .tmp/recovery-drill-vitest.json and .tmp/recovery-drill-health.json."
if [[ -n "${RECOVERY_DRILL_WEBHOOK_URL:-}" ]]; then
payload=$(printf '{"text":"%s"}' "${message}")
curl --fail --silent --show-error \
-X POST \
-H "Content-Type: application/json" \
--data "${payload}" \
"${RECOVERY_DRILL_WEBHOOK_URL}"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

git ls-files | grep -i "recovery-drill"

Repository: ndycode/codex-multi-auth

Length of output: 104


🏁 Script executed:

cat -n .github/workflows/recovery-drill.yml | sed -n '50,65p'

Repository: ndycode/codex-multi-auth

Length of output: 1017


🌐 Web query:

jq availability ubuntu-latest GitHub Actions runner 2024 2025

💡 Result:

  • jq is preinstalled on GitHub-hosted Ubuntu runners. On ubuntu-22.04 the image includes jq 1.6. (github.com)
  • On ubuntu-24.04 the image includes jq 1.7 (APT package shown as 1.7.1). (github.com)
  • ubuntu-latest switched from Ubuntu 22.04 to Ubuntu 24.04 during Dec 2024–Jan 2025 (gradual migration), so the default jq on ubuntu-latest effectively moved from 1.6 → 1.7 across that window. To avoid surprises, pin runs-on: ubuntu-22.04 or ubuntu-24.04. (github.com)

Citations:


use jq for json escaping instead of printf.

.github/workflows/recovery-drill.yml:57printf '{"text":"%s"}' "${message}" doesn't escape special chars in the payload. the current message is safe since it's built from github context variables, but future changes could break this silently. jq is preinstalled on ubuntu-latest (1.7+), so replace the printf call with jq -n --arg msg "${message}" '{"text": $msg}' for proper escaping.

also add --max-time 30 to curl to prevent indefinite hangs if the webhook endpoint doesn't respond.

proposed fix
       run: |
         message="Recovery drill failed. Run: ${RUN_URL}. Artifacts: .tmp/recovery-drill-vitest.json and .tmp/recovery-drill-health.json."
         if [[ -n "${RECOVERY_DRILL_WEBHOOK_URL:-}" ]]; then
-          payload=$(printf '{"text":"%s"}' "${message}")
+          payload=$(jq -n --arg msg "${message}" '{"text": $msg}')
           curl --fail --silent --show-error \
+            --max-time 30 \
             -X POST \
             -H "Content-Type: application/json" \
             --data "${payload}" \
             "${RECOVERY_DRILL_WEBHOOK_URL}"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/recovery-drill.yml around lines 54 - 62, Replace the
unsafe printf-based JSON construction and add a curl timeout: instead of
building payload with printf '{"text":"%s"}' "${message}", use jq to produce
properly escaped JSON by running jq -n --arg msg "${message}" '{"text": $msg}'
and assign that to payload (or stream it directly to curl), and add --max-time
30 to the curl invocation that posts to "${RECOVERY_DRILL_WEBHOOK_URL}" while
keeping the existing flags (--fail --silent --show-error) and the
RUN_URL/message variables unchanged.

Comment on lines +79 to +80
```
Run this from the project repository root where `package.json` defines enterprise ops scripts, or run your CI/job wrapper that exposes these scripts.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

fix md031 fence spacing around list-item code blocks.

line 79 and line 95 close fenced blocks and immediately continue with paragraph text. add a blank line after each closing fence so markdownlint stays clean in docs/upgrade.md line 79 and docs/upgrade.md line 95.

proposed patch
 2. Validate keychain backend availability:
 
    ```bash
    npm run ops:keychain-assert
    ```
+
    Run this from the project repository root where `package.json` defines enterprise ops scripts, or run your CI/job wrapper that exposes these scripts.
@@
 5. Verify health and storage state:
 
    ```bash
    npm run ops:health-check -- --require-files
    ```
+
    Run this from the same repository checkout (or your standard CI/job wrapper).

Also applies to: 95-96

🧰 Tools
🪛 markdownlint-cli2 (0.21.0)

[warning] 79-79: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/upgrade.md` around lines 79 - 80, In docs/upgrade.md there are fenced
code blocks that close (the ``` after the bash examples such as the block around
"npm run ops:keychain-assert" and the block around "npm run ops:health-check")
and are immediately followed by paragraph text; add a single blank line after
each closing fence so the paragraph text is separated from the fenced code block
(i.e., insert a blank line after the closing ``` for the ops:keychain-assert
block and after the closing ``` for the ops:health-check block) to satisfy
markdownlint MD031.

Comment on lines +13 to +35
function runForwarder(
args: string[],
env: NodeJS.ProcessEnv = {},
): Promise<{ status: number | null; stdout: string; stderr: string }> {
return new Promise((resolve, reject) => {
const child = spawn(process.execPath, [scriptPath, ...args], {
env: { ...process.env, ...env },
stdio: ["ignore", "pipe", "pipe"],
});
let stdout = "";
let stderr = "";
child.stdout.on("data", (chunk) => {
stdout += chunk.toString();
});
child.stderr.on("data", (chunk) => {
stderr += chunk.toString();
});
child.on("error", reject);
child.on("close", (status) => {
resolve({ status, stdout, stderr });
});
});
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

add a hard timeout to the child-process harness.

in test/audit-log-forwarder.test.ts:13-35 (Line 13), runForwarder can hang forever if the script never exits, which makes this suite non-deterministic and can deadlock ci.

proposed fix
 function runForwarder(
 	args: string[],
 	env: NodeJS.ProcessEnv = {},
+	timeoutMs = 10_000,
 ): Promise<{ status: number | null; stdout: string; stderr: string }> {
 	return new Promise((resolve, reject) => {
 		const child = spawn(process.execPath, [scriptPath, ...args], {
 			env: { ...process.env, ...env },
 			stdio: ["ignore", "pipe", "pipe"],
 		});
+		const timeout = setTimeout(() => {
+			child.kill();
+			reject(new Error(`runForwarder timed out after ${timeoutMs}ms`));
+		}, timeoutMs);
 		let stdout = "";
 		let stderr = "";
 		child.stdout.on("data", (chunk) => {
 			stdout += chunk.toString();
 		});
 		child.stderr.on("data", (chunk) => {
 			stderr += chunk.toString();
 		});
-		child.on("error", reject);
+		child.on("error", (error) => {
+			clearTimeout(timeout);
+			reject(error);
+		});
 		child.on("close", (status) => {
+			clearTimeout(timeout);
 			resolve({ status, stdout, stderr });
 		});
 	});
 }

As per coding guidelines test/**: tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
function runForwarder(
args: string[],
env: NodeJS.ProcessEnv = {},
): Promise<{ status: number | null; stdout: string; stderr: string }> {
return new Promise((resolve, reject) => {
const child = spawn(process.execPath, [scriptPath, ...args], {
env: { ...process.env, ...env },
stdio: ["ignore", "pipe", "pipe"],
});
let stdout = "";
let stderr = "";
child.stdout.on("data", (chunk) => {
stdout += chunk.toString();
});
child.stderr.on("data", (chunk) => {
stderr += chunk.toString();
});
child.on("error", reject);
child.on("close", (status) => {
resolve({ status, stdout, stderr });
});
});
}
function runForwarder(
args: string[],
env: NodeJS.ProcessEnv = {},
timeoutMs = 10_000,
): Promise<{ status: number | null; stdout: string; stderr: string }> {
return new Promise((resolve, reject) => {
const child = spawn(process.execPath, [scriptPath, ...args], {
env: { ...process.env, ...env },
stdio: ["ignore", "pipe", "pipe"],
});
const timeout = setTimeout(() => {
child.kill();
reject(new Error(`runForwarder timed out after ${timeoutMs}ms`));
}, timeoutMs);
let stdout = "";
let stderr = "";
child.stdout.on("data", (chunk) => {
stdout += chunk.toString();
});
child.stderr.on("data", (chunk) => {
stderr += chunk.toString();
});
child.on("error", (error) => {
clearTimeout(timeout);
reject(error);
});
child.on("close", (status) => {
clearTimeout(timeout);
resolve({ status, stdout, stderr });
});
});
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/audit-log-forwarder.test.ts` around lines 13 - 35, The runForwarder
harness can hang forever; add a hard timeout that kills the spawned child and
resolves the promise when exceeded. Inside runForwarder (the child variable
returned by spawn), start a timer (e.g., with setTimeout) that calls
child.kill() and then resolves with a timeout result (status null and an
explanatory stderr fragment or message) if the process hasn't exited; clear the
timer on child 'close' and on 'error' to avoid leaks, and ensure the same
stdout/stderr aggregation is preserved before resolving.

Comment on lines +210 to +223
const releaseTimer = setTimeout(async () => {
await fs.unlink(checkpointLockPath).catch(() => {});
}, 120);
try {
const result = await runForwarder(
[
`--endpoint=${endpoint}`,
`--log-dir=${logDir}`,
`--checkpoint=${checkpointPath}`,
],
{
CODEX_AUDIT_FORWARDER_MAX_ATTEMPTS: "2",
CODEX_AUDIT_FORWARDER_TIMEOUT_MS: "300",
},
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

remove the timer race in the lock-release test.

in test/audit-log-forwarder.test.ts:210-223 (Line 210), releasing the lock at 120ms with a 300ms timeout budget is tight and can flake on slower windows runners under load.

proposed fix
-			const releaseTimer = setTimeout(async () => {
+			const releaseTimer = setTimeout(async () => {
 				await fs.unlink(checkpointLockPath).catch(() => {});
-			}, 120);
+			}, 50);
 			try {
 				const result = await runForwarder(
 					[
 						`--endpoint=${endpoint}`,
 						`--log-dir=${logDir}`,
 						`--checkpoint=${checkpointPath}`,
 					],
 					{
 						CODEX_AUDIT_FORWARDER_MAX_ATTEMPTS: "2",
 						CODEX_AUDIT_FORWARDER_TIMEOUT_MS: "300",
+						CODEX_AUDIT_FORWARDER_MAX_WAIT_MS: "2000",
+						CODEX_AUDIT_FORWARDER_LOCK_MAX_ATTEMPTS: "200",
 					},
 				);

As per coding guidelines test/**: tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/audit-log-forwarder.test.ts` around lines 210 - 223, The test's
lock-release timer (releaseTimer using setTimeout to
fs.unlink(checkpointLockPath)) races with the forwarder's timeout
(CODEX_AUDIT_FORWARDER_TIMEOUT_MS), causing flakes; replace the fixed 120ms
wakeup with a deterministic approach—either increase the delay to exceed the
timeout window (e.g., 500ms) or, preferably, replace the timer with a
polling/wait helper that repeatedly checks for the forwarder hitting the lock
wait state before calling fs.unlink(checkpointLockPath); update the test around
runForwarder and the env var CODEX_AUDIT_FORWARDER_TIMEOUT_MS accordingly and
remove the fragile 120ms timing.

Comment on lines +239 to +272
it("clears stale checkpoint locks and proceeds", async () => {
const root = mkdtempSync(path.join(tmpdir(), "audit-forwarder-stale-lock-"));
fixtures.push(root);
const logDir = path.join(root, "logs");
const checkpointPath = path.join(root, "checkpoint.json");
const checkpointLockPath = `${checkpointPath}.lock`;
await fs.mkdir(logDir, { recursive: true });
await fs.writeFile(
path.join(logDir, "audit.log"),
'{"timestamp":"2026-03-01T00:00:00Z","action":"request.start"}\n',
"utf8",
);
await fs.writeFile(checkpointLockPath, "999999\n", "utf8");
const staleDate = new Date(Date.now() - 60 * 1000);
utimesSync(checkpointLockPath, staleDate, staleDate);

await withServer((_req, res) => {
res.statusCode = 200;
res.end("ok");
}, async (endpoint) => {
const result = await runForwarder(
[
`--endpoint=${endpoint}`,
`--log-dir=${logDir}`,
`--checkpoint=${checkpointPath}`,
],
{
CODEX_AUDIT_FORWARDER_STALE_LOCK_MS: "50",
CODEX_AUDIT_FORWARDER_MAX_WAIT_MS: "500",
},
);
expect(result.status).toBe(0);
});
});
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

strengthen stale-lock assertions to prevent false positives.

in test/audit-log-forwarder.test.ts:239-272 (Line 239), this regression only checks exit status. add payload and checkpoint assertions so it proves the stale lock was actually cleared and processing advanced.

proposed fix
 		}, async (endpoint) => {
 			const result = await runForwarder(
 				[
 					`--endpoint=${endpoint}`,
 					`--log-dir=${logDir}`,
 					`--checkpoint=${checkpointPath}`,
 				],
 				{
 					CODEX_AUDIT_FORWARDER_STALE_LOCK_MS: "50",
 					CODEX_AUDIT_FORWARDER_MAX_WAIT_MS: "500",
 				},
 			);
 			expect(result.status).toBe(0);
+			const payload = parseJsonStdout(result.stdout);
+			expect(payload.status).toBe("sent");
+			expect(payload.sent).toBe(1);
 		});
+
+		const checkpoint = JSON.parse(await fs.readFile(checkpointPath, "utf8")) as {
+			file?: string;
+			line?: number;
+		};
+		expect(checkpoint.file).toBe("audit.log");
+		expect(checkpoint.line).toBe(1);
 	});

As per coding guidelines test/**: tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
it("clears stale checkpoint locks and proceeds", async () => {
const root = mkdtempSync(path.join(tmpdir(), "audit-forwarder-stale-lock-"));
fixtures.push(root);
const logDir = path.join(root, "logs");
const checkpointPath = path.join(root, "checkpoint.json");
const checkpointLockPath = `${checkpointPath}.lock`;
await fs.mkdir(logDir, { recursive: true });
await fs.writeFile(
path.join(logDir, "audit.log"),
'{"timestamp":"2026-03-01T00:00:00Z","action":"request.start"}\n',
"utf8",
);
await fs.writeFile(checkpointLockPath, "999999\n", "utf8");
const staleDate = new Date(Date.now() - 60 * 1000);
utimesSync(checkpointLockPath, staleDate, staleDate);
await withServer((_req, res) => {
res.statusCode = 200;
res.end("ok");
}, async (endpoint) => {
const result = await runForwarder(
[
`--endpoint=${endpoint}`,
`--log-dir=${logDir}`,
`--checkpoint=${checkpointPath}`,
],
{
CODEX_AUDIT_FORWARDER_STALE_LOCK_MS: "50",
CODEX_AUDIT_FORWARDER_MAX_WAIT_MS: "500",
},
);
expect(result.status).toBe(0);
});
});
it("clears stale checkpoint locks and proceeds", async () => {
const root = mkdtempSync(path.join(tmpdir(), "audit-forwarder-stale-lock-"));
fixtures.push(root);
const logDir = path.join(root, "logs");
const checkpointPath = path.join(root, "checkpoint.json");
const checkpointLockPath = `${checkpointPath}.lock`;
await fs.mkdir(logDir, { recursive: true });
await fs.writeFile(
path.join(logDir, "audit.log"),
'{"timestamp":"2026-03-01T00:00:00Z","action":"request.start"}\n',
"utf8",
);
await fs.writeFile(checkpointLockPath, "999999\n", "utf8");
const staleDate = new Date(Date.now() - 60 * 1000);
utimesSync(checkpointLockPath, staleDate, staleDate);
await withServer((_req, res) => {
res.statusCode = 200;
res.end("ok");
}, async (endpoint) => {
const result = await runForwarder(
[
`--endpoint=${endpoint}`,
`--log-dir=${logDir}`,
`--checkpoint=${checkpointPath}`,
],
{
CODEX_AUDIT_FORWARDER_STALE_LOCK_MS: "50",
CODEX_AUDIT_FORWARDER_MAX_WAIT_MS: "500",
},
);
expect(result.status).toBe(0);
const payload = parseJsonStdout(result.stdout);
expect(payload.status).toBe("sent");
expect(payload.sent).toBe(1);
});
const checkpoint = JSON.parse(await fs.readFile(checkpointPath, "utf8")) as {
file?: string;
line?: number;
};
expect(checkpoint.file).toBe("audit.log");
expect(checkpoint.line).toBe(1);
});
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/audit-log-forwarder.test.ts` around lines 239 - 272, The test currently
only asserts exit status; update the "clears stale checkpoint locks and
proceeds" test to also assert that the forwarder actually processed the log and
advanced the checkpoint: when calling withServer and runForwarder, capture and
assert the HTTP request payload/body was received (via the withServer request
handler) contains the expected audit event, and after runForwarder finishes read
checkpointPath to assert it exists and contains an updated checkpoint (e.g., a
timestamp or offset indicating progress) and that checkpointLockPath was removed
or replaced; use the existing helpers runForwarder, withServer, checkpointPath,
checkpointLockPath, and logDir to locate files and the server handler to capture
payload, keeping assertions deterministic under vitest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant