test: add deep stress suite for audited subsystems by ndycode · Pull Request #166 · ndycode/oc-codex-multi-auth

ndycode · 2026-06-06T00:36:32Z

Summary

The deepest stress layer for the subsystems hardened by the deep audit (#165). Adds property-based and concurrency tests that hammer the invariants those fixes depend on. Every test was mutation-verified — disabling the corresponding fix makes the test fail — so they are real guards, not tautologies.

Suites

test/property/tracker-remap.property.test.ts — health-score and token-bucket state follows the right account through random removal sequences; remapIndexedKeys drops only the removed index and is a bijection on survivors (incl. index:quotaKey keys). ~1300 generated cases. Guards the index-keyed-tracker fix.
test/chaos/concurrent-storage.test.ts — drives the REAL on-disk path (atomic temp+rename, mutex, withAccountStorageTransaction) against a temp file: 60 concurrent read-modify-write transactions lose zero updates; per-account interleaving all persists; mixed transaction/overwrite storms never tear the file. Guards the runAccountCheck/hydrateEmails lost-update fix.
test/property/redaction.property.test.ts — for any generated email / opaque secret / JWT, the raw value never survives maskEmailForDisplay, resolveDisplayEmail, sanitizeValue (incl. nested + cookie keys), or maskString. ~2500 generated cases. Guards Add privacy-safe account labels across account switcher and command displays #163 masking + the redaction fixes.
test/property/refresh-rotation.property.test.ts — random sibling groups sharing refresh tokens: a rotation converges every sibling holding the old token (other groups untouched), and workspace-scoped removal never drops refresh-token siblings while preserving the index === position invariant.

Verification

npm test: 96 files, 2487 passed, 1 skipped
npm run lint + npm run typecheck: clean
Mutation-verified: disabling propagateRotatedRefreshTokenToSiblings and HealthScoreTracker.remapAfterRemoval each makes the corresponding suite fail.

🤖 Generated with Claude Code

Summary by CodeRabbit

Tests
- Added comprehensive stress test suites covering account storage reliability under concurrent operations, privacy and redaction safeguards, token rotation mechanics across multiple accounts, and state tracking consistency after account removal.

note: greptile review for oc-chatgpt-multi-auth. cite files like `lib/foo.ts:123`. confirm regression tests + windows concurrency/token redaction coverage.

Greptile Summary

adds a deep stress layer on top of the audit-hardened subsystems from #165: four new test files covering property-based and concurrency invariants across storage transactions, email/token redaction, refresh-token rotation propagation, and tracker index remapping. all suites use real code paths (no mocks for the storage mutex or on-disk atomic rename), and each suite is documented as mutation-verified.

test/chaos/concurrent-storage.test.ts — 60 concurrent withAccountStorageTransaction calls through the real withStorageLock mutex + atomic temp/rename pipeline; verifies zero lost updates and structural integrity under a mixed transaction/overwrite storm.
test/property/redaction.property.test.ts — ~2500 fast-check cases asserting that maskEmailForDisplay, resolveDisplayEmail, sanitizeValue, and maskString never leak raw emails, opaque secrets, or jwt-shaped substrings.
test/property/refresh-rotation.property.test.ts — 300-run random sibling-group suite asserting rotation converges all siblings and workspace-scoped removal never drops refresh-token siblings while preserving the index === position invariant.
test/property/tracker-remap.property.test.ts — 400-run suite for HealthScoreTracker.remapAfterRemoval, TokenBucketTracker.remapAfterRemoval, and remapIndexedKeys bijectivity across random removal sequences.

Confidence Score: 4/5

test-only changes; no production code touched, all suites are mutation-verified, and the real storage path is exercised against a temp directory with proper cleanup.

the dead lastUsedCount field in makeStorage is misleading and could confuse future maintainers, the buildManager fixture silently relies on normalization to supply a missing activeIndexByFamily, and the health-score property test's toBeCloseTo tolerance depends on wall-clock speed rather than pinned timers. none of these break correctness today, but they are rough edges worth tidying before the suite grows.

test/chaos/concurrent-storage.test.ts (dead field + windows rename coverage gap) and test/property/tracker-remap.property.test.ts (real-clock dependency in health-score assertions)

Important Files Changed

Filename	Overview
test/chaos/concurrent-storage.test.ts	adds 3 real-disk concurrent-storage tests using the actual mutex/atomic-rename path; dead `lastUsedCount` field in makeStorage is confusing, and windows rename-retry coverage is absent
test/property/redaction.property.test.ts	~2500 property-based cases covering maskEmailForDisplay, resolveDisplayEmail, sanitizeValue, and maskString; key normalization correctly aligns with SENSITIVE_KEYS in logger.ts; no issues found
test/property/refresh-rotation.property.test.ts	300-run property suite for sibling token propagation and workspace-scoped removal; buildManager fixture missing activeIndexByFamily field, relying silently on normalizeAccountStorage fallback
test/property/tracker-remap.property.test.ts	400-run property suite for HealthScoreTracker/TokenBucketTracker remap after removal; toBeCloseTo(health, 3) tolerance relies on wall-clock speed rather than fake timers, which could be flaky under cpu pressure

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[withAccountStorageTransaction] --> B[withStorageLock]
    B --> C[loadAccountsInternal]
    C --> D[normalizeAccountStorage]
    D --> E[handler: read-modify-write]
    E --> F[persist: saveAccountsUnlocked]
    F --> G[writeAccountsToPathUnlocked]
    G --> H[atomic temp+rename renameWithWindowsRetry]

    subgraph concurrent-storage.test.ts
        I[60x Promise.all] --> A
        J[saveAccounts plain] --> B
    end

    subgraph tracker-remap.property.test.ts
        K[HealthScoreTracker] --> L[remapAfterRemoval]
        M[TokenBucketTracker] --> L
        N[remapIndexedKeys] --> L
    end

    subgraph refresh-rotation.property.test.ts
        O[buildManager random seed] --> P[updateFromAuth]
        P --> Q[propagateRotatedRefreshTokenToSiblings]
        O --> R[removeAccountsByWorkspaceIdentity]
        R --> S[index === position invariant]
    end

    subgraph redaction.property.test.ts
        T[arbEmail / arbOpaqueSecret] --> U[maskEmailForDisplay]
        T --> V[sanitizeValue]
        T --> W[maskString JWT]
    end

Prompt To Fix All With AI

Fix the following 4 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 4
test/chaos/concurrent-storage.test.ts:33-40
**dead `lastUsedCount` field — misleading comment**

`makeStorage` adds `lastUsedCount: 0 as number` with the comment "a per-account counter we will concurrently increment," but none of the three test bodies ever read or increment `lastUsedCount`. the actual counter used in every assertion is `lastUsed`. the dead field and its comment will mislead anyone trying to trace what the test is measuring, and the `as never` cast on the accounts array silently suppresses typescript from flagging the unknown field. remove `lastUsedCount` and its comment to keep the fixture honest.

### Issue 2 of 4
test/chaos/concurrent-storage.test.ts:47-57
**windows filesystem risk: `renameWithWindowsRetry` path not exercised**

the concurrent-storage test drives the real `atomic temp+rename` path, which uses `renameWithWindowsRetry` to handle `EPERM`/`EBUSY` on windows (where renaming over an existing file can fail). the 60-concurrent-transaction and the storm test (test 3) both pass on linux/mac, but neither explicitly verifies the retry behaviour or the `EBUSY` scenario. on windows ci the rename can transiently fail even with the retry logic, and a test breakage there would not be caught by this suite. consider adding a note or a platform-gated assertion that the rename path is exercised under contention.

### Issue 3 of 4
test/property/refresh-rotation.property.test.ts:42-58
**`buildManager` stored object missing `activeIndexByFamily`**

the stored fixture passed to `AccountManager` does not include `activeIndexByFamily`. `normalizeAccountStorage` handles the missing field gracefully (it defaults every family to `rawActiveIndex`), so the tests work at runtime, but the fixture silently relies on that fallback rather than supplying a valid `AccountStorageV3` shape. if `AccountState.initializeFromStorage` is ever changed to skip normalization, or if a family-specific rotation method is added to a test, the fixture will produce undefined behaviour without any typescript warning (the `as never` cast hides it). adding `activeIndexByFamily: {}` makes the fixture self-contained and matches the shape that `makeStorage` in the concurrent-storage suite already provides.

### Issue 4 of 4
test/property/tracker-remap.property.test.ts:103-110
**passive-recovery drift: no time control in health-score property test**

`HealthScoreTracker.getScore` calls `Date.now()` to apply passive recovery (`2 pts/hr`). the snapshot `acc.health` is recorded immediately after the rate-limit hits, and the invariant check runs after the removal loop — all synchronous, so drift is negligible in practice. however, `toBeCloseTo(acc.health, 3)` (±0.0005) gives no headroom for a slow ci runner that pauses between the seed phase and the assertion (e.g. garbage collection, cpu throttle). `vi.useFakeTimers()` at the start of the suite and `vi.useRealTimers()` in an `afterAll` would pin the clock and make the tolerance unconditionally correct rather than empirically fast-enough.

_{Reviews (1): Last reviewed commit: "test: add deep stress suite for audited ..." | Re-trigger Greptile}

Greptile also left 4 inline comments on this PR.

Property-based and concurrency stress tests that hammer the invariants the deep audit fixes depend on. Each was mutation-verified (disabling the fix makes it fail): - test/property/tracker-remap.property.test.ts — health/token-bucket index remap follows the right account through random removal sequences; remapIndexedKeys is a bijection on survivors that drops only the removed index. (~1300 cases) - test/chaos/concurrent-storage.test.ts — 60 concurrent withAccountStorageTransaction RMW ops on the REAL on-disk path lose zero updates; mixed transaction/overwrite storms never tear the file. Validates the runAccountCheck/hydrateEmails fix. - test/property/redaction.property.test.ts — for any generated email/opaque secret/JWT, the raw value never survives maskEmailForDisplay / sanitizeValue / maskString. (~2500 cases) - test/property/refresh-rotation.property.test.ts — random sibling groups: a rotation converges all siblings sharing the old token (others untouched), and workspace-scoped removal never drops refresh-token siblings. Full suite: 96 files, 2487 passed, 1 skipped. Typecheck + lint clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector · 2026-06-06T00:36:39Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

coderabbitai · 2026-06-06T00:36:43Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ad686bc6-8c95-4dbb-8743-2b287076ffa9

📥 Commits

Reviewing files that changed from the base of the PR and between b8cdf6b and b7cf8fd.

📒 Files selected for processing (4)

test/chaos/concurrent-storage.test.ts
test/property/redaction.property.test.ts
test/property/refresh-rotation.property.test.ts
test/property/tracker-remap.property.test.ts

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📝 Walkthrough

Walkthrough

This PR adds four new test files that introduce deep stress and property-based testing coverage across the account storage, privacy/redaction, account management, and rate-limiting systems. The tests use Vitest and fast-check to verify concurrent transaction correctness, invariant preservation, and state consistency under randomized conditions.

Changes

Deep Stress and Property-Based Testing Suite

Layer / File(s)	Summary
Concurrent storage transaction stress tests `test/chaos/concurrent-storage.test.ts`	Introduces three concurrency-focused test cases that spawn many concurrent `withAccountStorageTransaction` operations incrementing the same or different accounts, verify no lost updates occur, and assert that mixing transactional increments with non-transactional `saveAccounts` overwrites does not produce torn or invalid JSON.
Email masking and logger redaction property tests `test/property/redaction.property.test.ts`	Adds property-based tests verifying that `maskEmailForDisplay` and `resolveDisplayEmail` never leak full email local parts, that `sanitizeValue` recursively masks nested objects and arrays keyed by sensitive identifiers, and that `maskString` removes JWT-shaped token substrings from free text.
Account manager refresh token rotation property tests `test/property/refresh-rotation.property.test.ts`	Adds property tests asserting that rotating one account's refresh token causes all sibling accounts sharing the old token to converge to the new token, while accounts in other refresh-token groups remain unchanged, and that workspace-scoped removal removes only the targeted workspace without deleting other accounts sharing the same token.
Tracker remapping property tests `test/property/tracker-remap.property.test.ts`	Adds property tests verifying that `HealthScoreTracker` and `TokenBucketTracker` state remains correctly associated with surviving accounts at their new indices after randomized removals, and that `remapIndexedKeys` drops entries for removed indices while reindexing surviving entries and preserving quota suffixes.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

ndycode/oc-codex-multi-auth#67: Introduces withAccountStorageTransaction and transaction-based storage locking that these concurrent stress tests directly exercise.
ndycode/oc-codex-multi-auth#128: Adds chaos tests for storage fault injection; overlaps on the same storage write/read paths tested here.
ndycode/oc-codex-multi-auth#72: Changes refresh-token rotation and transaction-based persistence logic exercised by the new deep-stress property tests.

Poem

🐰 A rabbit hops through tests galore,
Checking tokens, storage, and more,
Concurrent writes and masking too—
Fast-check proves what systems do.
With property tests, no race can hide! ✨

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch test/deep-stress

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

greptile-apps · 2026-06-06T00:45:10Z

+      addedAt: 1,
+      lastUsed: 1,
+      rateLimitResetTimes: {},
+      // A per-account counter we will concurrently increment.
+      lastUsedCount: 0 as number,
+    })) as never,
+  };
+}


dead lastUsedCount field — misleading comment

makeStorage adds lastUsedCount: 0 as number with the comment "a per-account counter we will concurrently increment," but none of the three test bodies ever read or increment lastUsedCount. the actual counter used in every assertion is lastUsed. the dead field and its comment will mislead anyone trying to trace what the test is measuring, and the as never cast on the accounts array silently suppresses typescript from flagging the unknown field. remove lastUsedCount and its comment to keep the fixture honest.

Prompt To Fix With AI

This is a comment left during a code review. Path: test/chaos/concurrent-storage.test.ts Line: 33-40 Comment: **dead `lastUsedCount` field — misleading comment** `makeStorage` adds `lastUsedCount: 0 as number` with the comment "a per-account counter we will concurrently increment," but none of the three test bodies ever read or increment `lastUsedCount`. the actual counter used in every assertion is `lastUsed`. the dead field and its comment will mislead anyone trying to trace what the test is measuring, and the `as never` cast on the accounts array silently suppresses typescript from flagging the unknown field. remove `lastUsedCount` and its comment to keep the fixture honest. How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

greptile-apps · 2026-06-06T00:45:10Z

+    dir = await fs.mkdtemp(join(tmpdir(), "codex-stress-"));
+    storePath = join(dir, "accounts.json");
+    setStoragePathDirect(storePath);
+  });
+
+  afterEach(async () => {
+    setStoragePathDirect(null);
+    await fs.rm(dir, { recursive: true, force: true });
+  });
+
+  it("serializes N concurrent transactions with no lost updates", async () => {


windows filesystem risk: renameWithWindowsRetry path not exercised

the concurrent-storage test drives the real atomic temp+rename path, which uses renameWithWindowsRetry to handle EPERM/EBUSY on windows (where renaming over an existing file can fail). the 60-concurrent-transaction and the storm test (test 3) both pass on linux/mac, but neither explicitly verifies the retry behaviour or the EBUSY scenario. on windows ci the rename can transiently fail even with the retry logic, and a test breakage there would not be caught by this suite. consider adding a note or a platform-gated assertion that the rename path is exercised under contention.

Prompt To Fix With AI

This is a comment left during a code review. Path: test/chaos/concurrent-storage.test.ts Line: 47-57 Comment: **windows filesystem risk: `renameWithWindowsRetry` path not exercised** the concurrent-storage test drives the real `atomic temp+rename` path, which uses `renameWithWindowsRetry` to handle `EPERM`/`EBUSY` on windows (where renaming over an existing file can fail). the 60-concurrent-transaction and the storm test (test 3) both pass on linux/mac, but neither explicitly verifies the retry behaviour or the `EBUSY` scenario. on windows ci the rename can transiently fail even with the retry logic, and a test breakage there would not be caught by this suite. consider adding a note or a platform-gated assertion that the rename path is exercised under contention. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-06-06T00:45:12Z

+      expiresAt: now + 3_600_000,
+      addedAt: now,
+      lastUsed: now,
+      rateLimitResetTimes: {},
+    })),
+  };
+  return new AccountManager(undefined, stored as never);
+}
+
+describe("DEEP STRESS: refresh-token rotation propagation", () => {
+  it("rotating one account's token converges all siblings sharing the old token", () => {
+    fc.assert(
+      fc.property(arbSeed, fc.integer({ min: 0 }), (seed, pickRaw) => {
+        const manager = buildManager(seed);
+        const snapshot = manager.getAccountsSnapshot();
+        const n = snapshot.length;
+        const pick = pickRaw % n;


buildManager stored object missing activeIndexByFamily

the stored fixture passed to AccountManager does not include activeIndexByFamily. normalizeAccountStorage handles the missing field gracefully (it defaults every family to rawActiveIndex), so the tests work at runtime, but the fixture silently relies on that fallback rather than supplying a valid AccountStorageV3 shape. if AccountState.initializeFromStorage is ever changed to skip normalization, or if a family-specific rotation method is added to a test, the fixture will produce undefined behaviour without any typescript warning (the as never cast hides it). adding activeIndexByFamily: {} makes the fixture self-contained and matches the shape that makeStorage in the concurrent-storage suite already provides.

Prompt To Fix With AI

This is a comment left during a code review. Path: test/property/refresh-rotation.property.test.ts Line: 42-58 Comment: **`buildManager` stored object missing `activeIndexByFamily`** the stored fixture passed to `AccountManager` does not include `activeIndexByFamily`. `normalizeAccountStorage` handles the missing field gracefully (it defaults every family to `rawActiveIndex`), so the tests work at runtime, but the fixture silently relies on that fallback rather than supplying a valid `AccountStorageV3` shape. if `AccountState.initializeFromStorage` is ever changed to skip normalization, or if a family-specific rotation method is added to a test, the fixture will produce undefined behaviour without any typescript warning (the `as never` cast hides it). adding `activeIndexByFamily: {}` makes the fixture self-contained and matches the shape that `makeStorage` in the concurrent-storage suite already provides. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-06-06T00:45:13Z

+            index: i,
+            health: 0,
+            tokensDrained: tracker.getTokens(i),
+          });
+        }
+
+        for (const frac of removals) {
+          if (accounts.length <= 1) break;


passive-recovery drift: no time control in health-score property test

HealthScoreTracker.getScore calls Date.now() to apply passive recovery (2 pts/hr). the snapshot acc.health is recorded immediately after the rate-limit hits, and the invariant check runs after the removal loop — all synchronous, so drift is negligible in practice. however, toBeCloseTo(acc.health, 3) (±0.0005) gives no headroom for a slow ci runner that pauses between the seed phase and the assertion (e.g. garbage collection, cpu throttle). vi.useFakeTimers() at the start of the suite and vi.useRealTimers() in an afterAll would pin the clock and make the tolerance unconditionally correct rather than empirically fast-enough.

Prompt To Fix With AI

This is a comment left during a code review. Path: test/property/tracker-remap.property.test.ts Line: 103-110 Comment: **passive-recovery drift: no time control in health-score property test** `HealthScoreTracker.getScore` calls `Date.now()` to apply passive recovery (`2 pts/hr`). the snapshot `acc.health` is recorded immediately after the rate-limit hits, and the invariant check runs after the removal loop — all synchronous, so drift is negligible in practice. however, `toBeCloseTo(acc.health, 3)` (±0.0005) gives no headroom for a slow ci runner that pauses between the seed phase and the assertion (e.g. garbage collection, cpu throttle). `vi.useFakeTimers()` at the start of the suite and `vi.useRealTimers()` in an `afterAll` would pin the clock and make the tolerance unconditionally correct rather than empirically fast-enough. How can I resolve this? If you propose a fix, please make it concise.

Minor release on top of 6.2.0: email masking across all display surfaces (#164), 16 deep-audit bug fixes (#165), and the deep stress suite (#166). Bumps package.json, .release-please-manifest.json, and .codex-plugin/plugin.json to 6.3.0. 2487 tests pass; build/lint/typecheck clean; publish dry-run verified.

ndycode merged commit da02fc1 into main Jun 6, 2026
1 of 2 checks passed

greptile-apps Bot reviewed Jun 6, 2026

View reviewed changes

ndycode mentioned this pull request Jun 6, 2026

chore(release): 6.3.0 #167

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: add deep stress suite for audited subsystems#166

test: add deep stress suite for audited subsystems#166
ndycode merged 1 commit into
mainfrom
test/deep-stress

ndycode commented Jun 6, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

chatgpt-codex-connector Bot commented Jun 6, 2026

Uh oh!

coderabbitai Bot commented Jun 6, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

Uh oh!

greptile-apps Bot Jun 6, 2026

Uh oh!

greptile-apps Bot Jun 6, 2026

Uh oh!

greptile-apps Bot Jun 6, 2026

Uh oh!

greptile-apps Bot Jun 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ndycode commented Jun 6, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Suites

Verification

Summary by CodeRabbit

note: greptile review for oc-chatgpt-multi-auth. cite files like lib/foo.ts:123. confirm regression tests + windows concurrency/token redaction coverage.

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

chatgpt-codex-connector Bot commented Jun 6, 2026

Uh oh!

coderabbitai Bot commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

Uh oh!

greptile-apps Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ndycode commented Jun 6, 2026 •

edited by greptile-apps Bot

Loading

note: greptile review for oc-chatgpt-multi-auth. cite files like `lib/foo.ts:123`. confirm regression tests + windows concurrency/token redaction coverage.

coderabbitai Bot commented Jun 6, 2026 •

edited

Loading