Skip to content

test: add deep stress suite for audited subsystems#166

Merged
ndycode merged 1 commit into
mainfrom
test/deep-stress
Jun 6, 2026
Merged

test: add deep stress suite for audited subsystems#166
ndycode merged 1 commit into
mainfrom
test/deep-stress

Conversation

@ndycode
Copy link
Copy Markdown
Owner

@ndycode ndycode commented Jun 6, 2026

Summary

The deepest stress layer for the subsystems hardened by the deep audit (#165). Adds property-based and concurrency tests that hammer the invariants those fixes depend on. Every test was mutation-verified — disabling the corresponding fix makes the test fail — so they are real guards, not tautologies.

Suites

  • test/property/tracker-remap.property.test.ts — health-score and token-bucket state follows the right account through random removal sequences; remapIndexedKeys drops only the removed index and is a bijection on survivors (incl. index:quotaKey keys). ~1300 generated cases. Guards the index-keyed-tracker fix.
  • test/chaos/concurrent-storage.test.ts — drives the REAL on-disk path (atomic temp+rename, mutex, withAccountStorageTransaction) against a temp file: 60 concurrent read-modify-write transactions lose zero updates; per-account interleaving all persists; mixed transaction/overwrite storms never tear the file. Guards the runAccountCheck/hydrateEmails lost-update fix.
  • test/property/redaction.property.test.ts — for any generated email / opaque secret / JWT, the raw value never survives maskEmailForDisplay, resolveDisplayEmail, sanitizeValue (incl. nested + cookie keys), or maskString. ~2500 generated cases. Guards Add privacy-safe account labels across account switcher and command displays #163 masking + the redaction fixes.
  • test/property/refresh-rotation.property.test.ts — random sibling groups sharing refresh tokens: a rotation converges every sibling holding the old token (other groups untouched), and workspace-scoped removal never drops refresh-token siblings while preserving the index === position invariant.

Verification

  • npm test: 96 files, 2487 passed, 1 skipped
  • npm run lint + npm run typecheck: clean
  • Mutation-verified: disabling propagateRotatedRefreshTokenToSiblings and HealthScoreTracker.remapAfterRemoval each makes the corresponding suite fail.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Tests
    • Added comprehensive stress test suites covering account storage reliability under concurrent operations, privacy and redaction safeguards, token rotation mechanics across multiple accounts, and state tracking consistency after account removal.

note: greptile review for oc-chatgpt-multi-auth. cite files like lib/foo.ts:123. confirm regression tests + windows concurrency/token redaction coverage.

Greptile Summary

adds a deep stress layer on top of the audit-hardened subsystems from #165: four new test files covering property-based and concurrency invariants across storage transactions, email/token redaction, refresh-token rotation propagation, and tracker index remapping. all suites use real code paths (no mocks for the storage mutex or on-disk atomic rename), and each suite is documented as mutation-verified.

  • test/chaos/concurrent-storage.test.ts — 60 concurrent withAccountStorageTransaction calls through the real withStorageLock mutex + atomic temp/rename pipeline; verifies zero lost updates and structural integrity under a mixed transaction/overwrite storm.
  • test/property/redaction.property.test.ts — ~2500 fast-check cases asserting that maskEmailForDisplay, resolveDisplayEmail, sanitizeValue, and maskString never leak raw emails, opaque secrets, or jwt-shaped substrings.
  • test/property/refresh-rotation.property.test.ts — 300-run random sibling-group suite asserting rotation converges all siblings and workspace-scoped removal never drops refresh-token siblings while preserving the index === position invariant.
  • test/property/tracker-remap.property.test.ts — 400-run suite for HealthScoreTracker.remapAfterRemoval, TokenBucketTracker.remapAfterRemoval, and remapIndexedKeys bijectivity across random removal sequences.

Confidence Score: 4/5

test-only changes; no production code touched, all suites are mutation-verified, and the real storage path is exercised against a temp directory with proper cleanup.

the dead lastUsedCount field in makeStorage is misleading and could confuse future maintainers, the buildManager fixture silently relies on normalization to supply a missing activeIndexByFamily, and the health-score property test's toBeCloseTo tolerance depends on wall-clock speed rather than pinned timers. none of these break correctness today, but they are rough edges worth tidying before the suite grows.

test/chaos/concurrent-storage.test.ts (dead field + windows rename coverage gap) and test/property/tracker-remap.property.test.ts (real-clock dependency in health-score assertions)

Important Files Changed

Filename Overview
test/chaos/concurrent-storage.test.ts adds 3 real-disk concurrent-storage tests using the actual mutex/atomic-rename path; dead lastUsedCount field in makeStorage is confusing, and windows rename-retry coverage is absent
test/property/redaction.property.test.ts ~2500 property-based cases covering maskEmailForDisplay, resolveDisplayEmail, sanitizeValue, and maskString; key normalization correctly aligns with SENSITIVE_KEYS in logger.ts; no issues found
test/property/refresh-rotation.property.test.ts 300-run property suite for sibling token propagation and workspace-scoped removal; buildManager fixture missing activeIndexByFamily field, relying silently on normalizeAccountStorage fallback
test/property/tracker-remap.property.test.ts 400-run property suite for HealthScoreTracker/TokenBucketTracker remap after removal; toBeCloseTo(health, 3) tolerance relies on wall-clock speed rather than fake timers, which could be flaky under cpu pressure

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[withAccountStorageTransaction] --> B[withStorageLock]
    B --> C[loadAccountsInternal]
    C --> D[normalizeAccountStorage]
    D --> E[handler: read-modify-write]
    E --> F[persist: saveAccountsUnlocked]
    F --> G[writeAccountsToPathUnlocked]
    G --> H[atomic temp+rename renameWithWindowsRetry]

    subgraph concurrent-storage.test.ts
        I[60x Promise.all] --> A
        J[saveAccounts plain] --> B
    end

    subgraph tracker-remap.property.test.ts
        K[HealthScoreTracker] --> L[remapAfterRemoval]
        M[TokenBucketTracker] --> L
        N[remapIndexedKeys] --> L
    end

    subgraph refresh-rotation.property.test.ts
        O[buildManager random seed] --> P[updateFromAuth]
        P --> Q[propagateRotatedRefreshTokenToSiblings]
        O --> R[removeAccountsByWorkspaceIdentity]
        R --> S[index === position invariant]
    end

    subgraph redaction.property.test.ts
        T[arbEmail / arbOpaqueSecret] --> U[maskEmailForDisplay]
        T --> V[sanitizeValue]
        T --> W[maskString JWT]
    end
Loading

Fix All in Codex

Prompt To Fix All With AI
Fix the following 4 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 4
test/chaos/concurrent-storage.test.ts:33-40
**dead `lastUsedCount` field — misleading comment**

`makeStorage` adds `lastUsedCount: 0 as number` with the comment "a per-account counter we will concurrently increment," but none of the three test bodies ever read or increment `lastUsedCount`. the actual counter used in every assertion is `lastUsed`. the dead field and its comment will mislead anyone trying to trace what the test is measuring, and the `as never` cast on the accounts array silently suppresses typescript from flagging the unknown field. remove `lastUsedCount` and its comment to keep the fixture honest.

### Issue 2 of 4
test/chaos/concurrent-storage.test.ts:47-57
**windows filesystem risk: `renameWithWindowsRetry` path not exercised**

the concurrent-storage test drives the real `atomic temp+rename` path, which uses `renameWithWindowsRetry` to handle `EPERM`/`EBUSY` on windows (where renaming over an existing file can fail). the 60-concurrent-transaction and the storm test (test 3) both pass on linux/mac, but neither explicitly verifies the retry behaviour or the `EBUSY` scenario. on windows ci the rename can transiently fail even with the retry logic, and a test breakage there would not be caught by this suite. consider adding a note or a platform-gated assertion that the rename path is exercised under contention.

### Issue 3 of 4
test/property/refresh-rotation.property.test.ts:42-58
**`buildManager` stored object missing `activeIndexByFamily`**

the stored fixture passed to `AccountManager` does not include `activeIndexByFamily`. `normalizeAccountStorage` handles the missing field gracefully (it defaults every family to `rawActiveIndex`), so the tests work at runtime, but the fixture silently relies on that fallback rather than supplying a valid `AccountStorageV3` shape. if `AccountState.initializeFromStorage` is ever changed to skip normalization, or if a family-specific rotation method is added to a test, the fixture will produce undefined behaviour without any typescript warning (the `as never` cast hides it). adding `activeIndexByFamily: {}` makes the fixture self-contained and matches the shape that `makeStorage` in the concurrent-storage suite already provides.

### Issue 4 of 4
test/property/tracker-remap.property.test.ts:103-110
**passive-recovery drift: no time control in health-score property test**

`HealthScoreTracker.getScore` calls `Date.now()` to apply passive recovery (`2 pts/hr`). the snapshot `acc.health` is recorded immediately after the rate-limit hits, and the invariant check runs after the removal loop — all synchronous, so drift is negligible in practice. however, `toBeCloseTo(acc.health, 3)` (±0.0005) gives no headroom for a slow ci runner that pauses between the seed phase and the assertion (e.g. garbage collection, cpu throttle). `vi.useFakeTimers()` at the start of the suite and `vi.useRealTimers()` in an `afterAll` would pin the clock and make the tolerance unconditionally correct rather than empirically fast-enough.

Reviews (1): Last reviewed commit: "test: add deep stress suite for audited ..." | Re-trigger Greptile

Greptile also left 4 inline comments on this PR.

Property-based and concurrency stress tests that hammer the invariants the
deep audit fixes depend on. Each was mutation-verified (disabling the fix
makes it fail):

- test/property/tracker-remap.property.test.ts — health/token-bucket index
  remap follows the right account through random removal sequences; remapIndexedKeys
  is a bijection on survivors that drops only the removed index. (~1300 cases)
- test/chaos/concurrent-storage.test.ts — 60 concurrent withAccountStorageTransaction
  RMW ops on the REAL on-disk path lose zero updates; mixed transaction/overwrite
  storms never tear the file. Validates the runAccountCheck/hydrateEmails fix.
- test/property/redaction.property.test.ts — for any generated email/opaque
  secret/JWT, the raw value never survives maskEmailForDisplay / sanitizeValue /
  maskString. (~2500 cases)
- test/property/refresh-rotation.property.test.ts — random sibling groups: a
  rotation converges all siblings sharing the old token (others untouched), and
  workspace-scoped removal never drops refresh-token siblings.

Full suite: 96 files, 2487 passed, 1 skipped. Typecheck + lint clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 6, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ad686bc6-8c95-4dbb-8743-2b287076ffa9

📥 Commits

Reviewing files that changed from the base of the PR and between b8cdf6b and b7cf8fd.

📒 Files selected for processing (4)
  • test/chaos/concurrent-storage.test.ts
  • test/property/redaction.property.test.ts
  • test/property/refresh-rotation.property.test.ts
  • test/property/tracker-remap.property.test.ts

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting


📝 Walkthrough

Walkthrough

This PR adds four new test files that introduce deep stress and property-based testing coverage across the account storage, privacy/redaction, account management, and rate-limiting systems. The tests use Vitest and fast-check to verify concurrent transaction correctness, invariant preservation, and state consistency under randomized conditions.

Changes

Deep Stress and Property-Based Testing Suite

Layer / File(s) Summary
Concurrent storage transaction stress tests
test/chaos/concurrent-storage.test.ts
Introduces three concurrency-focused test cases that spawn many concurrent withAccountStorageTransaction operations incrementing the same or different accounts, verify no lost updates occur, and assert that mixing transactional increments with non-transactional saveAccounts overwrites does not produce torn or invalid JSON.
Email masking and logger redaction property tests
test/property/redaction.property.test.ts
Adds property-based tests verifying that maskEmailForDisplay and resolveDisplayEmail never leak full email local parts, that sanitizeValue recursively masks nested objects and arrays keyed by sensitive identifiers, and that maskString removes JWT-shaped token substrings from free text.
Account manager refresh token rotation property tests
test/property/refresh-rotation.property.test.ts
Adds property tests asserting that rotating one account's refresh token causes all sibling accounts sharing the old token to converge to the new token, while accounts in other refresh-token groups remain unchanged, and that workspace-scoped removal removes only the targeted workspace without deleting other accounts sharing the same token.
Tracker remapping property tests
test/property/tracker-remap.property.test.ts
Adds property tests verifying that HealthScoreTracker and TokenBucketTracker state remains correctly associated with surviving accounts at their new indices after randomized removals, and that remapIndexedKeys drops entries for removed indices while reindexing surviving entries and preserving quota suffixes.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • ndycode/oc-codex-multi-auth#67: Introduces withAccountStorageTransaction and transaction-based storage locking that these concurrent stress tests directly exercise.
  • ndycode/oc-codex-multi-auth#128: Adds chaos tests for storage fault injection; overlaps on the same storage write/read paths tested here.
  • ndycode/oc-codex-multi-auth#72: Changes refresh-token rotation and transaction-based persistence logic exercised by the new deep-stress property tests.

Poem

🐰 A rabbit hops through tests galore,
Checking tokens, storage, and more,
Concurrent writes and masking too—
Fast-check proves what systems do.
With property tests, no race can hide!

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch test/deep-stress

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ndycode ndycode merged commit da02fc1 into main Jun 6, 2026
1 of 2 checks passed
Comment on lines +33 to +40
addedAt: 1,
lastUsed: 1,
rateLimitResetTimes: {},
// A per-account counter we will concurrently increment.
lastUsedCount: 0 as number,
})) as never,
};
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 dead lastUsedCount field — misleading comment

makeStorage adds lastUsedCount: 0 as number with the comment "a per-account counter we will concurrently increment," but none of the three test bodies ever read or increment lastUsedCount. the actual counter used in every assertion is lastUsed. the dead field and its comment will mislead anyone trying to trace what the test is measuring, and the as never cast on the accounts array silently suppresses typescript from flagging the unknown field. remove lastUsedCount and its comment to keep the fixture honest.

Prompt To Fix With AI
This is a comment left during a code review.
Path: test/chaos/concurrent-storage.test.ts
Line: 33-40

Comment:
**dead `lastUsedCount` field — misleading comment**

`makeStorage` adds `lastUsedCount: 0 as number` with the comment "a per-account counter we will concurrently increment," but none of the three test bodies ever read or increment `lastUsedCount`. the actual counter used in every assertion is `lastUsed`. the dead field and its comment will mislead anyone trying to trace what the test is measuring, and the `as never` cast on the accounts array silently suppresses typescript from flagging the unknown field. remove `lastUsedCount` and its comment to keep the fixture honest.

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Fix in Codex

Comment on lines +47 to +57
dir = await fs.mkdtemp(join(tmpdir(), "codex-stress-"));
storePath = join(dir, "accounts.json");
setStoragePathDirect(storePath);
});

afterEach(async () => {
setStoragePathDirect(null);
await fs.rm(dir, { recursive: true, force: true });
});

it("serializes N concurrent transactions with no lost updates", async () => {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 windows filesystem risk: renameWithWindowsRetry path not exercised

the concurrent-storage test drives the real atomic temp+rename path, which uses renameWithWindowsRetry to handle EPERM/EBUSY on windows (where renaming over an existing file can fail). the 60-concurrent-transaction and the storm test (test 3) both pass on linux/mac, but neither explicitly verifies the retry behaviour or the EBUSY scenario. on windows ci the rename can transiently fail even with the retry logic, and a test breakage there would not be caught by this suite. consider adding a note or a platform-gated assertion that the rename path is exercised under contention.

Prompt To Fix With AI
This is a comment left during a code review.
Path: test/chaos/concurrent-storage.test.ts
Line: 47-57

Comment:
**windows filesystem risk: `renameWithWindowsRetry` path not exercised**

the concurrent-storage test drives the real `atomic temp+rename` path, which uses `renameWithWindowsRetry` to handle `EPERM`/`EBUSY` on windows (where renaming over an existing file can fail). the 60-concurrent-transaction and the storm test (test 3) both pass on linux/mac, but neither explicitly verifies the retry behaviour or the `EBUSY` scenario. on windows ci the rename can transiently fail even with the retry logic, and a test breakage there would not be caught by this suite. consider adding a note or a platform-gated assertion that the rename path is exercised under contention.

How can I resolve this? If you propose a fix, please make it concise.

Fix in Codex

Comment on lines +42 to +58
expiresAt: now + 3_600_000,
addedAt: now,
lastUsed: now,
rateLimitResetTimes: {},
})),
};
return new AccountManager(undefined, stored as never);
}

describe("DEEP STRESS: refresh-token rotation propagation", () => {
it("rotating one account's token converges all siblings sharing the old token", () => {
fc.assert(
fc.property(arbSeed, fc.integer({ min: 0 }), (seed, pickRaw) => {
const manager = buildManager(seed);
const snapshot = manager.getAccountsSnapshot();
const n = snapshot.length;
const pick = pickRaw % n;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 buildManager stored object missing activeIndexByFamily

the stored fixture passed to AccountManager does not include activeIndexByFamily. normalizeAccountStorage handles the missing field gracefully (it defaults every family to rawActiveIndex), so the tests work at runtime, but the fixture silently relies on that fallback rather than supplying a valid AccountStorageV3 shape. if AccountState.initializeFromStorage is ever changed to skip normalization, or if a family-specific rotation method is added to a test, the fixture will produce undefined behaviour without any typescript warning (the as never cast hides it). adding activeIndexByFamily: {} makes the fixture self-contained and matches the shape that makeStorage in the concurrent-storage suite already provides.

Prompt To Fix With AI
This is a comment left during a code review.
Path: test/property/refresh-rotation.property.test.ts
Line: 42-58

Comment:
**`buildManager` stored object missing `activeIndexByFamily`**

the stored fixture passed to `AccountManager` does not include `activeIndexByFamily`. `normalizeAccountStorage` handles the missing field gracefully (it defaults every family to `rawActiveIndex`), so the tests work at runtime, but the fixture silently relies on that fallback rather than supplying a valid `AccountStorageV3` shape. if `AccountState.initializeFromStorage` is ever changed to skip normalization, or if a family-specific rotation method is added to a test, the fixture will produce undefined behaviour without any typescript warning (the `as never` cast hides it). adding `activeIndexByFamily: {}` makes the fixture self-contained and matches the shape that `makeStorage` in the concurrent-storage suite already provides.

How can I resolve this? If you propose a fix, please make it concise.

Fix in Codex

Comment on lines +103 to +110
index: i,
health: 0,
tokensDrained: tracker.getTokens(i),
});
}

for (const frac of removals) {
if (accounts.length <= 1) break;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 passive-recovery drift: no time control in health-score property test

HealthScoreTracker.getScore calls Date.now() to apply passive recovery (2 pts/hr). the snapshot acc.health is recorded immediately after the rate-limit hits, and the invariant check runs after the removal loop — all synchronous, so drift is negligible in practice. however, toBeCloseTo(acc.health, 3) (±0.0005) gives no headroom for a slow ci runner that pauses between the seed phase and the assertion (e.g. garbage collection, cpu throttle). vi.useFakeTimers() at the start of the suite and vi.useRealTimers() in an afterAll would pin the clock and make the tolerance unconditionally correct rather than empirically fast-enough.

Prompt To Fix With AI
This is a comment left during a code review.
Path: test/property/tracker-remap.property.test.ts
Line: 103-110

Comment:
**passive-recovery drift: no time control in health-score property test**

`HealthScoreTracker.getScore` calls `Date.now()` to apply passive recovery (`2 pts/hr`). the snapshot `acc.health` is recorded immediately after the rate-limit hits, and the invariant check runs after the removal loop — all synchronous, so drift is negligible in practice. however, `toBeCloseTo(acc.health, 3)` (±0.0005) gives no headroom for a slow ci runner that pauses between the seed phase and the assertion (e.g. garbage collection, cpu throttle). `vi.useFakeTimers()` at the start of the suite and `vi.useRealTimers()` in an `afterAll` would pin the clock and make the tolerance unconditionally correct rather than empirically fast-enough.

How can I resolve this? If you propose a fix, please make it concise.

Fix in Codex

@ndycode ndycode mentioned this pull request Jun 6, 2026
ndycode added a commit that referenced this pull request Jun 6, 2026
Minor release on top of 6.2.0: email masking across all display surfaces (#164), 16 deep-audit bug fixes (#165), and the deep stress suite (#166). Bumps package.json, .release-please-manifest.json, and .codex-plugin/plugin.json to 6.3.0. 2487 tests pass; build/lint/typecheck clean; publish dry-run verified.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant