test: add deep stress suite for audited subsystems#166
Conversation
Property-based and concurrency stress tests that hammer the invariants the deep audit fixes depend on. Each was mutation-verified (disabling the fix makes it fail): - test/property/tracker-remap.property.test.ts — health/token-bucket index remap follows the right account through random removal sequences; remapIndexedKeys is a bijection on survivors that drops only the removed index. (~1300 cases) - test/chaos/concurrent-storage.test.ts — 60 concurrent withAccountStorageTransaction RMW ops on the REAL on-disk path lose zero updates; mixed transaction/overwrite storms never tear the file. Validates the runAccountCheck/hydrateEmails fix. - test/property/redaction.property.test.ts — for any generated email/opaque secret/JWT, the raw value never survives maskEmailForDisplay / sanitizeValue / maskString. (~2500 cases) - test/property/refresh-rotation.property.test.ts — random sibling groups: a rotation converges all siblings sharing the old token (others untouched), and workspace-scoped removal never drops refresh-token siblings. Full suite: 96 files, 2487 passed, 1 skipped. Typecheck + lint clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (4)
Cache: Disabled due to data retention organization setting Knowledge base: Disabled due to data retention organization setting 📝 WalkthroughWalkthroughThis PR adds four new test files that introduce deep stress and property-based testing coverage across the account storage, privacy/redaction, account management, and rate-limiting systems. The tests use Vitest and fast-check to verify concurrent transaction correctness, invariant preservation, and state consistency under randomized conditions. ChangesDeep Stress and Property-Based Testing Suite
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| addedAt: 1, | ||
| lastUsed: 1, | ||
| rateLimitResetTimes: {}, | ||
| // A per-account counter we will concurrently increment. | ||
| lastUsedCount: 0 as number, | ||
| })) as never, | ||
| }; | ||
| } |
There was a problem hiding this comment.
dead
lastUsedCount field — misleading comment
makeStorage adds lastUsedCount: 0 as number with the comment "a per-account counter we will concurrently increment," but none of the three test bodies ever read or increment lastUsedCount. the actual counter used in every assertion is lastUsed. the dead field and its comment will mislead anyone trying to trace what the test is measuring, and the as never cast on the accounts array silently suppresses typescript from flagging the unknown field. remove lastUsedCount and its comment to keep the fixture honest.
Prompt To Fix With AI
This is a comment left during a code review.
Path: test/chaos/concurrent-storage.test.ts
Line: 33-40
Comment:
**dead `lastUsedCount` field — misleading comment**
`makeStorage` adds `lastUsedCount: 0 as number` with the comment "a per-account counter we will concurrently increment," but none of the three test bodies ever read or increment `lastUsedCount`. the actual counter used in every assertion is `lastUsed`. the dead field and its comment will mislead anyone trying to trace what the test is measuring, and the `as never` cast on the accounts array silently suppresses typescript from flagging the unknown field. remove `lastUsedCount` and its comment to keep the fixture honest.
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| dir = await fs.mkdtemp(join(tmpdir(), "codex-stress-")); | ||
| storePath = join(dir, "accounts.json"); | ||
| setStoragePathDirect(storePath); | ||
| }); | ||
|
|
||
| afterEach(async () => { | ||
| setStoragePathDirect(null); | ||
| await fs.rm(dir, { recursive: true, force: true }); | ||
| }); | ||
|
|
||
| it("serializes N concurrent transactions with no lost updates", async () => { |
There was a problem hiding this comment.
windows filesystem risk:
renameWithWindowsRetry path not exercised
the concurrent-storage test drives the real atomic temp+rename path, which uses renameWithWindowsRetry to handle EPERM/EBUSY on windows (where renaming over an existing file can fail). the 60-concurrent-transaction and the storm test (test 3) both pass on linux/mac, but neither explicitly verifies the retry behaviour or the EBUSY scenario. on windows ci the rename can transiently fail even with the retry logic, and a test breakage there would not be caught by this suite. consider adding a note or a platform-gated assertion that the rename path is exercised under contention.
Prompt To Fix With AI
This is a comment left during a code review.
Path: test/chaos/concurrent-storage.test.ts
Line: 47-57
Comment:
**windows filesystem risk: `renameWithWindowsRetry` path not exercised**
the concurrent-storage test drives the real `atomic temp+rename` path, which uses `renameWithWindowsRetry` to handle `EPERM`/`EBUSY` on windows (where renaming over an existing file can fail). the 60-concurrent-transaction and the storm test (test 3) both pass on linux/mac, but neither explicitly verifies the retry behaviour or the `EBUSY` scenario. on windows ci the rename can transiently fail even with the retry logic, and a test breakage there would not be caught by this suite. consider adding a note or a platform-gated assertion that the rename path is exercised under contention.
How can I resolve this? If you propose a fix, please make it concise.| expiresAt: now + 3_600_000, | ||
| addedAt: now, | ||
| lastUsed: now, | ||
| rateLimitResetTimes: {}, | ||
| })), | ||
| }; | ||
| return new AccountManager(undefined, stored as never); | ||
| } | ||
|
|
||
| describe("DEEP STRESS: refresh-token rotation propagation", () => { | ||
| it("rotating one account's token converges all siblings sharing the old token", () => { | ||
| fc.assert( | ||
| fc.property(arbSeed, fc.integer({ min: 0 }), (seed, pickRaw) => { | ||
| const manager = buildManager(seed); | ||
| const snapshot = manager.getAccountsSnapshot(); | ||
| const n = snapshot.length; | ||
| const pick = pickRaw % n; |
There was a problem hiding this comment.
buildManager stored object missing activeIndexByFamily
the stored fixture passed to AccountManager does not include activeIndexByFamily. normalizeAccountStorage handles the missing field gracefully (it defaults every family to rawActiveIndex), so the tests work at runtime, but the fixture silently relies on that fallback rather than supplying a valid AccountStorageV3 shape. if AccountState.initializeFromStorage is ever changed to skip normalization, or if a family-specific rotation method is added to a test, the fixture will produce undefined behaviour without any typescript warning (the as never cast hides it). adding activeIndexByFamily: {} makes the fixture self-contained and matches the shape that makeStorage in the concurrent-storage suite already provides.
Prompt To Fix With AI
This is a comment left during a code review.
Path: test/property/refresh-rotation.property.test.ts
Line: 42-58
Comment:
**`buildManager` stored object missing `activeIndexByFamily`**
the stored fixture passed to `AccountManager` does not include `activeIndexByFamily`. `normalizeAccountStorage` handles the missing field gracefully (it defaults every family to `rawActiveIndex`), so the tests work at runtime, but the fixture silently relies on that fallback rather than supplying a valid `AccountStorageV3` shape. if `AccountState.initializeFromStorage` is ever changed to skip normalization, or if a family-specific rotation method is added to a test, the fixture will produce undefined behaviour without any typescript warning (the `as never` cast hides it). adding `activeIndexByFamily: {}` makes the fixture self-contained and matches the shape that `makeStorage` in the concurrent-storage suite already provides.
How can I resolve this? If you propose a fix, please make it concise.| index: i, | ||
| health: 0, | ||
| tokensDrained: tracker.getTokens(i), | ||
| }); | ||
| } | ||
|
|
||
| for (const frac of removals) { | ||
| if (accounts.length <= 1) break; |
There was a problem hiding this comment.
passive-recovery drift: no time control in health-score property test
HealthScoreTracker.getScore calls Date.now() to apply passive recovery (2 pts/hr). the snapshot acc.health is recorded immediately after the rate-limit hits, and the invariant check runs after the removal loop — all synchronous, so drift is negligible in practice. however, toBeCloseTo(acc.health, 3) (±0.0005) gives no headroom for a slow ci runner that pauses between the seed phase and the assertion (e.g. garbage collection, cpu throttle). vi.useFakeTimers() at the start of the suite and vi.useRealTimers() in an afterAll would pin the clock and make the tolerance unconditionally correct rather than empirically fast-enough.
Prompt To Fix With AI
This is a comment left during a code review.
Path: test/property/tracker-remap.property.test.ts
Line: 103-110
Comment:
**passive-recovery drift: no time control in health-score property test**
`HealthScoreTracker.getScore` calls `Date.now()` to apply passive recovery (`2 pts/hr`). the snapshot `acc.health` is recorded immediately after the rate-limit hits, and the invariant check runs after the removal loop — all synchronous, so drift is negligible in practice. however, `toBeCloseTo(acc.health, 3)` (±0.0005) gives no headroom for a slow ci runner that pauses between the seed phase and the assertion (e.g. garbage collection, cpu throttle). `vi.useFakeTimers()` at the start of the suite and `vi.useRealTimers()` in an `afterAll` would pin the clock and make the tolerance unconditionally correct rather than empirically fast-enough.
How can I resolve this? If you propose a fix, please make it concise.Minor release on top of 6.2.0: email masking across all display surfaces (#164), 16 deep-audit bug fixes (#165), and the deep stress suite (#166). Bumps package.json, .release-please-manifest.json, and .codex-plugin/plugin.json to 6.3.0. 2487 tests pass; build/lint/typecheck clean; publish dry-run verified.
Summary
The deepest stress layer for the subsystems hardened by the deep audit (#165). Adds property-based and concurrency tests that hammer the invariants those fixes depend on. Every test was mutation-verified — disabling the corresponding fix makes the test fail — so they are real guards, not tautologies.
Suites
test/property/tracker-remap.property.test.ts— health-score and token-bucket state follows the right account through random removal sequences;remapIndexedKeysdrops only the removed index and is a bijection on survivors (incl.index:quotaKeykeys). ~1300 generated cases. Guards the index-keyed-tracker fix.test/chaos/concurrent-storage.test.ts— drives the REAL on-disk path (atomic temp+rename, mutex,withAccountStorageTransaction) against a temp file: 60 concurrent read-modify-write transactions lose zero updates; per-account interleaving all persists; mixed transaction/overwrite storms never tear the file. Guards therunAccountCheck/hydrateEmailslost-update fix.test/property/redaction.property.test.ts— for any generated email / opaque secret / JWT, the raw value never survivesmaskEmailForDisplay,resolveDisplayEmail,sanitizeValue(incl. nested + cookie keys), ormaskString. ~2500 generated cases. Guards Add privacy-safe account labels across account switcher and command displays #163 masking + the redaction fixes.test/property/refresh-rotation.property.test.ts— random sibling groups sharing refresh tokens: a rotation converges every sibling holding the old token (other groups untouched), and workspace-scoped removal never drops refresh-token siblings while preserving theindex === positioninvariant.Verification
npm test: 96 files, 2487 passed, 1 skippednpm run lint+npm run typecheck: cleanpropagateRotatedRefreshTokenToSiblingsandHealthScoreTracker.remapAfterRemovaleach makes the corresponding suite fail.🤖 Generated with Claude Code
Summary by CodeRabbit
note: greptile review for oc-chatgpt-multi-auth. cite files like
lib/foo.ts:123. confirm regression tests + windows concurrency/token redaction coverage.Greptile Summary
adds a deep stress layer on top of the audit-hardened subsystems from #165: four new test files covering property-based and concurrency invariants across storage transactions, email/token redaction, refresh-token rotation propagation, and tracker index remapping. all suites use real code paths (no mocks for the storage mutex or on-disk atomic rename), and each suite is documented as mutation-verified.
test/chaos/concurrent-storage.test.ts— 60 concurrentwithAccountStorageTransactioncalls through the realwithStorageLockmutex + atomic temp/rename pipeline; verifies zero lost updates and structural integrity under a mixed transaction/overwrite storm.test/property/redaction.property.test.ts— ~2500 fast-check cases asserting thatmaskEmailForDisplay,resolveDisplayEmail,sanitizeValue, andmaskStringnever leak raw emails, opaque secrets, or jwt-shaped substrings.test/property/refresh-rotation.property.test.ts— 300-run random sibling-group suite asserting rotation converges all siblings and workspace-scoped removal never drops refresh-token siblings while preserving theindex === positioninvariant.test/property/tracker-remap.property.test.ts— 400-run suite forHealthScoreTracker.remapAfterRemoval,TokenBucketTracker.remapAfterRemoval, andremapIndexedKeysbijectivity across random removal sequences.Confidence Score: 4/5
test-only changes; no production code touched, all suites are mutation-verified, and the real storage path is exercised against a temp directory with proper cleanup.
the dead
lastUsedCountfield inmakeStorageis misleading and could confuse future maintainers, thebuildManagerfixture silently relies on normalization to supply a missingactiveIndexByFamily, and the health-score property test'stoBeCloseTotolerance depends on wall-clock speed rather than pinned timers. none of these break correctness today, but they are rough edges worth tidying before the suite grows.test/chaos/concurrent-storage.test.ts (dead field + windows rename coverage gap) and test/property/tracker-remap.property.test.ts (real-clock dependency in health-score assertions)
Important Files Changed
lastUsedCountfield in makeStorage is confusing, and windows rename-retry coverage is absentFlowchart
%%{init: {'theme': 'neutral'}}%% flowchart TD A[withAccountStorageTransaction] --> B[withStorageLock] B --> C[loadAccountsInternal] C --> D[normalizeAccountStorage] D --> E[handler: read-modify-write] E --> F[persist: saveAccountsUnlocked] F --> G[writeAccountsToPathUnlocked] G --> H[atomic temp+rename renameWithWindowsRetry] subgraph concurrent-storage.test.ts I[60x Promise.all] --> A J[saveAccounts plain] --> B end subgraph tracker-remap.property.test.ts K[HealthScoreTracker] --> L[remapAfterRemoval] M[TokenBucketTracker] --> L N[remapIndexedKeys] --> L end subgraph refresh-rotation.property.test.ts O[buildManager random seed] --> P[updateFromAuth] P --> Q[propagateRotatedRefreshTokenToSiblings] O --> R[removeAccountsByWorkspaceIdentity] R --> S[index === position invariant] end subgraph redaction.property.test.ts T[arbEmail / arbOpaqueSecret] --> U[maskEmailForDisplay] T --> V[sanitizeValue] T --> W[maskString JWT] endPrompt To Fix All With AI
Reviews (1): Last reviewed commit: "test: add deep stress suite for audited ..." | Re-trigger Greptile