feat: version runtime observability snapshots by ndycode · Pull Request #379 · ndycode/codex-multi-auth

ndycode · 2026-04-06T02:28:04Z

Summary

add a version field to persisted runtime observability snapshots and normalize legacy unversioned snapshots onto the current schema
safely ignore unknown future snapshot versions instead of trying to load incompatible runtime telemetry files
add focused runtime observability tests for both backward-compatible normalization and unknown-version rejection

Validation

node_modules/.bin/vitest.cmd run test/runtime-observability.test.ts test/runtime-metrics.test.ts
node_modules/.bin/tsc.cmd --noEmit

note: greptile review for oc-chatgpt-multi-auth. cite files like `lib/foo.ts:123`. confirm regression tests + windows concurrency/token redaction coverage.

Greptile Summary

this pr adds a version field to persisted runtime observability snapshots and implements two migration behaviors: legacy unversioned snapshots are normalized onto the current schema via a safe spread-merge, and unknown future versions are silently rejected. the approach is consistent with how storage/migrations.ts handles v1/v2→v3 upgrades elsewhere in the codebase.

key concerns:

windows rename safety: writeSnapshot calls fs.rename without EBUSY/EPERM retry logic, violating the project's explicit anti-pattern. the failure is then silently swallowed by the pendingWrite chain's .catch(() => undefined), making the data loss invisible on windows.
test module isolation: both tests call await import(...) without vi.resetModules() in a beforeEach, sharing the module-level snapshotState singleton across tests — currently harmless but fragile as the suite grows.
missing branch coverage: the existsSync=false, JSON-parse-error, and non-object guard branches in loadPersistedRuntimeObservabilitySnapshot are untested; the 80% threshold may be at risk.

Confidence Score: 3/5

safe to merge on linux/mac; windows rename path silently loses snapshot data without retry logic

core versioning logic is correct and well-scoped; the windows filesystem safety gap is a real concern per project conventions but non-blocking for non-windows environments; missing test isolation is fragile but not currently failing

lib/runtime/runtime-observability.ts (writeSnapshot rename retry), test/runtime-observability.test.ts (vi.resetModules + branch coverage)

Important Files Changed

Filename	Overview
lib/runtime/runtime-observability.ts	adds versioning + legacy normalization to snapshot persistence; windows rename lacks EBUSY/EPERM retry and write failures are silently swallowed in the pendingWrite chain
test/runtime-observability.test.ts	two focused snapshot-versioning tests; missing vi.resetModules() before dynamic imports and missing branch coverage for file-not-found, JSON-error, and non-object-guard paths

Sequence Diagram

sequenceDiagram
    participant Caller
    participant mutate as mutateRuntimeObservabilitySnapshot
    participant chain as pendingWrite chain
    participant write as writeSnapshot
    participant fs as node:fs

    Caller->>mutate: mutator fn
    mutate->>mutate: apply mutator, stamp updatedAt
    mutate->>chain: .then(writeSnapshot(clone))
    chain->>write: writeSnapshot(snapshot)
    write->>fs: mkdir(dir, recursive)
    write->>fs: writeFile(tempPath)
    write->>fs: rename(tempPath → path)
    note over fs: ⚠ no EBUSY/EPERM retry on Windows
    fs-->>write: ok / EBUSY throws
    write-->>chain: rejects
    note over chain: next .catch(() => undefined) silently drops error

    Caller->>+write: loadPersistedRuntimeObservabilitySnapshot()
    write->>fs: existsSync(path)
    alt file missing
        write-->>Caller: null
    else file present
        write->>fs: readFile(path)
        write->>write: parse + check version
        alt version undefined — legacy
            write-->>Caller: spread-merged snapshot (version=1)
        else version === 1 — current
            write-->>Caller: spread-merged snapshot
        else version > 1 — unknown future
            write-->>Caller: null
        end
    end
    deactivate write

Comments Outside Diff (1)

lib/runtime/runtime-observability.ts, line 79-81 (link)

missing EBUSY/EPERM retry on windows rename

fs.rename on windows can fail with EBUSY or EPERM when antivirus has the destination file open. per AGENTS.md, bare filesystem writes without retry logic are an explicit anti-pattern ("do not use bare fs.rm without retry logic — Windows antivirus locks"). the failure propagates into writeSnapshot which throws, the moved flag stays false, the temp file is cleaned up — but the pendingWrite chain's next .catch(() => undefined) silently swallows the rejection. windows users will lose snapshot data with zero diagnostic signal.

Prompt To Fix With AI

This is a comment left during a code review.
Path: lib/runtime/runtime-observability.ts
Line: 79-81

Comment:
**missing EBUSY/EPERM retry on windows rename**

`fs.rename` on windows can fail with `EBUSY` or `EPERM` when antivirus has the destination file open. per `AGENTS.md`, bare filesystem writes without retry logic are an explicit anti-pattern (`"do not use bare fs.rm without retry logic — Windows antivirus locks"`). the failure propagates into `writeSnapshot` which throws, the `moved` flag stays false, the temp file is cleaned up — but the `pendingWrite` chain's next `.catch(() => undefined)` silently swallows the rejection. windows users will lose snapshot data with zero diagnostic signal.



How can I resolve this? If you propose a fix, please make it concise.

Prompt To Fix All With AI

This is a comment left during a code review.
Path: lib/runtime/runtime-observability.ts
Line: 79-81

Comment:
**missing EBUSY/EPERM retry on windows rename**

`fs.rename` on windows can fail with `EBUSY` or `EPERM` when antivirus has the destination file open. per `AGENTS.md`, bare filesystem writes without retry logic are an explicit anti-pattern (`"do not use bare fs.rm without retry logic — Windows antivirus locks"`). the failure propagates into `writeSnapshot` which throws, the `moved` flag stays false, the temp file is cleaned up — but the `pendingWrite` chain's next `.catch(() => undefined)` silently swallows the rejection. windows users will lose snapshot data with zero diagnostic signal.

```suggestion
		await fs.writeFile(tempPath, JSON.stringify(snapshot, null, 2), "utf-8");
		for (let attempt = 0; attempt < 4; attempt++) {
			try {
				await fs.rename(tempPath, path);
				moved = true;
				break;
			} catch (err) {
				const code = (err as NodeJS.ErrnoException).code;
				if ((code === "EBUSY" || code === "EPERM" || code === "ENOTEMPTY") && attempt < 3) {
					await new Promise((r) => setTimeout(r, 50 * 2 ** attempt));
					continue;
				}
				throw err;
			}
		}
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: test/runtime-observability.test.ts
Line: 19-22

Comment:
**module cache shared across tests — missing `vi.resetModules()`**

both tests use dynamic `await import(...)` inside their `it()` bodies but there is no `vi.resetModules()` call between them. in vitest's ESM mode the second test reuses the already-cached module instance, including the module-level `snapshotState` and `pendingWrite` singletons. the current tests pass only because they both call `loadPersistedRuntimeObservabilitySnapshot`, which is stateless with respect to those singletons. any future test that calls `mutateRuntimeObservabilitySnapshot` will see leftover state and produce intermittent failures depending on test order.

add a `beforeEach` reset:

```suggestion
describe("runtime observability snapshot versioning", () => {
	beforeEach(() => {
		vi.resetModules();
	});

	afterEach(() => {
		readFileMock.mockReset();
	});
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: test/runtime-observability.test.ts
Line: 3-13

Comment:
**missing vitest coverage: several branches in `loadPersistedRuntimeObservabilitySnapshot` untested**

`existsSync` is globally mocked to return `true` for every test, so the early `return null` (file does not exist, line 115 of `runtime-observability.ts`) is never exercised. likewise, the `!parsed || typeof parsed !== 'object'` guard and the JSON parse-error catch block are untested. given the 80% branch-coverage threshold, these gaps may trip the coverage gate.

add at minimum:
```ts
it("returns null when snapshot file does not exist", async () => {
  const { existsSync } = await import("node:fs");
  vi.mocked(existsSync).mockReturnValueOnce(false);
  const { loadPersistedRuntimeObservabilitySnapshot } = await import(
    "../lib/runtime/runtime-observability.js"
  );
  expect(await loadPersistedRuntimeObservabilitySnapshot()).toBeNull();
});

it("returns null on malformed JSON", async () => {
  readFileMock.mockResolvedValueOnce("not-json");
  const { loadPersistedRuntimeObservabilitySnapshot } = await import(
    "../lib/runtime/runtime-observability.js"
  );
  expect(await loadPersistedRuntimeObservabilitySnapshot()).toBeNull();
});
```

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (1): Last reviewed commit: "feat: version runtime observability snap..." | Re-trigger Greptile}

Greptile also left 1 inline comment on this PR.

Context used:

Context used - speak in lowercase, concise sentences. act like th... (source)

chatgpt-codex-connector · 2026-04-06T02:28:09Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

coderabbitai · 2026-04-06T02:28:11Z

Warning

Rate limit exceeded

@ndycode has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 28 minutes and 21 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 28 minutes and 21 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: c788519b-7a83-49ef-b884-5b44b155b3a8

📥 Commits

Reviewing files that changed from the base of the PR and between eb9a037 and c7bf640.

📒 Files selected for processing (2)

lib/runtime/runtime-observability.ts
test/runtime-observability.test.ts

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/runtime-snapshot-versioning

✨ Simplify code

Create PR with simplified code
Commit simplified code in branch feat/runtime-snapshot-versioning

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

greptile-apps · 2026-04-06T02:31:17Z

+describe("runtime observability snapshot versioning", () => {
+	afterEach(() => {
+		readFileMock.mockReset();
+	});


module cache shared across tests — missing vi.resetModules()

both tests use dynamic await import(...) inside their it() bodies but there is no vi.resetModules() call between them. in vitest's ESM mode the second test reuses the already-cached module instance, including the module-level snapshotState and pendingWrite singletons. the current tests pass only because they both call loadPersistedRuntimeObservabilitySnapshot, which is stateless with respect to those singletons. any future test that calls mutateRuntimeObservabilitySnapshot will see leftover state and produce intermittent failures depending on test order.

add a beforeEach reset:

Suggested change

describe("runtime observability snapshot versioning", () => {

afterEach(() => {

readFileMock.mockReset();

});

describe("runtime observability snapshot versioning", () => {

beforeEach(() => {

vi.resetModules();

});

afterEach(() => {

readFileMock.mockReset();

});

Prompt To Fix With AI

This is a comment left during a code review. Path: test/runtime-observability.test.ts Line: 19-22 Comment: **module cache shared across tests — missing `vi.resetModules()`** both tests use dynamic `await import(...)` inside their `it()` bodies but there is no `vi.resetModules()` call between them. in vitest's ESM mode the second test reuses the already-cached module instance, including the module-level `snapshotState` and `pendingWrite` singletons. the current tests pass only because they both call `loadPersistedRuntimeObservabilitySnapshot`, which is stateless with respect to those singletons. any future test that calls `mutateRuntimeObservabilitySnapshot` will see leftover state and produce intermittent failures depending on test order. add a `beforeEach` reset: ```suggestion describe("runtime observability snapshot versioning", () => { beforeEach(() => { vi.resetModules(); }); afterEach(() => { readFileMock.mockReset(); }); ``` How can I resolve this? If you propose a fix, please make it concise.

ndycode · 2026-04-06T04:21:33Z

Superseded by #387, which rebuilds the full open PR stack onto one reviewed integration branch.

ndycode · 2026-04-06T04:21:35Z

Closing in favor of #387.

feat: version runtime observability snapshots

c7bf640

greptile-apps Bot reviewed Apr 6, 2026

View reviewed changes

ndycode closed this Apr 6, 2026

ndycode deleted the feat/runtime-snapshot-versioning branch April 12, 2026 06:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: version runtime observability snapshots#379

feat: version runtime observability snapshots#379
ndycode wants to merge 1 commit intofeat/harden-request-reliabilityfrom
feat/runtime-snapshot-versioning

ndycode commented Apr 6, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

chatgpt-codex-connector Bot commented Apr 6, 2026

Uh oh!

coderabbitai Bot commented Apr 6, 2026

Rate limit exceeded

Uh oh!

greptile-apps Bot Apr 6, 2026

Uh oh!

ndycode commented Apr 6, 2026

Uh oh!

ndycode commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ndycode commented Apr 6, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

note: greptile review for oc-chatgpt-multi-auth. cite files like lib/foo.ts:123. confirm regression tests + windows concurrency/token redaction coverage.

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Comments Outside Diff (1)

Uh oh!

chatgpt-codex-connector Bot commented Apr 6, 2026

Uh oh!

coderabbitai Bot commented Apr 6, 2026

Rate limit exceeded

Uh oh!

greptile-apps Bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

ndycode commented Apr 6, 2026

Uh oh!

ndycode commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ndycode commented Apr 6, 2026 •

edited by greptile-apps Bot

Loading

note: greptile review for oc-chatgpt-multi-auth. cite files like `lib/foo.ts:123`. confirm regression tests + windows concurrency/token redaction coverage.