Skip to content

feat: version runtime observability snapshots#379

Closed
ndycode wants to merge 1 commit intofeat/harden-request-reliabilityfrom
feat/runtime-snapshot-versioning
Closed

feat: version runtime observability snapshots#379
ndycode wants to merge 1 commit intofeat/harden-request-reliabilityfrom
feat/runtime-snapshot-versioning

Conversation

@ndycode
Copy link
Copy Markdown
Owner

@ndycode ndycode commented Apr 6, 2026

Summary

  • add a version field to persisted runtime observability snapshots and normalize legacy unversioned snapshots onto the current schema
  • safely ignore unknown future snapshot versions instead of trying to load incompatible runtime telemetry files
  • add focused runtime observability tests for both backward-compatible normalization and unknown-version rejection

Validation

  • node_modules/.bin/vitest.cmd run test/runtime-observability.test.ts test/runtime-metrics.test.ts
  • node_modules/.bin/tsc.cmd --noEmit

note: greptile review for oc-chatgpt-multi-auth. cite files like lib/foo.ts:123. confirm regression tests + windows concurrency/token redaction coverage.

Greptile Summary

this pr adds a version field to persisted runtime observability snapshots and implements two migration behaviors: legacy unversioned snapshots are normalized onto the current schema via a safe spread-merge, and unknown future versions are silently rejected. the approach is consistent with how storage/migrations.ts handles v1/v2→v3 upgrades elsewhere in the codebase.

key concerns:

  • windows rename safety: writeSnapshot calls fs.rename without EBUSY/EPERM retry logic, violating the project's explicit anti-pattern. the failure is then silently swallowed by the pendingWrite chain's .catch(() => undefined), making the data loss invisible on windows.
  • test module isolation: both tests call await import(...) without vi.resetModules() in a beforeEach, sharing the module-level snapshotState singleton across tests — currently harmless but fragile as the suite grows.
  • missing branch coverage: the existsSync=false, JSON-parse-error, and non-object guard branches in loadPersistedRuntimeObservabilitySnapshot are untested; the 80% threshold may be at risk.

Confidence Score: 3/5

safe to merge on linux/mac; windows rename path silently loses snapshot data without retry logic

core versioning logic is correct and well-scoped; the windows filesystem safety gap is a real concern per project conventions but non-blocking for non-windows environments; missing test isolation is fragile but not currently failing

lib/runtime/runtime-observability.ts (writeSnapshot rename retry), test/runtime-observability.test.ts (vi.resetModules + branch coverage)

Important Files Changed

Filename Overview
lib/runtime/runtime-observability.ts adds versioning + legacy normalization to snapshot persistence; windows rename lacks EBUSY/EPERM retry and write failures are silently swallowed in the pendingWrite chain
test/runtime-observability.test.ts two focused snapshot-versioning tests; missing vi.resetModules() before dynamic imports and missing branch coverage for file-not-found, JSON-error, and non-object-guard paths

Sequence Diagram

sequenceDiagram
    participant Caller
    participant mutate as mutateRuntimeObservabilitySnapshot
    participant chain as pendingWrite chain
    participant write as writeSnapshot
    participant fs as node:fs

    Caller->>mutate: mutator fn
    mutate->>mutate: apply mutator, stamp updatedAt
    mutate->>chain: .then(writeSnapshot(clone))
    chain->>write: writeSnapshot(snapshot)
    write->>fs: mkdir(dir, recursive)
    write->>fs: writeFile(tempPath)
    write->>fs: rename(tempPath → path)
    note over fs: ⚠ no EBUSY/EPERM retry on Windows
    fs-->>write: ok / EBUSY throws
    write-->>chain: rejects
    note over chain: next .catch(() => undefined) silently drops error

    Caller->>+write: loadPersistedRuntimeObservabilitySnapshot()
    write->>fs: existsSync(path)
    alt file missing
        write-->>Caller: null
    else file present
        write->>fs: readFile(path)
        write->>write: parse + check version
        alt version undefined — legacy
            write-->>Caller: spread-merged snapshot (version=1)
        else version === 1 — current
            write-->>Caller: spread-merged snapshot
        else version > 1 — unknown future
            write-->>Caller: null
        end
    end
    deactivate write
Loading

Comments Outside Diff (1)

  1. lib/runtime/runtime-observability.ts, line 79-81 (link)

    P1 missing EBUSY/EPERM retry on windows rename

    fs.rename on windows can fail with EBUSY or EPERM when antivirus has the destination file open. per AGENTS.md, bare filesystem writes without retry logic are an explicit anti-pattern ("do not use bare fs.rm without retry logic — Windows antivirus locks"). the failure propagates into writeSnapshot which throws, the moved flag stays false, the temp file is cleaned up — but the pendingWrite chain's next .catch(() => undefined) silently swallows the rejection. windows users will lose snapshot data with zero diagnostic signal.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: lib/runtime/runtime-observability.ts
    Line: 79-81
    
    Comment:
    **missing EBUSY/EPERM retry on windows rename**
    
    `fs.rename` on windows can fail with `EBUSY` or `EPERM` when antivirus has the destination file open. per `AGENTS.md`, bare filesystem writes without retry logic are an explicit anti-pattern (`"do not use bare fs.rm without retry logic — Windows antivirus locks"`). the failure propagates into `writeSnapshot` which throws, the `moved` flag stays false, the temp file is cleaned up — but the `pendingWrite` chain's next `.catch(() => undefined)` silently swallows the rejection. windows users will lose snapshot data with zero diagnostic signal.
    
    
    
    How can I resolve this? If you propose a fix, please make it concise.

    Fix in Codex

Fix All in Codex

Prompt To Fix All With AI
This is a comment left during a code review.
Path: lib/runtime/runtime-observability.ts
Line: 79-81

Comment:
**missing EBUSY/EPERM retry on windows rename**

`fs.rename` on windows can fail with `EBUSY` or `EPERM` when antivirus has the destination file open. per `AGENTS.md`, bare filesystem writes without retry logic are an explicit anti-pattern (`"do not use bare fs.rm without retry logic — Windows antivirus locks"`). the failure propagates into `writeSnapshot` which throws, the `moved` flag stays false, the temp file is cleaned up — but the `pendingWrite` chain's next `.catch(() => undefined)` silently swallows the rejection. windows users will lose snapshot data with zero diagnostic signal.

```suggestion
		await fs.writeFile(tempPath, JSON.stringify(snapshot, null, 2), "utf-8");
		for (let attempt = 0; attempt < 4; attempt++) {
			try {
				await fs.rename(tempPath, path);
				moved = true;
				break;
			} catch (err) {
				const code = (err as NodeJS.ErrnoException).code;
				if ((code === "EBUSY" || code === "EPERM" || code === "ENOTEMPTY") && attempt < 3) {
					await new Promise((r) => setTimeout(r, 50 * 2 ** attempt));
					continue;
				}
				throw err;
			}
		}
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: test/runtime-observability.test.ts
Line: 19-22

Comment:
**module cache shared across tests — missing `vi.resetModules()`**

both tests use dynamic `await import(...)` inside their `it()` bodies but there is no `vi.resetModules()` call between them. in vitest's ESM mode the second test reuses the already-cached module instance, including the module-level `snapshotState` and `pendingWrite` singletons. the current tests pass only because they both call `loadPersistedRuntimeObservabilitySnapshot`, which is stateless with respect to those singletons. any future test that calls `mutateRuntimeObservabilitySnapshot` will see leftover state and produce intermittent failures depending on test order.

add a `beforeEach` reset:

```suggestion
describe("runtime observability snapshot versioning", () => {
	beforeEach(() => {
		vi.resetModules();
	});

	afterEach(() => {
		readFileMock.mockReset();
	});
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: test/runtime-observability.test.ts
Line: 3-13

Comment:
**missing vitest coverage: several branches in `loadPersistedRuntimeObservabilitySnapshot` untested**

`existsSync` is globally mocked to return `true` for every test, so the early `return null` (file does not exist, line 115 of `runtime-observability.ts`) is never exercised. likewise, the `!parsed || typeof parsed !== 'object'` guard and the JSON parse-error catch block are untested. given the 80% branch-coverage threshold, these gaps may trip the coverage gate.

add at minimum:
```ts
it("returns null when snapshot file does not exist", async () => {
  const { existsSync } = await import("node:fs");
  vi.mocked(existsSync).mockReturnValueOnce(false);
  const { loadPersistedRuntimeObservabilitySnapshot } = await import(
    "../lib/runtime/runtime-observability.js"
  );
  expect(await loadPersistedRuntimeObservabilitySnapshot()).toBeNull();
});

it("returns null on malformed JSON", async () => {
  readFileMock.mockResolvedValueOnce("not-json");
  const { loadPersistedRuntimeObservabilitySnapshot } = await import(
    "../lib/runtime/runtime-observability.js"
  );
  expect(await loadPersistedRuntimeObservabilitySnapshot()).toBeNull();
});
```

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "feat: version runtime observability snap..." | Re-trigger Greptile

Greptile also left 1 inline comment on this PR.

Context used:

  • Context used - speak in lowercase, concise sentences. act like th... (source)

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 6, 2026

Warning

Rate limit exceeded

@ndycode has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 28 minutes and 21 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 28 minutes and 21 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: c788519b-7a83-49ef-b884-5b44b155b3a8

📥 Commits

Reviewing files that changed from the base of the PR and between eb9a037 and c7bf640.

📒 Files selected for processing (2)
  • lib/runtime/runtime-observability.ts
  • test/runtime-observability.test.ts
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/runtime-snapshot-versioning
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch feat/runtime-snapshot-versioning

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment on lines +19 to +22
describe("runtime observability snapshot versioning", () => {
afterEach(() => {
readFileMock.mockReset();
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 module cache shared across tests — missing vi.resetModules()

both tests use dynamic await import(...) inside their it() bodies but there is no vi.resetModules() call between them. in vitest's ESM mode the second test reuses the already-cached module instance, including the module-level snapshotState and pendingWrite singletons. the current tests pass only because they both call loadPersistedRuntimeObservabilitySnapshot, which is stateless with respect to those singletons. any future test that calls mutateRuntimeObservabilitySnapshot will see leftover state and produce intermittent failures depending on test order.

add a beforeEach reset:

Suggested change
describe("runtime observability snapshot versioning", () => {
afterEach(() => {
readFileMock.mockReset();
});
describe("runtime observability snapshot versioning", () => {
beforeEach(() => {
vi.resetModules();
});
afterEach(() => {
readFileMock.mockReset();
});
Prompt To Fix With AI
This is a comment left during a code review.
Path: test/runtime-observability.test.ts
Line: 19-22

Comment:
**module cache shared across tests — missing `vi.resetModules()`**

both tests use dynamic `await import(...)` inside their `it()` bodies but there is no `vi.resetModules()` call between them. in vitest's ESM mode the second test reuses the already-cached module instance, including the module-level `snapshotState` and `pendingWrite` singletons. the current tests pass only because they both call `loadPersistedRuntimeObservabilitySnapshot`, which is stateless with respect to those singletons. any future test that calls `mutateRuntimeObservabilitySnapshot` will see leftover state and produce intermittent failures depending on test order.

add a `beforeEach` reset:

```suggestion
describe("runtime observability snapshot versioning", () => {
	beforeEach(() => {
		vi.resetModules();
	});

	afterEach(() => {
		readFileMock.mockReset();
	});
```

How can I resolve this? If you propose a fix, please make it concise.

Fix in Codex

@ndycode
Copy link
Copy Markdown
Owner Author

ndycode commented Apr 6, 2026

Superseded by #387, which rebuilds the full open PR stack onto one reviewed integration branch.

@ndycode
Copy link
Copy Markdown
Owner Author

ndycode commented Apr 6, 2026

Closing in favor of #387.

@ndycode ndycode closed this Apr 6, 2026
@ndycode ndycode deleted the feat/runtime-snapshot-versioning branch April 12, 2026 06:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant