refactor(ci): let tracer-validation reuse benchmark tracer output by carlos-alm · Pull Request #1173 · optave/ops-codegraph-tool

carlos-alm · 2026-05-19T23:40:00Z

Summary

Extend `scripts/resolution-benchmark.ts` per-language result with a `tracer` artifact (`status: 'ok' | 'skipped'` + raw edges) so the tracer-validation gate test can reuse the subprocess output instead of respawning `run-tracer.mjs` per fixture.
Refactor `tests/benchmarks/resolution/tracer/tracer-validation.test.ts` to consume the artifact when `RESOLUTION_RESULT_JSON` is set, preserving skip-on-toolchain-missing semantics. Falls back to running the tracer directly when unset so standalone `vitest` runs still work.
Wire `RESOLUTION_RESULT_JSON` through the `Run tracer validation` step in `.github/workflows/publish.yml`.

Mirrors the resolution-gate dedup approach in #1052 (PR d697929).

Why

The pre-publish benchmark job runs each language's tracer subprocess once for telemetry (writing `dynamicEdges` / `dynamicConfirmed` into `resolution-result.json`). The tracer-validation step that follows ran the same subprocess again per fixture, doubling tracer cost in the publish pipeline.

The status field distinguishes "tracer ran, found nothing" (`ok` with empty edges) from "toolchain missing" (`skipped`), mirroring `runTracer`'s null-return semantics so the test still skips gracefully on environments without rustc/ghc/ruby/etc.

Test plan

`npx vitest run tests/benchmarks/resolution/tracer/tracer-validation.test.ts` (standalone) — passes (1/1 with the rest skipped on this machine since most language toolchains aren't installed)
`RESOLUTION_RESULT_JSON=/tmp/resolution-result.json npx vitest run tests/benchmarks/resolution/tracer/tracer-validation.test.ts` — passes 35/35 in ~160ms against an artifact produced by `scripts/resolution-benchmark.ts`
`RESOLUTION_RESULT_JSON=/tmp/resolution-result.json npx vitest run tests/benchmarks/resolution/resolution-benchmark.test.ts` — passes 170/170 (confirms the new `tracer` field doesn't disturb the existing resolution gate)
CI publish workflow's pre-publish-benchmark job

Closes #1166

The pre-publish-benchmark job's `Run resolution benchmark` step already spawns `run-tracer.mjs` once per language fixture for telemetry (writing `dynamicEdges` / `dynamicConfirmed` counts into resolution-result.json). The `Run tracer validation` step that follows then ran the same per- language tracer subprocess again to compute same-file recall — doubling the tracer cost in the pre-publish job. Extend the script's per-language LangResult with a `tracer` artifact containing status ('ok' | 'skipped') and the raw captured edges. The status distinction mirrors the gate test's `runTracer` semantics (null-return on toolchain-missing → 'skipped'). Refactor the gate test to consume that artifact when RESOLUTION_RESULT_JSON is set, falling back to running run-tracer.mjs directly when unset so devs can still execute `npx vitest run tests/benchmarks/resolution/tracer/...` standalone. Wire the env var through the workflow's tracer-validation step. Verified locally: gate test in artifact mode passes 35/35 in ~160ms against an artifact produced by scripts/resolution-benchmark.ts (vs ~320ms for the standalone path that respawns the JS tracer subprocess). The resolution-benchmark gate test still passes 170/170 with the new `tracer` field present in the artifact. Closes #1166

claude · 2026-05-19T23:40:16Z

Claude finished @carlos-alm's task —— View job

PR Review: Tracer Validation Deduplication

Final recommendation completed.

Todo List:

Read and analyze the 3 changed files
Phase 1: Root Cause Analysis
Phase 2: Configuration Change Detection
Phase 3: Test Integrity Assessment
Phase 4: Code Quality Review
Phase 5: Backlog Compliance Rating
Provide final recommendation and rating

Root Cause Analysis

Problem Being Solved: The pre-publish benchmark job runs each language's tracer subprocess once for telemetry, but the tracer-validation step that follows runs the same subprocess again per fixture, doubling tracer cost in the publish pipeline.

Why This Approach: The solution mirrors the resolution-gate dedup approach from PR #1052. It extends the resolution benchmark script to include a tracer artifact with status and raw edges, allowing the validation test to reuse this output instead of respawning processes.

Risk Assessment: Low risk. This is a CI optimization that preserves existing behavior while reducing duplicate work. Fallback to direct tracer execution ensures standalone tests still work.

Configuration Change Detection

Assessment: ✅ NO HIGH-RISK CHANGES DETECTED

No linting rules disabled or relaxed
No test timeouts increased or coverage thresholds lowered
No build configuration or TypeScript strict mode changes
Only adds environment variable to CI workflow
Changes are purely additive optimization

Test Integrity Assessment

Assessment: ✅ NO RED FLAGS DETECTED

Test assertions remain unchanged - same recall thresholds and expectations
Preserves skip-on-toolchain-missing semantics (status: 'skipped' mirrors null return)
Fallback behavior ensures standalone vitest runs still work
No weakened assertions or conditional validation logic
Logic correctly distinguishes "tracer ran, found nothing" (ok + empty edges) from "toolchain missing" (skipped)

Code Quality Assessment

Assessment: ✅ GOOD IMPLEMENTATION QUALITY

Strengths:

Clean interface design: TracerArtifact with discriminated union status: 'ok' | 'skipped'
Proper error handling: tracer crashes → skipped, subprocess errors → skipped
Type safety: Well-defined interfaces for TracerEdge and ArtifactTracerEntry
Graceful fallbacks: artifact missing/malformed → use direct execution path
Performance optimization: ~50% CI time reduction for tracer validation step

Minor observations:

Consistent with existing codebase patterns (mirrors resolution-gate approach)
Appropriate timeout handling (60s matches existing)
Good separation of concerns: benchmark script emits, test consumes

No significant issues identified.

Backlog Compliance

Zero-dep: ✅ — No new runtime dependencies added
Foundation-aligned: ✅ — Pure CI optimization, no violation of core principles
Problem-fit: 3/5 — Useful CI efficiency improvement but doesn't directly address core "lost AI" problem
Breaking: No — Purely additive, preserves all existing behavior
Tier: 1 (Zero-dep ✓ + Foundation-aligned ✓)

Critical Concerns

None identified. This is a well-implemented CI optimization.

Final Recommendation

Rating: ⭐⭐⭐⭐☆ (4/5)
Action: APPROVE
Reasoning: Clean implementation of a valuable CI optimization that reduces duplicate work without changing behavior. Well-designed interfaces, proper fallbacks, and follows established patterns. The problem-fit is moderate (CI efficiency vs. core AI agent value), but the implementation quality is high and risk is minimal.

greptile-apps · 2026-05-19T23:42:39Z

Greptile Summary

This PR eliminates duplicate tracer subprocess invocations in the publish CI pipeline by having scripts/resolution-benchmark.ts embed per-language tracer output (edges + a status discriminator) into resolution-result.json, which tracer-validation.test.ts then reads via RESOLUTION_RESULT_JSON instead of re-spawning run-tracer.mjs per fixture.

scripts/resolution-benchmark.ts: runDynamicTracer now returns a TracerArtifact ({ status: 'ok' | 'skipped', edges }) with semantics that map cleanly to the test's null-return convention for missing toolchains; the artifact is written to every LangResult entry in the output JSON.
tests/benchmarks/resolution/tracer/tracer-validation.test.ts: Adds an IIFE that parses the artifact at module-load time (with a helpful error on missing/malformed JSON from the previous review), then short-circuits runTracer to consume pre-computed edges when the artifact is present — falling back to the existing subprocess path when RESOLUTION_RESULT_JSON is unset.
.github/workflows/publish.yml: Wires RESOLUTION_RESULT_JSON into the "Run tracer validation" step, mirroring the same pattern already in place for the "Gate on resolution thresholds" step.

Confidence Score: 5/5

Safe to merge — the refactor correctly deduplicates tracer subprocess calls in CI without altering any observable test semantics or benchmark metrics.

All three changed files make narrowly scoped, well-tested changes. The TracerArtifact type is structurally identical to what the test's TracerEdge interface expects, so no type mismatch occurs at runtime. The ok/skipped status mapping faithfully mirrors the existing null-return convention in runTracer. Error handling for missing or malformed JSON is in place. The workflow path for RESOLUTION_RESULT_JSON is consistent with the file written by the preceding benchmark step. No correctness or data-loss risks were identified.

No files require special attention.

Important Files Changed

Filename	Overview
scripts/resolution-benchmark.ts	Introduces `TracerArtifact` type and refactors `runDynamicTracer` to return it; emits `metrics.tracer` into every language result. Logic for `ok`/`skipped` status is correct and backward-compatible with existing `dynamicEdges`/`dynamicConfirmed` metrics.
tests/benchmarks/resolution/tracer/tracer-validation.test.ts	Adds artifact-mode path for `runTracer` guarded by `RESOLUTION_RESULT_JSON`; JSON parse is now wrapped in try/catch with a clear regeneration message. Skip semantics (`null` return) are faithfully preserved for `status === 'skipped'` entries.
.github/workflows/publish.yml	Adds `RESOLUTION_RESULT_JSON` env var to the "Run tracer validation" step, pointing at the file written by the preceding resolution benchmark step. Path is consistent with the existing gate-test step.

Sequence Diagram

sequenceDiagram
    participant WF as GitHub Actions
    participant BM as resolution-benchmark.ts
    participant RJ as resolution-result.json
    participant TV as tracer-validation.test.ts
    participant RT as run-tracer.mjs

    WF->>BM: node scripts/resolution-benchmark.ts
    loop per language fixture
        BM->>RT: execFileSync (run-tracer.mjs)
        RT-->>BM: "{ edges, error? }"
        BM->>BM: "Build TracerArtifact {status, edges}"
        BM->>RJ: write LangResult incl. tracer field
    end
    BM-->>WF: resolution-result.json written

    WF->>TV: vitest run (RESOLUTION_RESULT_JSON set)
    TV->>RJ: readFileSync + JSON.parse (module load)
    loop per language
        TV->>TV: runTracer(lang)
        alt "artifact has status=ok"
            TV->>TV: return entry.edges (no subprocess)
        else "artifact missing or status=skipped"
            TV->>TV: "return null -> skip test"
        end
    end
    TV-->>WF: recall assertions complete

_{Reviews (7): Last reviewed commit: "fix: surface helpful error when RESOLUTI..." | Re-trigger Greptile}

greptile-apps · 2026-05-19T23:42:42Z

+  return JSON.parse(fs.readFileSync(RESOLUTION_RESULT_JSON, 'utf-8'));
+})();


If RESOLUTION_RESULT_JSON points to a file that exists but contains invalid or truncated JSON (e.g., the benchmark step was killed mid-write and the redirect left a partial file), JSON.parse throws a raw SyntaxError during module load. Every test in the suite then fails with an opaque parse error instead of a useful message explaining which file to regenerate.

Suggested change

return JSON.parse(fs.readFileSync(RESOLUTION_RESULT_JSON, 'utf-8'));

})();

try {

return JSON.parse(fs.readFileSync(RESOLUTION_RESULT_JSON, 'utf-8'));

} catch (err) {

throw new Error(

`RESOLUTION_RESULT_JSON=${RESOLUTION_RESULT_JSON} is not valid JSON — regenerate it with scripts/resolution-benchmark.ts. (${err})`,

);

}

})();

Fixed in c2c5889 — wrapped the JSON.parse in a try/catch that throws a helpful error pointing the developer at scripts/resolution-benchmark.ts to regenerate the artifact, instead of letting the raw SyntaxError surface during module load. Verified manually with a malformed JSON file.

github-actions · 2026-05-19T23:47:18Z

Codegraph Impact Analysis

4 functions changed → 1 callers affected across 1 files

TracerArtifact.status in scripts/resolution-benchmark.ts:68 (0 transitive callers)
TracerArtifact.edges in scripts/resolution-benchmark.ts:69 (0 transitive callers)
LangResult.tracer in scripts/resolution-benchmark.ts:87 (0 transitive callers)
runDynamicTracer in scripts/resolution-benchmark.ts:189 (1 transitive callers)

…1173)

carlos-alm · 2026-05-20T07:25:18Z

@greptileai

greptile-apps Bot reviewed May 19, 2026

View reviewed changes

carlos-alm and others added 5 commits May 19, 2026 18:30

Merge branch 'main' into refactor/1166-dedup-tracer-validation

81fcd73

Merge branch 'main' into refactor/1166-dedup-tracer-validation

b9e45ec

Merge branch 'main' into refactor/1166-dedup-tracer-validation

797fb45

Merge branch 'main' into refactor/1166-dedup-tracer-validation

4357471

fix: surface helpful error when RESOLUTION_RESULT_JSON is malformed (#…

c2c5889

…1173)

carlos-alm closed this May 20, 2026

github-actions Bot locked and limited conversation to collaborators May 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(ci): let tracer-validation reuse benchmark tracer output#1173

refactor(ci): let tracer-validation reuse benchmark tracer output#1173
carlos-alm wants to merge 6 commits into
mainfrom
refactor/1166-dedup-tracer-validation

carlos-alm commented May 19, 2026

Uh oh!

claude Bot commented May 19, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented May 19, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot May 19, 2026

Uh oh!

carlos-alm May 20, 2026

Uh oh!

github-actions Bot commented May 19, 2026 •

edited

Loading

Uh oh!

carlos-alm commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		return JSON.parse(fs.readFileSync(RESOLUTION_RESULT_JSON, 'utf-8'));
		})();

-  return JSON.parse(fs.readFileSync(RESOLUTION_RESULT_JSON, 'utf-8'));
-})();
+  try {
+    return JSON.parse(fs.readFileSync(RESOLUTION_RESULT_JSON, 'utf-8'));
+  } catch (err) {
+    throw new Error(
+      `RESOLUTION_RESULT_JSON=${RESOLUTION_RESULT_JSON} is not valid JSON — regenerate it with scripts/resolution-benchmark.ts. (${err})`,
+    );
+  }
+})();

Conversation

carlos-alm commented May 19, 2026

Summary

Why

Test plan

Uh oh!

claude Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: Tracer Validation Deduplication

Todo List:

Root Cause Analysis

Configuration Change Detection

Test Integrity Assessment

Code Quality Assessment

Backlog Compliance

Critical Concerns

Final Recommendation

Uh oh!

greptile-apps Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm May 20, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codegraph Impact Analysis

Uh oh!

carlos-alm commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude Bot commented May 19, 2026 •

edited

Loading

greptile-apps Bot commented May 19, 2026 •

edited

Loading

github-actions Bot commented May 19, 2026 •

edited

Loading