Spun off from #1052.
The resolution dedup landing in #1052 (this PR) covers the resolution gate but leaves a similar — albeit more involved — duplication on the tracer side.
Where the duplication is:
scripts/resolution-benchmark.ts calls runDynamicTracer(lang) for every fixture (for telemetry: writes `dynamicEdges` / `dynamicConfirmed` counts into `resolution-result.json`).
tests/benchmarks/resolution/tracer/tracer-validation.test.ts independently calls `runTracer(lang)` for every fixture to compute same-file recall.
Both end up running the per-language tracer subprocess once each, doubling the tracer cost in the pre-publish job.
Why it's not as trivial as the resolution dedup:
The shapes and toolchain-missing semantics differ. The script's `runDynamicTracer` returns `[]` for both "ran, found nothing" and "toolchain missing". The test's `runTracer` returns `null` for "toolchain missing" so it can skip gracefully and treats empty edges as 0% recall. Deduplicating cleanly means:
- Extending the resolution script to emit raw tracer edges per language (not just counts), with a status field (`ok` | `skipped`).
- Updating `tracer-validation.test.ts` to optionally consume that artifact via an env var (e.g. `RESOLUTION_RESULT_JSON`, same as the resolution gate), preserving the existing skip-on-toolchain-missing behavior.
- Wiring the env var into the "Run tracer validation" step in `.github/workflows/publish.yml`.
That's a more invasive change to the script's output format, so it's worth doing in its own PR to keep #1052 reviewable.
Spun off from #1052.
The resolution dedup landing in #1052 (this PR) covers the resolution gate but leaves a similar — albeit more involved — duplication on the tracer side.
Where the duplication is:
scripts/resolution-benchmark.tscallsrunDynamicTracer(lang)for every fixture (for telemetry: writes `dynamicEdges` / `dynamicConfirmed` counts into `resolution-result.json`).tests/benchmarks/resolution/tracer/tracer-validation.test.tsindependently calls `runTracer(lang)` for every fixture to compute same-file recall.Both end up running the per-language tracer subprocess once each, doubling the tracer cost in the pre-publish job.
Why it's not as trivial as the resolution dedup:
The shapes and toolchain-missing semantics differ. The script's `runDynamicTracer` returns `[]` for both "ran, found nothing" and "toolchain missing". The test's `runTracer` returns `null` for "toolchain missing" so it can skip gracefully and treats empty edges as 0% recall. Deduplicating cleanly means:
That's a more invasive change to the script's output format, so it's worth doing in its own PR to keep #1052 reviewable.