Context
Phase 8.6 (#1299) deferred the external validation step from the roadmap:
Benchmark against Jelly and ACG on shared fixture projects for external validation
What's needed
-
Select shared fixture languages — Jelly targets JavaScript/TypeScript; ACG targets Java. Pick 2–3 fixture projects from tests/benchmarks/resolution/fixtures/ that overlap (likely javascript, typescript, java).
-
Run Jelly on the TypeScript/JavaScript fixtures and collect its call graph output. Compare edge sets against codegraph's resolved edges and the hand-annotated expected-edges.json manifests. Compute precision/recall for Jelly on the same corpus.
-
Run ACG (or a compatible tool) on the Java fixture similarly.
-
Produce a comparison table — precision, recall, and TP/FP/FN counts for codegraph vs Jelly vs ACG on the shared fixture set. Document in docs/benchmarks/RESOLUTION-COMPARISON.md.
-
Wire into CI (optional) — if Jelly/ACG can be installed in CI without excessive overhead, add a comparison job to the resolution benchmark workflow; otherwise keep it as a manually-run script.
References
Notes
- The goal is external validation of codegraph's precision/recall claims, not necessarily matching their numbers — the fixture set is intentionally small and hand-annotated
- Jelly supports whole-program TypeScript analysis; the comparison should use the same fixture source files codegraph builds against
Context
Phase 8.6 (#1299) deferred the external validation step from the roadmap:
What's needed
Select shared fixture languages — Jelly targets JavaScript/TypeScript; ACG targets Java. Pick 2–3 fixture projects from
tests/benchmarks/resolution/fixtures/that overlap (likelyjavascript,typescript,java).Run Jelly on the TypeScript/JavaScript fixtures and collect its call graph output. Compare edge sets against codegraph's resolved edges and the hand-annotated
expected-edges.jsonmanifests. Compute precision/recall for Jelly on the same corpus.Run ACG (or a compatible tool) on the Java fixture similarly.
Produce a comparison table — precision, recall, and TP/FP/FN counts for codegraph vs Jelly vs ACG on the shared fixture set. Document in
docs/benchmarks/RESOLUTION-COMPARISON.md.Wire into CI (optional) — if Jelly/ACG can be installed in CI without excessive overhead, add a comparison job to the resolution benchmark workflow; otherwise keep it as a manually-run script.
References
tests/benchmarks/resolution/fixtures/typescript/expected-edges.jsontests/benchmarks/resolution/resolution-benchmark.test.tsNotes