fix(benchmark): apply sink-edge exclusion filter to scripts/resolution-benchmark.ts#1699
Conversation
… path `insert_call_edge_rows` in pipeline.rs mapped ComputedEdge→EdgeRow but silently dropped dynamic_kind, so native sink edges landed in the DB with dynamic_kind IS NULL. The benchmark filter added in 39b9a00 (`OR e.dynamic_kind IS NULL`) then matched those NULL-kind rows and counted them as false positives, breaking precision for dynamic-javascript, dynamic-typescript, and dynamic-scala. Two-layer fix: 1. edges.rs: add dynamic_kind: Option<String> to EdgeRow and update do_insert_edges to a 6-column INSERT (CHUNK 199→165 to stay under SQLite's 999 bind-parameter limit). None binds as SQL NULL for all non-sink edges; Some(dk) writes the kind string for sink edges. 2. pipeline.rs: pass e.dynamic_kind.clone() in insert_call_edge_rows so ComputedEdge.dynamic_kind reaches the DB insert. 3. resolution-benchmark.test.ts: add AND tgt.kind != 'file' to extractResolvedEdges — the correct semantic filter since sink edges are the only calls that ever target the file node, and no legitimate resolution does. This filter is correct regardless of dynamic_kind being set, removing the dependency on the back-fill path.
…xtractResolvedEdges (#1698) The tgt.kind != 'file' guard is the authoritative semantic exclusion for sink edges; the confidence/dynamic_kind clause is kept as belt-and-suspenders for databases built before the native fix where dynamic_kind was NULL.
…n-benchmark.ts The tgt.kind != 'file' and confidence/dynamic_kind guards were added to extractResolvedEdges in the vitest test file (9530468) but not to the matching SQL query in scripts/resolution-benchmark.ts. The CI gate reads precomputed JSON from that script (via RESOLUTION_RESULT_JSON), so sink edges (confidence=0, targeting file nodes) were counted as false positives, breaking precision for dynamic-java, dynamic-javascript, dynamic-scala, and dynamic-typescript.
Greptile SummaryThis PR backports the sink-edge exclusion SQL filter (added to the vitest test in commit
Confidence Score: 4/5Safe to merge — the benchmark script now uses the same SQL filter as the test, and the Rust bulk-insert path correctly writes dynamic_kind for sink edges. All changed logic is a targeted one-to-one alignment of the benchmark script's SQL query with the already-correct test implementation. The only loose end is a comment in the test file that cites #1698 for the Rust dynamic_kind write fix, which actually lands in this PR (#1699). The inline comment on line 267 of tests/benchmarks/resolution/resolution-benchmark.test.ts references an incorrect PR number for the native engine fix. Important Files Changed
Sequence Diagram%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
participant CI as CI Gate
participant Script as resolution-benchmark.ts
participant DB as SQLite (graph.db)
participant Test as resolution-benchmark.test.ts
CI->>Script: run (reads RESOLUTION_RESULT_JSON)
Script->>DB: "SELECT WHERE kind=calls AND tgt.kind != file NEW"
DB-->>Script: resolved edges (sink edges excluded)
Script-->>CI: metrics JSON (precision/recall)
CI->>CI: Gate on resolution thresholds
Note over Test: extractResolvedEdges() already had both guards since commit 95304680
Test->>DB: same SQL (reference implementation)
DB-->>Test: resolved edges (sink edges excluded)
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
participant CI as CI Gate
participant Script as resolution-benchmark.ts
participant DB as SQLite (graph.db)
participant Test as resolution-benchmark.test.ts
CI->>Script: run (reads RESOLUTION_RESULT_JSON)
Script->>DB: "SELECT WHERE kind=calls AND tgt.kind != file NEW"
DB-->>Script: resolved edges (sink edges excluded)
Script-->>CI: metrics JSON (precision/recall)
CI->>CI: Gate on resolution thresholds
Note over Test: extractResolvedEdges() already had both guards since commit 95304680
Test->>DB: same SQL (reference implementation)
DB-->>Test: resolved edges (sink edges excluded)
Reviews (1): Last reviewed commit: "fix(benchmark): apply sink-edge exclusio..." | Re-trigger Greptile |
| -- databases built before the Rust fix in #1698 where dynamic_kind was NULL for | ||
| -- sink edges, making tgt.kind != 'file' the authoritative semantic guard. |
There was a problem hiding this comment.
Comment references wrong PR number for the Rust fix
The comment attributes the native dynamic_kind write fix to #1698, but the EdgeRow struct change that actually makes the native engine write dynamic_kind for sink edges is part of this PR (#1699). Anyone reading this comment in the future will look up #1698 and not find the corresponding edges.rs change. If #1698 was intended as a separate squash/predecessor, the match is still off-by-one here.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Summary
scripts/resolution-benchmark.tswas missing the sink-edge exclusion filters that commit95304680added to the vitest test fileRESOLUTION_RESULT_JSON, so sink edges (confidence=0, targeting file nodes) were counted as false positivesAND tgt.kind != 'file'and theconfidence/dynamic_kindbelt-and-suspenders guard to the script's SQL, matchingextractResolvedEdgesin the test fileAffected fixtures (before fix):
dynamic-java: precision 33.3% (runInvoke, runForName sink edges leaking through)dynamic-javascript: precision 57.1% (runComputedKey, runEval, runNewFunction leaking)dynamic-scala: precision 50.0% (runInvoke leaking)dynamic-typescript: precision 60.0% (runReflectGetVariable, runReflectApplyExternal leaking)Test plan
Gate on resolution thresholdsshould pass for all four fixtures at 100% precision