Skip to content

fix(extract): eliminate WASM/native node divergence#1266

Merged
carlos-alm merged 3 commits into
mainfrom
fix/engine-node-divergence
Jun 1, 2026
Merged

fix(extract): eliminate WASM/native node divergence#1266
carlos-alm merged 3 commits into
mainfrom
fix/engine-node-divergence

Conversation

@carlos-alm
Copy link
Copy Markdown
Contributor

Summary

Eliminates the persistent full-build node divergence between the WASM and native engines (tracked in #1263). Two independent root causes:

  • Native Ruby extractor never handled top-level assignment nodes, so program-level constants (e.g. SKIP_CLASSES = %w[...].freeze) were dropped while the WASM extractor extracted them. Added an assignment arm mirroring handleRubyAssignment, gated to program-level constant targets so class/module-body constants stay attributed to their owner.
  • C++ benchmark fixture service.h was parsed as C (.h maps to C), where tree-sitter-c 0.23 (native) error-recovered class Service {} into a bogus function node while 0.24 (WASM) dropped it. The file is genuinely C++, so it's renamed to service.hpp — both engines now parse it as C++ and agree (class/method/property nodes). Updated the #include lines and taught the native tracer to copy .hpp headers.

After both fixes, the full-build node sets are byte-identical across engines.

Out of scope (tracked separately)

The residual ~30–46 call-edge divergence stems from grammar-version drift between the Rust crates and the npm WASM packages (e.g. go 0.23↔0.25, python 0.23↔0.25, c 0.23↔0.24). That is a large, multi-language alignment effort and is tracked in #1265.

Test plan

  • cargo test --lib — 327 pass, including the grammar-ABI regression test and 3 new Ruby extractor tests
  • cpp resolution benchmark passes (precision/recall thresholds unchanged)
  • Full-repo dual-engine build: node sets verified byte-identical (WASM vs native)

Closes #1263

Two independent root causes produced the persistent full-build node
divergence between the WASM and native engines (tracked in #1263):

- The native Ruby extractor never handled top-level `assignment` nodes,
  so program-level constants (e.g. `SKIP_CLASSES = %w[...].freeze`) were
  dropped while the WASM extractor extracted them. Added an `assignment`
  arm mirroring `handleRubyAssignment`, gated to `program`-level constant
  targets so class/module-body constants stay attributed to their owner.

- The C++ benchmark fixture `service.h` was parsed as C (`.h` maps to C),
  where tree-sitter-c 0.23 (native) error-recovered `class Service {}`
  into a bogus `function` node while 0.24 (WASM) dropped it. The file is
  genuinely C++, so renamed it to `service.hpp`; both engines now parse it
  as C++ and agree (class/method/property nodes). Updated the `#include`
  lines and taught the native tracer to copy `.hpp` headers.

After both fixes the full-build node sets are byte-identical across
engines. The residual call-edge divergence stems from grammar-version
drift between the Rust crates and npm WASM packages and is tracked
separately in #1265.

Closes #1263

docs check acknowledged: bug fix + test fixture rename, no language/
architecture/feature changes warranting README/CLAUDE/ROADMAP edits.
@claude
Copy link
Copy Markdown

claude Bot commented May 31, 2026

Claude finished @carlos-alm's task —— View job


PR Review: fix(extract): eliminate WASM/native node divergence

Tasks

  • Gather context and understand the PR
  • Read and analyze Ruby extractor changes
  • Read and analyze C++ fixture changes
  • Read and analyze native tracer changes
  • Perform backlog compliance check
  • Write final review

Root Cause Analysis

Problem Being Solved: Two independent bugs caused the native engine to produce different node sets than the WASM engine for a full-repo build. Bug 1: the native Ruby extractor never handled assignment nodes, so top-level constants like SKIP_CLASSES = ... were silently dropped. Bug 2: the C++ benchmark fixture service.h was parsed as C (not C++), causing grammar-version-specific behavior differences (tree-sitter-c 0.23 error-recovered class Service {} into a bogus function node, while 0.24 dropped it).

Why This Approach: Both fixes are correct and targeted. The Ruby fix ports the existing WASM handleRubyAssignment logic 1:1 into Rust, with the identical guard (parent.kind() == "program") and identical constant-only filter. The C++ fix corrects a misclassification at the file level — .h files are parsed as C by default, but this particular header is genuinely C++, so .hpp is the right extension. This avoids any special-casing in parser dispatch logic.

Risk Assessment: Low. Both fixes are purely additive on the extraction side — they add nodes that were previously missing, rather than removing or transforming existing ones. The .hpp rename is consistent (both .cpp files that #include "service.h" are updated, and the native tracer's cp is updated). expected-edges.json was not changed, which is correct since the rename doesn't affect call-edge resolution semantics.


Phase 1 — Root Cause Analysis: PASS

Both fixes address genuine bugs at the extraction layer, not symptoms. No warnings are being silenced. No "parity gap" is being documented as acceptable.


Phase 2 — Configuration Change Detection: PASS

No configuration, lint rules, TypeScript flags, or test thresholds were modified. Zero config changes in this PR.


Phase 3 — Test Integrity: PASS

Three new unit tests were added directly to crates/codegraph-core/src/extractors/ruby.rs (lines 353–378):

  • extracts_top_level_constant — positive case: SKIP_CLASSES = %w[...].freeze produces a constant def.
  • ignores_local_variable_assignment — negative case: edges = [] (lowercase) is correctly ignored.
  • does_not_duplicate_class_body_constant_at_top_level — boundary case: class Foo; BAR = 1; endBAR must not appear as a top-level def (it belongs to the class children).

All three tests target exactly the new handle_assignment function and the critical guard on node.parent().kind() == "program". The tests are strict (no skipped assertions, no conditional assert).

The WASM-side tests in tests/parsers/ruby.test.ts were not updated — this is a minor gap worth noting (see Concerns below).


Phase 4 — Code Quality

Ruby extractor (crates/codegraph-core/src/extractors/ruby.rs:41-59)

The implementation is a faithful, idiomatic Rust port of the TypeScript handleRubyAssignment. Structural parity with the WASM extractor:

Aspect WASM (TS) Native (Rust) Match
Scope guard parent.type === 'program' parent.kind() == "program"
LHS field childForFieldName('left') child_by_field_name("left")
LHS kind filter left.type === 'constant' left.kind() != "constant" (early return)
Output kind 'constant' "constant"
Line numbers startPosition.row + 1 start_line(node)
End line nodeEndLine(node) Some(end_line(node))

One minor observation: complexity and cfg are set to None on the constant definition (line 55–56). This is correct — constants don't have control-flow graphs. Consistent with how other non-function kinds are handled in this file.

C++ fixture rename

service.hservice.hpp is correct. The file contains class Service {} and class UserService : public Service {} — it is definitively C++, not C. Both consuming files (main.cpp:1, service.cpp:1) were updated from #include "service.h" to #include "service.hpp". The expected-edges.json references only .cpp files, so no manifest update was needed.

Native tracer (tests/benchmarks/resolution/tracer/native-tracer.sh:55)

The added line:

cp "$FIXTURE_DIR"/*.hpp "$TMP_DIR/" 2>/dev/null || true

is inserted after the existing cp *.h line (line 54). This is the minimal correct fix — the C/C++ tracer already copies .h headers; .hpp headers were previously not copied, which would have silently caused a missing-include failure when compiling the fixture. The || true fallback is consistent with the pattern used on line 54.

Minor concern — handle_assignment in match_ruby_node is called for all assignments (crates/codegraph-core/src/extractors/ruby.rs:32), including deeply nested ones (e.g., inside methods). The early-return guard at line 42 (parent.kind() != "program") correctly handles this at runtime, but the walk itself still visits every assignment node in the file before the guard fires. This is a minor performance consideration, not a correctness issue, and is identical behavior to the WASM extractor.


Phase 5 — Backlog Compliance

  • Zero-dep: ✓ — No new runtime dependencies. Uses existing tree-sitter-ruby grammar already in Cargo.toml.
  • Foundation-aligned: ✓ — Purely local, offline extraction. No cloud calls, no agent behavior, no pipeline changes.
  • Problem-fit: 5 — Engine divergence directly undermines the tool's core reliability guarantee. When native and WASM produce different node sets, all downstream queries (impact, blast radius, context) are unreliable for users running the native engine. This fix restores parity.
  • Breaking: No — purely additive. New nodes are extracted (Ruby top-level constants, C++ class/method nodes). No API, schema, or CLI output format changes.
  • Tier: 1 (zero-dep ✓ + foundation-aligned ✓)

Critical Concerns

Minor gap — no WASM-side test for top-level constant extraction (tests/parsers/ruby.test.ts)

The three new unit tests live entirely in the Rust crate. The WASM extractor (src/extractors/ruby.ts) already had handleRubyAssignment before this PR — but tests/parsers/ruby.test.ts doesn't appear to assert that a top-level constant like SKIP_CLASSES = %w[...].freeze is extracted. The underlying WASM logic is correct (it predates this PR), but having a corresponding TS-side test would complete the parity coverage. This is low priority since the WASM path was not changed and is the reference implementation, but worth tracking.

Suggestion: Consider adding extracts_top_level_constant to tests/parsers/ruby.test.ts in a follow-up to prevent future WASM regressions on this code path.


Final Assessment

## Root Cause Analysis
Problem Being Solved: Two real extraction bugs causing native/WASM node-set divergence —
  missing top-level constant extraction in the Ruby native extractor, and
  a C header misclassified as C causing grammar-version-sensitive parse differences.
Why This Approach: Direct, targeted fixes at the extraction layer. Ruby fix ports
  the reference WASM logic faithfully. C++ fix corrects the file extension so both
  engines use the same (C++) grammar, eliminating the mismatch root cause.
Risk Assessment: Low — additive changes only, both fixes backed by unit tests,
  no config/schema/API changes.

## Backlog Compliance
- Zero-dep: ✓ — no new runtime deps
- Foundation-aligned: ✓ — fully local, no cloud, no agent behavior
- Problem-fit: 5 — restores engine reliability, which is foundational to accurate
  impact analysis and blast-radius queries for AI agents
- Breaking: No
- Tier: 1

## Critical Concerns
None blocking. One minor follow-up suggestion: add a WASM-side test for
Ruby top-level constant extraction in tests/parsers/ruby.test.ts.

## Final Recommendation
- Rating: ⭐⭐⭐⭐ (4/5)
- Action: APPROVE
- Reasoning: Clean, well-reasoned fix with faithful engine parity, three targeted
  unit tests covering positive/negative/boundary cases, and no collateral changes.
  The only gap is a missing WASM-side test for the already-correct WASM path —
  a minor improvement for a follow-up, not a blocker.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 31, 2026

Greptile Summary

This PR eliminates the WASM/native engine node divergence (#1263) by fixing two independent root causes: adding top-level constant extraction to the native Ruby extractor and renaming a misclassified C header to .hpp so both engines parse it as C++.

  • Ruby extractor (ruby.rs): Adds handle_assignment gated on parent().kind() == "program", mirroring the WASM handleRubyAssignment. Class/module-body constants are already handled by extract_ruby_class_children and remain attributed to their owner. Three Rust unit tests and three parallel TypeScript tests cover the boundary cases.
  • C++ fixture rename: service.hservice.hpp with #include updates in main.cpp and service.cpp, and a one-line addition to native-tracer.sh to copy .hpp headers into the build sandbox alongside .h files.

Confidence Score: 5/5

Safe to merge — both fixes are narrow and well-tested, with no changes to core extraction logic for existing node kinds.

The Ruby extractor change is a guarded additive path: it only fires when the assignment node is a direct child of the program root and the left-hand side is a constant, leaving class- and module-body constants untouched. The C++ fixture rename is purely mechanical — service.hpp content is byte-for-byte identical to the old service.h — and the one-line tracer addition follows the existing copy pattern exactly. Three Rust unit tests and three TypeScript tests explicitly cover the boundary conditions.

No files require special attention.

Important Files Changed

Filename Overview
crates/codegraph-core/src/extractors/ruby.rs Adds handle_assignment for program-level constant extraction with correct parent-kind guard and three unit tests covering the key boundary cases.
tests/parsers/ruby.test.ts Three new TypeScript tests mirror the Rust unit tests, verifying the WASM extractor agrees with the native extractor on top-level constants and boundary cases.
tests/benchmarks/resolution/tracer/native-tracer.sh Single-line addition to copy .hpp headers into the build sandbox, matching the existing .h copy pattern with the same silent-failure guard.
tests/benchmarks/resolution/fixtures/cpp/service.hpp Pure rename from service.h; content is unchanged. The file genuinely contains C++ class definitions, making .hpp the correct extension.
tests/benchmarks/resolution/fixtures/cpp/main.cpp Include path updated from service.h to service.hpp to reflect the rename.
tests/benchmarks/resolution/fixtures/cpp/service.cpp Include path updated from service.h to service.hpp to reflect the rename.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[walk_tree visits node] --> B{node.kind?}
    B -->|assignment| C[handle_assignment]
    B -->|class| D[handle_class]
    B -->|module| E[handle_module]
    B -->|method/call/etc| F[other handlers]
    C --> G{parent.kind == program?}
    G -->|No| H[return early]
    G -->|Yes| I{left.kind == constant?}
    I -->|No| J[return early]
    I -->|Yes| K[push to symbols.definitions]
    D --> L[extract_ruby_class_children]
Loading

Reviews (3): Last reviewed commit: "Merge branch 'main' into fix/engine-node..." | Re-trigger Greptile

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 31, 2026

Codegraph Impact Analysis

7 functions changed5 callers affected across 2 files

  • match_ruby_node in crates/codegraph-core/src/extractors/ruby.rs:25 (0 transitive callers)
  • handle_assignment in crates/codegraph-core/src/extractors/ruby.rs:41 (1 transitive callers)
  • parse_ruby in crates/codegraph-core/src/extractors/ruby.rs:344 (3 transitive callers)
  • extracts_top_level_constant in crates/codegraph-core/src/extractors/ruby.rs:354 (0 transitive callers)
  • ignores_local_variable_assignment in crates/codegraph-core/src/extractors/ruby.rs:365 (0 transitive callers)
  • does_not_duplicate_class_body_constant_at_top_level in crates/codegraph-core/src/extractors/ruby.rs:371 (0 transitive callers)
  • trace_c_cpp in tests/benchmarks/resolution/tracer/native-tracer.sh:45 (1 transitive callers)

)

Mirror the three Rust unit tests for handleRubyAssignment in the TS
test suite: positive case for top-level constant, negative case for
local variable assignment, and boundary case for class-body constant.
@carlos-alm
Copy link
Copy Markdown
Contributor Author

Addressed review feedback:

  • Added three WASM-side tests to tests/parsers/ruby.test.ts mirroring the Rust unit tests: extracts top-level constant assignment, does not extract local variable assignment as a constant, and does not extract class-body constant as a top-level definition. These cover the same three cases as the Rust tests and guard against future WASM regressions on the handleRubyAssignment code path.

@carlos-alm
Copy link
Copy Markdown
Contributor Author

@claude

@claude
Copy link
Copy Markdown

claude Bot commented May 31, 2026

Claude finished @carlos-alm's task —— View job


Follow-up Review: WASM-side Ruby tests

Tasks

  • Gather context
  • Read updated tests/parsers/ruby.test.ts
  • Verify parity with Rust unit tests
  • Write final assessment

Assessment

The three new WASM-side tests in tests/parsers/ruby.test.ts:117-138 directly mirror the Rust unit tests. Parity check:

Case Rust (ruby.rs) TypeScript (ruby.test.ts) Match
Top-level constant SKIP_CLASSES = %w[...].freezekind == "constant" SKIP_CLASSES = %w[...].freezekind: 'constant', line: 1
Local variable excluded edges = [] → no "edges" def edges = [] → zero kind === 'constant' defs
Class-body constant not top-level class Foo\n BAR = 1\nend → count of "BAR" == 0 class Foo\n BAR = 1\nendtopLevelConstant is undefined

The TS tests are slightly stricter on the positive case (also assert line: 1), which is good. The negative-case filter (d.kind === 'constant') in the TS test is more precise than the Rust test (d.name == "edges"), which is also fine — both correctly verify the same invariant.

The minor gap from the original review is fully addressed. No new concerns.

Updated rating: ⭐⭐⭐⭐⭐ (5/5) — APPROVE

View job

@carlos-alm carlos-alm merged commit 26d0488 into main Jun 1, 2026
27 checks passed
@carlos-alm carlos-alm deleted the fix/engine-node-divergence branch June 1, 2026 00:05
@github-actions github-actions Bot locked and limited conversation to collaborators Jun 1, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

follow-up: fix pre-existing 1-node/1-edge WASM vs native full-build divergence

1 participant