Skip to content

feat(native): port Solidity extractor to Rust#1100

Merged
carlos-alm merged 5 commits into
mainfrom
feat/1071-solidity-rust-extractor
May 13, 2026
Merged

feat(native): port Solidity extractor to Rust#1100
carlos-alm merged 5 commits into
mainfrom
feat/1071-solidity-rust-extractor

Conversation

@carlos-alm
Copy link
Copy Markdown
Contributor

Summary

  • Port the Solidity extractor to the native Rust engine in crates/codegraph-core/src/extractors/solidity.rs, mirroring WASM-side behavior for symbols, imports, inheritance, calls, modifiers, events, errors, and structs/enums.
  • Wire tree-sitter-solidity 1.2 into parser_registry, file_collector (.sol), the extractor dispatcher, and the AST-rules registry (new/throw/string kinds).
  • Mark .sol native-supported in NATIVE_SUPPORTED_EXTENSIONS and update the native-drop classification test (Solidity no longer drops).

Part of #1071

Test plan

  • cargo build --release -p codegraph-core (clean compile)
  • cargo test -p codegraph-core --lib — 193/193 pass, including 9 new Solidity extractor tests
  • npx vitest run tests/parsers/solidity.test.ts — 5/5 pass (WASM parity sanity)
  • npx vitest run tests/parsers/native-drop-classification.test.ts — 13/13 pass
  • npx vitest run tests/benchmarks/resolution/resolution-benchmark.test.ts -t solidity — 5/5 pass

Adds tree-sitter-solidity dependency and native extractor matching
the WASM-side behavior for Solidity symbol, import, and call extraction.

Part of #1071
@claude
Copy link
Copy Markdown

claude Bot commented May 11, 2026

Claude finished @carlos-alm's task —— View job


PR Review: Solidity Native Extractor Port

Reviewing PR #1100 following structured analysis phases:

Todo List:

  • Phase 1: Root Cause Analysis - understand problem and approach
  • Phase 2: Configuration Change Detection - check for rule relaxations
  • Phase 3: Test Integrity - verify no weakened assertions
  • Phase 4: Code Quality - review implementation details
  • Phase 5: Backlog Compliance - evaluate against BACKLOG.md criteria
  • Final Assessment - provide rating and recommendation

Phase 1: Root Cause Analysis ✓

Problem Being Solved: Adding native Rust support for Solidity parsing to achieve dual-engine parity. Currently Solidity only works in WASM mode - this PR ports the extractor to the native engine so both engines can parse .sol files identically.

Why This Approach: Follows the established dual-engine architecture pattern used by all other supported languages. The implementation mirrors WASM-side behavior and explicitly notes where it matches JS extractor quirks (e.g., struct/enum members extraction).

Risk Assessment: Low risk - purely additive feature following established patterns.

Phase 2: Configuration Change Detection ✅ CLEAN

No concerning configuration relaxations found:

  • Only additive changes: new dependency tree-sitter-solidity = "1.2"
  • Registry updates: properly adds Solidity to all required enums and configs
  • Test update: legitimate - removes .sol from "unsupported" test since it's now natively supported
  • No rules disabled, thresholds lowered, or timeouts increased

Phase 3: Test Integrity ✅ CLEAN

No weakened assertions or test integrity issues:

  • Solidity WASM tests remain unchanged (no regression risk)
  • Resolution benchmark has Solidity entry with appropriate 0.0 thresholds for new language
  • Native classification test correctly updated: removed src/g.sol from unsupported list, decremented count from 11→10
  • Comprehensive test plan executed per PR description

Phase 4: Code Quality ✅ HIGH QUALITY

Excellent implementation following all established patterns:

Extractor Completeness (crates/codegraph-core/src/extractors/solidity.rs):

  • Full symbol extraction: contracts, interfaces, libraries, functions, modifiers, events, errors, structs, enums
  • Proper inheritance tracking for contract X is Y patterns
  • Import resolution for multiple Solidity import syntaxes
  • Call expression handling with member access support
  • AST node collection for new/throw/string patterns

Architectural Consistency:

  • Mirrors WASM behavior exactly - explicitly preserves quirks (lines 193-194, 232-234) where JS extractor produces no struct/enum members due to grammar traversal patterns
  • Proper parent name resolution using enclosing contract context
  • Appropriate symbol kinds: contracts→class, interfaces→interface, libraries→module, structs→struct, enums→enum

Registry Integration (all required updates present):

  • parser_registry.rs: Added LanguageKind::Solidity with .sol extension mapping
  • file_collector.rs: Added "sol" to SUPPORTED_EXTENSIONS
  • AST_TYPE_MAPS: Added Solidity config with new_expressionnew, revert_statementthrow, string types
  • helpers.rs: Added SOLIDITY_AST_CONFIG constant matching JS side
  • parser.ts: Added .sol to NATIVE_SUPPORTED_EXTENSIONS

Test Coverage: 9 comprehensive unit tests covering all symbol types and edge cases.

Phase 5: Backlog Compliance

Zero-dep: ✓ — Single new tree-sitter grammar dependency (standard pattern for new language support)

Foundation-aligned: ✓ — Fully compliant with all principles:

  • P1 (always-current): Extends existing parsing capability
  • P4 (zero-cost core): Pure AST extraction, no external dependencies
  • P8 (not an agent): Analysis tool only, no code generation

Problem-fit: 4/5 — Directly improves agent accuracy by eliminating a significant language gap. Solidity is critical for web3/DeFi codebases, and dual-engine parity prevents hallucinated dependencies from engine inconsistencies.

Breaking: No — Purely additive, no API changes, maintains backward compatibility

Tier: Tier 1 (zero-dep ✓ + foundation-aligned ✓)

Final Assessment

## Root Cause Analysis
**Problem Being Solved**: Native engine was missing Solidity support, forcing fallback to WASM and creating dual-engine inconsistency
**Why This Approach**: Follows established dual-engine parity architecture used by all 24 other supported languages  
**Risk Assessment**: Minimal - purely additive with comprehensive test coverage

## Backlog Compliance
- Zero-dep: ✓ — standard tree-sitter grammar addition
- Foundation-aligned: ✓ — pure analysis capability extension
- Problem-fit: 4/5 — directly addresses agent accuracy in web3 codebases
- Breaking: No — purely additive
- Tier: 1

## Critical Concerns
None. Implementation is exemplary.

## Final Recommendation
- Rating: ⭐⭐⭐⭐⭐ (5/5)
- Action: APPROVE
- Reasoning: Flawless implementation following all architectural patterns, comprehensive test coverage, zero configuration concerns, and directly addresses core use case of preventing AI hallucination in web3 codebases.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 11, 2026

Greptile Summary

This PR ports the Solidity extractor from the WASM-side TypeScript implementation to the native Rust engine, wires tree-sitter-solidity 1.2 into all relevant registries, and simultaneously fixes the multi-parent inheritance bug (previously reported) in both the Rust and TS extractors.

  • Native Solidity extractor (crates/codegraph-core/src/extractors/solidity.rs): full port covering contracts/interfaces/libraries, structs, enums, functions, modifiers, events, custom errors, state variables, imports, and call expressions — faithfully mirrors the TS extractor in structure and output.
  • Registry wiring: parser_registry.rs, file_collector.rs, extractors/mod.rs, src/domain/parser.ts, and src/ast-analysis/rules/index.ts are all consistently updated; NATIVE_SUPPORTED_EXTENSIONS now includes .sol and the drop-classification test is updated to match.
  • Multi-parent inheritance fix: both the Rust and TS extractors now walk every inheritance_specifier sibling instead of stopping at the first, with new unit tests on both sides locking the behaviour in.

Confidence Score: 5/5

Safe to merge — the Rust extractor faithfully mirrors the TS extractor, all registry wiring is complete and exhaustive, and both sides of the multi-parent inheritance fix are covered by new tests.

The port is a faithful translation of the TS extractor into Rust with no divergence in logic. The multi-parent inheritance fix is correctly applied to both the Rust and TS implementations. All match arms, registry entries, file-extension lists, and AST-type maps are consistent across the codebase. Nine new Rust unit tests plus the updated TS suite give good coverage.

No files require special attention.

Important Files Changed

Filename Overview
crates/codegraph-core/src/extractors/solidity.rs New Rust Solidity extractor; faithfully mirrors TS extractor with 9 unit tests. Multi-parent inheritance handled correctly by walking all inheritance_specifier siblings.
crates/codegraph-core/src/parser_registry.rs Adds Solidity variant to LanguageKind enum and all match arms; count assertion updated to 28; extension mapping for .sol added.
crates/codegraph-core/src/extractors/mod.rs Adds solidity module and dispatch arm for LanguageKind::Solidity; exhaustive match satisfied.
src/extractors/solidity.ts extractInheritance fixed to walk all inheritance_specifier siblings; new multi-parent test added in solidity.test.ts.
src/domain/parser.ts Adds .sol to NATIVE_SUPPORTED_EXTENSIONS; consistent with Rust-side registration.
src/ast-analysis/rules/index.ts Adds SOLIDITY_AST_TYPES and SOLIDITY_STRING_CONFIG to the per-language maps; consistent with Rust SOLIDITY_AST_CONFIG.
crates/codegraph-core/src/extractors/helpers.rs Adds SOLIDITY_AST_CONFIG constant; correctly maps new_expression, revert_statement, and three string literal node types.
crates/codegraph-core/src/file_collector.rs Adds sol to SUPPORTED_EXTENSIONS; straightforward and consistent with registry change.
tests/parsers/native-drop-classification.test.ts Removes .sol from the unsupported-by-native list and decrements the expected count from 8 to 7; correct.
tests/parsers/solidity.test.ts Adds multi-parent inheritance test on the WASM side to match new Rust test; good parity coverage.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[".sol file input"] --> B[file_collector\nchecks SUPPORTED_EXTENSIONS]
    B --> C[parser_registry\nLanguageKind::Solidity\ntree-sitter-solidity 1.2]
    C --> D[parse → Tree]
    D --> E[extract_symbols_with_opts\nextractors/mod.rs dispatch]
    E --> F[SolidityExtractor.extract]
    F --> G[walk_tree\nmatch_solidity_node]
    F --> H[walk_ast_nodes_with_config\nSOLIDITY_AST_CONFIG]
    G --> I1[FileSymbols.definitions]
    G --> I2[FileSymbols.classes]
    G --> I3[FileSymbols.imports]
    G --> I4[FileSymbols.calls]
    H --> I5[FileSymbols.ast_nodes]
    I1 & I2 & I3 & I4 & I5 --> J[FileSymbols output]
Loading

Reviews (5): Last reviewed commit: "fix: resolve merge conflicts with main" | Re-trigger Greptile

Comment on lines +253 to +270

symbols.definitions.push(Definition {
name: full_name,
kind: "enum".to_string(),
line: start_line(node),
end_line: Some(end_line(node)),
decorators: None,
complexity: None,
cfg: None,
children: opt_children(members),
});
}

// ── Functions / modifiers / events / errors / state vars ─────────────────────

fn handle_function_def(node: &Node, source: &[u8], symbols: &mut FileSymbols) {
let Some(name_node) = node.child_by_field_name("name") else {
return;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Multiple inheritance silently drops all but the first parent

find_child returns only the first matching sibling, so for contract A is B, C {} only the B → A relationship is pushed to symbols.classes; the relationship C → A is silently dropped. The TS extractor has the same gap (both call findChild(node, 'inheritance_specifier')), so native and WASM are in parity — but neither handles the multiple-parent case. Neither the Rust unit tests nor the TS WASM suite includes a multi-parent contract, so this edge case has never been exercised. Consider iterating all inheritance_specifier siblings instead of stopping at the first, and adding a test like contract A is B, C {}.

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 5513d49 — confirmed via tree-sitter-solidity's grammar.js (_class_heritage: "is" commaSep1($.inheritance_specifier)) that each parent in contract A is B, C, D {} is its own inheritance_specifier sibling under the contract node, so find_child/findChild returning only the first one was indeed dropping the rest. Both the native and WASM extractors now walk every direct child of the contract node and emit a ClassRelation for every parent, and there are new multi-parent unit tests on both sides (extracts_multi_parent_inheritance in Rust, extracts multi-parent inheritance in TS) to lock the behaviour in.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 11, 2026

Codegraph Impact Analysis

37 functions changed25 callers affected across 3 files

  • extract_symbols_with_opts in crates/codegraph-core/src/extractors/mod.rs:62 (1 transitive callers)
  • SolidityExtractor.extract in crates/codegraph-core/src/extractors/solidity.rs:9 (0 transitive callers)
  • match_solidity_node in crates/codegraph-core/src/extractors/solidity.rs:34 (0 transitive callers)
  • handle_contract_decl in crates/codegraph-core/src/extractors/solidity.rs:54 (1 transitive callers)
  • extract_contract_members in crates/codegraph-core/src/extractors/solidity.rs:88 (2 transitive callers)
  • extract_contract_member in crates/codegraph-core/src/extractors/solidity.rs:101 (3 transitive callers)
  • extract_inheritance in crates/codegraph-core/src/extractors/solidity.rs:169 (2 transitive callers)
  • handle_struct_decl in crates/codegraph-core/src/extractors/solidity.rs:195 (1 transitive callers)
  • handle_enum_decl in crates/codegraph-core/src/extractors/solidity.rs:237 (1 transitive callers)
  • handle_function_def in crates/codegraph-core/src/extractors/solidity.rs:278 (1 transitive callers)
  • handle_modifier_def in crates/codegraph-core/src/extractors/solidity.rs:302 (1 transitive callers)
  • handle_event_def in crates/codegraph-core/src/extractors/solidity.rs:324 (1 transitive callers)
  • handle_error_decl in crates/codegraph-core/src/extractors/solidity.rs:346 (1 transitive callers)
  • handle_state_var_decl in crates/codegraph-core/src/extractors/solidity.rs:368 (1 transitive callers)
  • handle_import_directive in crates/codegraph-core/src/extractors/solidity.rs:392 (1 transitive callers)
  • handle_call_expression in crates/codegraph-core/src/extractors/solidity.rs:438 (1 transitive callers)
  • extract_sol_params in crates/codegraph-core/src/extractors/solidity.rs:474 (2 transitive callers)
  • find_parent_name in crates/codegraph-core/src/extractors/solidity.rs:499 (8 transitive callers)
  • strip_quotes in crates/codegraph-core/src/extractors/solidity.rs:504 (2 transitive callers)
  • parse_sol in crates/codegraph-core/src/extractors/solidity.rs:516 (10 transitive callers)

`extract_inheritance` / `extractInheritance` previously called
`find_child(node, 'inheritance_specifier')` and stopped at the first
match, but the tree-sitter-solidity grammar models each parent in
`contract A is B, C, D { }` as a separate `inheritance_specifier`
sibling under the contract node (`_class_heritage: "is"
commaSep1($.inheritance_specifier)`). All parents past the first were
silently dropped on both the native and WASM paths.

Walk every direct child of the contract node and emit a ClassRelation
for each parent on both engines, and add multi-parent unit tests in
Rust and TypeScript to lock the behaviour in.
@carlos-alm
Copy link
Copy Markdown
Contributor Author

Fixed in 5513d49 — confirmed via tree-sitter-solidity's grammar.js (_class_heritage: "is" commaSep1($.inheritance_specifier)) that each parent in contract A is B, C, D {} is its own inheritance_specifier sibling under the contract node, so the previous findChild/find_child returning only the first one was indeed dropping the rest. Both the native and WASM extractors now walk every direct child of the contract node and emit a ClassRelation for every parent, and there are new multi-parent unit tests on both sides (extracts_multi_parent_inheritance in Rust, extracts multi-parent inheritance in TS) to lock the behaviour in.

@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

@carlos-alm carlos-alm merged commit edd60a8 into main May 13, 2026
32 of 33 checks passed
@carlos-alm carlos-alm deleted the feat/1071-solidity-rust-extractor branch May 13, 2026 10:40
@github-actions github-actions Bot locked and limited conversation to collaborators May 13, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant