Skip to content

Add Solidity language support#563

Merged
buger merged 1 commit into
mainfrom
add-solidity-language-support
May 20, 2026
Merged

Add Solidity language support#563
buger merged 1 commit into
mainfrom
add-solidity-language-support

Conversation

@buger
Copy link
Copy Markdown
Collaborator

@buger buger commented May 20, 2026

Summary

  • add first-class Solidity parsing via tree-sitter-solidity and wire .sol through search, extract, symbols, query, parser pools, docs, and LSP/indexing helpers
  • add Solidity symbol extraction for contracts, interfaces, libraries, functions, constructors, modifiers, fallback/receive, structs, enums, events, errors, state variables, and user-defined value types
  • add realistic Solidity fixtures/tests and fix generic JS test-call detection so Solidity identifiers like latest() are not marked as tests

Verification

  • cargo fmt --all -- --check
  • cargo clippy --all-targets --all-features -- -D warnings
  • cargo test --lib
  • cargo test --test integration_tests
  • cargo check
  • cargo test --test solidity_language_tests
  • cargo test query::tests::test_solidity_query_support

Dogfood

  • Sparse cloned OpenZeppelin/openzeppelin-contracts master at cd05883078060e0cd8a7bd36636944570dbe1722
  • Ran probe symbols across contracts/governance: 23 .sol files, 869 lines of outline output
  • Ran probe search quorum ... --language solidity --format json
  • Ran probe extract .../Governor.sol#castVote --format plain and got the expected castVote function block
  • Ran probe query with a Solidity function pattern against Governor.sol and matched public virtual functions

@buger buger merged commit 124a819 into main May 20, 2026
15 checks passed
@probelabs
Copy link
Copy Markdown
Contributor

probelabs Bot commented May 20, 2026

PR Overview: Add Solidity Language Support

Summary

This PR adds first-class Solidity language support to Probe, enabling full parsing, symbol extraction, search, query, and LSP indexing capabilities for .sol files. The implementation follows Probe's established language integration pattern and includes comprehensive test coverage.

Files Changed Analysis

Core Changes (30 files, +165/-15 lines):

Dependencies & Build

  • Cargo.toml (+1): Added tree-sitter-solidity = "=1.2.10" dependency
  • lsp-daemon/Cargo.toml (+1): Added tree-sitter-solidity = "=1.2.10" dependency

Documentation Updates (9 files)

  • README.md: Added Solidity to multi-language list and supported languages table
  • docs/probe-cli/extraction-reference.md: Added Solidity extraction capabilities
  • docs/probe-cli/query.md: Added Solidity to language parameter table
  • docs/probe-cli/search.md: Added "solidity" to supported languages list
  • docs/reference/adding-languages.md: Updated npm language list example
  • docs/reference/architecture.md: Added Solidity to supported languages
  • docs/reference/faq.md: Updated language support answer
  • docs/reference/language-support.md: Added Solidity section with features
  • docs/reference/supported-languages.md: Added Solidity row with extraction/test detection support

CLI & Language Detection

  • src/cli.rs (+2): Added "solidity"/"sol" to language parameter completions (2 locations)
  • src/debug_tree_sitter.rs (+1): Added Solidity language name detection
  • src/extract/formatter.rs (+1): Added .sol extension mapping to "solidity"

LSP Daemon Integration (11 files)

  • lsp-daemon/src/analyzer/tree_sitter_analyzer.rs (+2): Added .sol extension and language mapping
  • lsp-daemon/src/daemon.rs (+6): Added Solidity parser initialization
  • lsp-daemon/src/fqn.rs (+17/-1): Added Solidity FQN extraction with method/namespace detection
  • lsp-daemon/src/indexing/ast_extractor.rs (+28/-1): Added Solidity symbol kind mapping (contracts, interfaces, libraries, functions, modifiers, events, errors, state variables, user-defined types)
  • lsp-daemon/src/indexing/config.rs (+8): Added Solidity to language configs with feature flags (extract_contracts, extract_events, extract_modifiers)
  • lsp-daemon/src/indexing/file_detector.rs (+4/-3): Added "sol" to supported extensions and languages array
  • lsp-daemon/src/language_detector.rs (+6): Added Solidity enum variant with "solidity"/"sol" string mapping
  • lsp-daemon/src/lsp_database_adapter.rs (+20/-1): Added Solidity tree-sitter integration, separator, and method/namespace detection
  • lsp-daemon/src/lsp_registry.rs (+1): Added Solidity language mapping
  • lsp-daemon/src/lsp_server.rs (+1): Added .sol extension to language mapping
  • lsp-daemon/src/relationship/tree_sitter_extractor.rs (+1): Added Solidity to parser pool

Symbol & Language Support

  • lsp-daemon/src/symbol/language_support.rs (+39): Added comprehensive LanguageRules::solidity() with scope separator ("."), signature keywords (contract, interface, library, function, constructor, modifier, event, error, visibility modifiers, state mutability), and type aliases (uint→uint256, int→int256)
  • lsp-daemon/src/symbol/uid_generator.rs (+2): Added Solidity language rules initialization
  • lsp-daemon/src/workspace/config.rs (+1): Added "solidity" to default indexed languages
  • lsp-daemon/src/workspace/project.rs (+1): Added .sol extension to Solidity language mapping
  • lsp-daemon/src/workspace_resolver.rs (+1): Added Solidity project markers (foundry.toml, hardhat.config.js/ts)

Architecture & Impact Assessment

What This PR Accomplishes

  1. Complete Language Integration: Adds Solidity as a first-class supported language across all Probe features
  2. Symbol Extraction: Supports contracts, interfaces, libraries, functions, constructors, modifiers, fallback/receive functions, structs, enums, events, errors, state variables, and user-defined value types
  3. Test Detection: Identifies Foundry-style .t.sol files, *Test.sol contracts, setUp, test*, and invariant_* functions
  4. LSP/Indexing: Full integration with language server, symbol database, and relationship extraction
  5. Query Support: Enables ast-grep pattern matching for Solidity code structures

Key Technical Changes

1. Tree-Sitter Integration

  • Uses tree-sitter-solidity grammar (pinned to 1.2.10) for parsing
  • Integrated into parser pools across CLI and LSP daemon
  • Supports all Solidity node types for symbol extraction

2. Symbol Kind Mapping

"function_definition" | "fallback_receive_definition" => SymbolKind::Function
"constructor_definition" | "modifier_definition" => SymbolKind::Method
"contract_declaration" | "library_declaration" => SymbolKind::Class
"interface_declaration" => SymbolKind::Interface
"struct_declaration" => SymbolKind::Struct
"enum_declaration" => SymbolKind::Enum
"event_definition" | "error_definition" => SymbolKind::Type
"state_variable_declaration" => SymbolKind::Variable
"user_defined_type_definition" => SymbolKind::Type

3. Language Rules

  • Scope separator: . (e.g., Contract.function)
  • Signature normalization: Remove parameter names
  • Default visibility: internal
  • Type aliases: uintuint256, intint256
  • Keywords: contract, interface, library, function, constructor, modifier, event, error, public, external, internal, private, view, pure, payable, virtual, override

4. Test Detection

  • File patterns: *.t.sol, *Test.sol
  • Function patterns: setUp, test*, invariant_*
  • Follows Foundry testing conventions

5. Project Detection

  • Marker files: foundry.toml, hardhat.config.js, hardhat.config.ts
  • Extension: .sol

Affected System Components

graph TD
    A[CLI Commands] --> B[Language Factory]
    B --> C[Solidity Language Impl]
    C --> D[Tree-Sitter Parser]
    D --> E[Symbol Extraction]
    E --> F[Query/Search]
    
    G[LSP Daemon] --> H[Language Detector]
    H --> I[Parser Pool]
    I --> D
    D --> J[AST Extractor]
    J --> K[Symbol Database]
    K --> L[FQN Generator]
    L --> M[Language Rules]
    M --> N[UID Generator]
    
    O[Workspace Resolver] --> P[Project Detection]
    P --> Q[Marker Files]
    Q --> R[foundry.toml]
    Q --> S[hardhat.config.*]
Loading

Component Impact:

  • CLI: search, extract, query, symbols commands now support .sol files
  • LSP Daemon: Full indexing, symbol resolution, and relationship extraction
  • Language Detection: Auto-detects Solidity from .sol extension and project markers
  • Query System: Supports ast-grep patterns for Solidity code structures
  • Documentation: All language lists and examples updated

Scope Discovery & Context Expansion

Immediate Impact

  • Core Language Module: New solidity.rs implementation file (not shown in diff but required)
  • Test Fixtures: New Solidity test fixtures in tests/fixtures/solidity/ (not shown in diff but required)
  • Integration Tests: New test cases for Solidity parsing and extraction

Related Files to Verify

  1. src/language/solidity.rs (likely added): Implements LanguageImpl trait for Solidity
  2. tests/fixtures/solidity/ (likely added): Test .sol files for validation
  3. tests/soliditytests.rs (likely added): Solidity-specific test cases
  4. src/language/mod.rs (modified): Added pub mod solidity; (not shown in diff)
  5. src/language/factory.rs (modified): Added "sol" => Some(Box::new(SolidityLanguage::new())) (not shown in diff)

Cross-Module Integration

  • Search System: Language filters now include Solidity
  • Query System: get_language() and get_file_extension() functions updated (not shown in diff)
  • Extract System: Formatter handles .sol extension
  • Indexing System: Language config includes Solidity feature flags

Testing Strategy

Based on the PR description, the author verified:

  • Cargo formatting and linting checks
  • Unit tests (cargo test --lib)
  • Integration tests (cargo test --test integration_tests)
  • Solidity-specific tests (cargo test --test solidity_language_tests)
  • Query support tests (cargo test query::tests::test_solidity_query_support)
  • Real-world testing with OpenZeppelin contracts (23 .sol files, 869 lines of symbols)

Potential Edge Cases

  1. NatSpec Comments: Solidity uses /// and /** */ for documentation - comment handling should be verified
  2. Fallback/Receive Functions: Special functions without names - signature extraction needs testing
  3. Inheritance: Solidity supports multiple inheritance - FQN generation should handle is contracts
  4. Overloading: Solidity doesn't support function overloading - UID generation should be straightforward
  5. State Variables: Public state variables auto-generate getter functions - symbol extraction should distinguish them

Verification Steps

The author performed comprehensive verification:

  1. Code Quality: cargo fmt, cargo clippy, cargo check all pass
  2. Unit Tests: cargo test --lib passes
  3. Integration Tests: cargo test --test integration_tests passes
  4. Language Tests: cargo test --test solidity_language_tests passes
  5. Query Tests: cargo test query::tests::test_solidity_query_support passes
  6. Real-World Testing: Tested against OpenZeppelin contracts with successful symbol extraction, search, and query operations

Recommendations for Reviewers

  1. Focus Areas:

    • Symbol kind mapping in ast_extractor.rs (lines 840-856)
    • Language rules in language_support.rs (lines 297-330)
    • Test detection logic (verify Foundry conventions)
    • FQN generation for contract inheritance
  2. Testing:

    • Verify test fixtures cover all Solidity constructs
    • Check that public state variable getters are handled correctly
    • Test fallback/receive function signature extraction
    • Validate NatSpec comment association
  3. Documentation:

    • Ensure all language lists are consistent
    • Verify examples use correct Solidity syntax
  4. Performance:

    • Parser pool integration should be efficient
    • Large contract files (like OpenZeppelin) should parse quickly
Metadata
  • Review Effort: 3 / 5
  • Primary Label: feature

Powered by Visor from Probelabs

Last updated: 2026-05-20T07:07:07.584Z | Triggered by: pr_opened | Commit: 671818a

💡 TIP: You can chat with Visor using /visor ask <your question>

@probelabs
Copy link
Copy Markdown
Contributor

probelabs Bot commented May 20, 2026

✅ Security Check Passed

No security issues found – changes LGTM.

✅ Security Check Passed

No security issues found – changes LGTM.

\n\n

Architecture Issues (6)

Severity Location Issue
🟠 Error lsp-daemon/src/symbol/language_support.rs:520
LanguageRulesFactory::supported_languages() at line 536-543 lists 'solidity' but LanguageRulesFactory::create_rules() at line 503-516 does NOT include a case for 'solidity' or 'sol'. This creates an inconsistency where Solidity is declared as supported but will return None from create_rules(), causing fallback to generic rules instead of proper Solidity-specific semantics.
💡 SuggestionAdd 'solidity' | 'sol' => Some(LanguageRules::solidity()) case to the match statement in create_rules() after the cpp case, matching the pattern used for other languages.
🟢 Info lsp-daemon/src/indexing/config.rs:1502
Solidity feature flags (extract_contracts, extract_events, extract_modifiers) are added at lines 1502-1507 but are not integrated with the actual symbol extraction logic in ast_extractor.rs. The extraction happens unconditionally based on node kinds regardless of these flags, creating a disconnect between configuration and implementation.
💡 SuggestionEither implement feature flag checking in ast_extractor.rs or remove these unused flags. Following the Rust pattern (extract_macros, extract_traits), these flags should control whether specific symbol types are extracted.
🟡 Warning lsp-daemon/src/indexing/ast_extractor.rs:837
The Solidity symbol mapping is added inline within the existing match statement (lines 837-850) rather than as a separate match arm. While this works functionally, it creates an inconsistent pattern where Solidity is embedded in the existing structure rather than being a first-class case like Rust, Python, Go, Java, and C++ which have clear separate match arms.
💡 SuggestionRefactor to add Solidity as a separate match arm: crate::language_detector::Language::Solidity => match node_kind { ... }, following the established pattern for other languages.
🟡 Warning lsp-daemon/src/fqn.rs:361
Solidity language handling is scattered across multiple functions: get_language_separator() adds 'sol' (line 364), is_method_node() adds 4 Solidity node kinds (lines 382-389), is_namespace_node() adds 5 Solidity node kinds (lines 411-418). This scatter-gun approach requires updating multiple locations for any Solidity changes, violating DRY and increasing maintenance burden compared to centralized language-specific modules.
💡 SuggestionConsider centralizing Solidity-specific logic in a dedicated module or trait implementation that these utility functions can delegate to, similar to how other languages have specialized analyzers.
🟡 Warning lsp-daemon/src/lsp_database_adapter.rs:2404
The same Solidity special-case pattern is duplicated in lsp_database_adapter.rs: get_language_separator() adds 'sol' (line 2407), is_method_node() adds 4 Solidity node kinds (lines 2423-2430), is_namespace_node() adds 5 Solidity node kinds (lines 2453-2460). This duplicates the logic already added to fqn.rs, creating two separate maintenance points for Solidity language behavior.
💡 SuggestionExtract language-specific node kind mappings into a shared registry or trait that both fqn.rs and lsp_database_adapter.rs can use, eliminating duplication.
🟡 Warning lsp-daemon/src/indexing/ast_extractor.rs:950
The get_symbol_name() method adds Solidity-specific handling for constructor_definition and fallback_receive_definition (lines 950-962) directly in a generic utility method. This violates single responsibility - the method now handles both generic symbol name extraction and Solidity-specific node name resolution. The method should delegate to language-specific implementations.
💡 SuggestionMove Solidity-specific node name handling to a language-specific helper or trait method, then call it conditionally only for Solidity language from the generic get_symbol_name() method.

Quality Issues (1)

Severity Location Issue
🟠 Error system:0
ProbeAgent execution failed: Error: Failed to get response from AI model. No output generated. Check the stream for errors.

Powered by Visor from Probelabs

Last updated: 2026-05-20T06:47:44.307Z | Triggered by: pr_opened | Commit: 671818a

💡 TIP: You can chat with Visor using /visor ask <your question>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant