Skip to content

Add Crystal language support#571

Merged
buger merged 1 commit into
mainfrom
add-crystal-language-support
May 31, 2026
Merged

Add Crystal language support#571
buger merged 1 commit into
mainfrom
add-crystal-language-support

Conversation

@buger
Copy link
Copy Markdown
Collaborator

@buger buger commented May 30, 2026

Summary

  • Adds first-class Crystal language support for parser/block extraction, symbol extraction, query, search filters, source context, path/language detection, and LSP daemon language registration.
  • Adds realistic Crystal fixtures covering modules, aliases, annotations, enums, structs, classes, methods, macros, specs, and HTTP::Server namespaced references.
  • Documents the Crystal support plan and supported-language status, including real-project dogfood requirements and current verification results.
  • Fixes search parsing for Crystal-style namespaced identifiers like HTTP::Server so they are treated as content terms rather than empty field filters.
  • Adds Crystal-specific LSP tree-sitter analyzer and database-adapter symbol mappings so parser-pool extraction and find_symbol_at_position() resolve Crystal modules, classes, and methods without falling back to keyword/regex paths.

Validation

  • cargo fmt --all -- --check
  • cargo check -p probe-code
  • cargo check -p lsp-daemon
  • cargo test -p lsp-daemon crystal
  • cargo test -p lsp-daemon language_detector
  • cargo test -p lsp-daemon lsp_registry
  • cargo test --test crystal_language_tests
  • cargo test query::tests::test_crystal_query_support
  • cargo test search::filters::tests::test_normalize_language_names
  • cargo test search::elastic_query::tests::test_quoted_strings
  • cargo test --test search_hints_tests
  • Real-project edge flags: --language cr, --with-context, --strict-elastic-syntax, --allow-tests, --max-bytes, --max-tokens, --dry-run
  • Pre-commit hook: cargo clippy --all-targets --all-features -- -D warnings, cargo test --lib, cargo test --test integration_tests

Real Crystal Dogfood

Validated against an up-to-date local checkout of https://github.com/crystal-lang/crystal:

  • probe symbols src/compiler/crystal/compiler.cr extracted module Crystal, class Compiler, enums, nested CompilationUnit, and compile methods.
  • probe query 'def compile' src/compiler/crystal --language crystal --format json returned 4 Crystal method matches across command.cr, compiler.cr, and interpreter/compiler.cr.
  • probe query 'def compile' src/compiler/crystal --language cr --max-results 3 --with-context --format json accepted the cr alias and returned Crystal query context metadata.
  • probe query 'class Compiler' src/compiler/crystal/compiler.cr --format json auto-detected .cr and returned the full class Compiler block.
  • probe search 'Crystal::System::Dir AND lang:crystal' . --no-gitignore --max-results 5 parsed the namespaced identifier and returned only Crystal files.
  • probe search '"Crystal::System::Dir" AND lang:crystal' . --strict-elastic-syntax --max-results 3 --no-gitignore verified strict syntax works for quoted Crystal namespaced constants.
  • probe search 'describe AND lang:crystal' spec --max-results 5 --no-gitignore --format json returned zero results by default, while adding --allow-tests --max-bytes 700 --max-tokens 250 returned Crystal spec blocks within the requested limits.
  • probe extract src/compiler/crystal/compiler.cr#compile --format plain found both compile method definitions.
  • probe extract src/compiler/crystal/compiler.cr:228 --format plain extracted the enclosing method from a line target.
  • probe extract src/compiler/crystal/compiler.cr#compile --dry-run --format plain reported both matching method ranges without returning code.

Notes

  • The parser-selection review comments were checked against current HEAD: ParserPool::create_parser() and find_symbol_at_position() already include crystal and cr match arms. This update adds regression coverage and fixes the adjacent symbol-node/name-resolution gap those paths exposed.
  • Crystal LSP is registered with the default crystalline command, but the README marks it as install-required because no Crystal LSP server binary is available in this local environment for a live smoke test.
  • LSP tool version checks fail in this environment: crystalline --version, crystal-language-server --version, and crystal --version all return command not found.

@probelabs
Copy link
Copy Markdown
Contributor

probelabs Bot commented May 30, 2026

PR Overview: Add Crystal Language Support

Summary

This PR adds first-class Crystal language support to Probe, covering parsing, symbol extraction, query/search, language filtering, source context, path detection, and LSP daemon integration. The implementation follows the established pattern used for other languages like Solidity and Ruby.

Files Changed Analysis

30 files modified with 893 additions and 39 deletions across core Probe CLI and LSP daemon:

Core Dependencies:

  • Cargo.toml - Added tree-sitter-crystal git dependency (pinned to commit f71f4ca62ac0 for ABI 14 compatibility)
  • lsp-daemon/Cargo.toml - Same Crystal dependency for LSP daemon

Documentation:

  • README.md - Added Crystal to supported languages list and feature table
  • docs/reference/supported-languages.md - Added Crystal support section with module/class/method/type extraction details
  • docs/reference/adding-languages.md - Updated language addition guide to reflect Crystal integration pattern
  • docs/reference/crystal-language-support-plan.md - NEW 426-line comprehensive implementation plan with validation checklist

CLI & Core Language Support:

  • src/cli.rs - Added crystal/cr to language argument enums for query and search commands
  • src/debug_tree_sitter.rs - Added Crystal symbol detection test cases and language name mapping by extension
  • src/extract/formatter.rs - Added .cr extension mapping to language names

LSP Daemon Integration (14 files):

  • lsp-daemon/src/language_detector.rs - Added Language::Crystal enum variant with extension mapping
  • lsp-daemon/src/lsp_registry.rs - Registered Crystal with crystalline LSP server configuration
  • lsp-daemon/src/lsp_server.rs - Added .crcrystal language ID mapping for LSP protocol
  • lsp-daemon/src/workspace_resolver.rs - Added shard.yml/shard.lock workspace markers for Crystal projects
  • lsp-daemon/src/analyzer/tree_sitter_analyzer.rs - Added Crystal parser pool support, node-to-symbol mapping, and keyword filtering
  • lsp-daemon/src/fqn.rs - Added Crystal FQN extraction with :: separator (like Rust/C++)
  • lsp-daemon/src/indexing/ast_extractor.rs - Added Crystal symbol kind mapping (Function, Class, Module, Struct, Enum, Interface, Type)
  • lsp-daemon/src/indexing/config.rs - Added Crystal to language configs with macro/module extraction features
  • lsp-daemon/src/indexing/file_detector.rs - Added .cr to supported extensions
  • lsp-daemon/src/indexing/lsp_enrichment_worker.rs - Added Crystal language mapping
  • lsp-daemon/src/indexing/pipelines.rs - Added Crystal indexing pipeline configuration
  • lsp-daemon/src/lsp_database_adapter.rs - Added Crystal tree-sitter integration for symbol extraction
  • lsp-daemon/src/relationship/tree_sitter_extractor.rs - Added Crystal parser support for relationship extraction
  • lsp-daemon/src/symbol/language_support.rs - Added LanguageRules::crystal() with :: scope separator, signature normalization, and Crystal-specific keywords
  • lsp-daemon/src/symbol/uid_generator.rs - Added Crystal language rules for unique identifier generation
  • lsp-daemon/src/workspace/config.rs - Added Crystal to default indexed languages
  • lsp-daemon/src/workspace/project.rs - Added .crcrystal mapping
  • lsp-daemon/README.md - Documented Crystal LSP support (marked as install-required since no binary available for testing)

NPM Package:

  • npm/src/agent/acp/tools.js - Updated tool description to include Crystal in supported languages list

CI Configuration:

  • .github/workflows/lsp-tests.yml - Updated PHP version from 8.1 to 8.2

Architecture & Impact Assessment

What This PR Accomplishes:

  • Full Crystal language support across all Probe subsystems (CLI, search, extract, query, LSP daemon)
  • Tree-sitter-based parsing for .cr files using the official tree-sitter-crystal grammar
  • Symbol extraction for modules, classes, structs, enums, methods, macros, aliases, and type definitions
  • LSP daemon integration with crystalline server registration and workspace detection
  • Search filtering with lang:crystal and --language crystal support
  • Query support through ast-grep integration
  • Fixed search parsing for Crystal-style namespaced identifiers like HTTP::Server so they are treated as content terms rather than empty field filters

Key Technical Changes:

  1. Dependency Management: Pinned tree-sitter-crystal to commit f71f4ca62ac0 to ensure ABI 14 compatibility with Probe's tree-sitter runtime (avoiding ABI 15 mismatch from newer commits)

  2. Language Detection: Added Crystal to 15+ language maps across CLI and LSP daemon, handling both crystal and cr aliases consistently

  3. Symbol Extraction: Implemented Crystal-specific AST node handling for:

    • Containers: class_def, module_def, struct_def, enum_def, lib_def, union_def
    • Functions: method_def, abstract_method_def, macro_def, fun_def
    • Types: alias, annotation_def, type_def
  4. LSP Integration: Registered crystalline as default LSP server with workspace markers (shard.yml, shard.lock) and appropriate capabilities

  5. Namespace Handling: Crystal uses :: separator (like Rust/C++) for fully qualified names and module/class hierarchies

  6. Keyword Filtering: Added is_keyword_or_invalid() method to prevent extracting keywords like module, class, def as symbol names

Affected System Components:

graph TD
    A[Crystal .cr Files] --> B[Parser Pool]
    B --> C[tree-sitter-crystal Grammar]
    C --> D[Symbol Extraction]
    D --> E[CLI Commands]
    E --> F[symbols/extract/query/search]
    
    B --> G[LSP Daemon]
    G --> H[Language Detector]
    H --> I[LSP Registry]
    I --> J[crystalline Server]
    J --> K[Workspace Resolver]
    K --> L[shard.yml/shard.lock]
    
    C --> M[AST Indexing]
    M --> N[Symbol Database]
    N --> O[FQN Generator]
    O --> P[:: Separator]
    
    style A fill:#e1f5ff
    style C fill:#fff4e1
    style J fill:#ffe1f5
Loading

Scope Discovery & Context Expansion

Inferred Scope:

  • Core CLI: All search/extract/query commands now support Crystal via --language crystal or --language cr
  • LSP Daemon: Full indexing pipeline with macro and module extraction features enabled
  • NPM Package: Agent tools can filter searches by Crystal language
  • Testing: Added unit tests for Crystal symbol detection in debug_tree_sitter.rs and LSP daemon tests

Validation Coverage (from PR description):

  • ✅ Pre-commit hooks passed (formatting, tests)
  • ✅ Real Crystal repository dogfood on crystal-lang/crystal:
    • symbols extracted modules, classes, enums, methods
    • query returned method matches across files
    • search parsed namespaced identifiers correctly
    • extract worked by symbol name and line target

Notable Patterns:

  • Consistent language mapping pattern across 20+ files (extension → language → parser)
  • Crystal uses :: namespace separator (like Rust/C++)
  • Test detection handles *_spec.cr files via existing test-file detection
  • LSP marked as "install required" since no Crystal LSP binary was available in test environment

Recent Updates (from comment history):

  • Fixed Crystal symbol-defining node kinds in LSP symbol resolution
  • Added keyword filtering to prevent extracting module, class, def as symbol names
  • Added regression tests for Crystal parser-pool creation and tree-sitter resolution
  • Verified parser pool and find_symbol_at_position() already include Crystal support

References

Core Dependencies:

  • Cargo.toml:47 - tree-sitter-crystal git dependency
  • lsp-daemon/Cargo.toml:85 - tree-sitter-crystal for LSP daemon

CLI Integration:

  • src/cli.rs:172,355 - Crystal in language enums
  • src/debug_tree_sitter.rs:140-154 - Crystal language name mapping
  • src/debug_tree_sitter.rs:236-252 - Crystal symbol extraction
  • src/extract/formatter.rs:1056 - .cr extension mapping

LSP Daemon Integration:

  • lsp-daemon/src/language_detector.rs:24,48,78,158 - Language enum and detection
  • lsp-daemon/src/lsp_registry.rs:298-313 - Crystalline server registration
  • lsp-daemon/src/workspace_resolver.rs:253 - Workspace markers
  • lsp-daemon/src/symbol/language_support.rs:334-361 - Crystal language rules
  • lsp-daemon/src/indexing/pipelines.rs:84-90 - Crystal indexing pipeline
  • lsp-daemon/src/analyzer/tree_sitter_analyzer.rs:104,580-595,725-770 - Parser pool and node mapping

Documentation:

  • docs/reference/crystal-language-support-plan.md:1-426 - Complete implementation plan
  • docs/reference/supported-languages.md:22,141-148 - Crystal in language table
  • README.md:146,472 - Crystal in feature list
Metadata
  • Review Effort: 3 / 5
  • Primary Label: feature

Powered by Visor from Probelabs

Last updated: 2026-05-31T10:57:08.464Z | Triggered by: pr_updated | Commit: 502c085

💡 TIP: You can chat with Visor using /visor ask <your question>

@probelabs
Copy link
Copy Markdown
Contributor

probelabs Bot commented May 30, 2026

Security Issues (7)

Severity Location Issue
🟢 Info lsp-daemon/src/workspace_resolver.rs:253
Crystal workspace markers (shard.yml, shard.lock) are added without path validation or content verification. Malicious shard.yml files could be used to redirect workspace resolution or inject malicious configuration.
💡 SuggestionAdd validation that shard.yml/shard.lock files are within expected project boundaries and contain valid Crystal project structure before using them for workspace detection.
🟢 Info lsp-daemon/src/analyzer/tree_sitter_analyzer.rs:722-752
Identifier validation allows ? and ! characters in symbol names without proper sanitization. While valid in Crystal method names, these characters could cause issues in downstream systems that expect alphanumeric identifiers.
💡 SuggestionAdd escaping or normalization for special characters in symbol names when used in external systems (databases, APIs, file operations).
🟢 Info src/debug_tree_sitter.rs:236-252
Crystal symbol detection tests only cover basic cases (module, class, method). Missing security test cases for edge cases like malicious UTF-8 sequences, extremely long identifiers, or keyword collision attempts.
💡 SuggestionAdd comprehensive security test cases covering: invalid UTF-8 handling, identifier length limits, keyword filtering, special character sanitization, and namespace injection attempts.
🟢 Info .github/workflows/lsp-tests.yml:69
PHP version updated from 8.1 to 8.2 without security justification. While PHP 8.2 is more recent, the change should be documented with security rationale, especially if this is a test environment dependency.
💡 SuggestionDocument the security reasoning for PHP version updates in commit messages or PR description. Ensure compatibility with existing PHP test fixtures.
🟡 Warning Cargo.toml:47
Git dependency uses commit hash pinning without integrity verification. The tree-sitter-crystal dependency is pinned to commit f71f4ca62ac0 but lacks checksum verification or provenance attestation, making the supply chain vulnerable to commit replacement attacks.
💡 SuggestionAdd cargo-lock integrity verification or consider publishing to crates.io with signed releases. Document the security audit process for the pinned commit.
🟡 Warning lsp-daemon/Cargo.toml:85
Duplicate git dependency vulnerability in LSP daemon. Same unpinned commit hash as main Cargo.toml, doubling the supply chain risk surface.
💡 SuggestionConsolidate dependency verification across both Cargo.toml files. Use workspace dependencies to ensure both use the exact same verified source.
🟡 Warning lsp-daemon/src/lsp_registry.rs:298-313
LSP server registration uses hardcoded crystalline command without path validation. If crystalline is not installed or a malicious binary is placed in PATH before the legitimate one, this could lead to command injection or unauthorized code execution.
💡 SuggestionAdd absolute path validation for crystalline binary or implement allowlist of safe installation paths. Document the security requirement for verified crystalline installation.

Security Issues (7)

Severity Location Issue
🟢 Info lsp-daemon/src/workspace_resolver.rs:253
Crystal workspace markers (shard.yml, shard.lock) are added without path validation or content verification. Malicious shard.yml files could be used to redirect workspace resolution or inject malicious configuration.
💡 SuggestionAdd validation that shard.yml/shard.lock files are within expected project boundaries and contain valid Crystal project structure before using them for workspace detection.
🟢 Info lsp-daemon/src/analyzer/tree_sitter_analyzer.rs:722-752
Identifier validation allows ? and ! characters in symbol names without proper sanitization. While valid in Crystal method names, these characters could cause issues in downstream systems that expect alphanumeric identifiers.
💡 SuggestionAdd escaping or normalization for special characters in symbol names when used in external systems (databases, APIs, file operations).
🟢 Info src/debug_tree_sitter.rs:236-252
Crystal symbol detection tests only cover basic cases (module, class, method). Missing security test cases for edge cases like malicious UTF-8 sequences, extremely long identifiers, or keyword collision attempts.
💡 SuggestionAdd comprehensive security test cases covering: invalid UTF-8 handling, identifier length limits, keyword filtering, special character sanitization, and namespace injection attempts.
🟢 Info .github/workflows/lsp-tests.yml:69
PHP version updated from 8.1 to 8.2 without security justification. While PHP 8.2 is more recent, the change should be documented with security rationale, especially if this is a test environment dependency.
💡 SuggestionDocument the security reasoning for PHP version updates in commit messages or PR description. Ensure compatibility with existing PHP test fixtures.
🟡 Warning Cargo.toml:47
Git dependency uses commit hash pinning without integrity verification. The tree-sitter-crystal dependency is pinned to commit f71f4ca62ac0 but lacks checksum verification or provenance attestation, making the supply chain vulnerable to commit replacement attacks.
💡 SuggestionAdd cargo-lock integrity verification or consider publishing to crates.io with signed releases. Document the security audit process for the pinned commit.
🟡 Warning lsp-daemon/Cargo.toml:85
Duplicate git dependency vulnerability in LSP daemon. Same unpinned commit hash as main Cargo.toml, doubling the supply chain risk surface.
💡 SuggestionConsolidate dependency verification across both Cargo.toml files. Use workspace dependencies to ensure both use the exact same verified source.
🟡 Warning lsp-daemon/src/lsp_registry.rs:298-313
LSP server registration uses hardcoded crystalline command without path validation. If crystalline is not installed or a malicious binary is placed in PATH before the legitimate one, this could lead to command injection or unauthorized code execution.
💡 SuggestionAdd absolute path validation for crystalline binary or implement allowlist of safe installation paths. Document the security requirement for verified crystalline installation.
\n\n ### ✅ Architecture Check Passed

No architecture issues found – changes LGTM.

✅ Performance Check Passed

No performance issues found – changes LGTM.

✅ Quality Check Passed

No quality issues found – changes LGTM.


Powered by Visor from Probelabs

Last updated: 2026-05-31T10:49:17.821Z | Triggered by: pr_updated | Commit: 502c085

💡 TIP: You can chat with Visor using /visor ask <your question>

@buger buger force-pushed the add-crystal-language-support branch 2 times, most recently from 3c9b766 to d0371ef Compare May 31, 2026 07:00
@buger
Copy link
Copy Markdown
Collaborator Author

buger commented May 31, 2026

Addressed the Crystal LSP review follow-up:

  • Rechecked the two parser-selection comments against current HEAD. ParserPool::create_parser() already has crystal/cr, and find_symbol_at_position() already selects tree_sitter_crystal for crystal/cr.
  • Fixed the adjacent real issue: Crystal symbol-defining node kinds were missing from LSP symbol resolution, and the analyzer could extract keyword tokens like module, class, and def instead of Crystal names.
  • Added regression tests for Crystal parser-pool creation, analyzer symbol extraction, and find_symbol_at_position() tree-sitter resolution.
  • Added the LSP tool-version evidence: crystalline --version, crystal-language-server --version, and crystal --version all fail locally with command not found, so live Crystal LSP server smoke testing remains install-required.

Validated with:

  • cargo fmt --all -- --check
  • cargo check -p lsp-daemon
  • cargo test -p lsp-daemon crystal
  • cargo test -p lsp-daemon language_detector
  • cargo test -p lsp-daemon lsp_registry
  • cargo test --test crystal_language_tests
  • cargo test --test search_hints_tests

@buger buger force-pushed the add-crystal-language-support branch from d0371ef to 502c085 Compare May 31, 2026 10:28
@buger buger merged commit 53ed586 into main May 31, 2026
18 of 19 checks passed
@buger buger deleted the add-crystal-language-support branch May 31, 2026 11:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant