feat(native): port Groovy extractor to Rust#1101
Conversation
Adds tree-sitter-groovy dependency and native extractor matching the WASM-side behavior for Groovy symbol, import, and call extraction. Part of #1071
|
Claude finished @carlos-alm's task —— View job PR Review: Groovy Native Extractor ImplementationReviewing this PR that ports Groovy extraction from WASM to native Rust implementation. Todo List:
Root Cause AnalysisProblem Being Solved: This PR implements native Rust support for Groovy files (.groovy, .gvy), eliminating a coverage gap where the native engine would drop Groovy files and fall back to WASM. This addresses issue #1071 and maintains engine parity. Why This Approach: The dual-engine architecture requires both native (Rust) and WASM engines to produce identical results. Adding native Groovy support prevents engine parity warnings and allows the faster native engine to handle Groovy files end-to-end. Risk Assessment: Low risk. This is a straightforward language addition following established patterns. The implementation mirrors existing extractors and includes comprehensive tests. Configuration Changes (HIGH SCRUTINY)✅ No concerning configuration changes detected
Test Integrity✅ Test changes are appropriate and strengthen coverage
Code Quality Assessment✅ Excellent Implementation QualityArchitecture Consistency: The Groovy extractor perfectly follows the established patterns:
Language Support Completeness:
Engine Parity: The implementation meticulously mirrors the WASM
Error Handling: Robust with early returns and safe node access patterns. ✅ Registry IntegrationAll required integration points are properly updated:
✅ Test CoverageComprehensive unit tests cover:
Backlog Compliance
Minor Observations
Final Assessment |
Greptile SummaryThis PR ports the Groovy symbol extractor from JavaScript/WASM to Rust, wiring Groovy into the full native pipeline —
Confidence Score: 5/5Safe to merge — all previously identified divergences from the JS source-of-truth were addressed in follow-up commits, and the full test suite (191 Rust + 18 TS) passes. The extractor faithfully mirrors the JS source-of-truth across all node kinds, the wiring changes are mechanical and consistent across Rust and TypeScript sides, and prior review findings have been resolved. No new logic gaps were found. No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[".groovy / .gvy file"] --> B[file_collector.rs\nSUPPORTED_EXTENSIONS]
B --> C[parser_registry.rs\nLanguageKind::Groovy]
C --> D[tree_sitter_groovy::LANGUAGE\nparse → Tree]
D --> E[extractors/mod.rs\nextract_symbols_with_opts]
E --> F[groovy::GroovyExtractor]
F --> G1[walk_tree → match_groovy_node]
F --> G2[walk_ast_nodes_with_config\nGROOVY_AST_CONFIG]
G1 --> H1[class / interface / enum\ndefinitions]
G1 --> H2[method / constructor /\nfunction definitions]
G1 --> H3[imports\njava_import=true]
G1 --> H4[calls &\nobject_creation]
G2 --> H5[AST nodes\nstring / throw / new]
H1 & H2 & H3 & H4 & H5 --> I[FileSymbols]
J["NATIVE_SUPPORTED_EXTENSIONS\n(parser.ts)"] -.->|JS engine awareness| A
K["GROOVY_AST_TYPES + GROOVY_STRING_CONFIG\n(rules/index.ts)"] -.->|TS-side config parity| G2
Reviews (4): Last reviewed commit: "fix: resolve merge conflicts with main (..." | Re-trigger Greptile |
| if let Some(body) = body { | ||
| for i in 0..body.child_count() { | ||
| let Some(child) = body.child(i) else { continue }; | ||
| if child.kind() == "enum_constant" { |
There was a problem hiding this comment.
The Rust port diverges from the JS source-of-truth here. In
handleGroovyEnumDecl (groovy.ts line 163) the JS extractor accepts both enum_constant and identifier children as enum members. The Rust port only matches enum_constant, so Groovy enums whose body produces bare identifier nodes (which can occur with some grammar variants or simple Groovy enum Color { RED, GREEN } forms) will have no extracted children in the Rust engine while the JS engine would capture them correctly.
| if child.kind() == "enum_constant" { | |
| if child.kind() == "enum_constant" || child.kind() == "identifier" { |
There was a problem hiding this comment.
Fixed in 74cd047. Added identifier to the enum body match — now matches the JS source-of-truth in handleGroovyEnumDecl (groovy.ts:163).
| "method_invocation" | "method_call" | "call_expression" | "function_call" => { | ||
| handle_call_expr(node, source, symbols) | ||
| } |
There was a problem hiding this comment.
The docstring says the extractor mirrors
extractGroovySymbols and mentions juxt_function_call as a node kind present in the grammar, but it does not appear in match_groovy_node. The JS extractor also omits it, so this isn't a regression, but the discrepancy between the doc and the implementation means Groovy command-style calls like foo bar(x) are silently dropped from the call graph. At minimum the doc claim should be removed or an arm added.
| "method_invocation" | "method_call" | "call_expression" | "function_call" => { | |
| handle_call_expr(node, source, symbols) | |
| } | |
| "method_invocation" | "method_call" | "call_expression" | "function_call" | |
| | "juxt_function_call" => { | |
| handle_call_expr(node, source, symbols) | |
| } |
There was a problem hiding this comment.
Clarified the doc in 74cd047. The doc previously listed juxt_function_call alongside features the extractor handles, which was misleading. The JS source-of-truth also omits it, so I haven't added the arm here to preserve parity. Tracked separately in #1108 — fix needs to land in both engines to keep them in lockstep.
| pub const GROOVY_AST_CONFIG: LangAstConfig = LangAstConfig { | ||
| new_types: &["object_creation_expression"], | ||
| throw_types: &["throw_statement"], | ||
| await_types: &[], | ||
| string_types: &["string_literal"], | ||
| regex_types: &[], | ||
| quote_chars: &['\'', '"'], | ||
| string_prefixes: &[], | ||
| }; |
There was a problem hiding this comment.
Groovy's double-quoted strings are GStrings, which the tree-sitter-groovy grammar represents with a
gstring node (not string_literal). string_literal covers only single-quoted strings. The same gap exists in GROOVY_AST_TYPES on the TypeScript side. Both configs are consistent with each other, but any AST-node analysis that looks for string literals in Groovy code will silently miss all GString interpolations.
| pub const GROOVY_AST_CONFIG: LangAstConfig = LangAstConfig { | |
| new_types: &["object_creation_expression"], | |
| throw_types: &["throw_statement"], | |
| await_types: &[], | |
| string_types: &["string_literal"], | |
| regex_types: &[], | |
| quote_chars: &['\'', '"'], | |
| string_prefixes: &[], | |
| }; | |
| pub const GROOVY_AST_CONFIG: LangAstConfig = LangAstConfig { | |
| new_types: &["object_creation_expression"], | |
| throw_types: &["throw_statement"], | |
| await_types: &[], | |
| string_types: &["string_literal", "gstring"], | |
| regex_types: &[], | |
| quote_chars: &['\'', '"'], | |
| string_prefixes: &[], | |
| }; |
There was a problem hiding this comment.
Added gstring defensively in 74cd047. Verified against node_modules/tree-sitter-groovy/grammar.js and src/node-types.json: the 0.1.x grammar uses string_literal for both single- and double-quoted strings (it extends tree-sitter-java) and does not emit a separate gstring node, so this is a no-op for the current grammar — but it costs nothing and guards against future grammar variants. Matched the TS config.
There was a problem hiding this comment.
Added gstring defensively in 74cd047. Verified against node_modules/tree-sitter-groovy/grammar.js and src/node-types.json: the 0.1.x grammar uses string_literal for both single- and double-quoted strings (it extends tree-sitter-java) and does not emit a separate gstring node, so this is a no-op for the current grammar — but it costs nothing and guards against future grammar variants. Matched the TS config.
| const GROOVY_AST_TYPES: Record<string, string> = { | ||
| object_creation_expression: 'new', | ||
| throw_statement: 'throw', | ||
| string_literal: 'string', | ||
| }; |
There was a problem hiding this comment.
Mirrors the gap in the Rust config:
string_literal covers single-quoted strings only; gstring should also be listed for double-quoted / interpolated Groovy strings. The GROOVY_STRING_CONFIG already includes '" as quote chars, so only the type key is missing.
| const GROOVY_AST_TYPES: Record<string, string> = { | |
| object_creation_expression: 'new', | |
| throw_statement: 'throw', | |
| string_literal: 'string', | |
| }; | |
| const GROOVY_AST_TYPES: Record<string, string> = { | |
| object_creation_expression: 'new', | |
| throw_statement: 'throw', | |
| string_literal: 'string', | |
| gstring: 'string', | |
| }; |
There was a problem hiding this comment.
Added gstring to GROOVY_AST_TYPES in 74cd047, matching the Rust config change. As noted in the Rust-side reply, the current tree-sitter-groovy 0.1.x grammar emits string_literal for both quote styles, so this is defensive against future grammar variants rather than a current bug fix.
Codegraph Impact Analysis30 functions changed → 13 callers affected across 2 files
|
…tries Address Greptile review on PR #1101: - Rust enum handler now accepts both `enum_constant` and `identifier` children, matching the JS source-of-truth in `handleGroovyEnumDecl` (groovy.ts:163). Without this, Groovy enums whose grammar emits bare identifier nodes had no extracted members in the native engine. - Add `gstring` defensively to GROOVY_AST_CONFIG (Rust) and GROOVY_AST_TYPES (TS). tree-sitter-groovy 0.1.x emits `string_literal` for both quote styles, but this keeps both engines resilient to grammar variants. - Clarify module doc: `juxt_function_call` was previously listed alongside features the extractor handles. It is intentionally unhandled (matches JS). Tracked in #1108 for adding support to both engines.
Summary
tree-sitter-groovy(v0.1) dependency and a native Groovy extractor undercrates/codegraph-core/src/extractors/groovy.rs, mirroring the WASM-sideextractGroovySymbolsbehaviour.LanguageKind, the parser registry, file collector extensions, AST type/string maps, andNATIVE_SUPPORTED_EXTENSIONSso the native engine handles.groovy/.gvyfiles end-to-end.Part of #1071
Test plan
cargo build --release -p codegraph-corecargo test -p codegraph-core --lib(191 passed)npx vitest run tests/parsers/groovy.test.ts tests/parsers/native-drop-classification.test.ts(18 passed)npx vitest run tests/benchmarks/resolution/resolution-benchmark.test.ts -t groovy(5 passed)