feat(native): port F# extractor to Rust#1104
Conversation
Adds tree-sitter-fsharp dependency and a native F# extractor in crates/codegraph-core/src/extractors/fsharp.rs. Registers .fs/.fsx/.fsi with LanguageKind::FSharp and the Rust file_collector, adds FSharp to NATIVE_SUPPORTED_EXTENSIONS on the JS side, and wires FSHARP_AST_TYPES / FSHARP_STRING_CONFIG so the native and JS engines extract identical ast_nodes for F# source. Mirrors extractFSharpSymbols: named/anonymous modules as module, function declarations (with parameter children) as function, type definitions as type / class / record / enum / interface (mapped from the F# node kind), type-member function bindings as method, value bindings as variable, and import declarations + dot-expression / application call extraction.
|
Claude finished @carlos-alm's task —— View job Reviewing this F# extractor PR... Todo List
Phase 1: Root Cause Analysis ✅Problem Being Solved: This PR ports the F# extractor from JavaScript/WASM to native Rust to achieve performance parity with other supported languages. F# was previously only supported via tree-sitter WASM, which is slower than the native Rust extractors. Why This Approach: The dual-engine architecture requires both engines to produce identical results. F# support existed in WASM-only form, and this PR adds the missing native Rust implementation to complete the dual-engine support. Risk Assessment: Low risk - this is additive functionality following established patterns. No configuration changes, no relaxed tests, purely extending existing architecture. Phase 2: Configuration Change Detection ✅✅ No high-risk changes detected:
Phase 3: Test Integrity ✅✅ Strong test integrity:
Phase 4: Code Quality Review ✅✅ Excellent implementation quality: Rust Extractor (
Infrastructure Integration:
Dual-Engine Parity:
Phase 5: Backlog Compliance ✅Root Cause AnalysisProblem Being Solved: Completing F# native support to achieve dual-engine performance parity Backlog Compliance
Critical ConcernsNone — this is a high-quality implementation following established patterns. Final Recommendation
|
Greptile SummaryThis PR ports the F# symbol extractor from the JS WASM engine to native Rust, using
Confidence Score: 5/5Safe to merge — the Rust extractor is a faithful port of the JS engine with no functional divergence. All handler functions were carefully compared against the JS extractor and produce equivalent output. The previously flagged handle_application divergence is correctly resolved. Plumbing changes are consistent and complete. No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[".fs / .fsx / .fsi file"] --> B{Engine routing\nNATIVE_SUPPORTED_EXTENSIONS}
B -->|native| C[Rust file_collector]
C --> D[parser_registry\nLanguageKind::FSharp]
D --> E[tree-sitter-fsharp\nparse tree]
E --> F[FSharpExtractor::extract]
F --> G[walk_tree\nmatch_fsharp_node]
F --> H[walk_ast_nodes_with_config\nFSHARP_AST_CONFIG]
G --> I{node.kind}
I -->|named_module| J[Definition: module]
I -->|function_declaration_left| K[Definition: function]
I -->|type_definition| L[Definition: type/class/record/enum/interface]
I -->|import_decl| M[Import]
I -->|application_expression| N[Call]
I -->|dot_expression| O[Call with receiver]
H --> P[AstNode: string literals]
J & K & L & M & N & O & P --> Q[FileSymbols]
Reviews (2): Last reviewed commit: "fix: resolve merge conflicts with main (..." | Re-trigger Greptile |
| "long_identifier_or_op" => { | ||
| // Inner child is either `long_identifier` (qualified, e.g. | ||
| // `Repository.save`) or `identifier` (bare, e.g. `validateUser`). | ||
| // Fall back to the wrapper text if neither exists (e.g. | ||
| // operator forms like `( + )`). | ||
| let inner = find_child(&func_node, "long_identifier") | ||
| .or_else(|| find_child(&func_node, "identifier")); | ||
| let name = match inner { | ||
| Some(n) => node_text(&n, source).to_string(), | ||
| None => node_text(&func_node, source).to_string(), | ||
| }; | ||
| symbols.calls.push(Call { | ||
| name, | ||
| line: start_line(node), | ||
| dynamic: None, | ||
| receiver: None, | ||
| }); | ||
| } |
There was a problem hiding this comment.
Divergence from JS extractor in
handle_application
Two behavioural differences exist versus the JS handleApplication that the PR claims to mirror:
-
Search order flipped: The JS extractor tries
identifierfirst, thenlong_identifierinside along_identifier_or_opwrapper (findChild(funcNode, 'identifier') || findChild(funcNode, 'long_identifier')). The Rust version trieslong_identifierfirst. For a node containing both kinds, the preferred result will differ. -
Extra fallback emits operator calls: When neither child is found (e.g., an operator expression like
( + )), JS emits nothing. Rust falls back to the raw text offunc_nodeand still pushes aCall. This means every operator application in an F# file produces a spurious call entry in the native engine that the WASM engine never produces, diverging the two outputs.
There was a problem hiding this comment.
Fixed in the merge resolution commit. The handle_application branch for long_identifier_or_op now matches the JS extractor exactly:
- Search order is now
identifierfirst, thenlong_identifier(matchesfindChild(funcNode, 'identifier') || findChild(funcNode, 'long_identifier')in the JS extractor). - When neither child is present (operator forms like
( + )), the Rust extractor emits nothing — mirroring the JS extractor's silent skip. The previous fallback that pushed aCallwith the rawfunc_nodetext has been removed.
See crates/codegraph-core/src/extractors/fsharp.rs:242-260.
Codegraph Impact Analysis21 functions changed → 5 callers affected across 2 files
|
…-extractor # Conflicts: # crates/codegraph-core/Cargo.toml # crates/codegraph-core/src/change_detection.rs # crates/codegraph-core/src/extractors/helpers.rs # crates/codegraph-core/src/extractors/mod.rs # crates/codegraph-core/src/file_collector.rs # crates/codegraph-core/src/parser_registry.rs # src/ast-analysis/rules/index.ts # src/domain/parser.ts
Summary
tree-sitter-fsharpdependency and a native F# extractor incrates/codegraph-core/src/extractors/fsharp.rs..fs/.fsx/.fsiwithLanguageKind::FSharpand the Rustfile_collector, adds F# toNATIVE_SUPPORTED_EXTENSIONSon the JS side, and wiresFSHARP_AST_TYPES/FSHARP_STRING_CONFIGon both the native and JS sides so the two engines extract identicalast_nodesfor F# source.extractFSharpSymbols: named/anonymous modules asmodule, function declarations (with parameter children) asfunction, type definitions astype/class/record/enum/interface(mapped from the F# node kind), type-member function bindings asmethod, value bindings asvariable, plus import declarations anddot expression/applicationcall extraction.Closes #1071
Test plan
cargo build --release -p codegraph-core(clean build)cargo test -p codegraph-core --lib— 184/184npx tree-sitter build --wasmregeneratestree-sitter-fsharp.wasmnpx vitest run tests/parsers/fsharp.test.ts— 5/5npx vitest run tests/parsers/native-drop-classification.test.ts— 13/13