perf: compute CFG in Rust native engine for all languages#342
perf: compute CFG in Rust native engine for all languages#342carlos-alm wants to merge 6 commits intomainfrom
Conversation
|
Claude finished @carlos-alm's task —— View job Reviewing PR: perf: compute CFG in Rust native engine for all languagesConducted comprehensive PR review following the 4-phase analysis process:
Phase 1: Root Cause Analysis ✅Problem Being Solved: Performance optimization in CFG (Control Flow Graph) building phase. Current JS implementation does WASM re-parsing (~169ms) even when native AST data is already available from the Rust engine. This creates unnecessary double-parsing overhead. Approach Justification: This is a legitimate performance optimization that eliminates redundant WASM tree-walking by computing CFG data during the initial native extraction phase. The approach maintains backward compatibility with WASM fallback when native data is unavailable. This addresses a real bottleneck, not masking symptoms. Phase 2: Configuration Change Detection ✅No High-Risk Configuration Changes Detected:
Phase 3: Test Integrity ✅Test Coverage Assessment:
Phase 4: Code Quality ✅Rust Implementation Analysis:
Critical Issue Identified:
Minor Issues:
Root Cause AnalysisProblem Being Solved: Legitimate performance bottleneck - double WASM parsing when native data available Critical ConcernsLanguage ID Inconsistency: C# language ID mismatch will cause silent CFG extraction failures in production Final Recommendation
Required Changes:
|
Greptile SummaryThis PR ports the However, 6 critical language-specific CFG correctness bugs remain unfixed:
These bugs produce silently incorrect CFGs that downstream consumers (complexity analysis, dead-code detection) will act on without error signals. The performance goal is also partially undermined by the unfixed Confidence Score: 2/5
Last reviewed commit: e020420 |
| end_line: Some(end_line(&member)), | ||
| decorators: None, | ||
| complexity: None, | ||
| cfg: build_function_cfg(&member, "go"), |
There was a problem hiding this comment.
Interface methods (method_elem in interface_type) don't have bodies - they're just signatures. build_function_cfg will return a trivial entry->exit CFG. Consider using cfg: None for interface methods to match how complexity: None is set.
There was a problem hiding this comment.
Fixed in d5aab61 — changed to cfg: None for interface methods (method_elem), matching the existing complexity: None pattern.
|
Claude encountered an error —— View job I'll analyze this and get back to you. |
|
Addressed both review comments in d5aab61:
|
crates/codegraph-core/src/cfg.rs
Outdated
| let label_node = stmt.child_by_field_name("label"); | ||
| let body = stmt.child_by_field_name("body"); | ||
| if let (Some(label_node), Some(body)) = (label_node, body) { | ||
| let label_name = label_node.utf8_text(&[]).unwrap_or("").to_string(); |
There was a problem hiding this comment.
utf8_text(&[]) will panic for any labeled loop
Node::utf8_text(source) slices source[start_byte..end_byte]. Passing &[] (empty slice) is safe only if start_byte == end_byte == 0, which is never true for a real identifier node inside a function body. In practice this panics with index out of bounds: the len is 0 but the index is N.
The consequence is two-fold:
- Panic: Any file processed by the native engine that contains a labeled statement (valid in JS/TS, Go, Java, C#) will panic at parse time.
- Silent wrong CFG (if somehow unwrap_or catches it): Every label name is stored as
""inlabel_map, so a nested labeled break likebreak outersilently resolves to the innermost label instead of the correct one.
The root cause is that CfgBuilder has no access to the source bytes. build_function_cfg needs to accept a source: &[u8] parameter and thread it through to the builder.
The same bug applies on lines 521 and 542 for break and continue label extraction.
// Suggested fix: add source to CfgBuilder and build_function_cfg
pub struct CfgBuilder<'a> {
rules: &'a CfgRules,
source: &'a [u8], // <-- add this
...
}
pub fn build_function_cfg(function_node: &Node, lang_id: &str, source: &[u8]) -> Option<CfgData> {
let rules = get_cfg_rules(lang_id)?;
let mut builder = CfgBuilder::new(rules, source);
...
}Then replace all three utf8_text(&[]) calls with utf8_text(self.source).
There was a problem hiding this comment.
Critical bug, agreed. utf8_text(&[]) will panic with index-out-of-bounds for any labeled loop/break/continue since the label node has a non-zero byte offset into source. Will fix by adding source: &[u8] to build_function_cfg() and CfgBuilder, then updating all three call sites (lines 449, 521, 542) to use utf8_text(self.source). All extractor call sites will need to pass source bytes through.
There was a problem hiding this comment.
Fixed in 32a9463 — added source: &[u8] to CfgBuilder and build_function_cfg, replaced all 3 utf8_text(&[]) calls with utf8_text(self.source), and updated all 24 extractor call sites to pass source bytes.
crates/codegraph-core/src/cfg.rs
Outdated
| let body_block = self.make_block("loop_body", None, None, None); | ||
| self.add_edge(header, body_block, "branch_true"); |
There was a problem hiding this comment.
"branch_true" edge kind misleading for unconditional infinite loop
Unlike process_for_loop and process_while_loop, a Rust loop {} has no condition — the body is always entered. Using "branch_true" implies a condition check exists, which is incorrect and inconsistent with how the WASM JS side distinguishes loop types. A "fallthrough" edge (matching the entry edge from the loop header used everywhere else for unconditional control flow) would be semantically accurate here.
This is also why Go/Rust/Ruby are intentionally absent from the parity tests — if this were compared against the WASM implementation it would likely diverge on edge kinds for infinite loops.
| let body_block = self.make_block("loop_body", None, None, None); | |
| self.add_edge(header, body_block, "branch_true"); | |
| self.add_edge(header, body_block, "fallthrough"); |
There was a problem hiding this comment.
Correct — Rust loop {} is unconditional, there is no condition check. Using "branch_true" is semantically wrong and inconsistent with the WASM JS side. Will change to "fallthrough" for the entry edge into the loop body in process_infinite_loop.
There was a problem hiding this comment.
Fixed in 32a9463 — changed to "fallthrough" in process_infinite_loop.
Additional Comments (1)
The However, the scan loop only enters the Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time! |
|
Claude encountered an error —— View job I'll analyze this and get back to you. |
|
Claude finished @carlos-alm's task —— View job Reviewing PR: perf: compute CFG in Rust native engine for all languagesConducting comprehensive PR review following the 4-phase analysis process:
Phase 1: Root Cause Analysis ✅Problem Being Solved: Performance optimization in CFG (Control Flow Graph) building phase. Current JS implementation does WASM re-parsing (~169ms) even when native AST data is already available from the Rust engine. This creates unnecessary double-parsing overhead. Approach Justification: This is a legitimate performance optimization that eliminates redundant WASM tree-walking by computing CFG data during the initial native extraction phase. The approach maintains backward compatibility with WASM fallback when native data is unavailable. This addresses a real bottleneck, not masking symptoms. Phase 2: Configuration Change Detection ✅No High-Risk Configuration Changes Detected:
Phase 3: Test Integrity ✅Test Coverage Assessment:
Phase 4: Code Quality
|
| end_line: Some(end_line(&child)), | ||
| decorators: None, | ||
| complexity: None, | ||
| cfg: build_function_cfg(&child, "javascript"), |
There was a problem hiding this comment.
TypeScript interface method signatures receive non-null CFG
method_signature and property_signature nodes have no body field. When build_function_cfg is called on them, it finds no body, short-circuits to builder.add_edge(entry, exit, "fallthrough"), and returns a non-null CfgData with 2 blocks and 1 edge.
This contrasts with every other language extractor in this PR — Go, Java, C#, Ruby, PHP, and Rust all explicitly set cfg: None for interface/abstract members (matching complexity: None). For TS interface methods, the def.cfg?.blocks?.length check in cfg.js evaluates to 2, so these trivial 2-block CFGs get written to cfg_blocks/cfg_edges in the DB for every method signature in every TypeScript interface. This inflates the DB unnecessarily for zero analytical value.
| cfg: build_function_cfg(&child, "javascript"), | |
| cfg: None, |
There was a problem hiding this comment.
Fixed in e5bc19e — set cfg: None for method_signature and property_signature in the JS/TS extractor, matching Go/Java/C#/Ruby/PHP pattern for bodyless interface members.
crates/codegraph-core/src/cfg.rs
Outdated
| let join_block = self.make_block("body", None, None, None); | ||
|
|
||
| // Switch acts like a break target | ||
| self.loop_stack.push(LoopCtx { header_idx: switch_header, exit_idx: join_block }); |
There was a problem hiding this comment.
continue inside a switch resolves to switch header, not enclosing loop
The switch is pushed onto loop_stack so that break statements inside cases correctly target join_block (the switch exit). However, since loop_stack.last() is also what an unlabeled continue resolves to, any continue statement inside a switch case will incorrectly produce an edge to switch_header instead of the nearest enclosing loop's header.
Consider Java:
for (int i = 0; i < 10; i++) {
switch (x) {
case 1: continue; // should continue the *for*, not re-enter the switch
}
}With the current stack: [for_loop_ctx, switch_ctx], loop_stack.last() returns switch_ctx, so continue emits current -> switch_header instead of current -> for_header.
The standard fix is to track switches separately from loops (e.g., a switch_exit_stack) so that unlabeled continue skips the switch and targets the enclosing loop. The JS WASM implementation should be cross-checked to confirm whether this inconsistency pre-exists or is newly introduced here.
There was a problem hiding this comment.
Fixed in e5bc19e — added is_loop: bool to LoopCtx. Loop pushes set is_loop: true, switch push sets is_loop: false. Continue handler now walks back through loop_stack with .iter().rev().find(|ctx| ctx.is_loop) to skip switch entries.
| const complexityRules = COMPLEXITY_RULES.get(langId); | ||
| if (!complexityRules) continue; | ||
| // complexityRules only needed for WASM fallback path |
There was a problem hiding this comment.
Safety guard for complexityRules removed without replacement
The previous guard if (!complexityRules) continue; ensured that if a language has CFG rules but no matching entry in COMPLEXITY_RULES, the WASM fallback path would not call findFunctionNode(..., undefined). After this change, complexityRules can be undefined when COMPLEXITY_RULES.get(langId) returns nothing, but it is still passed directly to findFunctionNode on line 1165 whenever !tree is false and the native path wasn't taken.
All current CFG languages do have complexity rules, so this is not immediately broken. But removing the guard silently removes a safety net: if a language is ever added to CFG rules but not to COMPLEXITY_RULES, the WASM fallback will silently malfunction rather than safely skipping.
Consider preserving the guard, or at a minimum adding a comment that explicitly acknowledges complexityRules may be undefined and that findFunctionNode must tolerate it.
There was a problem hiding this comment.
Fixed in e5bc19e — re-added the guard: if (!tree || !complexityRules) continue. This prevents undefined from being passed to findFunctionNode when COMPLEXITY_RULES.get(langId) returns nothing.
| // Check if all defs already have native CFG — skip WASM parse if so | ||
| const allNative = symbols.definitions | ||
| .filter((d) => (d.kind === 'function' || d.kind === 'method') && d.line) | ||
| .every((d) => d.cfg?.blocks?.length); | ||
|
|
||
| // WASM fallback if no cached tree and not all native | ||
| if (!tree && !allNative) { |
There was a problem hiding this comment.
needsFallback false-positive for files with bodyless methods
Array.prototype.every returns true for an empty array (vacuous truth), so files with no functions are handled correctly. However, methods that intentionally have cfg: null — interface methods in Go, Java, and C#, or abstract methods — also fail the .every((d) => d.cfg?.blocks?.length) check because null?.blocks?.length is undefined (falsy).
As a result, any file containing even one bodyless method (e.g., a Go file with an interface type) will set needsFallback = true and trigger WASM parser initialization for the whole pipeline, even though the concrete methods in that file already have complete native CFG data and the bodyless methods are correctly handled by the if (!tree) continue guard later.
This doesn't cause correctness issues (the per-definition WASM guard is still in place), but it defeats the performance goal for any file that mixes concrete implementations with interface/abstract declarations.
There was a problem hiding this comment.
Fixed in e5bc19e — changed .every((d) => d.cfg?.blocks?.length) to .every((d) => d.cfg === null || d.cfg?.blocks?.length). Bodyless methods with cfg: null are now considered "handled by native" and no longer trigger WASM fallback.
|
Claude finished @carlos-alm's task —— View job Reviewing PR: perf: compute CFG in Rust native engine for all languagesConducting comprehensive PR review following the 4-phase analysis process:
Phase 1: Root Cause Analysis ✅Problem Being Solved: Performance optimization in CFG (Control Flow Graph) building phase. Current JS implementation does WASM re-parsing (~169ms) even when native AST data is already available from the Rust engine. This creates unnecessary double-parsing overhead. Approach Justification: This is a legitimate performance optimization that eliminates redundant WASM tree-walking by computing CFG data during the initial native extraction phase. The approach maintains backward compatibility with WASM fallback when native data is unavailable. This addresses a real bottleneck, not masking symptoms. Phase 2: Configuration Change Detection ✅No High-Risk Configuration Changes Detected:
Phase 3: Test Integrity ✅Test Coverage Assessment:
Phase 4: Code Quality
|
| let case_body_node = case_clause.child_by_field_name("body") | ||
| .or_else(|| case_clause.child_by_field_name("consequence")); | ||
|
|
||
| let case_stmts: Vec<Node> = if let Some(body_node) = case_body_node { | ||
| self.get_statements(&body_node) | ||
| } else { | ||
| let value_node = case_clause.child_by_field_name("value"); | ||
| let pattern_node = case_clause.child_by_field_name("pattern"); | ||
| let cursor2 = &mut case_clause.walk(); | ||
| case_clause.named_children(cursor2) | ||
| .filter(|child| { | ||
| if let Some(ref v) = value_node { if child.id() == v.id() { return false; } } | ||
| if let Some(ref p) = pattern_node { if child.id() == p.id() { return false; } } | ||
| child.kind() != "switch_label" | ||
| }) | ||
| .collect() | ||
| }; |
There was a problem hiding this comment.
Rust match_arm bodies are discarded from CFG
In tree-sitter-rust, a match_arm node has pattern and value fields — but no body field. The current code checks for body and consequence (lines 867-868), finds neither, then adds value_node to an exclude filter (line 878) and collects the remaining named children into case_stmts.
This means the actual arm expression/body is filtered out, so case_stmts is empty for every match arm. Any CFG-relevant constructs inside a Rust match arm (nested if, return, loop, etc.) are completely skipped.
Example that produces wrong CFG:
match state {
State::Ready => {
if self.queue.is_empty() { return None; }
self.process()
}
_ => None,
}The nested if and return would not appear in the CFG.
Fix: Fall back to using the value field as the case body for Rust match arms:
let case_body_node = case_clause.child_by_field_name("body")
.or_else(|| case_clause.child_by_field_name("consequence"))
.or_else(|| case_clause.child_by_field_name("value")); // Rust match_armRemove value_node from the exclude filter, or update the filter condition since value_node now matches case_body_node.
There was a problem hiding this comment.
Fixed in e5bc19e — added else-if branch for child_by_field_name("value") before the generic named-children fallback. Rust match arm expressions are now included as case_stmts instead of being filtered out.
| fn process_infinite_loop(&mut self, loop_stmt: &Node, current: u32) -> Option<u32> { | ||
| let header = self.make_block("loop_header", Some(node_line(loop_stmt)), Some(node_line(loop_stmt)), Some("loop")); | ||
| self.add_edge(current, header, "fallthrough"); | ||
|
|
||
| let exit = self.make_block("body", None, None, None); | ||
|
|
||
| self.loop_stack.push(LoopCtx { header_idx: header, exit_idx: exit }); | ||
| self.update_label_map(header, exit); | ||
|
|
||
| let body = loop_stmt.child_by_field_name("body"); | ||
| let body_block = self.make_block("loop_body", None, None, None); | ||
| self.add_edge(header, body_block, "fallthrough"); | ||
|
|
||
| if let Some(body) = body { | ||
| let stmts = self.get_statements(&body); | ||
| let body_end = self.process_statements(&stmts, body_block); | ||
| if let Some(be) = body_end { | ||
| self.add_edge(be, header, "loop_back"); | ||
| } | ||
| } | ||
|
|
||
| // No loop_exit from header — only exit via break | ||
| self.loop_stack.pop(); | ||
| Some(exit) | ||
| } |
There was a problem hiding this comment.
Infinite loop without break makes subsequent code appear reachable
process_infinite_loop always returns Some(exit) (line 826), causing the caller to emit a fallthrough edge from exit to the next block. However, the exit block only receives edges from break statements inside the loop body. If no break exists, exit has no predecessors — it is an unreachable block — yet code after the loop is still connected to it, making dead code appear live in the CFG.
Example:
loop { do_something(); }
// This line is dead, but the CFG shows it as reachable
unreachable_code();A more accurate result would return None when no break targeted exit. One way to detect this is to check after processing the body whether any edge targeting exit was added:
let had_break = self.edges.iter().any(|e| e.target_index == exit);
if had_break { Some(exit) } else { None }This matches the semantics of return None from process_return/process_throw (which also unconditionally terminate control flow).
There was a problem hiding this comment.
Fixed in e5bc19e — after processing the loop body, we now check if the exit block has any incoming edges. If no break targeted exit, return None instead of Some(exit) to indicate subsequent code is unreachable.
| if matches_opt(kind, self.rules.continue_node) { | ||
| let label_name = stmt.child_by_field_name("label") | ||
| .map(|n| n.utf8_text(self.source).unwrap_or("").to_string()); | ||
|
|
||
| let target = if let Some(ref name) = label_name { | ||
| self.label_map.iter().rev() | ||
| .find(|(n, _)| n == name) | ||
| .and_then(|(_, ctx)| ctx.header_idx) | ||
| } else { | ||
| self.loop_stack.last().map(|ctx| ctx.header_idx) | ||
| }; | ||
|
|
||
| if let Some(target) = target { | ||
| self.set_end_line(current, node_line(stmt)); | ||
| self.add_edge(current, target, "continue"); | ||
| return None; | ||
| } | ||
| return Some(current); | ||
| } |
There was a problem hiding this comment.
continue inside a switch resolves to switch header, not enclosing loop
The switch is pushed onto loop_stack (line 838 in process_switch) so that break statements correctly target the switch exit. However, unlabeled continue statements also resolve to loop_stack.last() (line 551), which may now be a switch context instead of an enclosing loop.
Example in Java:
for (int i = 0; i < 10; i++) {
switch (x) {
case 1: continue; // should continue the *for*, not re-enter the switch
}
}With loop_stack = [for_loop_ctx, switch_ctx], continue would emit an edge to switch_header instead of for_header.
Standard fix: Track switches separately from loops (e.g., a switch_exit_stack) so that unlabeled continue skips the switch and targets the enclosing loop. Labeled continue statements already work correctly by searching label_map first (lines 546-549).
There was a problem hiding this comment.
Fixed in e5bc19e — same fix as the other continue/switch comment. LoopCtx now has is_loop flag, continue skips non-loop entries.
| const allNative = symbols.definitions | ||
| .filter((d) => (d.kind === 'function' || d.kind === 'method') && d.line) | ||
| .every((d) => d.cfg?.blocks?.length); | ||
|
|
||
| // WASM fallback if no cached tree and not all native | ||
| if (!tree && !allNative) { |
There was a problem hiding this comment.
allNative check has false-positive for files with bodyless methods
The check on line 1114 uses .every((d) => d.cfg?.blocks?.length). When a definition has cfg: null (interface methods in Go, Java, C#, or abstract methods), the optional chaining produces undefined (falsy), causing every() to return false.
As a result, any file containing even one bodyless method will set allNative = false and trigger WASM parser initialization for the entire pipeline on line 1117, even though the concrete methods in that file already have complete native CFG data and the bodyless methods are correctly skipped later (line 1160-1161).
This doesn't cause correctness issues (the per-definition native-CFG check on line 1160 is still in place), but it defeats the performance goal for any file mixing concrete implementations with interface/abstract declarations — exactly what the PR is optimizing for.
Fix: Adjust the check to exclude definitions with cfg: null:
const allNative = symbols.definitions
.filter((d) => (d.kind === 'function' || d.kind === 'method') && d.line && d.cfg !== null)
.every((d) => d.cfg?.blocks?.length);There was a problem hiding this comment.
Fixed in e5bc19e — same fix as the other allNative comment. cfg === null is now treated as native-handled.
- Set cfg: None for TS interface method_signature/property_signature (no body) - Tag loop_stack entries with is_loop flag; continue now skips switch entries - Use match_arm value field as case body for Rust instead of filtering it out - Return None from process_infinite_loop when no break targets the exit block - Treat cfg: null as native-handled in allNative check to avoid false WASM fallback - Re-add complexityRules guard before passing to findFunctionNode
|
Claude finished @carlos-alm's task —— View job Reviewing PR: perf: compute CFG in Rust native engine for all languagesConducted comprehensive PR review following the 4-phase analysis process:
Phase 1: Root Cause Analysis ✅Problem Being Solved: Performance optimization in CFG (Control Flow Graph) building phase. Current JS implementation does WASM re-parsing (~169ms) even when native AST data is already available from the Rust engine. This creates unnecessary double-parsing overhead. Approach Justification: This is a legitimate performance optimization that eliminates redundant WASM tree-walking by computing CFG data during the initial native extraction phase. The approach maintains backward compatibility with WASM fallback when native data is unavailable. This addresses a real bottleneck, not masking symptoms. Phase 2: Configuration Change Detection ✅No High-Risk Configuration Changes Detected:
Phase 3: Test Integrity ✅Test Coverage Assessment:
Phase 4: Code Quality
|
|
Addressed all 7 new review comments in e5bc19e:
|
| if_consequent_field: None, | ||
| for_nodes: &["for_statement", "enhanced_for_statement"], | ||
| while_node: Some("while_statement"), | ||
| while_nodes: &[], | ||
| do_node: Some("do_statement"), | ||
| infinite_loop_node: None, | ||
| unless_node: None, | ||
| until_node: None, | ||
| switch_node: Some("switch_expression"), | ||
| switch_nodes: &[], | ||
| case_node: Some("switch_block_statement_group"), | ||
| case_nodes: &["switch_rule"], | ||
| default_node: None, | ||
| try_node: Some("try_statement"), | ||
| catch_node: Some("catch_clause"), | ||
| finally_node: Some("finally_clause"), | ||
| return_node: Some("return_statement"), | ||
| throw_node: Some("throw_statement"), | ||
| break_node: Some("break_statement"), | ||
| continue_node: Some("continue_statement"), | ||
| block_node: Some("block"), | ||
| block_nodes: &[], | ||
| labeled_node: Some("labeled_statement"), | ||
| }; | ||
|
|
||
| pub static CSHARP_CFG: CfgRules = CfgRules { | ||
| if_node: Some("if_statement"), | ||
| if_nodes: &[], | ||
| elif_node: None, | ||
| else_clause: None, |
There was a problem hiding this comment.
Java switch_statement absent from CFG rules
JAVA_CFG maps only switch_expression as a switch node:
switch_node: Some("switch_expression"),
switch_nodes: &[],In tree-sitter-java, the traditional switch statement compiles to switch_statement, and the newer (Java 14+) switch-as-value form compiles to switch_expression. The vast majority of Java code—all code targeting pre-Java 14—uses switch_statement. As written, any Java function with a traditional switch statement will have that switch silently skipped by process_switch, producing a CFG where the entire switch appears as a linear "fall-through" block with no case branching. The switch block's internal break statements will also be unresolved, causing silent incorrect flow.
Fix: add switch_statement to either switch_node or switch_nodes:
| if_consequent_field: None, | |
| for_nodes: &["for_statement", "enhanced_for_statement"], | |
| while_node: Some("while_statement"), | |
| while_nodes: &[], | |
| do_node: Some("do_statement"), | |
| infinite_loop_node: None, | |
| unless_node: None, | |
| until_node: None, | |
| switch_node: Some("switch_expression"), | |
| switch_nodes: &[], | |
| case_node: Some("switch_block_statement_group"), | |
| case_nodes: &["switch_rule"], | |
| default_node: None, | |
| try_node: Some("try_statement"), | |
| catch_node: Some("catch_clause"), | |
| finally_node: Some("finally_clause"), | |
| return_node: Some("return_statement"), | |
| throw_node: Some("throw_statement"), | |
| break_node: Some("break_statement"), | |
| continue_node: Some("continue_statement"), | |
| block_node: Some("block"), | |
| block_nodes: &[], | |
| labeled_node: Some("labeled_statement"), | |
| }; | |
| pub static CSHARP_CFG: CfgRules = CfgRules { | |
| if_node: Some("if_statement"), | |
| if_nodes: &[], | |
| elif_node: None, | |
| else_clause: None, | |
| switch_node: Some("switch_statement"), | |
| switch_nodes: &["switch_expression"], |
Both node types share the same switch_block body structure in the Java grammar, so the existing process_switch logic handles them identically once they are recognized.
There was a problem hiding this comment.
Fixed in 6312ce7 — changed JAVA_CFG to switch_node: Some("switch_statement"), switch_nodes: &["switch_expression"]. Both traditional switch statements and Java 14+ switch expressions are now recognized.
| // Create DB | ||
| const dbPath = path.join(tmpDir, '.codegraph', 'graph.db'); | ||
| db = new Database(dbPath); | ||
| db.pragma('journal_mode = WAL'); | ||
| initSchema(db); | ||
|
|
||
| // Insert definition nodes for parent resolution | ||
| const insertNode = db.prepare( | ||
| 'INSERT INTO nodes (name, kind, file, line, end_line) VALUES (?, ?, ?, ?, ?)', | ||
| ); | ||
| for (const [relPath, symbols] of allSymbols) { | ||
| for (const def of symbols.definitions || []) { | ||
| insertNode.run(def.name, def.kind, relPath, def.line, def.endLine); | ||
| } |
There was a problem hiding this comment.
C# test asserts new kind for a node that is emitted as throw
The fixture only contains a single new expression:
throw new ArgumentNullException("result");walk_ast_nodes_with_config processes throw_statement (or throw_expression) first because "throw_statement" and "throw_expression" are both in CSHARP_AST_CONFIG.throw_types. It emits the node as kind: "throw" with name: "ArgumentNullException" (via extract_throw_target) and returns early, so the inner object_creation_expression is never visited independently. No kind: "new" row is ever inserted into the DB.
This means both assertions in the test will fail when the native binary is available and the describe.skipIf guard is lifted in CI:
expect(news.length).toBeGreaterThanOrEqual(1); // fails — no 'new' rows
expect(news.some((n) => n.name.includes('ArgumentNullException'))).toBe(true); // failsEither add a standalone new expression to the fixture (e.g. var ex = new ArgumentNullException("x");), or fix the assertion to check for the throw kind instead:
const throwNodes = db
.prepare("SELECT * FROM ast_nodes WHERE kind = 'throw' AND file LIKE '%fixture.cs'")
.all();
expect(throwNodes.some((n) => n.name.includes('ArgumentNullException'))).toBe(true);There was a problem hiding this comment.
Fixed in 6312ce7 — added a standalone new expression (var ex = new ArgumentNullException("x")) to the C# fixture so the kind:new assertion has a dedicated node independent of the throw path.
|
Claude finished @carlos-alm's task —— View job Reviewing PR: perf: compute CFG in Rust native engine for all languagesConducting comprehensive PR review following the 4-phase analysis process:
Phase 1: Root Cause Analysis ✅Problem Being Solved: Performance optimization in CFG (Control Flow Graph) building phase. Current JS implementation does WASM re-parsing (~169ms) even when native AST data is already available from the Rust engine. This creates unnecessary double-parsing overhead. Approach Justification: This is a legitimate performance optimization that eliminates redundant WASM tree-walking by computing CFG data during the initial native extraction phase. The approach maintains backward compatibility with WASM fallback when native data is unavailable. This addresses a real bottleneck, not masking symptoms. Phase 2: Configuration Change Detection ✅No High-Risk Configuration Changes Detected:
Phase 3: Test Integrity ✅Test Coverage Assessment:
Phase 4: Code Quality ✅Rust Implementation Analysis:
Minor Issues Noted but Not Blocking:
Root Cause AnalysisProblem Being Solved: Legitimate performance bottleneck - double WASM parsing when native data available Critical ConcernsAll previous critical issues have been addressed: The major bugs identified in earlier review rounds (language ID mismatch, utf8_text panics, continue-in-switch semantics, infinite loop reachability) have been properly fixed in recent commits. Final Recommendation
Recommended Follow-up (non-blocking):
|
|
Addressed both new comments in 6312ce7:
|
crates/codegraph-core/src/cfg.rs
Outdated
| catch_handler = Some(child); | ||
| } | ||
| if matches_opt(child.kind(), self.rules.finally_node) { | ||
| finally_handler = Some(child); | ||
| } | ||
| } | ||
|
|
||
| if let Some(catch_node) = catch_handler { | ||
| let catch_block = self.make_block("catch", Some(node_line(&catch_node)), None, Some("catch")); | ||
| self.add_edge(try_block, catch_block, "exception"); | ||
|
|
||
| let catch_body_node = catch_node.child_by_field_name("body"); |
There was a problem hiding this comment.
process_try_catch silently drops all catch clauses except the last
catch_handler is an Option<Node> that gets unconditionally overwritten on each iteration. For a try { } catch(A) { } catch(B) { } block:
- Iteration 1:
catch_handler = Some(catch_A) - Iteration 2:
catch_handler = Some(catch_B)← overwrites, A is lost
The resulting CFG completely omits every catch clause except the last one. This affects all languages that allow multiple catches: Java (catch_clause), C# (catch_clause), Python (except_clause), and PHP (catch_clause). None of the parity-test fixtures exercise try/catch, so this won't be caught by the existing tests.
Fix: collect all catch clauses into a Vec and wire each one independently to the try block via an "exception" edge:
let mut catch_handlers: Vec<Node> = Vec::new();
let mut finally_handler: Option<Node> = None;
let cursor = &mut try_stmt.walk();
for child in try_stmt.named_children(cursor) {
if matches_opt(child.kind(), self.rules.catch_node) {
catch_handlers.push(child);
}
if matches_opt(child.kind(), self.rules.finally_node) {
finally_handler = Some(child);
}
}
// Then process each catch_handler in catch_handlersThere was a problem hiding this comment.
Fixed in 73d4160 — changed catch_handler from Option to Vec. All catch clauses are now collected and each is processed independently with its own exception edge from the try block.
There was a problem hiding this comment.
Fixed in commit 73d4160 — changed catch_handler: Option<Node> to catch_handlers: Vec<Node> and wired each catch clause independently via "exception" edges.
There was a problem hiding this comment.
Already implemented this way — process_try_catch collects all catch clauses into a Vec<Node> (line 937: let mut catch_handlers: Vec<Node> = Vec::new()) and iterates over each one independently, wiring separate "exception" edges from the try block to each catch block. The Option<Node> approach described in the comment was never in the merged code.
| } | ||
| } | ||
|
|
||
| fn process_for_loop(&mut self, for_stmt: &Node, current: u32) -> Option<u32> { | ||
| let header = self.make_block("loop_header", Some(node_line(for_stmt)), Some(node_line(for_stmt)), Some("for")); | ||
| self.add_edge(current, header, "fallthrough"); | ||
|
|
||
| let exit = self.make_block("body", None, None, None); |
There was a problem hiding this comment.
update_label_map fills ALL pending label placeholders, not just the innermost one
When a loop is created, update_label_map fills every label in label_map whose header_idx is still None. This is correct when nested labels each wrap exactly one loop, but breaks down when an outer label wraps a non-loop block that contains an inner loop:
outer: { // label_map: [("outer", {None, None})]
inner: for (...) { // label_map: [("outer", {None,None}), ("inner", {None,None})]
break outer; // should jump to the block after `outer:`, not the for-loop exit
}
}When process_for_loop is entered for the inner loop, update_label_map(inner_header, inner_exit) fills both "outer" and "inner" with the inner loop's exit. break outer then incorrectly targets the inner loop exit.
The fix is to only fill the most recently pushed unresolved label (i.e., the one that syntactically wraps the current loop):
fn update_label_map(&mut self, header_idx: u32, exit_idx: u32) {
if let Some((_, ctx)) = self.label_map.iter_mut().rev()
.find(|(_, ctx)| ctx.header_idx.is_none())
{
ctx.header_idx = Some(header_idx);
ctx.exit_idx = Some(exit_idx);
}
}This is valid in Java, JavaScript/TypeScript, and Go where labels can prefix any statement (not only loops).
There was a problem hiding this comment.
Fixed in 73d4160 — changed update_label_map to use .iter_mut().rev().find(|ctx| ctx.header_idx.is_none()) so only the most recently pushed unresolved label is filled, preventing inner loops from incorrectly resolving outer labels.
There was a problem hiding this comment.
Fixed in commit 73d4160 — changed update_label_map to only fill the most recently pushed unresolved label using .iter_mut().rev().find(|(_, ctx)| ctx.header_idx.is_none()).
There was a problem hiding this comment.
Already implemented exactly this way — update_label_map uses .iter_mut().rev().find(|(_, ctx)| ctx.header_idx.is_none()) (line 721) to fill only the most recently pushed unresolved label, not all pending labels. The suggested fix matches the current code verbatim.
| const hasNativeCfg = symbols.definitions | ||
| .filter((d) => (d.kind === 'function' || d.kind === 'method') && d.line) | ||
| .every((d) => d.cfg?.blocks?.length); | ||
| if (!hasNativeCfg) { | ||
| needsFallback = true; | ||
| break; | ||
| } |
There was a problem hiding this comment.
hasNativeCfg check still missing d.cfg === null guard at the needsFallback gate
The per-definition allNative check (added ~20 lines lower) was correctly fixed with d.cfg === null || d.cfg?.blocks?.length, but this earlier hasNativeCfg check that guards the global needsFallback flag still uses only d.cfg?.blocks?.length.
For a Go file with an interface type, the interface method signatures are emitted with cfg: null. null?.blocks?.length evaluates to undefined (falsy), so .every() returns false, hasNativeCfg is false, needsFallback = true, and the WASM parser is initialized for every file in the run — exactly the performance regression the PR aims to eliminate.
| const hasNativeCfg = symbols.definitions | |
| .filter((d) => (d.kind === 'function' || d.kind === 'method') && d.line) | |
| .every((d) => d.cfg?.blocks?.length); | |
| if (!hasNativeCfg) { | |
| needsFallback = true; | |
| break; | |
| } | |
| const hasNativeCfg = symbols.definitions | |
| .filter((d) => (d.kind === 'function' || d.kind === 'method') && d.line) | |
| .every((d) => d.cfg === null || d.cfg?.blocks?.length); |
|
Claude finished @carlos-alm's task —— View job Reviewing PR: perf: compute CFG in Rust native engine for all languagesConducted comprehensive PR review following the 4-phase analysis process:
Phase 1: Root Cause Analysis ✅Problem Being Solved: Performance optimization in CFG (Control Flow Graph) building phase. Current JS implementation does WASM re-parsing (~169ms) even when native AST data is already available from the Rust engine. This creates unnecessary double-parsing overhead. Approach Justification: This is a legitimate performance optimization that eliminates redundant WASM tree-walking by computing CFG data during the initial native extraction phase. The approach maintains backward compatibility with WASM fallback when native data is unavailable. This addresses a real bottleneck, not masking symptoms. Phase 2: Configuration Change Detection ✅No High-Risk Configuration Changes Detected:
Phase 3: Test Integrity ✅Test Coverage Assessment:
Phase 4: Code Quality
|
|
Addressed both new comments in 73d4160:
|
| self.loop_stack.push(LoopCtx { header_idx: header, exit_idx: exit, is_loop: true }); | ||
| self.update_label_map(header, exit); | ||
|
|
||
| let body = for_stmt.child_by_field_name("body"); | ||
| let body_block = self.make_block("loop_body", None, None, None); | ||
| self.add_edge(header, body_block, "branch_true"); | ||
|
|
||
| if let Some(body) = body { | ||
| let stmts = self.get_statements(&body); | ||
| let body_end = self.process_statements(&stmts, body_block); | ||
| if let Some(be) = body_end { | ||
| self.add_edge(be, header, "loop_back"); | ||
| } | ||
| } | ||
|
|
||
| self.add_edge(header, exit, "loop_exit"); | ||
| self.loop_stack.pop(); | ||
| Some(exit) | ||
| } | ||
|
|
||
| fn process_while_loop(&mut self, while_stmt: &Node, current: u32) -> Option<u32> { | ||
| let header = self.make_block("loop_header", Some(node_line(while_stmt)), Some(node_line(while_stmt)), Some("while")); | ||
| self.add_edge(current, header, "fallthrough"); | ||
|
|
There was a problem hiding this comment.
Go for {} produces spurious loop_exit edge
process_for_loop unconditionally emits header → exit with "loop_exit", which is correct for conditional loops (for i := 0; i < n; i++) but wrong for Go's infinite loops (for {}). In tree-sitter-go there is no infinite_loop_node — all Go loops, including for {}, are for_statement nodes, so they all arrive here.
For a Go for {} body with no break, the exit block will have exactly one predecessor (the loop_exit edge) but no actual runtime path reaches it. Any code written after for {} will appear reachable in the CFG even though it is dead code — the same semantic bug that was already fixed for Rust's loop {} via process_infinite_loop.
The condition field is absent on for {} nodes. A targeted fix is to check whether the for_statement has a condition field; if not, treat it like an infinite loop and only return Some(exit) when a break targeted it:
fn process_for_loop(&mut self, for_stmt: &Node, current: u32) -> Option<u32> {
let is_infinite = for_stmt.child_by_field_name("condition").is_none()
&& for_stmt.child_by_field_name("left").is_none(); // Go range uses "left"
let header = self.make_block("loop_header", ...);
self.add_edge(current, header, "fallthrough");
let exit = self.make_block("body", None, None, None);
self.loop_stack.push(LoopCtx { header_idx: header, exit_idx: exit, is_loop: true });
self.update_label_map(header, exit);
// ... process body ...
if is_infinite {
self.loop_stack.pop();
let has_break = self.edges.iter().any(|e| e.target_index == exit);
if has_break { Some(exit) } else { None }
} else {
self.add_edge(header, exit, "loop_exit");
self.loop_stack.pop();
Some(exit)
}
}| } | ||
| if matches_opt(child.kind(), self.rules.finally_node) { | ||
| finally_handler = Some(child); | ||
| } | ||
| } | ||
|
|
||
| if !catch_handlers.is_empty() { | ||
| let mut catch_ends: Vec<Option<u32>> = Vec::new(); | ||
|
|
||
| for catch_node in &catch_handlers { | ||
| let catch_block = self.make_block("catch", Some(node_line(catch_node)), None, Some("catch")); |
There was a problem hiding this comment.
Python try…except…else block silently dropped from CFG
The loop here only collects children whose kind matches rules.catch_node (Python: "except_clause") or rules.finally_node (Python: "finally_clause"). In tree-sitter-python a try_statement can also contain an else_clause child — the else block that runs only when the try body completed without raising an exception.
Because else_clause matches neither catch_node nor finally_node, it is silently ignored. The resulting CFG omits the else block entirely: execution flows from try_block → join_block directly, bypassing what should be a distinct "success" path. For Python code like:
try:
x = risky()
except ValueError:
handle()
else:
success() # ← this block is invisible in the CFGThe fix is to also look for rules.else_clause in this loop (Python reuses "else_clause" for both if…else and try…else), wire it between try_end and join_block, and exclude it from the try_body collection. Since else_clause is already a field on CfgRules, no struct change is required — only process_try_catch needs updating.
| catch_node: Some("catch_clause"), | ||
| finally_node: Some("finally_clause"), | ||
| return_node: Some("return_statement"), | ||
| throw_node: Some("throw_statement"), |
There was a problem hiding this comment.
C# switch_expression silently skipped in CFG
CSHARP_CFG only lists switch_node: Some("switch_statement") with switch_nodes: &[]. In tree-sitter-csharp, C# 8+ switch expressions compile to a switch_expression node (distinct from switch_statement), and each arm is a switch_expression_arm. Because neither appears in switch_nodes or case_nodes, any C# function containing a switch expression will have the entire construct treated as a regular sequential statement — no branching, no case edges, and any return/throw inside the arms will not appear in the CFG.
This is the same structural gap that was already identified and fixed for Java (where both switch_statement and switch_expression are now listed). C# should receive the same treatment:
switch_node: Some("switch_statement"),
switch_nodes: &["switch_expression"],
case_node: Some("switch_section"),
case_nodes: &["switch_expression_arm"],The tree-sitter-csharp switch_expression_arm node has a value field (the arm's expression), which is already handled by the existing child_by_field_name("value") fallback in process_switch (the same path used for Rust match arms). No additional logic changes should be needed — only the grammar rule additions above.
| continue_node: Some("next"), | ||
| block_node: None, | ||
| block_nodes: &["then", "do", "body_statement"], |
There was a problem hiding this comment.
Ruby method-level rescue not captured — most common Ruby rescue pattern produces incorrect CFG
RUBY_CFG sets try_node: Some("begin"), so only explicit begin ... rescue ... end blocks are recognized as try/catch. However, the idiomatic Ruby pattern for error handling is a method-level rescue that appears directly inside the method body — without an explicit begin block:
def process(data)
risky_operation(data)
rescue ArgumentError => e # ← direct child of body_statement, never inside begin
handle_error(e)
endIn tree-sitter-ruby, a method node's body is a body_statement. Since "body_statement" is listed in RUBY_CFG.block_nodes, its children (including any rescue nodes) are returned as flat statements by get_statements. When process_statement encounters a rescue node, it doesn't match try_node ("begin"), so it falls through to the "regular statement" handler — the rescue block is silently folded into the preceding sequential block with no exception edge.
The impact is significant: any Ruby method using the canonical rescue pattern will have an incorrect CFG — the error-handling path is invisible, and the rescue body is included in the normal flow block.
A targeted fix is to also detect "rescue" in process_statement when it appears as a top-level child of the function body, and route it through process_try_catch logic, or add a dedicated handler that creates an exception edge from the preceding block to the rescue block.
Port buildFunctionCFG algorithm from JS (cfg.js) to Rust (cfg.rs) with per-language CfgRules for all 8 supported languages. Each extractor now calls build_function_cfg() on function/method AST nodes during extraction, storing the CFG directly on the Definition struct. JS pipeline updated to use native CFG data when available (def.cfg), falling back to WASM tree walk only when native data is absent. This eliminates the need for WASM re-parsing in the CFG phase for native engine builds. Rust changes: - New cfg.rs module with CfgRules struct and 8 language configs - CfgBlock, CfgEdge, CfgData types in types.rs - All extractors call build_function_cfg for function/method defs JS changes: - parser.js normalizeNativeSymbols maps def.cfg through - cfg.js buildCFGData checks def.cfg before WASM fallback - Skips WASM parser init when all defs have native CFG Tests: 1437 pass, new cfg-all-langs.test.js with JS-side mock tests and native parity tests (block/edge count + type matching). Impact: 38 functions changed, 56 affected
…interface methods - Change "c_sharp" to "csharp" in cfg.rs, complexity.rs, and csharp.rs to match the canonical ID from parser_registry::lang_id_str() - Set cfg: None for Go interface method_elem signatures since they have no bodies, matching the existing complexity: None pattern Impact: 6 functions changed, 6 affected
…infinite loops
Fix two issues in the Rust native CFG construction:
1. CfgBuilder called utf8_text(&[]) to extract label names for labeled
loops, break, and continue statements. Passing an empty slice panics
when the label node has a non-zero byte offset. Add a `source` field
to CfgBuilder, thread it through from build_function_cfg, and use
utf8_text(self.source) for all 3 call sites. Updated all 8 extractor
files to pass source bytes.
2. process_infinite_loop used "branch_true" for the edge from loop header
to body, but Rust `loop {}` is unconditional — there is no condition
to branch on. Changed to "fallthrough" to match the semantics.
Impact: 14 functions changed, 18 affected
- Set cfg: None for TS interface method_signature/property_signature (no body) - Tag loop_stack entries with is_loop flag; continue now skips switch entries - Use match_arm value field as case body for Rust instead of filtering it out - Return None from process_infinite_loop when no break targets the exit block - Treat cfg: null as native-handled in allNative check to avoid false WASM fallback - Re-add complexityRules guard before passing to findFunctionNode
Impact: 2 functions changed, 0 affected
73d4160 to
e020420
Compare
|
Claude finished @carlos-alm's task —— View job Reviewing PR: perf: compute CFG in Rust native engine for all languagesConducted comprehensive PR review following the 4-phase analysis process:
Phase 1: Root Cause Analysis ✅Problem Being Solved: Performance optimization in CFG (Control Flow Graph) building phase. Current JS implementation does WASM re-parsing (~169ms) even when native AST data is already available from the Rust engine. This creates unnecessary double-parsing overhead. Approach Justification: This is a legitimate performance optimization that eliminates redundant WASM tree-walking by computing CFG data during the initial native extraction phase. The approach maintains backward compatibility with WASM fallback when native data is unavailable. This addresses a real bottleneck, not masking symptoms. Phase 2: Configuration Change Detection ✅No High-Risk Configuration Changes Detected:
Phase 3: Test Integrity ✅Test Coverage Assessment:
Phase 4: Code Quality ✅Rust Implementation Analysis:
REMAINING CRITICAL ISSUES:
Minor Issues Noted but Not Blocking:
Root Cause AnalysisProblem Being Solved: Legitimate performance bottleneck - double WASM parsing when native data available Critical ConcernsPerformance Goal Defeat: The missing Final Recommendation
Recommended Follow-up (non-blocking):
|
| if !has_default { | ||
| self.add_edge(switch_header, join_block, "branch_false"); | ||
| } |
There was a problem hiding this comment.
Python match with case _: wildcard emits spurious "branch_false" exit edge
PYTHON_CFG.default_node is None (line 97), so the is_default flag on line 860 is always false for Python — even when a case_clause is the wildcard arm (case _:). After all clauses are processed, has_default remains false, and this unconditional fallthrough edge is always added:
if !has_default {
self.add_edge(switch_header, join_block, "branch_false");
}For an exhaustive Python match with a wildcard catch-all, this edge makes the CFG appear as if the match can fall through without matching any case, even though case _: is exhaustive.
Fix: Inspect the pattern child of the case clause to detect wildcard patterns in tree-sitter-python:
// In process_switch, when processing a case_clause:
let is_default_pattern = case_clause
.child_by_field_name("pattern")
.map(|p| p.kind() == "wildcard_pattern")
.unwrap_or(false);
let is_default = matches_opt(cc_kind, self.rules.default_node) || is_default_pattern;Alternatively, add a default_pattern_node field to CfgRules for languages (like Python) whose wildcard defaults are identified by the pattern child's kind.
| const hasNativeCfg = symbols.definitions | ||
| .filter((d) => (d.kind === 'function' || d.kind === 'method') && d.line) | ||
| .every((d) => d.cfg?.blocks?.length); |
There was a problem hiding this comment.
hasNativeCfg pre-scan missing d.cfg === null guard, defeating performance optimization
This check at line 1057 determines whether to initialize the WASM parser for the entire run. However, it uses only .every((d) => d.cfg?.blocks?.length) without accounting for cfg: null (intentionally-null CFG for interface/abstract methods).
When a definition has cfg: null, the optional chaining produces undefined (falsy), causing .every() to return false. Any file containing even one interface method (valid in Go, Java, C#) will set needsFallback = true and initialize the WASM parser for the whole file and pipeline, even though the concrete methods already have complete native CFG data.
This defeats the performance goal of the PR for any file mixing concrete implementations with interface/abstract declarations — exactly the common case.
Compare with the corrected allNative check at line 1114, which already has the fix:
.every((d) => d.cfg === null || d.cfg?.blocks?.length)Fix: Apply the same guard to line 1059:
const hasNativeCfg = symbols.definitions
.filter((d) => (d.kind === 'function' || d.kind === 'method') && d.line)
.every((d) => d.cfg === null || d.cfg?.blocks?.length);| switch_node: Some("switch_statement"), | ||
| switch_nodes: &[], |
There was a problem hiding this comment.
C# switch_expression silently skipped — C# 8+ expressions treated as sequential statements
CSHARP_CFG lists only:
switch_node: Some("switch_statement"),
switch_nodes: &[],In tree-sitter-csharp, C# 8+ switch expressions compile to switch_expression nodes with switch_expression_arm children. Since neither switch_expression nor its arms appear in any of these lists, any C# function containing a modern switch expression is treated as a single sequential statement — no branching, no case edges, and any control flow inside the arms (return, throw, break) is completely omitted from the CFG.
This mirrors the gap that was fixed for Java, which now lists both switch_statement and switch_expression. C# should receive the same treatment.
Fix: Add switch_expression to the configuration:
switch_node: Some("switch_statement"),
switch_nodes: &["switch_expression"],
case_node: Some("switch_section"),
case_nodes: &["switch_expression_arm"],| try_node: Some("begin"), | ||
| catch_node: Some("rescue"), |
There was a problem hiding this comment.
Ruby method-level rescue not recognized — idiomatic Ruby error-handling pattern silently omitted
RUBY_CFG.try_node is Some("begin"), so only explicit begin ... rescue ... end blocks are recognized as try/catch constructs. However, idiomatic Ruby exception handling uses method-level rescue directly inside the method body without an explicit begin block:
def process(data)
risky_operation(data)
rescue ArgumentError => e # ← direct child of body_statement, never inside begin
handle_error(e)
endIn tree-sitter-ruby, the method body is a body_statement, and rescue nodes are direct children of it (listed in block_nodes on line 261). When process_statement encounters a rescue node at the top level of the method body, it doesn't match try_node ("begin"), so it falls through to the generic statement handler — the rescue block is silently folded into the preceding sequential code instead of being recognized as an exception handler.
This affects the majority of Ruby code, which uses method-level rescue rather than explicit begin blocks.
Fix: Extend the try detection to recognize rescue at the top level of method bodies, or treat standalone rescue nodes specially in process_statement when they appear as children of block_nodes.
| fn process_for_loop(&mut self, for_stmt: &Node, current: u32) -> Option<u32> { | ||
| let header = self.make_block("loop_header", Some(node_line(for_stmt)), Some(node_line(for_stmt)), Some("for")); | ||
| self.add_edge(current, header, "fallthrough"); | ||
|
|
||
| let exit = self.make_block("body", None, None, None); | ||
|
|
||
| self.loop_stack.push(LoopCtx { header_idx: header, exit_idx: exit, is_loop: true }); | ||
| self.update_label_map(header, exit); | ||
|
|
||
| let body = for_stmt.child_by_field_name("body"); | ||
| let body_block = self.make_block("loop_body", None, None, None); | ||
| self.add_edge(header, body_block, "branch_true"); | ||
|
|
||
| if let Some(body) = body { | ||
| let stmts = self.get_statements(&body); | ||
| let body_end = self.process_statements(&stmts, body_block); | ||
| if let Some(be) = body_end { | ||
| self.add_edge(be, header, "loop_back"); | ||
| } | ||
| } | ||
|
|
||
| self.add_edge(header, exit, "loop_exit"); |
There was a problem hiding this comment.
Go for {} infinite loop produces spurious "loop_exit" edge, making dead code appear reachable
GO_CFG has infinite_loop_node: None, so all Go loops — including for {} infinite loops — are processed by process_for_loop. This function unconditionally emits:
self.add_edge(header, exit, "loop_exit"); // line 750and returns Some(exit) (line 752), indicating that the exit block is reachable.
For a Go for {} loop with no break statement, the exit block has no incoming edges except from this spurious "loop_exit" edge. Any code after for {} is consequently marked as reachable in the CFG, even though it is dead code:
for {} // infinite loop, never terminates
unreachable_code() // marked as reachable in CFG, but is dead(Rust's loop {} handles this correctly via process_infinite_loop, which returns None when no break targeted the exit block.)
Fix: In process_for_loop, check whether the loop has a condition field. If absent (indicating an infinite loop like Go's for {}), use the same logic as process_infinite_loop: return None if no break targeted the exit block, making subsequent code correctly unreachable.
| let stmts: Vec<Node> = try_stmt.named_children(cursor) | ||
| .filter(|child| { | ||
| let ck = child.kind(); | ||
| !matches_opt(ck, self.rules.catch_node) | ||
| && !matches_opt(ck, self.rules.finally_node) | ||
| }) | ||
| .collect(); |
There was a problem hiding this comment.
Python try…except…else block silently dropped — else_clause incorrectly included in try body
The fallback path for extracting the try body (when there's no explicit body field) collects all children except those matching catch_node and finally_node:
let stmts: Vec<Node> = try_stmt.named_children(cursor)
.filter(|child| {
let ck = child.kind();
!matches_opt(ck, self.rules.catch_node) // except_clause
&& !matches_opt(ck, self.rules.finally_node)
})
.collect();In tree-sitter-python, a try_statement can contain an else_clause child — the block that runs only when the try body completed without raising an exception. Since else_clause matches neither filter, it passes through and is included in stmts, getting processed as part of the try body itself.
This is semantically wrong. The else block should execute only after the try succeeds (no exception), not as an integral part of the try body. The CFG omits the else block as a distinct "success path" and instead merges it into the try block.
Fix: Also filter out else_clause children:
.filter(|child| {
let ck = child.kind();
!matches_opt(ck, self.rules.catch_node)
&& !matches_opt(ck, self.rules.finally_node)
&& ck != "else_clause" // Add this
})Then process the else clause as a distinct path between try_end and join_block, after the try succeeds but before any finally block.
Summary
buildFunctionCFGalgorithm from JS (src/cfg.js) to Rust (crates/codegraph-core/src/cfg.rs) with per-languageCfgRulesfor all 8 supported languages (JS/TS/TSX, Python, Go, Rust, Java, C#, Ruby, PHP)build_function_cfg()on function/method AST nodes during extraction, storing CFG data directly on theDefinitionstructcfg.js,parser.js) updated to use native CFG data when available (def.cfg), falling back to WASM tree walk only when native data is absentChanges
Rust (~750 lines):
cfg.rs:CfgRulesstruct, 8 language configs,CfgBuilderstate machine porting the full algorithm (if/elif/else 3 patterns, for/while/do-while/infinite loops, switch/match, try/catch/finally, break/continue with labels, loop stack)types.rs:CfgBlock,CfgEdge,CfgDatastructs with napi bindingsbuild_function_cfg(node, lang_id)for function/method definitionsJS (~30 lines):
parser.js:normalizeNativeSymbolsmapsdef.cfgthrough to JScfg.js:buildCFGDatachecksdef.cfgbefore WASM fallback, skips WASM parser init when all defs have native CFGTest plan
cfg-all-langs.test.js— JS-side mock tests (2 pass locally)buildFunctionCFGPR 2 of 4 in the WASM double-parse elimination plan.