feat: intraprocedural control flow graph (CFG) by carlos-alm · Pull Request #274 · optave/codegraph

carlos-alm · 2026-03-03T03:24:28Z

Summary

Add opt-in CFG analysis (--cfg flag on build) that constructs basic-block control flow graphs from tree-sitter AST for individual functions
New cfg <name> CLI command with --format text|dot|mermaid, JSON, and NDJSON output
MCP cfg tool for AI agent access with pagination support
DB migration v12: cfg_blocks + cfg_edges tables (separate from interprocedural edges)
Phase 1: JS/TS/TSX — handles if/else, for/while/do-while, switch, try/catch/finally, break/continue (with labels), return/throw

Test plan

24 unit tests covering all CFG construction patterns (empty, if/else, loops, break/continue, switch, try/catch/finally, nested structures, arrow functions)
11 integration tests (query, DOT export, Mermaid export, error cases)
Full test suite passes (1218 tests, 0 failures)
Lint clean (0 errors in changed files)
Dogfood: build --cfg analyzed 447 functions; cfg buildGraph shows 181 blocks / 238 edges
All 3 output formats verified (text, DOT, Mermaid)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* docs: add Contributor License Agreement (CLA) * ci: add CLA Assistant workflow and fix CLA.md issues - Add .github/workflows/cla.yml using contributor-assistant/github-action@v2.6.1 with dedicated cla-signatures branch to avoid polluting main - Fix CLA.md: section 7→6 reference, capitalization consistency, control definition formatting (Roman numerals → lettered list) - Add Acceptance section documenting the CLA bot signing process - Add Governing Law clause (Province of Alberta, Canada) - Update CONTRIBUTING.md with CLA signing instructions * docs: document CLA recheck command in CONTRIBUTING.md Address Greptile review feedback on #244 — add note that contributors can comment `recheck` on a PR to re-trigger the CLA signature check.

* feat: add dataflow analysis (flows_to, returns, mutates edges) Track how data moves through functions with three new edge types: - flows_to: parameter/variable flows into another function as argument - returns: call return value is captured by the caller - mutates: parameter-derived value is mutated in-place Opt-in via `build --dataflow` (JS/TS only for MVP). Adds schema migration v10 (dataflow table), extractDataflow() AST walker with scope tracking and confidence scoring, query functions (dataflowData, dataflowPathData, dataflowImpactData), CLI command with --path and --impact modes, MCP tool, batch support, and programmatic API exports. Impact: 29 functions changed, 33 affected * fix: handle spread args, optional chaining, and reassignment in dataflow Address review feedback from Greptile: - Track spread arguments (foo(...args)) by unwrapping spread_element - Handle optional chaining (foo?.bar()) in callee name resolution - Track non-declaration assignments (x = foo() without const/let/var) as returns edges - Add 3 tests covering these cases Impact: 3 functions changed, 3 affected

Insert Phase 4 (TypeScript Migration) between the architectural refactoring phase and the intelligent embeddings phase. Renumber all subsequent phases (old 4-8 → new 5-9) including sub-section headings, cross-references, dependency graph, and verification table. The migration is planned after Phase 3 because the architectural refactoring establishes clean module boundaries that serve as natural type boundaries for incremental TS adoption.

* feat(watcher): add structured NDJSON change journal for watch mode Write symbol-level change events to .codegraph/change-events.ndjson during watch mode. Each line records added/removed/modified symbols with node counts and edge data, enabling external tools to detect rule staleness without polling. File is size-capped at 1 MB with keep-last-half rotation. Impact: 8 functions changed, 4 affected * style: use template literals per biome lint Impact: 1 functions changed, 1 affected * feat(queries): expose file content hash in where and query JSON output Add fileHash field to queryNameData, whereSymbolImpl, and whereFileImpl return objects by looking up the file_hashes table. This lets consumers (e.g. code-praxis) detect when a rule's target file has changed since the rule was created, enabling staleness detection. Impact: 4 functions changed, 16 affected * style: use template literal in test fixture * fix(change-journal): add debug/warn logging for observability Address review feedback: debug log on successful append and rotation, warn when a single oversized line prevents rotation. Impact: 2 functions changed, 2 affected * style: fix biome formatting in change-journal warn call Impact: 1 functions changed, 1 affected * fix(change-journal): use Buffer for byte-accurate rotation midpoint stat.size returns bytes but String.length counts UTF-16 characters. Read as Buffer and use buf.indexOf(0x0a) to find the newline at the byte-level midpoint, ensuring consistent behavior with multi-byte UTF-8. Impact: 1 functions changed, 1 affected

* feat: add batch-query command and multi-command batch mode Add splitTargets() for comma-separated target expansion, multiBatchData() for mixed-command orchestration, and a new batch-query CLI command that defaults to 'where'. The existing batch command also gains comma splitting and multi-command detection via --from-file/--stdin. Impact: 5 functions changed, 3 affected * fix: add try/catch around JSON.parse in batch and batch-query actions Wrap --from-file and --stdin JSON parsing with error handling so malformed input produces a clear error message instead of an unhandled exception.

* docs: add competitive deep-dive for Joern and reorganize competitive folder Move COMPETITIVE_ANALYSIS.md into generated/competitive/ and add a comprehensive feature-by-feature comparison against joernio/joern (our #1-ranked competitor). Covers parsing, graph model, query language, performance, installation, AI/MCP integration, security analysis, developer productivity, and ecosystem across 100+ individual features. Update FOUNDATION.md reference to the new path. * fix: update broken links to moved COMPETITIVE_ANALYSIS.md README.md and docs/roadmap/BACKLOG.md still referenced the old path at generated/COMPETITIVE_ANALYSIS.md after the file was moved to generated/competitive/COMPETITIVE_ANALYSIS.md in #260.

Reduce MCP tool surface from 32 to 29 by merging overlapping tools: - Rename query_function → query with deps/path modes (absorbs fn_deps + symbol_path) - Add list mode to execution_flow (absorbs list_entry_points) - Remove path mode from dataflow tool (now edges + impact only) - Merge fn and path CLI commands into query (--path flag for path mode) - Remove --path option from dataflow CLI command - Update batch commands: remove fn, add dataflow, query uses fnDepsData - Update MCP_DEFAULTS pagination keys BREAKING CHANGE: MCP tools fn_deps, symbol_path, list_entry_points removed. CLI commands fn and path removed. Use query instead. Impact: 1 functions changed, 1 affected

* docs: add competitive deep-dive for narsil-mcp Comprehensive feature-by-feature analysis of narsil-mcp (postrv/narsil-mcp), the closest head-to-head competitor to codegraph. Covers all 8 FOUNDATION.md principles, 9 feature comparison sections with 130+ features, gap analysis, and competitive positioning. * fix: address Greptile review — scoring math and relative path - Fix principle scoring from 6-0-2 to 7-0-1 (correct count from table) - Fix relative link to COMPETITIVE_ANALYSIS.md (../ not ./)

* docs: add competitive deep-dive for Joern and reorganize competitive folder Move COMPETITIVE_ANALYSIS.md into generated/competitive/ and add a comprehensive feature-by-feature comparison against joernio/joern (our #1-ranked competitor). Covers parsing, graph model, query language, performance, installation, AI/MCP integration, security analysis, developer productivity, and ecosystem across 100+ individual features. Update FOUNDATION.md reference to the new path. * fix: update broken links to moved COMPETITIVE_ANALYSIS.md README.md and docs/roadmap/BACKLOG.md still referenced the old path at generated/COMPETITIVE_ANALYSIS.md after the file was moved to generated/competitive/COMPETITIVE_ANALYSIS.md in #260. * docs: add Joern-inspired feature candidates with BACKLOG-style grading Append a new "Joern-Inspired Feature Candidates" section to the Joern competitive deep-dive. Lists 11 actionable features extracted from Parsing & Language Support, Graph Model & Analysis Depth, and Query Language & Interface sections — assessed with the same tier/grading system used in BACKLOG.md (zero-dep, foundation-aligned, problem-fit, breaking). Tier 1 non-breaking: call-chain slicing, type-informed resolution, error-tolerant parsing, regex filtering, Kotlin, Swift, script execution. Tier 1 breaking: expanded node/edge types, intraprocedural CFG, stored AST. Not adopted: 9 features with FOUNDATION.md reasoning. Cross-references BACKLOG IDs 14 and 7.

Add normalizeSymbol(row, db, hashCache) that returns a consistent 7-field symbol shape (name, kind, file, line, endLine, role, fileHash) across all query and search commands. Update queryNameData, fnDepsData, fnImpactData, explainFunctionImpl, listFunctionsData, rolesData, whereSymbolImpl in queries.js and searchData, multiSearchData, ftsSearchData, hybridSearchData in embedder.js to use normalizeSymbol. Update SQL in listFunctionsData, rolesData, iterListFunctions, iterRoles, _prepareSearch, and ftsSearchData to include end_line and role columns. Export normalizeSymbol from index.js. Add docs/json-schema.md documenting the stable schema. Add 8 unit tests and 7 integration schema conformance tests. Impact: 13 functions changed, 33 affected Impact: 14 functions changed, 42 affected

Add normalizeSymbol(row, db, hashCache) that returns a consistent 7-field symbol shape (name, kind, file, line, endLine, role, fileHash) across all query and search commands. Update queryNameData, fnDepsData, fnImpactData, explainFunctionImpl, listFunctionsData, rolesData, whereSymbolImpl in queries.js and searchData, multiSearchData, ftsSearchData, hybridSearchData in embedder.js to use normalizeSymbol. Update SQL in listFunctionsData, rolesData, iterListFunctions, iterRoles, _prepareSearch, and ftsSearchData to include end_line and role columns. Export normalizeSymbol from index.js. Add docs/json-schema.md documenting the stable schema. Add 8 unit tests and 7 integration schema conformance tests. Impact: 13 functions changed, 33 affected Impact: 14 functions changed, 42 affected Impact: 13 functions changed, 21 affected

Re-evaluate all architectural recommendations against the actual codebase as it grew from v1.4.0 (5K lines, 12 modules) to v2.6.0 (17,830 lines, 35 modules). Architecture audit: - Reprioritize: dual-function anti-pattern across 15 modules is now #1 (was analysis/formatting split at #3) - Downgrade parser plugin system from #1 to #20 (parser.js shrank to 404 lines after native engine took over) - Add 3 new recommendations: decompose complexity.js (2,163 lines), unified graph model for structure/cochange/communities, pagination standardization - Update all metrics and line counts to current state Roadmap: - Add Phase 2.5 (Analysis Expansion) documenting 18 modules shipped across v2.0.0-v2.6.0 (complexity, communities, structure, flow, cochange, manifesto, boundaries, check, audit, batch, triage, hybrid search, owners, snapshot, etc.) - Mark Phase 5.3 (Hybrid Search) as completed early in Phase 2.5 - Update Phase 3 priorities based on revised architecture analysis - Update version to 2.6.0, language count to 11, phase count to 10 - Add Phase 8 note referencing check command foundation from 2.5

) * docs: add competitive deep-dive for Joern and reorganize competitive folder Move COMPETITIVE_ANALYSIS.md into generated/competitive/ and add a comprehensive feature-by-feature comparison against joernio/joern (our #1-ranked competitor). Covers parsing, graph model, query language, performance, installation, AI/MCP integration, security analysis, developer productivity, and ecosystem across 100+ individual features. Update FOUNDATION.md reference to the new path. * fix: update broken links to moved COMPETITIVE_ANALYSIS.md README.md and docs/roadmap/BACKLOG.md still referenced the old path at generated/COMPETITIVE_ANALYSIS.md after the file was moved to generated/competitive/COMPETITIVE_ANALYSIS.md in #260. * docs: add Joern-inspired feature candidates with BACKLOG-style grading Append a new "Joern-Inspired Feature Candidates" section to the Joern competitive deep-dive. Lists 11 actionable features extracted from Parsing & Language Support, Graph Model & Analysis Depth, and Query Language & Interface sections — assessed with the same tier/grading system used in BACKLOG.md (zero-dep, foundation-aligned, problem-fit, breaking). Tier 1 non-breaking: call-chain slicing, type-informed resolution, error-tolerant parsing, regex filtering, Kotlin, Swift, script execution. Tier 1 breaking: expanded node/edge types, intraprocedural CFG, stored AST. Not adopted: 9 features with FOUNDATION.md reasoning. Cross-references BACKLOG IDs 14 and 7. * docs: add competitive deep-dive for Narsil-MCP with feature candidates Comprehensive comparison across 10 dimensions: parsing (32 vs 11 languages), graph model (CFG/DFG/type inference vs complexity/roles/ communities), search (similarity/chunking vs RRF hybrid), security (147 rules vs none), queries (90 tools vs 21 + compound commands), performance (cold start vs incremental), install, MCP integration, developer productivity, and ecosystem. Feature candidates section covers all comparison sections: - Tier 1 non-breaking (10): MCP presets, AST chunking, code similarity, git blame/symbol history, remote repo indexing, config wizard, Kotlin, Swift, Bash, Scala language support - Tier 1 breaking (1): export map per module - Tier 2 (2): interactive HTML viz, multiple embedding backends - Tier 3 (2): OWASP patterns, SBOM generation - Not adopted (10): taint, type inference, SPARQL/RDF, CCG, in-memory arch, 90-tool surface, browser WASM, Forgemax, LSP, license scanning - Cross-references to BACKLOG IDs 7, 8, 10, 14 and Joern candidates J4, J5, J8, J9

…se 2) Build file→definition and parent→child contains edges, parameter_of inverse edges, and receiver edges for method-call dispatch. Add CORE_EDGE_KINDS, STRUCTURAL_EDGE_KINDS, EVERY_EDGE_KIND constants. Exclude structural edges from moduleMapData coupling counts. Scope directory contains-edge cleanup to preserve symbol-level edges. Impact: 3 functions changed, 22 affected

Add show-diff-impact.sh that automatically runs `codegraph diff-impact --staged -T` before git commit commands. The hook injects blast radius info as additionalContext — informational only, never blocks commits.

…#268) * feat(export): add GraphML, GraphSON, Neo4j CSV formats and interactive HTML viewer Add three new export formats for graph database interoperability: - GraphML (XML standard) with file-level and function-level modes - GraphSON (TinkerPop v3) for Gremlin/JanusGraph compatibility - Neo4j CSV (bulk import) with separate nodes/relationships files Add interactive HTML viewer (`codegraph plot`) powered by vis-network: - Hierarchical, force, and radial layouts with physics toggle - Node coloring by kind or role, search/filter, legend panel - Configurable via .plotDotCfg JSON file Update CLI export command, MCP export_graph tool, and programmatic API to support all six formats. Impact: 12 functions changed, 6 affected * feat(plot): add drill-down, clustering, complexity overlays, and detail panel Evolve the plot command from a static viewer into an interactive exploration tool with rich data overlays and navigation. Data preparation: - Extract prepareGraphData() with complexity, fan-in/fan-out, Louvain community detection, directory derivation, and risk flag computation - Seed strategies: all (default), top-fanin, entry Interactive features: - Detail sidebar: metrics, callers/callees lists, risk badges - Drill-down: click-to-expand / double-click-to-collapse neighbors - Clustering: community and directory grouping via vis-network API - Color by: kind, role, community, complexity (MI-based borders) - Size by: uniform, fan-in, fan-out, complexity - Risk overlay: dead-code (dashed), high-blast-radius (shadow), low-MI CLI options: - --cluster, --overlay, --seed, --seed-count, --size-by, --color-by Tests expanded from 7 to 21 covering all new data enrichment, seed strategies, risk flags, UI elements, and config backward compatibility. Impact: 5 functions changed, 3 affected * fix(test): update MCP export_graph enum to include new formats The previous commit added graphml, graphson, and neo4j export formats to the MCP tool definition but did not update the test assertion. * style: format mcp test after enum update * fix(security): escape config values in HTML template to prevent XSS Use JSON.stringify() for cfg.layout.direction, effectiveColorBy, and cfg.clusterBy when interpolated into inline JavaScript. Replace shell exec() with execFile() for browser-open to avoid path injection. Impact: 1 functions changed, 1 affected

* feat(export): add GraphML, GraphSON, Neo4j CSV formats and interactive HTML viewer Add three new export formats for graph database interoperability: - GraphML (XML standard) with file-level and function-level modes - GraphSON (TinkerPop v3) for Gremlin/JanusGraph compatibility - Neo4j CSV (bulk import) with separate nodes/relationships files Add interactive HTML viewer (`codegraph plot`) powered by vis-network: - Hierarchical, force, and radial layouts with physics toggle - Node coloring by kind or role, search/filter, legend panel - Configurable via .plotDotCfg JSON file Update CLI export command, MCP export_graph tool, and programmatic API to support all six formats. Impact: 12 functions changed, 6 affected * feat(plot): add drill-down, clustering, complexity overlays, and detail panel Evolve the plot command from a static viewer into an interactive exploration tool with rich data overlays and navigation. Data preparation: - Extract prepareGraphData() with complexity, fan-in/fan-out, Louvain community detection, directory derivation, and risk flag computation - Seed strategies: all (default), top-fanin, entry Interactive features: - Detail sidebar: metrics, callers/callees lists, risk badges - Drill-down: click-to-expand / double-click-to-collapse neighbors - Clustering: community and directory grouping via vis-network API - Color by: kind, role, community, complexity (MI-based borders) - Size by: uniform, fan-in, fan-out, complexity - Risk overlay: dead-code (dashed), high-blast-radius (shadow), low-MI CLI options: - --cluster, --overlay, --seed, --seed-count, --size-by, --color-by Tests expanded from 7 to 21 covering all new data enrichment, seed strategies, risk flags, UI elements, and config backward compatibility. Impact: 5 functions changed, 3 affected * fix(test): update MCP export_graph enum to include new formats The previous commit added graphml, graphson, and neo4j export formats to the MCP tool definition but did not update the test assertion. * style: format mcp test after enum update * fix(security): escape config values in HTML template to prevent XSS Use JSON.stringify() for cfg.layout.direction, effectiveColorBy, and cfg.clusterBy when interpolated into inline JavaScript. Replace shell exec() with execFile() for browser-open to avoid path injection. Impact: 1 functions changed, 1 affected * docs: add check-readme hook to recommended practices and guides Document the new check-readme.sh hook across all three doc locations: recommended-practices.md, ai-agent-guide.md, and the hooks example README. Adds settings.json examples, hook behavior descriptions, and customization entries.

* docs: add competitive deep-dive for Joern and reorganize competitive folder Move COMPETITIVE_ANALYSIS.md into generated/competitive/ and add a comprehensive feature-by-feature comparison against joernio/joern (our #1-ranked competitor). Covers parsing, graph model, query language, performance, installation, AI/MCP integration, security analysis, developer productivity, and ecosystem across 100+ individual features. Update FOUNDATION.md reference to the new path. * fix: update broken links to moved COMPETITIVE_ANALYSIS.md README.md and docs/roadmap/BACKLOG.md still referenced the old path at generated/COMPETITIVE_ANALYSIS.md after the file was moved to generated/competitive/COMPETITIVE_ANALYSIS.md in #260. * docs: add Joern-inspired feature candidates with BACKLOG-style grading Append a new "Joern-Inspired Feature Candidates" section to the Joern competitive deep-dive. Lists 11 actionable features extracted from Parsing & Language Support, Graph Model & Analysis Depth, and Query Language & Interface sections — assessed with the same tier/grading system used in BACKLOG.md (zero-dep, foundation-aligned, problem-fit, breaking). Tier 1 non-breaking: call-chain slicing, type-informed resolution, error-tolerant parsing, regex filtering, Kotlin, Swift, script execution. Tier 1 breaking: expanded node/edge types, intraprocedural CFG, stored AST. Not adopted: 9 features with FOUNDATION.md reasoning. Cross-references BACKLOG IDs 14 and 7. * docs: add competitive deep-dive for Narsil-MCP with feature candidates Comprehensive comparison across 10 dimensions: parsing (32 vs 11 languages), graph model (CFG/DFG/type inference vs complexity/roles/ communities), search (similarity/chunking vs RRF hybrid), security (147 rules vs none), queries (90 tools vs 21 + compound commands), performance (cold start vs incremental), install, MCP integration, developer productivity, and ecosystem. Feature candidates section covers all comparison sections: - Tier 1 non-breaking (10): MCP presets, AST chunking, code similarity, git blame/symbol history, remote repo indexing, config wizard, Kotlin, Swift, Bash, Scala language support - Tier 1 breaking (1): export map per module - Tier 2 (2): interactive HTML viz, multiple embedding backends - Tier 3 (2): OWASP patterns, SBOM generation - Not adopted (10): taint, type inference, SPARQL/RDF, CCG, in-memory arch, 90-tool surface, browser WASM, Forgemax, LSP, license scanning - Cross-references to BACKLOG IDs 7, 8, 10, 14 and Joern candidates J4, J5, J8, J9 * feat: add dedicated `exports <file>` command with per-symbol consumers Implements feature N11 from the Narsil competitive analysis. The new command provides a focused export map showing which symbols a file exports and who calls each one, filling the gap between `explain` (public/internal split without consumers) and `where --file` (just export names). Adds exportsData/fileExports to queries.js, CLI command, MCP tool, batch support, programmatic API, and integration tests. Impact: 7 functions changed, 15 affected * feat: add scoped rebuild for parallel agent rollback Extract purgeFilesFromGraph() from the inline deletion cascade in buildGraph() for reuse. Add opts.scope and opts.noReverseDeps to buildGraph() so agents can surgically rebuild only their changed files without nuking other agents' graph state. - `--scope <files...>` on `build` skips collectFiles/getChangedFiles - `--no-reverse-deps` skips reverse-dep cascade (safe when exports unchanged) - New `scoped_rebuild` MCP tool for multi-agent orchestration - purgeFilesFromGraph exported from programmatic API - Unit tests for purge function, integration tests for scoped rebuild - Documented agent-level rollback workflow in titan-paradigm.md Impact: 3 functions changed, 20 affected * fix: remove leaked scoped_rebuild changes from another session Reverts purgeFilesFromGraph export, --scope/--no-reverse-deps CLI options, scoped_rebuild MCP tool+handler, and test list entry that were accidentally included from a concurrent session's dirty worktree. Impact: 2 functions changed, 1 affected * fix: remove stale scoped-rebuild docs from titan-paradigm The scoped_rebuild feature (--scope, --no-reverse-deps CLI options and scoped_rebuild MCP tool) was removed in 651ddb2 but the documentation in titan-paradigm.md still referenced it. Addresses Greptile review feedback on PR #269.

Add opt-in CFG analysis that builds basic-block control flow graphs from tree-sitter AST for individual functions. Enables complexity-aware impact analysis and opens the path to dataflow (def-use chains). - DB migration v12: cfg_blocks + cfg_edges tables - New src/cfg.js module: CFG_RULES, buildFunctionCFG, buildCFGData, cfgData, cfgToDOT, cfgToMermaid, cfg CLI printer - Builder integration: --cfg flag triggers CFG after complexity pass - CLI: `cfg <name>` command with --format text/dot/mermaid, -j, --ndjson - MCP: cfg tool with name, format, file, kind, pagination props - Exports findFunctionNode from complexity.js for reuse - 24 unit tests + 11 integration tests (35 total) Phase 1: JS/TS/TSX only. Handles if/else, for/while/do-while, switch, try/catch/finally, break/continue (with labels), return/throw. Impact: 27 functions changed, 36 affected

greptile-apps · 2026-03-03T03:31:15Z

Greptile Summary

Added intraprocedural control flow graph (CFG) analysis as an opt-in feature for JavaScript, TypeScript, and TSX files. The implementation constructs basic-block CFGs from tree-sitter AST for individual functions, handling all major control structures including if/else, loops (for/while/do-while), switch statements, try/catch/finally, and break/continue with label support.

New modules: cfg.js (1035 lines) for CFG construction and querying, ast.js (392 lines) for AST node extraction
Database: Migration v12 adds cfg_blocks and cfg_edges tables with proper foreign keys and indexes
CLI: New cfg <name> command with text/dot/mermaid output formats and pagination
MCP: New cfg and ast_query tools for AI agent access
Extractors: Enhanced all language extractors to capture function parameters and class properties as children nodes
Test coverage: 35 new tests (24 unit + 11 integration) covering all CFG patterns

The implementation follows the existing codebase patterns, uses parameterized SQL queries (no injection risk), and includes comprehensive error handling. All tests pass (1218 total, 0 failures) and the feature was successfully dogfooded on the repository itself (447 functions analyzed).

Confidence Score: 5/5

This PR is safe to merge with no blocking issues found
Comprehensive implementation with excellent test coverage (35 new tests, 1218 total passing). Code follows existing patterns, uses safe SQL practices, includes proper error handling, and was successfully dogfooded. Phase 1 scope is well-defined and properly implemented.
No files require special attention

Important Files Changed

Filename	Overview
src/cfg.js	New CFG construction module with comprehensive control flow analysis for JS/TS/TSX functions
src/db.js	Added migration v12 for cfg_blocks and cfg_edges tables with proper indexes and foreign keys
src/builder.js	Integrated CFG analysis as opt-in feature with --cfg flag, added parent_id support for nodes hierarchy
src/cli.js	Added cfg command with text/dot/mermaid output formats and pagination support
src/mcp.js	Added cfg and ast_query MCP tools for AI agent access to CFG and AST data
src/ast.js	New module for extracting and querying stored AST nodes (calls, new, string, regex, throw, await)
src/extractors/javascript.js	Enhanced to extract function parameters, class properties, and enum members as children
tests/unit/cfg.test.js	Comprehensive unit tests covering all CFG patterns (24 tests for control structures)
tests/integration/cfg.test.js	Integration tests for CFG query, DOT/Mermaid export, and error handling
src/queries.js	Extended symbol kinds to include all language-specific types (struct, trait, module, enum)

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[User runs build --cfg] --> B[builder.js: buildGraph]
    B --> C[Parse files with tree-sitter]
    C --> D[Extract symbols via extractors]
    D --> E[Store nodes in DB]
    E --> F[ast.js: buildAstNodes]
    F --> G[cfg.js: buildCFGData]
    G --> H{For each function}
    H --> I[Get function AST node]
    I --> J[buildFunctionCFG]
    J --> K[Process statements recursively]
    K --> L[Create basic blocks]
    L --> M[Create edges: fallthrough, branch, loop, exception]
    M --> N[Store cfg_blocks and cfg_edges in DB]
    N --> H
    
    O[User runs cfg command] --> P[cfg.js: cfgData]
    P --> Q[Query cfg_blocks and cfg_edges]
    Q --> R{Output format?}
    R -->|text| S[Print blocks and edges]
    R -->|dot| T[cfgToDOT: Generate Graphviz]
    R -->|mermaid| U[cfgToMermaid: Generate diagram]
    R -->|json| V[Return JSON data]
    
    W[AI agent via MCP] --> X[mcp.js: cfg tool]
    X --> P

_{Last reviewed commit: d04a96a}

greptile-apps

_{31 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-03-03T03:31:18Z

src/cfg.js

+      // break outside loop (switch case) — just continue
+      return currentBlock;


Misleading comment - this code path handles break statements outside any loop/switch context (which would be invalid syntax), not breaks inside switch cases

Suggested change

// break outside loop (switch case) — just continue

return currentBlock;

// break with no enclosing loop/switch — treat as no-op

return currentBlock;

Fixed in d04a96a — updated the comment to // break with no enclosing loop/switch — treat as no-op as suggested. Good catch, the original wording was misleading.

…w, await) Persist selected AST nodes in a dedicated ast_nodes SQLite table during build, queryable via CLI (codegraph ast), MCP (ast_query), and programmatic API. - DB migration v13: ast_nodes table with indexes on kind, name, file, parent, and (kind,name) - New src/ast.js module: buildAstNodes (extraction), astQueryData/ astQuery (query), AST_NODE_KINDS constant - Builder integration: full-rebuild deletion, incremental cleanup, always-on post-parse extraction (before complexity to preserve _tree) - CLI: codegraph ast [pattern] with -k, -f, -T, -j, --ndjson, --limit, --offset options - MCP: ast_query tool with pattern, kind, file, no_tests, pagination - JS/TS/TSX Phase 1: full AST walk for new/throw/await/string/regex; all languages get call nodes from symbols.calls - Pattern matching uses SQL GLOB with auto-wrapping for substring search - Parent node resolution via narrowest enclosing definition Impact: 12 functions changed, 26 affected

The comment incorrectly suggested this code path handled break inside switch cases. It actually handles break with no enclosing loop/switch context (invalid syntax) as a no-op. Impact: 2 functions changed, 9 affected

carlos-alm · 2026-03-03T04:18:11Z

@greptileai

Update tool names and counts to match actual MCP server output: - query_function → query, fn_deps/symbol_path removed (merged into query) - list_entry_points removed (merged into execution_flow) - Add missing tools: ast_query, cfg, dataflow, symbol_children - Fix count: 31 tools (32 in multi-repo mode)

…eat/cfg-intraprocedural Impact: 2 functions changed, 16 affected

* feat: expand node types with parameter, property, constant kinds (Phase 1) Add sub-declaration node extraction to all 9 WASM language extractors, enabling structural queries like "which functions take a Request param?" or "which classes have a userId field?" without reading source code. Schema: migration v11 adds nullable parent_id column with indexes. Builder: insertNode links children to parent via parent_id FK. Extractors: JS/TS, Python, Go, Rust, Java, C#, Ruby, PHP, HCL now emit children arrays for parameters, properties, and constants. Queries: new childrenData() function, children in contextData output. CLI: new `children` command, EVERY_SYMBOL_KIND validation on --kind. MCP: new `symbol_children` tool, extended kind enum on all kind fields. Constants: CORE_SYMBOL_KINDS (10), EXTENDED_SYMBOL_KINDS (3), EVERY_SYMBOL_KIND (13). ALL_SYMBOL_KINDS preserved for backward compat. Native Rust engine: Definition struct gains children field but actual extraction is deferred to Phase 2 — WASM fallback handles new kinds. Impact: 63 functions changed, 62 affected * feat: add expanded edge types — contains, parameter_of, receiver (Phase 2) Build file→definition and parent→child contains edges, parameter_of inverse edges, and receiver edges for method-call dispatch. Add CORE_EDGE_KINDS, STRUCTURAL_EDGE_KINDS, EVERY_EDGE_KIND constants. Exclude structural edges from moduleMapData coupling counts. Scope directory contains-edge cleanup to preserve symbol-level edges. Impact: 3 functions changed, 22 affected * fix(native): add missing children field to all Rust extractors The Definition struct gained a children field but no extractor was updated to include it, causing 50 compilation errors. Add children: None to every Definition initializer across all 9 language extractors. Also fix unused variable warnings in parser_registry.rs and parallel.rs. Impact: 13 functions changed, 10 affected * ci: trigger workflow re-run

Impact: 37 functions changed, 50 affected

claude · 2026-03-03T04:29:52Z

Claude finished @carlos-alm's task —— View job

PR Review: Intraprocedural Control Flow Graph (CFG)

Todo List:

Phase 1: Root Cause Analysis - Understand problem and approach
Phase 2: Configuration Change Detection - Check for rule relaxations
Phase 3: Test Integrity - Verify test quality and coverage
Phase 4: Code Quality Review - Analyze implementation details
Final Assessment - Provide rating and recommendation

Phase 1: Root Cause Analysis ✅

Problem Being Solved: Adding intraprocedural control flow graph (CFG) analysis to codegraph for enhanced static analysis capabilities. This enables basic-block level analysis within individual functions, supporting advanced use cases like dataflow analysis, vulnerability detection, and more sophisticated code understanding.

Why This Approach: The approach is well-justified:

Opt-in design (--cfg flag) prevents performance impact for existing users
Phase 1 scope (JS/TS/TSX only) is reasonable for initial implementation
Separate tables (cfg_blocks, cfg_edges) keep CFG data isolated from interprocedural graph
tree-sitter integration leverages existing AST parsing infrastructure
Standard CFG concepts using basic blocks and standard edge types (fallthrough, branch, loop, exception)

Risk Assessment: LOW RISK - This is an additive feature with proper isolation and comprehensive testing.

Phase 2: Configuration Change Detection ✅

No configuration rule relaxations detected:

✅ biome.json - No changes to linting or formatting rules
✅ Package dependencies - Only minor reordering of devDependencies, no version changes
✅ No test timeouts increased or coverage thresholds lowered
✅ No TypeScript strict mode flags disabled
✅ All existing skipped tests are legitimate (missing optional dependencies, platform-specific features)

Phase 3: Test Integrity ✅

Excellent test coverage and quality:

✅ 35 new tests (24 unit + 11 integration) with comprehensive coverage
✅ Rigorous assertions - Tests verify specific CFG structures, not just "something exists"
✅ No weakened assertions - All test expectations are appropriately strict
✅ Proper test isolation - Integration tests use dedicated temp databases
✅ Edge case coverage - Tests include complex nested structures, error cases, unsupported languages
✅ Multiple output formats tested - DOT, Mermaid, JSON all verified

Phase 4: Code Quality Review ✅

Strong implementation with security-conscious design:

Security & SQL Safety

✅ Parameterized queries throughout (db.prepare() with ? placeholders)
✅ No string concatenation in SQL - all dynamic values safely bound
✅ Proper foreign keys and constraints in database schema
✅ Input validation with file extension checks and language ID filtering

Architecture & Design

✅ Proper separation - CFG logic isolated in dedicated module
✅ Database migration v12/13 - Clean schema evolution with indexes
✅ Graceful fallbacks - Missing tables return helpful warnings
✅ MCP integration - New tools properly expose CFG functionality to AI agents

Code Quality

✅ Follows codebase patterns - Consistent with existing query/export structure
✅ Error handling - Try/catch blocks with appropriate fallbacks
✅ Performance - Indexes on all query paths, prepared statements reused
✅ Documentation - Clear JSDoc comments and inline explanations

Minor Observations

One misleading comment was already fixed per review feedback (d04a96a)
Complex CFG algorithm is well-structured with clear block/edge creation logic
Pagination support properly implemented for large result sets

Root Cause Analysis

Problem Being Solved: Adding intraprocedural CFG analysis for enhanced static analysis capabilities within individual functions
Why This Approach: Opt-in design with proper isolation, leveraging existing tree-sitter infrastructure, following standard CFG concepts
Risk Assessment: LOW - Additive feature with comprehensive testing and security-conscious implementation

Critical Concerns

None identified. This is a well-implemented feature addition.

Final Recommendation

Rating: ⭐⭐⭐⭐⭐ (5/5)
Action: APPROVE
Reasoning: Exceptional implementation with comprehensive testing (35 new tests), security-conscious SQL practices, proper database migrations, excellent code quality following established patterns, and thorough documentation. This represents exactly how a major feature should be added - opt-in, well-tested, and properly integrated.

claude · 2026-03-03T04:46:28Z

Claude finished @carlos-alm's task —— View job

PR Review: Intraprocedural Control Flow Graph (CFG)

Todo List:

Phase 1: Root Cause Analysis - Understand problem and approach
Phase 2: Configuration Change Detection - Check for rule relaxations
Phase 3: Test Integrity - Verify test quality and coverage
Phase 4: Code Quality Review - Analyze implementation details
Final Assessment - Provide rating and recommendation

Phase 1: Root Cause Analysis ✅

Problem Being Solved: Adding intraprocedural control flow graph (CFG) analysis to codegraph for enhanced static analysis capabilities. This enables basic-block level analysis within individual functions, supporting advanced use cases like dataflow analysis, vulnerability detection, and more sophisticated code understanding.

Why This Approach: The approach is well-justified:

Opt-in design (--cfg flag) prevents performance impact for existing users
Phase 1 scope (JS/TS/TSX only) is reasonable for initial implementation
Separate tables (cfg_blocks, cfg_edges) keep CFG data isolated from interprocedural graph
tree-sitter integration leverages existing AST parsing infrastructure
Standard CFG concepts using basic blocks and standard edge types (fallthrough, branch, loop, exception)

Risk Assessment: LOW RISK - This is an additive feature with proper isolation and comprehensive testing.

Phase 2: Configuration Change Detection ✅

No configuration rule relaxations detected:

✅ biome.json - No changes to linting or formatting rules
✅ Package dependencies - Only minor reordering of devDependencies, no version changes
✅ No test timeouts increased or coverage thresholds lowered
✅ No TypeScript strict mode flags disabled
✅ All existing skipped tests are legitimate (missing optional dependencies, platform-specific features)

Phase 3: Test Integrity ✅

Excellent test coverage and quality:

✅ 35 new tests (24 unit + 11 integration) with comprehensive coverage
✅ Rigorous assertions - Tests verify specific CFG structures, not just "something exists"
✅ No weakened assertions - All test expectations are appropriately strict
✅ Proper test isolation - Integration tests use dedicated temp databases
✅ Edge case coverage - Tests include complex nested structures, error cases, unsupported languages
✅ Multiple output formats tested - DOT, Mermaid, JSON all verified

Phase 4: Code Quality Review ✅

Strong implementation with security-conscious design:

Security & SQL Safety

✅ Parameterized queries throughout (db.prepare() with ? placeholders)
✅ No string concatenation in SQL - all dynamic values safely bound
✅ Proper foreign keys and constraints in database schema
✅ Input validation with file extension checks and language ID filtering

Architecture & Design

✅ Proper separation - CFG logic isolated in dedicated module
✅ Database migration v12/13 - Clean schema evolution with indexes
✅ Graceful fallbacks - Missing tables return helpful warnings
✅ MCP integration - New tools properly expose CFG functionality to AI agents

Code Quality

✅ Follows codebase patterns - Consistent with existing query/export structure
✅ Error handling - Try/catch blocks with appropriate fallbacks
✅ Performance - Indexes on all query paths, prepared statements reused
✅ Documentation - Clear JSDoc comments and inline explanations

Minor Observations

One misleading comment was already fixed per review feedback (d04a96a)
Complex CFG algorithm is well-structured with clear block/edge creation logic
Pagination support properly implemented for large result sets

Root Cause Analysis

Problem Being Solved: Adding intraprocedural CFG analysis for enhanced static analysis capabilities within individual functions
Why This Approach: Opt-in design with proper isolation, leveraging existing tree-sitter infrastructure, following standard CFG concepts
Risk Assessment: LOW - Additive feature with comprehensive testing and security-conscious implementation

Critical Concerns

None identified. This is a well-implemented feature addition.

Final Recommendation

Rating: ⭐⭐⭐⭐⭐ (5/5)
Action: APPROVE
Reasoning: Exceptional implementation with comprehensive testing (35 new tests), security-conscious SQL practices, proper database migrations, excellent code quality following established patterns, and thorough documentation. This represents exactly how a major feature should be added - opt-in, well-tested, and properly integrated.

Keep normalizeSymbol spread usage from feature branch; main had inlined the same fields that normalizeSymbol produces. Impact: 43 functions changed, 42 affected

…se 1) Add sub-declaration node extraction to all 9 WASM language extractors, enabling structural queries like "which functions take a Request param?" or "which classes have a userId field?" without reading source code. Schema: migration v11 adds nullable parent_id column with indexes. Builder: insertNode links children to parent via parent_id FK. Extractors: JS/TS, Python, Go, Rust, Java, C#, Ruby, PHP, HCL now emit children arrays for parameters, properties, and constants. Queries: new childrenData() function, children in contextData output. CLI: new `children` command, EVERY_SYMBOL_KIND validation on --kind. MCP: new `symbol_children` tool, extended kind enum on all kind fields. Constants: CORE_SYMBOL_KINDS (10), EXTENDED_SYMBOL_KINDS (3), EVERY_SYMBOL_KIND (13). ALL_SYMBOL_KINDS preserved for backward compat. Native Rust engine: Definition struct gains children field but actual extraction is deferred to Phase 2 — WASM fallback handles new kinds. Impact: 63 functions changed, 62 affected

github-actions bot and others added 25 commits March 2, 2026 04:48

docs: update incremental benchmarks (2.6.0) (#251)

03b11a5

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

docs: update build performance benchmarks (2.6.0) (#249)

93de652

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

chore: release v2.6.0 (#245)

e5f6e62

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

docs: update query benchmarks (2.6.0) (#252)

1f78058

ci: allow benchmark/ branch name prefix (#258)

1c16a29

chore: resolve conflicts with main (narsil-mcp.md + queries.js)

dc46c1d

chore: add pre-commit diff-impact hook (#271)

f8f045c

Add show-diff-impact.sh that automatically runs `codegraph diff-impact --staged -T` before git commit commands. The hook injects blast radius info as additionalContext — informational only, never blocks commits.

greptile-apps bot reviewed Mar 3, 2026

View reviewed changes

carlos-alm added 2 commits March 2, 2026 21:16

carlos-alm added 4 commits March 2, 2026 21:24

Merge remote-tracking branch 'origin/feat/cfg-intraprocedural' into f…

fa35789

…eat/cfg-intraprocedural Impact: 2 functions changed, 16 affected

Merge remote-tracking branch 'origin/main' into feat/cfg-intraprocedural

b2eeceb

Impact: 37 functions changed, 50 affected

carlos-alm force-pushed the feat/cfg-intraprocedural branch from 2519bb7 to b2eeceb Compare March 3, 2026 04:41

Merge remote-tracking branch 'origin/main' into feat/cfg-intraprocedural

949e1ee

carlos-alm merged commit 1441019 into main Mar 3, 2026
16 checks passed

carlos-alm deleted the feat/cfg-intraprocedural branch March 3, 2026 05:04

github-actions bot locked and limited conversation to collaborators Mar 3, 2026

carlos-alm added 2 commits March 3, 2026 01:35

chore: resolve conflicts with main in src/queries.js

dad5b4f

Keep normalizeSymbol spread usage from feature branch; main had inlined the same fields that normalizeSymbol produces. Impact: 43 functions changed, 42 affected

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: intraprocedural control flow graph (CFG)#274

feat: intraprocedural control flow graph (CFG)#274
carlos-alm merged 34 commits intomainfrom
feat/cfg-intraprocedural

carlos-alm commented Mar 3, 2026

Uh oh!

greptile-apps bot commented Mar 3, 2026 •

edited

Loading

Uh oh!

greptile-apps bot left a comment •

edited

Loading

Uh oh!

greptile-apps bot Mar 3, 2026

Uh oh!

carlos-alm Mar 3, 2026

Uh oh!

carlos-alm commented Mar 3, 2026

Uh oh!

claude bot commented Mar 3, 2026 •

edited

Loading

Uh oh!

claude bot commented Mar 3, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		// break outside loop (switch case) — just continue
		return currentBlock;

Conversation

carlos-alm commented Mar 3, 2026

Summary

Test plan

Uh oh!

greptile-apps bot commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm commented Mar 3, 2026

Uh oh!

claude bot commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: Intraprocedural Control Flow Graph (CFG)

Phase 1: Root Cause Analysis ✅

Phase 2: Configuration Change Detection ✅

Phase 3: Test Integrity ✅

Phase 4: Code Quality Review ✅

Security & SQL Safety

Architecture & Design

Code Quality

Minor Observations

Root Cause Analysis

Critical Concerns

Final Recommendation

Uh oh!

claude bot commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: Intraprocedural Control Flow Graph (CFG)

Phase 1: Root Cause Analysis ✅

Phase 2: Configuration Change Detection ✅

Phase 3: Test Integrity ✅

Phase 4: Code Quality Review ✅

Security & SQL Safety

Architecture & Design

Code Quality

Minor Observations

Root Cause Analysis

Critical Concerns

Final Recommendation

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

greptile-apps bot commented Mar 3, 2026 •

edited

Loading

greptile-apps bot left a comment •

edited

Loading

claude bot commented Mar 3, 2026 •

edited

Loading

claude bot commented Mar 3, 2026 •

edited

Loading