Skip to content

feat: registry hardening, single-repo MCP, and strategic repositioning#20

Merged
carlos-alm merged 1 commit intomainfrom
feat/registry-hardening
Feb 22, 2026
Merged

feat: registry hardening, single-repo MCP, and strategic repositioning#20
carlos-alm merged 1 commit intomainfrom
feat/registry-hardening

Conversation

@carlos-alm
Copy link
Contributor

Summary

  • Registry hardening: input validation, path traversal prevention, and structural analysis (src/structure.js) with hotspot detection, coupling metrics, and module-level overview
  • Single-repo MCP default: MCP server now defaults to single-repo mode — tools have no repo property and list_repos is not exposed unless --multi-repo is explicitly enabled
  • Security fixes: eliminate SQL interpolation in hotspotsData, add NULLS LAST to hotspots ORDER BY, break parser.jsconstants.js circular dependency
  • Publish workflow hardening: pin checkout to main, sync native package versions, skip version push on release events, harden version resolution
  • Strategic repositioning: reframe project messaging from "zero cloud / fully local" to "always-fresh incremental graph at zero cost, optionally enhanced with your LLM provider" across README, COMPETITIVE_ANALYSIS.md, and FOUNDATION.md
  • New docs: competitive analysis (21 tools), foundation principles, dogfooding guide, v1.5.0 changelog

Test plan

  • npm test passes (new tests for structure, MCP single-repo, registry validation, CLI)
  • codegraph build . works in single-repo mode by default
  • codegraph mcp exposes no repo property without --multi-repo
  • codegraph map and structural analysis commands return valid output

Shift core messaging from "zero cloud / fully local" to "always-fresh
incremental graph at zero cost, optionally enhanced with your LLM."

- README: new hero tagline, add "Why most tools can't keep up" section,
  reframe feature comparison around rebuild speed and LLM-optional mode
- COMPETITIVE_ANALYSIS: lead with incremental builds and dual-mode as
  top differentiators, add LLM provider integration to Tier 2 roadmap
- FOUNDATION: principle 1 becomes "graph is always current", principle 4
  becomes "zero-cost core, LLM-enhanced when you choose", update
  competitive position around the three questions no competitor answers
@carlos-alm carlos-alm merged commit 9bd9ce1 into main Feb 22, 2026
12 checks passed
@carlos-alm carlos-alm deleted the feat/registry-hardening branch February 22, 2026 10:37
@greptile-apps
Copy link

greptile-apps bot commented Feb 22, 2026

Greptile Summary

Strategic repositioning of project messaging from "zero cloud / fully local" to "always-fresh incremental graph at zero cost, optionally enhanced with your LLM provider."

Key changes:

  • README.md: Added detailed comparison table explaining why competitors can't keep up with commits (full re-indexing, cloud API dependencies). Shifted tagline and feature list to emphasize incremental rebuilds and optional LLM enhancement
  • FOUNDATION.md: Reordered principles — "The graph is always current" is now Principle 1 (previously "Zero-infrastructure deployment"). Added Principle 4 on zero-cost core with optional LLM layer
  • COMPETITIVE_ANALYSIS.md: Updated project summary in ranking table and strengths section to highlight incremental rebuilds and dual-mode (local + optional cloud enhancement)

Gap identified:
The messaging describes LLM provider integration (OpenAI, Anthropic APIs) as a current feature, but the codebase only implements local embeddings via @huggingface/transformers. The llm.provider config infrastructure exists but isn't consumed. ROADMAP.md Phase 3 lists "Intelligent Embeddings" with LLM-generated descriptions as a planned feature.

Files not updated in this PR that contain conflicting messaging:

  • SECURITY.md: still says "zero network calls — no data ever leaves your machine"
  • ROADMAP.md: still says "zero cloud dependency by default"

Confidence Score: 4/5

  • Documentation-only PR with clear strategic intent, well-executed messaging shift, but creates expectation gap for unimplemented feature
  • This is a documentation-only PR with no code changes, so there's no risk of introducing bugs or breaking functionality. The writing is clear and the repositioning strategy is coherent. However, the messaging positions LLM provider integration as a current capability when it's actually planned (ROADMAP Phase 3). This creates a gap between marketing and reality that should be addressed before merge.
  • README.md needs the most attention due to the LLM enhancement claims that aren't yet implemented. Also consider updating SECURITY.md and ROADMAP.md to align with the new messaging.

Important Files Changed

Filename Overview
COMPETITIVE_ANALYSIS.md Repositioned project description from "zero-cloud" to "sub-second incremental rebuilds + zero-cost core + optional LLM enhancement"
FOUNDATION.md Updated Principle 1 from "Zero-infrastructure deployment" to "The graph is always current", reframed Principle 4 to focus on zero-cost core with optional LLM layer
README.md Major messaging shift: added incremental rebuild comparison table, updated tagline, feature list, and competitive positioning to emphasize always-fresh graph and optional LLM enhancement

Last reviewed commit: a403acc

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

> **The code graph that keeps up with your commits.**
>
> Codegraph uses [tree-sitter](https://tree-sitter.github.io/) (via WASM — no native compilation required) to parse your codebase into an AST, extracts functions, classes, imports, and call sites, resolves dependencies, and stores everything in a local SQLite database. Query it instantly from the command line.
> Codegraph parses your codebase with [tree-sitter](https://tree-sitter.github.io/) (native Rust or WASM), builds a function-level dependency graph in SQLite, and keeps it current with sub-second incremental rebuilds. Every query runs locally — no API keys, no Docker, no setup. When you want deeper intelligence, bring your own LLM provider and codegraph enhances search and analysis through the same API you already use. Your code only goes where you choose to send it.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claims "bring your own LLM provider and codegraph enhances search" but LLM provider integration (OpenAI/Anthropic APIs) isn't implemented — only local embeddings exist via @huggingface/transformers. This positioning describes a planned feature (listed in ROADMAP Phase 3) as if it's current functionality.


<p align="center">
<sub>Built with <a href="https://tree-sitter.github.io/">tree-sitter</a> and <a href="https://github.com/WiseLibs/better-sqlite3">better-sqlite3</a>. No data leaves your machine. Ever.</sub>
<sub>Built with <a href="https://tree-sitter.github.io/">tree-sitter</a> and <a href="https://github.com/WiseLibs/better-sqlite3">better-sqlite3</a>. Your code only goes where you choose to send it.</sub>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated from "No data leaves your machine. Ever." to "Your code only goes where you choose to send it." — but SECURITY.md still says "zero network calls — no data ever leaves your machine" and ROADMAP.md says "zero cloud dependency by default". These files should be updated to align with the new messaging that acknowledges optional cloud LLM features.

carlos-alm added a commit that referenced this pull request Mar 3, 2026
Re-evaluate all architectural recommendations against the actual
codebase as it grew from v1.4.0 (5K lines, 12 modules) to v2.6.0
(17,830 lines, 35 modules).

Architecture audit:
- Reprioritize: dual-function anti-pattern across 15 modules is now #1
  (was analysis/formatting split at #3)
- Downgrade parser plugin system from #1 to #20 (parser.js shrank to
  404 lines after native engine took over)
- Add 3 new recommendations: decompose complexity.js (2,163 lines),
  unified graph model for structure/cochange/communities, pagination
  standardization
- Update all metrics and line counts to current state

Roadmap:
- Add Phase 2.5 (Analysis Expansion) documenting 18 modules shipped
  across v2.0.0-v2.6.0 (complexity, communities, structure, flow,
  cochange, manifesto, boundaries, check, audit, batch, triage,
  hybrid search, owners, snapshot, etc.)
- Mark Phase 5.3 (Hybrid Search) as completed early in Phase 2.5
- Update Phase 3 priorities based on revised architecture analysis
- Update version to 2.6.0, language count to 11, phase count to 10
- Add Phase 8 note referencing check command foundation from 2.5
carlos-alm added a commit that referenced this pull request Mar 3, 2026
Re-evaluate all architectural recommendations against the actual
codebase as it grew from v1.4.0 (5K lines, 12 modules) to v2.6.0
(17,830 lines, 35 modules).

Architecture audit:
- Reprioritize: dual-function anti-pattern across 15 modules is now #1
  (was analysis/formatting split at #3)
- Downgrade parser plugin system from #1 to #20 (parser.js shrank to
  404 lines after native engine took over)
- Add 3 new recommendations: decompose complexity.js (2,163 lines),
  unified graph model for structure/cochange/communities, pagination
  standardization
- Update all metrics and line counts to current state

Roadmap:
- Add Phase 2.5 (Analysis Expansion) documenting 18 modules shipped
  across v2.0.0-v2.6.0 (complexity, communities, structure, flow,
  cochange, manifesto, boundaries, check, audit, batch, triage,
  hybrid search, owners, snapshot, etc.)
- Mark Phase 5.3 (Hybrid Search) as completed early in Phase 2.5
- Update Phase 3 priorities based on revised architecture analysis
- Update version to 2.6.0, language count to 11, phase count to 10
- Add Phase 8 note referencing check command foundation from 2.5
carlos-alm added a commit that referenced this pull request Mar 3, 2026
Re-evaluate all architectural recommendations against the actual
codebase as it grew from v1.4.0 (5K lines, 12 modules) to v2.6.0
(17,830 lines, 35 modules).

Architecture audit:
- Reprioritize: dual-function anti-pattern across 15 modules is now #1
  (was analysis/formatting split at #3)
- Downgrade parser plugin system from #1 to #20 (parser.js shrank to
  404 lines after native engine took over)
- Add 3 new recommendations: decompose complexity.js (2,163 lines),
  unified graph model for structure/cochange/communities, pagination
  standardization
- Update all metrics and line counts to current state

Roadmap:
- Add Phase 2.5 (Analysis Expansion) documenting 18 modules shipped
  across v2.0.0-v2.6.0 (complexity, communities, structure, flow,
  cochange, manifesto, boundaries, check, audit, batch, triage,
  hybrid search, owners, snapshot, etc.)
- Mark Phase 5.3 (Hybrid Search) as completed early in Phase 2.5
- Update Phase 3 priorities based on revised architecture analysis
- Update version to 2.6.0, language count to 11, phase count to 10
- Add Phase 8 note referencing check command foundation from 2.5
carlos-alm added a commit that referenced this pull request Mar 3, 2026
Re-evaluate all architectural recommendations against the actual
codebase as it grew from v1.4.0 (5K lines, 12 modules) to v2.6.0
(17,830 lines, 35 modules).

Architecture audit:
- Reprioritize: dual-function anti-pattern across 15 modules is now #1
  (was analysis/formatting split at #3)
- Downgrade parser plugin system from #1 to #20 (parser.js shrank to
  404 lines after native engine took over)
- Add 3 new recommendations: decompose complexity.js (2,163 lines),
  unified graph model for structure/cochange/communities, pagination
  standardization
- Update all metrics and line counts to current state

Roadmap:
- Add Phase 2.5 (Analysis Expansion) documenting 18 modules shipped
  across v2.0.0-v2.6.0 (complexity, communities, structure, flow,
  cochange, manifesto, boundaries, check, audit, batch, triage,
  hybrid search, owners, snapshot, etc.)
- Mark Phase 5.3 (Hybrid Search) as completed early in Phase 2.5
- Update Phase 3 priorities based on revised architecture analysis
- Update version to 2.6.0, language count to 11, phase count to 10
- Add Phase 8 note referencing check command foundation from 2.5
carlos-alm added a commit that referenced this pull request Mar 3, 2026
Re-evaluate all architectural recommendations against the actual
codebase as it grew from v1.4.0 (5K lines, 12 modules) to v2.6.0
(17,830 lines, 35 modules).

Architecture audit:
- Reprioritize: dual-function anti-pattern across 15 modules is now #1
  (was analysis/formatting split at #3)
- Downgrade parser plugin system from #1 to #20 (parser.js shrank to
  404 lines after native engine took over)
- Add 3 new recommendations: decompose complexity.js (2,163 lines),
  unified graph model for structure/cochange/communities, pagination
  standardization
- Update all metrics and line counts to current state

Roadmap:
- Add Phase 2.5 (Analysis Expansion) documenting 18 modules shipped
  across v2.0.0-v2.6.0 (complexity, communities, structure, flow,
  cochange, manifesto, boundaries, check, audit, batch, triage,
  hybrid search, owners, snapshot, etc.)
- Mark Phase 5.3 (Hybrid Search) as completed early in Phase 2.5
- Update Phase 3 priorities based on revised architecture analysis
- Update version to 2.6.0, language count to 11, phase count to 10
- Add Phase 8 note referencing check command foundation from 2.5
carlos-alm added a commit that referenced this pull request Mar 3, 2026
Re-evaluate all architectural recommendations against the actual
codebase as it grew from v1.4.0 (5K lines, 12 modules) to v2.6.0
(17,830 lines, 35 modules).

Architecture audit:
- Reprioritize: dual-function anti-pattern across 15 modules is now #1
  (was analysis/formatting split at #3)
- Downgrade parser plugin system from #1 to #20 (parser.js shrank to
  404 lines after native engine took over)
- Add 3 new recommendations: decompose complexity.js (2,163 lines),
  unified graph model for structure/cochange/communities, pagination
  standardization
- Update all metrics and line counts to current state

Roadmap:
- Add Phase 2.5 (Analysis Expansion) documenting 18 modules shipped
  across v2.0.0-v2.6.0 (complexity, communities, structure, flow,
  cochange, manifesto, boundaries, check, audit, batch, triage,
  hybrid search, owners, snapshot, etc.)
- Mark Phase 5.3 (Hybrid Search) as completed early in Phase 2.5
- Update Phase 3 priorities based on revised architecture analysis
- Update version to 2.6.0, language count to 11, phase count to 10
- Add Phase 8 note referencing check command foundation from 2.5
carlos-alm added a commit that referenced this pull request Mar 3, 2026
* docs: update incremental benchmarks (2.6.0) (#251)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* docs: update build performance benchmarks (2.6.0) (#249)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* chore: release v2.6.0 (#245)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* docs: update query benchmarks (2.6.0) (#252)

* ci: add CLA Assistant workflow and fix CLA.md (#244)

* docs: add Contributor License Agreement (CLA)

* ci: add CLA Assistant workflow and fix CLA.md issues

- Add .github/workflows/cla.yml using contributor-assistant/github-action@v2.6.1
  with dedicated cla-signatures branch to avoid polluting main
- Fix CLA.md: section 7→6 reference, capitalization consistency,
  control definition formatting (Roman numerals → lettered list)
- Add Acceptance section documenting the CLA bot signing process
- Add Governing Law clause (Province of Alberta, Canada)
- Update CONTRIBUTING.md with CLA signing instructions

* docs: document CLA recheck command in CONTRIBUTING.md

Address Greptile review feedback on #244 — add note that contributors
can comment `recheck` on a PR to re-trigger the CLA signature check.

* feat: add dataflow analysis (flows_to, returns, mutates) (#254)

* feat: add dataflow analysis (flows_to, returns, mutates edges)

Track how data moves through functions with three new edge types:
- flows_to: parameter/variable flows into another function as argument
- returns: call return value is captured by the caller
- mutates: parameter-derived value is mutated in-place

Opt-in via `build --dataflow` (JS/TS only for MVP). Adds schema
migration v10 (dataflow table), extractDataflow() AST walker with
scope tracking and confidence scoring, query functions (dataflowData,
dataflowPathData, dataflowImpactData), CLI command with --path and
--impact modes, MCP tool, batch support, and programmatic API exports.

Impact: 29 functions changed, 33 affected

* fix: handle spread args, optional chaining, and reassignment in dataflow

Address review feedback from Greptile:
- Track spread arguments (foo(...args)) by unwrapping spread_element
- Handle optional chaining (foo?.bar()) in callee name resolution
- Track non-declaration assignments (x = foo() without const/let/var)
  as returns edges
- Add 3 tests covering these cases

Impact: 3 functions changed, 3 affected

* docs: add TypeScript migration as Phase 4 in roadmap (#255)

Insert Phase 4 (TypeScript Migration) between the architectural
refactoring phase and the intelligent embeddings phase. Renumber
all subsequent phases (old 4-8 → new 5-9) including sub-section
headings, cross-references, dependency graph, and verification table.

The migration is planned after Phase 3 because the architectural
refactoring establishes clean module boundaries that serve as
natural type boundaries for incremental TS adoption.

* feat(queries): expose fileHash in where and query JSON output (#257)

* feat(watcher): add structured NDJSON change journal for watch mode

Write symbol-level change events to .codegraph/change-events.ndjson
during watch mode. Each line records added/removed/modified symbols
with node counts and edge data, enabling external tools to detect
rule staleness without polling. File is size-capped at 1 MB with
keep-last-half rotation.

Impact: 8 functions changed, 4 affected

* style: use template literals per biome lint

Impact: 1 functions changed, 1 affected

* feat(queries): expose file content hash in where and query JSON output

Add fileHash field to queryNameData, whereSymbolImpl, and whereFileImpl
return objects by looking up the file_hashes table. This lets consumers
(e.g. code-praxis) detect when a rule's target file has changed since
the rule was created, enabling staleness detection.

Impact: 4 functions changed, 16 affected

* style: use template literal in test fixture

* fix(change-journal): add debug/warn logging for observability

Address review feedback: debug log on successful append and rotation,
warn when a single oversized line prevents rotation.

Impact: 2 functions changed, 2 affected

* style: fix biome formatting in change-journal warn call

Impact: 1 functions changed, 1 affected

* fix(change-journal): use Buffer for byte-accurate rotation midpoint

stat.size returns bytes but String.length counts UTF-16 characters.
Read as Buffer and use buf.indexOf(0x0a) to find the newline at the
byte-level midpoint, ensuring consistent behavior with multi-byte UTF-8.

Impact: 1 functions changed, 1 affected

* feat: add batch-query command and multi-command batch mode (#256)

* feat: add batch-query command and multi-command batch mode

Add splitTargets() for comma-separated target expansion, multiBatchData()
for mixed-command orchestration, and a new batch-query CLI command that
defaults to 'where'. The existing batch command also gains comma splitting
and multi-command detection via --from-file/--stdin.

Impact: 5 functions changed, 3 affected

* fix: add try/catch around JSON.parse in batch and batch-query actions

Wrap --from-file and --stdin JSON parsing with error handling so malformed
input produces a clear error message instead of an unhandled exception.

* ci: allow benchmark/ branch name prefix (#258)

* docs: competitive deep-dive vs Joern (#260)

* docs: add competitive deep-dive for Joern and reorganize competitive folder

Move COMPETITIVE_ANALYSIS.md into generated/competitive/ and add a
comprehensive feature-by-feature comparison against joernio/joern
(our #1-ranked competitor). Covers parsing, graph model, query language,
performance, installation, AI/MCP integration, security analysis,
developer productivity, and ecosystem across 100+ individual features.
Update FOUNDATION.md reference to the new path.

* fix: update broken links to moved COMPETITIVE_ANALYSIS.md

README.md and docs/roadmap/BACKLOG.md still referenced the old path
at generated/COMPETITIVE_ANALYSIS.md after the file was moved to
generated/competitive/COMPETITIVE_ANALYSIS.md in #260.

* refactor!: consolidate MCP tools and CLI commands (#263)

Reduce MCP tool surface from 32 to 29 by merging overlapping tools:

- Rename query_function → query with deps/path modes (absorbs fn_deps + symbol_path)
- Add list mode to execution_flow (absorbs list_entry_points)
- Remove path mode from dataflow tool (now edges + impact only)
- Merge fn and path CLI commands into query (--path flag for path mode)
- Remove --path option from dataflow CLI command
- Update batch commands: remove fn, add dataflow, query uses fnDepsData
- Update MCP_DEFAULTS pagination keys

BREAKING CHANGE: MCP tools fn_deps, symbol_path, list_entry_points removed.
CLI commands fn and path removed. Use query instead.

Impact: 1 functions changed, 1 affected

* docs: competitive deep-dive — codegraph vs narsil-mcp (#262)

* docs: add competitive deep-dive for narsil-mcp

Comprehensive feature-by-feature analysis of narsil-mcp (postrv/narsil-mcp),
the closest head-to-head competitor to codegraph. Covers all 8 FOUNDATION.md
principles, 9 feature comparison sections with 130+ features, gap analysis,
and competitive positioning.

* fix: address Greptile review — scoring math and relative path

- Fix principle scoring from 6-0-2 to 7-0-1 (correct count from table)
- Fix relative link to COMPETITIVE_ANALYSIS.md (../ not ./)

* docs: add Joern competitive deep-dive with feature candidates (#264)

* docs: add competitive deep-dive for Joern and reorganize competitive folder

Move COMPETITIVE_ANALYSIS.md into generated/competitive/ and add a
comprehensive feature-by-feature comparison against joernio/joern
(our #1-ranked competitor). Covers parsing, graph model, query language,
performance, installation, AI/MCP integration, security analysis,
developer productivity, and ecosystem across 100+ individual features.
Update FOUNDATION.md reference to the new path.

* fix: update broken links to moved COMPETITIVE_ANALYSIS.md

README.md and docs/roadmap/BACKLOG.md still referenced the old path
at generated/COMPETITIVE_ANALYSIS.md after the file was moved to
generated/competitive/COMPETITIVE_ANALYSIS.md in #260.

* docs: add Joern-inspired feature candidates with BACKLOG-style grading

Append a new "Joern-Inspired Feature Candidates" section to the Joern
competitive deep-dive. Lists 11 actionable features extracted from
Parsing & Language Support, Graph Model & Analysis Depth, and Query
Language & Interface sections — assessed with the same tier/grading
system used in BACKLOG.md (zero-dep, foundation-aligned, problem-fit,
breaking).

Tier 1 non-breaking: call-chain slicing, type-informed resolution,
error-tolerant parsing, regex filtering, Kotlin, Swift, script execution.
Tier 1 breaking: expanded node/edge types, intraprocedural CFG, stored AST.
Not adopted: 9 features with FOUNDATION.md reasoning.
Cross-references BACKLOG IDs 14 and 7.

* feat: add normalizeSymbol utility for stable JSON schema

Add normalizeSymbol(row, db, hashCache) that returns a consistent
7-field symbol shape (name, kind, file, line, endLine, role, fileHash)
across all query and search commands.

Update queryNameData, fnDepsData, fnImpactData, explainFunctionImpl,
listFunctionsData, rolesData, whereSymbolImpl in queries.js and
searchData, multiSearchData, ftsSearchData, hybridSearchData in
embedder.js to use normalizeSymbol. Update SQL in listFunctionsData,
rolesData, iterListFunctions, iterRoles, _prepareSearch, and
ftsSearchData to include end_line and role columns.

Export normalizeSymbol from index.js. Add docs/json-schema.md
documenting the stable schema. Add 8 unit tests and 7 integration
schema conformance tests.

Impact: 13 functions changed, 33 affected

Impact: 14 functions changed, 42 affected

* feat: add normalizeSymbol utility for stable JSON schema (#267)

Add normalizeSymbol(row, db, hashCache) that returns a consistent
7-field symbol shape (name, kind, file, line, endLine, role, fileHash)
across all query and search commands.

Update queryNameData, fnDepsData, fnImpactData, explainFunctionImpl,
listFunctionsData, rolesData, whereSymbolImpl in queries.js and
searchData, multiSearchData, ftsSearchData, hybridSearchData in
embedder.js to use normalizeSymbol. Update SQL in listFunctionsData,
rolesData, iterListFunctions, iterRoles, _prepareSearch, and
ftsSearchData to include end_line and role columns.

Export normalizeSymbol from index.js. Add docs/json-schema.md
documenting the stable schema. Add 8 unit tests and 7 integration
schema conformance tests.

Impact: 13 functions changed, 33 affected

Impact: 14 functions changed, 42 affected

Impact: 13 functions changed, 21 affected

* docs: revise architecture audit and roadmap for v2.6.0 (#266)

Re-evaluate all architectural recommendations against the actual
codebase as it grew from v1.4.0 (5K lines, 12 modules) to v2.6.0
(17,830 lines, 35 modules).

Architecture audit:
- Reprioritize: dual-function anti-pattern across 15 modules is now #1
  (was analysis/formatting split at #3)
- Downgrade parser plugin system from #1 to #20 (parser.js shrank to
  404 lines after native engine took over)
- Add 3 new recommendations: decompose complexity.js (2,163 lines),
  unified graph model for structure/cochange/communities, pagination
  standardization
- Update all metrics and line counts to current state

Roadmap:
- Add Phase 2.5 (Analysis Expansion) documenting 18 modules shipped
  across v2.0.0-v2.6.0 (complexity, communities, structure, flow,
  cochange, manifesto, boundaries, check, audit, batch, triage,
  hybrid search, owners, snapshot, etc.)
- Mark Phase 5.3 (Hybrid Search) as completed early in Phase 2.5
- Update Phase 3 priorities based on revised architecture analysis
- Update version to 2.6.0, language count to 11, phase count to 10
- Add Phase 8 note referencing check command foundation from 2.5

* docs: add Narsil-MCP competitive deep-dive with feature candidates (#265)

* docs: add competitive deep-dive for Joern and reorganize competitive folder

Move COMPETITIVE_ANALYSIS.md into generated/competitive/ and add a
comprehensive feature-by-feature comparison against joernio/joern
(our #1-ranked competitor). Covers parsing, graph model, query language,
performance, installation, AI/MCP integration, security analysis,
developer productivity, and ecosystem across 100+ individual features.
Update FOUNDATION.md reference to the new path.

* fix: update broken links to moved COMPETITIVE_ANALYSIS.md

README.md and docs/roadmap/BACKLOG.md still referenced the old path
at generated/COMPETITIVE_ANALYSIS.md after the file was moved to
generated/competitive/COMPETITIVE_ANALYSIS.md in #260.

* docs: add Joern-inspired feature candidates with BACKLOG-style grading

Append a new "Joern-Inspired Feature Candidates" section to the Joern
competitive deep-dive. Lists 11 actionable features extracted from
Parsing & Language Support, Graph Model & Analysis Depth, and Query
Language & Interface sections — assessed with the same tier/grading
system used in BACKLOG.md (zero-dep, foundation-aligned, problem-fit,
breaking).

Tier 1 non-breaking: call-chain slicing, type-informed resolution,
error-tolerant parsing, regex filtering, Kotlin, Swift, script execution.
Tier 1 breaking: expanded node/edge types, intraprocedural CFG, stored AST.
Not adopted: 9 features with FOUNDATION.md reasoning.
Cross-references BACKLOG IDs 14 and 7.

* docs: add competitive deep-dive for Narsil-MCP with feature candidates

Comprehensive comparison across 10 dimensions: parsing (32 vs 11
languages), graph model (CFG/DFG/type inference vs complexity/roles/
communities), search (similarity/chunking vs RRF hybrid), security
(147 rules vs none), queries (90 tools vs 21 + compound commands),
performance (cold start vs incremental), install, MCP integration,
developer productivity, and ecosystem.

Feature candidates section covers all comparison sections:
- Tier 1 non-breaking (10): MCP presets, AST chunking, code similarity,
  git blame/symbol history, remote repo indexing, config wizard, Kotlin,
  Swift, Bash, Scala language support
- Tier 1 breaking (1): export map per module
- Tier 2 (2): interactive HTML viz, multiple embedding backends
- Tier 3 (2): OWASP patterns, SBOM generation
- Not adopted (10): taint, type inference, SPARQL/RDF, CCG, in-memory
  arch, 90-tool surface, browser WASM, Forgemax, LSP, license scanning
- Cross-references to BACKLOG IDs 7, 8, 10, 14 and Joern candidates
  J4, J5, J8, J9

* feat: add expanded edge types — contains, parameter_of, receiver (Phase 2)

Build file→definition and parent→child contains edges, parameter_of
inverse edges, and receiver edges for method-call dispatch. Add
CORE_EDGE_KINDS, STRUCTURAL_EDGE_KINDS, EVERY_EDGE_KIND constants.
Exclude structural edges from moduleMapData coupling counts. Scope
directory contains-edge cleanup to preserve symbol-level edges.

Impact: 3 functions changed, 22 affected

* chore: add pre-commit diff-impact hook (#271)

Add show-diff-impact.sh that automatically runs
`codegraph diff-impact --staged -T` before git commit commands.
The hook injects blast radius info as additionalContext —
informational only, never blocks commits.

* feat(export): add GraphML, GraphSON, Neo4j CSV and interactive viewer (#268)

* feat(export): add GraphML, GraphSON, Neo4j CSV formats and interactive HTML viewer

Add three new export formats for graph database interoperability:
- GraphML (XML standard) with file-level and function-level modes
- GraphSON (TinkerPop v3) for Gremlin/JanusGraph compatibility
- Neo4j CSV (bulk import) with separate nodes/relationships files

Add interactive HTML viewer (`codegraph plot`) powered by vis-network:
- Hierarchical, force, and radial layouts with physics toggle
- Node coloring by kind or role, search/filter, legend panel
- Configurable via .plotDotCfg JSON file

Update CLI export command, MCP export_graph tool, and programmatic API
to support all six formats.

Impact: 12 functions changed, 6 affected

* feat(plot): add drill-down, clustering, complexity overlays, and detail panel

Evolve the plot command from a static viewer into an interactive
exploration tool with rich data overlays and navigation.

Data preparation:
- Extract prepareGraphData() with complexity, fan-in/fan-out, Louvain
  community detection, directory derivation, and risk flag computation
- Seed strategies: all (default), top-fanin, entry

Interactive features:
- Detail sidebar: metrics, callers/callees lists, risk badges
- Drill-down: click-to-expand / double-click-to-collapse neighbors
- Clustering: community and directory grouping via vis-network API
- Color by: kind, role, community, complexity (MI-based borders)
- Size by: uniform, fan-in, fan-out, complexity
- Risk overlay: dead-code (dashed), high-blast-radius (shadow), low-MI

CLI options:
- --cluster, --overlay, --seed, --seed-count, --size-by, --color-by

Tests expanded from 7 to 21 covering all new data enrichment, seed
strategies, risk flags, UI elements, and config backward compatibility.

Impact: 5 functions changed, 3 affected

* fix(test): update MCP export_graph enum to include new formats

The previous commit added graphml, graphson, and neo4j export formats
to the MCP tool definition but did not update the test assertion.

* style: format mcp test after enum update

* fix(security): escape config values in HTML template to prevent XSS

Use JSON.stringify() for cfg.layout.direction, effectiveColorBy, and
cfg.clusterBy when interpolated into inline JavaScript. Replace shell
exec() with execFile() for browser-open to avoid path injection.

Impact: 1 functions changed, 1 affected

* docs: add check-readme hook to guides (#272)

* feat(export): add GraphML, GraphSON, Neo4j CSV formats and interactive HTML viewer

Add three new export formats for graph database interoperability:
- GraphML (XML standard) with file-level and function-level modes
- GraphSON (TinkerPop v3) for Gremlin/JanusGraph compatibility
- Neo4j CSV (bulk import) with separate nodes/relationships files

Add interactive HTML viewer (`codegraph plot`) powered by vis-network:
- Hierarchical, force, and radial layouts with physics toggle
- Node coloring by kind or role, search/filter, legend panel
- Configurable via .plotDotCfg JSON file

Update CLI export command, MCP export_graph tool, and programmatic API
to support all six formats.

Impact: 12 functions changed, 6 affected

* feat(plot): add drill-down, clustering, complexity overlays, and detail panel

Evolve the plot command from a static viewer into an interactive
exploration tool with rich data overlays and navigation.

Data preparation:
- Extract prepareGraphData() with complexity, fan-in/fan-out, Louvain
  community detection, directory derivation, and risk flag computation
- Seed strategies: all (default), top-fanin, entry

Interactive features:
- Detail sidebar: metrics, callers/callees lists, risk badges
- Drill-down: click-to-expand / double-click-to-collapse neighbors
- Clustering: community and directory grouping via vis-network API
- Color by: kind, role, community, complexity (MI-based borders)
- Size by: uniform, fan-in, fan-out, complexity
- Risk overlay: dead-code (dashed), high-blast-radius (shadow), low-MI

CLI options:
- --cluster, --overlay, --seed, --seed-count, --size-by, --color-by

Tests expanded from 7 to 21 covering all new data enrichment, seed
strategies, risk flags, UI elements, and config backward compatibility.

Impact: 5 functions changed, 3 affected

* fix(test): update MCP export_graph enum to include new formats

The previous commit added graphml, graphson, and neo4j export formats
to the MCP tool definition but did not update the test assertion.

* style: format mcp test after enum update

* fix(security): escape config values in HTML template to prevent XSS

Use JSON.stringify() for cfg.layout.direction, effectiveColorBy, and
cfg.clusterBy when interpolated into inline JavaScript. Replace shell
exec() with execFile() for browser-open to avoid path injection.

Impact: 1 functions changed, 1 affected

* docs: add check-readme hook to recommended practices and guides

Document the new check-readme.sh hook across all three doc locations:
recommended-practices.md, ai-agent-guide.md, and the hooks example
README. Adds settings.json examples, hook behavior descriptions, and
customization entries.

* feat: exports command + scoped rebuild for parallel agents (#269)

* docs: add competitive deep-dive for Joern and reorganize competitive folder

Move COMPETITIVE_ANALYSIS.md into generated/competitive/ and add a
comprehensive feature-by-feature comparison against joernio/joern
(our #1-ranked competitor). Covers parsing, graph model, query language,
performance, installation, AI/MCP integration, security analysis,
developer productivity, and ecosystem across 100+ individual features.
Update FOUNDATION.md reference to the new path.

* fix: update broken links to moved COMPETITIVE_ANALYSIS.md

README.md and docs/roadmap/BACKLOG.md still referenced the old path
at generated/COMPETITIVE_ANALYSIS.md after the file was moved to
generated/competitive/COMPETITIVE_ANALYSIS.md in #260.

* docs: add Joern-inspired feature candidates with BACKLOG-style grading

Append a new "Joern-Inspired Feature Candidates" section to the Joern
competitive deep-dive. Lists 11 actionable features extracted from
Parsing & Language Support, Graph Model & Analysis Depth, and Query
Language & Interface sections — assessed with the same tier/grading
system used in BACKLOG.md (zero-dep, foundation-aligned, problem-fit,
breaking).

Tier 1 non-breaking: call-chain slicing, type-informed resolution,
error-tolerant parsing, regex filtering, Kotlin, Swift, script execution.
Tier 1 breaking: expanded node/edge types, intraprocedural CFG, stored AST.
Not adopted: 9 features with FOUNDATION.md reasoning.
Cross-references BACKLOG IDs 14 and 7.

* docs: add competitive deep-dive for Narsil-MCP with feature candidates

Comprehensive comparison across 10 dimensions: parsing (32 vs 11
languages), graph model (CFG/DFG/type inference vs complexity/roles/
communities), search (similarity/chunking vs RRF hybrid), security
(147 rules vs none), queries (90 tools vs 21 + compound commands),
performance (cold start vs incremental), install, MCP integration,
developer productivity, and ecosystem.

Feature candidates section covers all comparison sections:
- Tier 1 non-breaking (10): MCP presets, AST chunking, code similarity,
  git blame/symbol history, remote repo indexing, config wizard, Kotlin,
  Swift, Bash, Scala language support
- Tier 1 breaking (1): export map per module
- Tier 2 (2): interactive HTML viz, multiple embedding backends
- Tier 3 (2): OWASP patterns, SBOM generation
- Not adopted (10): taint, type inference, SPARQL/RDF, CCG, in-memory
  arch, 90-tool surface, browser WASM, Forgemax, LSP, license scanning
- Cross-references to BACKLOG IDs 7, 8, 10, 14 and Joern candidates
  J4, J5, J8, J9

* feat: add dedicated `exports <file>` command with per-symbol consumers

Implements feature N11 from the Narsil competitive analysis. The new
command provides a focused export map showing which symbols a file
exports and who calls each one, filling the gap between `explain`
(public/internal split without consumers) and `where --file` (just
export names).

Adds exportsData/fileExports to queries.js, CLI command, MCP tool,
batch support, programmatic API, and integration tests.

Impact: 7 functions changed, 15 affected

* feat: add scoped rebuild for parallel agent rollback

Extract purgeFilesFromGraph() from the inline deletion cascade in
buildGraph() for reuse. Add opts.scope and opts.noReverseDeps to
buildGraph() so agents can surgically rebuild only their changed files
without nuking other agents' graph state.

- `--scope <files...>` on `build` skips collectFiles/getChangedFiles
- `--no-reverse-deps` skips reverse-dep cascade (safe when exports unchanged)
- New `scoped_rebuild` MCP tool for multi-agent orchestration
- purgeFilesFromGraph exported from programmatic API
- Unit tests for purge function, integration tests for scoped rebuild
- Documented agent-level rollback workflow in titan-paradigm.md

Impact: 3 functions changed, 20 affected

* fix: remove leaked scoped_rebuild changes from another session

Reverts purgeFilesFromGraph export, --scope/--no-reverse-deps CLI
options, scoped_rebuild MCP tool+handler, and test list entry that
were accidentally included from a concurrent session's dirty worktree.

Impact: 2 functions changed, 1 affected

* fix: remove stale scoped-rebuild docs from titan-paradigm

The scoped_rebuild feature (--scope, --no-reverse-deps CLI options and
scoped_rebuild MCP tool) was removed in 651ddb2 but the documentation
in titan-paradigm.md still referenced it. Addresses Greptile review
feedback on PR #269.

* feat: add intraprocedural control flow graph (CFG) construction

Add opt-in CFG analysis that builds basic-block control flow graphs
from tree-sitter AST for individual functions. Enables complexity-aware
impact analysis and opens the path to dataflow (def-use chains).

- DB migration v12: cfg_blocks + cfg_edges tables
- New src/cfg.js module: CFG_RULES, buildFunctionCFG, buildCFGData,
  cfgData, cfgToDOT, cfgToMermaid, cfg CLI printer
- Builder integration: --cfg flag triggers CFG after complexity pass
- CLI: `cfg <name>` command with --format text/dot/mermaid, -j, --ndjson
- MCP: cfg tool with name, format, file, kind, pagination props
- Exports findFunctionNode from complexity.js for reuse
- 24 unit tests + 11 integration tests (35 total)

Phase 1: JS/TS/TSX only. Handles if/else, for/while/do-while, switch,
try/catch/finally, break/continue (with labels), return/throw.

Impact: 27 functions changed, 36 affected

* feat: add stored queryable AST nodes (calls, new, string, regex, throw, await)

Persist selected AST nodes in a dedicated ast_nodes SQLite table during
build, queryable via CLI (codegraph ast), MCP (ast_query), and
programmatic API.

- DB migration v13: ast_nodes table with indexes on kind, name, file,
  parent, and (kind,name)
- New src/ast.js module: buildAstNodes (extraction), astQueryData/
  astQuery (query), AST_NODE_KINDS constant
- Builder integration: full-rebuild deletion, incremental cleanup,
  always-on post-parse extraction (before complexity to preserve _tree)
- CLI: codegraph ast [pattern] with -k, -f, -T, -j, --ndjson,
  --limit, --offset options
- MCP: ast_query tool with pattern, kind, file, no_tests, pagination
- JS/TS/TSX Phase 1: full AST walk for new/throw/await/string/regex;
  all languages get call nodes from symbols.calls
- Pattern matching uses SQL GLOB with auto-wrapping for substring search
- Parent node resolution via narrowest enclosing definition

Impact: 12 functions changed, 26 affected

* fix: correct misleading comment for break without enclosing loop/switch

The comment incorrectly suggested this code path handled break inside
switch cases. It actually handles break with no enclosing loop/switch
context (invalid syntax) as a no-op.

Impact: 2 functions changed, 9 affected

* docs: fix stale MCP tool references in guides

Update tool names and counts to match actual MCP server output:
- query_function → query, fn_deps/symbol_path removed (merged into query)
- list_entry_points removed (merged into execution_flow)
- Add missing tools: ast_query, cfg, dataflow, symbol_children
- Fix count: 31 tools (32 in multi-repo mode)

* feat: expand node types with parameter, property, constant kinds (#270)

* feat: expand node types with parameter, property, constant kinds (Phase 1)

Add sub-declaration node extraction to all 9 WASM language extractors,
enabling structural queries like "which functions take a Request param?"
or "which classes have a userId field?" without reading source code.

Schema: migration v11 adds nullable parent_id column with indexes.
Builder: insertNode links children to parent via parent_id FK.
Extractors: JS/TS, Python, Go, Rust, Java, C#, Ruby, PHP, HCL now
emit children arrays for parameters, properties, and constants.
Queries: new childrenData() function, children in contextData output.
CLI: new `children` command, EVERY_SYMBOL_KIND validation on --kind.
MCP: new `symbol_children` tool, extended kind enum on all kind fields.
Constants: CORE_SYMBOL_KINDS (10), EXTENDED_SYMBOL_KINDS (3),
EVERY_SYMBOL_KIND (13). ALL_SYMBOL_KINDS preserved for backward compat.

Native Rust engine: Definition struct gains children field but actual
extraction is deferred to Phase 2 — WASM fallback handles new kinds.

Impact: 63 functions changed, 62 affected

* feat: add expanded edge types — contains, parameter_of, receiver (Phase 2)

Build file→definition and parent→child contains edges, parameter_of
inverse edges, and receiver edges for method-call dispatch. Add
CORE_EDGE_KINDS, STRUCTURAL_EDGE_KINDS, EVERY_EDGE_KIND constants.
Exclude structural edges from moduleMapData coupling counts. Scope
directory contains-edge cleanup to preserve symbol-level edges.

Impact: 3 functions changed, 22 affected

* fix(native): add missing children field to all Rust extractors

The Definition struct gained a children field but no extractor was
updated to include it, causing 50 compilation errors. Add children: None
to every Definition initializer across all 9 language extractors.
Also fix unused variable warnings in parser_registry.rs and parallel.rs.

Impact: 13 functions changed, 10 affected

* ci: trigger workflow re-run

* feat: expand node types with parameter, property, constant kinds (Phase 1)

Add sub-declaration node extraction to all 9 WASM language extractors,
enabling structural queries like "which functions take a Request param?"
or "which classes have a userId field?" without reading source code.

Schema: migration v11 adds nullable parent_id column with indexes.
Builder: insertNode links children to parent via parent_id FK.
Extractors: JS/TS, Python, Go, Rust, Java, C#, Ruby, PHP, HCL now
emit children arrays for parameters, properties, and constants.
Queries: new childrenData() function, children in contextData output.
CLI: new `children` command, EVERY_SYMBOL_KIND validation on --kind.
MCP: new `symbol_children` tool, extended kind enum on all kind fields.
Constants: CORE_SYMBOL_KINDS (10), EXTENDED_SYMBOL_KINDS (3),
EVERY_SYMBOL_KIND (13). ALL_SYMBOL_KINDS preserved for backward compat.

Native Rust engine: Definition struct gains children field but actual
extraction is deferred to Phase 2 — WASM fallback handles new kinds.

Impact: 63 functions changed, 62 affected

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
carlos-alm added a commit that referenced this pull request Mar 3, 2026
…ge types (#279)

* docs: update incremental benchmarks (2.6.0) (#251)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* docs: update build performance benchmarks (2.6.0) (#249)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* chore: release v2.6.0 (#245)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* docs: update query benchmarks (2.6.0) (#252)

* ci: add CLA Assistant workflow and fix CLA.md (#244)

* docs: add Contributor License Agreement (CLA)

* ci: add CLA Assistant workflow and fix CLA.md issues

- Add .github/workflows/cla.yml using contributor-assistant/github-action@v2.6.1
  with dedicated cla-signatures branch to avoid polluting main
- Fix CLA.md: section 7→6 reference, capitalization consistency,
  control definition formatting (Roman numerals → lettered list)
- Add Acceptance section documenting the CLA bot signing process
- Add Governing Law clause (Province of Alberta, Canada)
- Update CONTRIBUTING.md with CLA signing instructions

* docs: document CLA recheck command in CONTRIBUTING.md

Address Greptile review feedback on #244 — add note that contributors
can comment `recheck` on a PR to re-trigger the CLA signature check.

* feat: add dataflow analysis (flows_to, returns, mutates) (#254)

* feat: add dataflow analysis (flows_to, returns, mutates edges)

Track how data moves through functions with three new edge types:
- flows_to: parameter/variable flows into another function as argument
- returns: call return value is captured by the caller
- mutates: parameter-derived value is mutated in-place

Opt-in via `build --dataflow` (JS/TS only for MVP). Adds schema
migration v10 (dataflow table), extractDataflow() AST walker with
scope tracking and confidence scoring, query functions (dataflowData,
dataflowPathData, dataflowImpactData), CLI command with --path and
--impact modes, MCP tool, batch support, and programmatic API exports.

Impact: 29 functions changed, 33 affected

* fix: handle spread args, optional chaining, and reassignment in dataflow

Address review feedback from Greptile:
- Track spread arguments (foo(...args)) by unwrapping spread_element
- Handle optional chaining (foo?.bar()) in callee name resolution
- Track non-declaration assignments (x = foo() without const/let/var)
  as returns edges
- Add 3 tests covering these cases

Impact: 3 functions changed, 3 affected

* docs: add TypeScript migration as Phase 4 in roadmap (#255)

Insert Phase 4 (TypeScript Migration) between the architectural
refactoring phase and the intelligent embeddings phase. Renumber
all subsequent phases (old 4-8 → new 5-9) including sub-section
headings, cross-references, dependency graph, and verification table.

The migration is planned after Phase 3 because the architectural
refactoring establishes clean module boundaries that serve as
natural type boundaries for incremental TS adoption.

* feat(queries): expose fileHash in where and query JSON output (#257)

* feat(watcher): add structured NDJSON change journal for watch mode

Write symbol-level change events to .codegraph/change-events.ndjson
during watch mode. Each line records added/removed/modified symbols
with node counts and edge data, enabling external tools to detect
rule staleness without polling. File is size-capped at 1 MB with
keep-last-half rotation.

Impact: 8 functions changed, 4 affected

* style: use template literals per biome lint

Impact: 1 functions changed, 1 affected

* feat(queries): expose file content hash in where and query JSON output

Add fileHash field to queryNameData, whereSymbolImpl, and whereFileImpl
return objects by looking up the file_hashes table. This lets consumers
(e.g. code-praxis) detect when a rule's target file has changed since
the rule was created, enabling staleness detection.

Impact: 4 functions changed, 16 affected

* style: use template literal in test fixture

* fix(change-journal): add debug/warn logging for observability

Address review feedback: debug log on successful append and rotation,
warn when a single oversized line prevents rotation.

Impact: 2 functions changed, 2 affected

* style: fix biome formatting in change-journal warn call

Impact: 1 functions changed, 1 affected

* fix(change-journal): use Buffer for byte-accurate rotation midpoint

stat.size returns bytes but String.length counts UTF-16 characters.
Read as Buffer and use buf.indexOf(0x0a) to find the newline at the
byte-level midpoint, ensuring consistent behavior with multi-byte UTF-8.

Impact: 1 functions changed, 1 affected

* feat: add batch-query command and multi-command batch mode (#256)

* feat: add batch-query command and multi-command batch mode

Add splitTargets() for comma-separated target expansion, multiBatchData()
for mixed-command orchestration, and a new batch-query CLI command that
defaults to 'where'. The existing batch command also gains comma splitting
and multi-command detection via --from-file/--stdin.

Impact: 5 functions changed, 3 affected

* fix: add try/catch around JSON.parse in batch and batch-query actions

Wrap --from-file and --stdin JSON parsing with error handling so malformed
input produces a clear error message instead of an unhandled exception.

* ci: allow benchmark/ branch name prefix (#258)

* docs: competitive deep-dive vs Joern (#260)

* docs: add competitive deep-dive for Joern and reorganize competitive folder

Move COMPETITIVE_ANALYSIS.md into generated/competitive/ and add a
comprehensive feature-by-feature comparison against joernio/joern
(our #1-ranked competitor). Covers parsing, graph model, query language,
performance, installation, AI/MCP integration, security analysis,
developer productivity, and ecosystem across 100+ individual features.
Update FOUNDATION.md reference to the new path.

* fix: update broken links to moved COMPETITIVE_ANALYSIS.md

README.md and docs/roadmap/BACKLOG.md still referenced the old path
at generated/COMPETITIVE_ANALYSIS.md after the file was moved to
generated/competitive/COMPETITIVE_ANALYSIS.md in #260.

* refactor!: consolidate MCP tools and CLI commands (#263)

Reduce MCP tool surface from 32 to 29 by merging overlapping tools:

- Rename query_function → query with deps/path modes (absorbs fn_deps + symbol_path)
- Add list mode to execution_flow (absorbs list_entry_points)
- Remove path mode from dataflow tool (now edges + impact only)
- Merge fn and path CLI commands into query (--path flag for path mode)
- Remove --path option from dataflow CLI command
- Update batch commands: remove fn, add dataflow, query uses fnDepsData
- Update MCP_DEFAULTS pagination keys

BREAKING CHANGE: MCP tools fn_deps, symbol_path, list_entry_points removed.
CLI commands fn and path removed. Use query instead.

Impact: 1 functions changed, 1 affected

* docs: competitive deep-dive — codegraph vs narsil-mcp (#262)

* docs: add competitive deep-dive for narsil-mcp

Comprehensive feature-by-feature analysis of narsil-mcp (postrv/narsil-mcp),
the closest head-to-head competitor to codegraph. Covers all 8 FOUNDATION.md
principles, 9 feature comparison sections with 130+ features, gap analysis,
and competitive positioning.

* fix: address Greptile review — scoring math and relative path

- Fix principle scoring from 6-0-2 to 7-0-1 (correct count from table)
- Fix relative link to COMPETITIVE_ANALYSIS.md (../ not ./)

* docs: add Joern competitive deep-dive with feature candidates (#264)

* docs: add competitive deep-dive for Joern and reorganize competitive folder

Move COMPETITIVE_ANALYSIS.md into generated/competitive/ and add a
comprehensive feature-by-feature comparison against joernio/joern
(our #1-ranked competitor). Covers parsing, graph model, query language,
performance, installation, AI/MCP integration, security analysis,
developer productivity, and ecosystem across 100+ individual features.
Update FOUNDATION.md reference to the new path.

* fix: update broken links to moved COMPETITIVE_ANALYSIS.md

README.md and docs/roadmap/BACKLOG.md still referenced the old path
at generated/COMPETITIVE_ANALYSIS.md after the file was moved to
generated/competitive/COMPETITIVE_ANALYSIS.md in #260.

* docs: add Joern-inspired feature candidates with BACKLOG-style grading

Append a new "Joern-Inspired Feature Candidates" section to the Joern
competitive deep-dive. Lists 11 actionable features extracted from
Parsing & Language Support, Graph Model & Analysis Depth, and Query
Language & Interface sections — assessed with the same tier/grading
system used in BACKLOG.md (zero-dep, foundation-aligned, problem-fit,
breaking).

Tier 1 non-breaking: call-chain slicing, type-informed resolution,
error-tolerant parsing, regex filtering, Kotlin, Swift, script execution.
Tier 1 breaking: expanded node/edge types, intraprocedural CFG, stored AST.
Not adopted: 9 features with FOUNDATION.md reasoning.
Cross-references BACKLOG IDs 14 and 7.

* feat: add normalizeSymbol utility for stable JSON schema

Add normalizeSymbol(row, db, hashCache) that returns a consistent
7-field symbol shape (name, kind, file, line, endLine, role, fileHash)
across all query and search commands.

Update queryNameData, fnDepsData, fnImpactData, explainFunctionImpl,
listFunctionsData, rolesData, whereSymbolImpl in queries.js and
searchData, multiSearchData, ftsSearchData, hybridSearchData in
embedder.js to use normalizeSymbol. Update SQL in listFunctionsData,
rolesData, iterListFunctions, iterRoles, _prepareSearch, and
ftsSearchData to include end_line and role columns.

Export normalizeSymbol from index.js. Add docs/json-schema.md
documenting the stable schema. Add 8 unit tests and 7 integration
schema conformance tests.

Impact: 13 functions changed, 33 affected

Impact: 14 functions changed, 42 affected

* feat: add normalizeSymbol utility for stable JSON schema (#267)

Add normalizeSymbol(row, db, hashCache) that returns a consistent
7-field symbol shape (name, kind, file, line, endLine, role, fileHash)
across all query and search commands.

Update queryNameData, fnDepsData, fnImpactData, explainFunctionImpl,
listFunctionsData, rolesData, whereSymbolImpl in queries.js and
searchData, multiSearchData, ftsSearchData, hybridSearchData in
embedder.js to use normalizeSymbol. Update SQL in listFunctionsData,
rolesData, iterListFunctions, iterRoles, _prepareSearch, and
ftsSearchData to include end_line and role columns.

Export normalizeSymbol from index.js. Add docs/json-schema.md
documenting the stable schema. Add 8 unit tests and 7 integration
schema conformance tests.

Impact: 13 functions changed, 33 affected

Impact: 14 functions changed, 42 affected

Impact: 13 functions changed, 21 affected

* docs: revise architecture audit and roadmap for v2.6.0 (#266)

Re-evaluate all architectural recommendations against the actual
codebase as it grew from v1.4.0 (5K lines, 12 modules) to v2.6.0
(17,830 lines, 35 modules).

Architecture audit:
- Reprioritize: dual-function anti-pattern across 15 modules is now #1
  (was analysis/formatting split at #3)
- Downgrade parser plugin system from #1 to #20 (parser.js shrank to
  404 lines after native engine took over)
- Add 3 new recommendations: decompose complexity.js (2,163 lines),
  unified graph model for structure/cochange/communities, pagination
  standardization
- Update all metrics and line counts to current state

Roadmap:
- Add Phase 2.5 (Analysis Expansion) documenting 18 modules shipped
  across v2.0.0-v2.6.0 (complexity, communities, structure, flow,
  cochange, manifesto, boundaries, check, audit, batch, triage,
  hybrid search, owners, snapshot, etc.)
- Mark Phase 5.3 (Hybrid Search) as completed early in Phase 2.5
- Update Phase 3 priorities based on revised architecture analysis
- Update version to 2.6.0, language count to 11, phase count to 10
- Add Phase 8 note referencing check command foundation from 2.5

* docs: add Narsil-MCP competitive deep-dive with feature candidates (#265)

* docs: add competitive deep-dive for Joern and reorganize competitive folder

Move COMPETITIVE_ANALYSIS.md into generated/competitive/ and add a
comprehensive feature-by-feature comparison against joernio/joern
(our #1-ranked competitor). Covers parsing, graph model, query language,
performance, installation, AI/MCP integration, security analysis,
developer productivity, and ecosystem across 100+ individual features.
Update FOUNDATION.md reference to the new path.

* fix: update broken links to moved COMPETITIVE_ANALYSIS.md

README.md and docs/roadmap/BACKLOG.md still referenced the old path
at generated/COMPETITIVE_ANALYSIS.md after the file was moved to
generated/competitive/COMPETITIVE_ANALYSIS.md in #260.

* docs: add Joern-inspired feature candidates with BACKLOG-style grading

Append a new "Joern-Inspired Feature Candidates" section to the Joern
competitive deep-dive. Lists 11 actionable features extracted from
Parsing & Language Support, Graph Model & Analysis Depth, and Query
Language & Interface sections — assessed with the same tier/grading
system used in BACKLOG.md (zero-dep, foundation-aligned, problem-fit,
breaking).

Tier 1 non-breaking: call-chain slicing, type-informed resolution,
error-tolerant parsing, regex filtering, Kotlin, Swift, script execution.
Tier 1 breaking: expanded node/edge types, intraprocedural CFG, stored AST.
Not adopted: 9 features with FOUNDATION.md reasoning.
Cross-references BACKLOG IDs 14 and 7.

* docs: add competitive deep-dive for Narsil-MCP with feature candidates

Comprehensive comparison across 10 dimensions: parsing (32 vs 11
languages), graph model (CFG/DFG/type inference vs complexity/roles/
communities), search (similarity/chunking vs RRF hybrid), security
(147 rules vs none), queries (90 tools vs 21 + compound commands),
performance (cold start vs incremental), install, MCP integration,
developer productivity, and ecosystem.

Feature candidates section covers all comparison sections:
- Tier 1 non-breaking (10): MCP presets, AST chunking, code similarity,
  git blame/symbol history, remote repo indexing, config wizard, Kotlin,
  Swift, Bash, Scala language support
- Tier 1 breaking (1): export map per module
- Tier 2 (2): interactive HTML viz, multiple embedding backends
- Tier 3 (2): OWASP patterns, SBOM generation
- Not adopted (10): taint, type inference, SPARQL/RDF, CCG, in-memory
  arch, 90-tool surface, browser WASM, Forgemax, LSP, license scanning
- Cross-references to BACKLOG IDs 7, 8, 10, 14 and Joern candidates
  J4, J5, J8, J9

* feat: add expanded edge types — contains, parameter_of, receiver (Phase 2)

Build file→definition and parent→child contains edges, parameter_of
inverse edges, and receiver edges for method-call dispatch. Add
CORE_EDGE_KINDS, STRUCTURAL_EDGE_KINDS, EVERY_EDGE_KIND constants.
Exclude structural edges from moduleMapData coupling counts. Scope
directory contains-edge cleanup to preserve symbol-level edges.

Impact: 3 functions changed, 22 affected

* chore: add pre-commit diff-impact hook (#271)

Add show-diff-impact.sh that automatically runs
`codegraph diff-impact --staged -T` before git commit commands.
The hook injects blast radius info as additionalContext —
informational only, never blocks commits.

* feat(export): add GraphML, GraphSON, Neo4j CSV and interactive viewer (#268)

* feat(export): add GraphML, GraphSON, Neo4j CSV formats and interactive HTML viewer

Add three new export formats for graph database interoperability:
- GraphML (XML standard) with file-level and function-level modes
- GraphSON (TinkerPop v3) for Gremlin/JanusGraph compatibility
- Neo4j CSV (bulk import) with separate nodes/relationships files

Add interactive HTML viewer (`codegraph plot`) powered by vis-network:
- Hierarchical, force, and radial layouts with physics toggle
- Node coloring by kind or role, search/filter, legend panel
- Configurable via .plotDotCfg JSON file

Update CLI export command, MCP export_graph tool, and programmatic API
to support all six formats.

Impact: 12 functions changed, 6 affected

* feat(plot): add drill-down, clustering, complexity overlays, and detail panel

Evolve the plot command from a static viewer into an interactive
exploration tool with rich data overlays and navigation.

Data preparation:
- Extract prepareGraphData() with complexity, fan-in/fan-out, Louvain
  community detection, directory derivation, and risk flag computation
- Seed strategies: all (default), top-fanin, entry

Interactive features:
- Detail sidebar: metrics, callers/callees lists, risk badges
- Drill-down: click-to-expand / double-click-to-collapse neighbors
- Clustering: community and directory grouping via vis-network API
- Color by: kind, role, community, complexity (MI-based borders)
- Size by: uniform, fan-in, fan-out, complexity
- Risk overlay: dead-code (dashed), high-blast-radius (shadow), low-MI

CLI options:
- --cluster, --overlay, --seed, --seed-count, --size-by, --color-by

Tests expanded from 7 to 21 covering all new data enrichment, seed
strategies, risk flags, UI elements, and config backward compatibility.

Impact: 5 functions changed, 3 affected

* fix(test): update MCP export_graph enum to include new formats

The previous commit added graphml, graphson, and neo4j export formats
to the MCP tool definition but did not update the test assertion.

* style: format mcp test after enum update

* fix(security): escape config values in HTML template to prevent XSS

Use JSON.stringify() for cfg.layout.direction, effectiveColorBy, and
cfg.clusterBy when interpolated into inline JavaScript. Replace shell
exec() with execFile() for browser-open to avoid path injection.

Impact: 1 functions changed, 1 affected

* docs: add check-readme hook to guides (#272)

* feat(export): add GraphML, GraphSON, Neo4j CSV formats and interactive HTML viewer

Add three new export formats for graph database interoperability:
- GraphML (XML standard) with file-level and function-level modes
- GraphSON (TinkerPop v3) for Gremlin/JanusGraph compatibility
- Neo4j CSV (bulk import) with separate nodes/relationships files

Add interactive HTML viewer (`codegraph plot`) powered by vis-network:
- Hierarchical, force, and radial layouts with physics toggle
- Node coloring by kind or role, search/filter, legend panel
- Configurable via .plotDotCfg JSON file

Update CLI export command, MCP export_graph tool, and programmatic API
to support all six formats.

Impact: 12 functions changed, 6 affected

* feat(plot): add drill-down, clustering, complexity overlays, and detail panel

Evolve the plot command from a static viewer into an interactive
exploration tool with rich data overlays and navigation.

Data preparation:
- Extract prepareGraphData() with complexity, fan-in/fan-out, Louvain
  community detection, directory derivation, and risk flag computation
- Seed strategies: all (default), top-fanin, entry

Interactive features:
- Detail sidebar: metrics, callers/callees lists, risk badges
- Drill-down: click-to-expand / double-click-to-collapse neighbors
- Clustering: community and directory grouping via vis-network API
- Color by: kind, role, community, complexity (MI-based borders)
- Size by: uniform, fan-in, fan-out, complexity
- Risk overlay: dead-code (dashed), high-blast-radius (shadow), low-MI

CLI options:
- --cluster, --overlay, --seed, --seed-count, --size-by, --color-by

Tests expanded from 7 to 21 covering all new data enrichment, seed
strategies, risk flags, UI elements, and config backward compatibility.

Impact: 5 functions changed, 3 affected

* fix(test): update MCP export_graph enum to include new formats

The previous commit added graphml, graphson, and neo4j export formats
to the MCP tool definition but did not update the test assertion.

* style: format mcp test after enum update

* fix(security): escape config values in HTML template to prevent XSS

Use JSON.stringify() for cfg.layout.direction, effectiveColorBy, and
cfg.clusterBy when interpolated into inline JavaScript. Replace shell
exec() with execFile() for browser-open to avoid path injection.

Impact: 1 functions changed, 1 affected

* docs: add check-readme hook to recommended practices and guides

Document the new check-readme.sh hook across all three doc locations:
recommended-practices.md, ai-agent-guide.md, and the hooks example
README. Adds settings.json examples, hook behavior descriptions, and
customization entries.

* feat: exports command + scoped rebuild for parallel agents (#269)

* docs: add competitive deep-dive for Joern and reorganize competitive folder

Move COMPETITIVE_ANALYSIS.md into generated/competitive/ and add a
comprehensive feature-by-feature comparison against joernio/joern
(our #1-ranked competitor). Covers parsing, graph model, query language,
performance, installation, AI/MCP integration, security analysis,
developer productivity, and ecosystem across 100+ individual features.
Update FOUNDATION.md reference to the new path.

* fix: update broken links to moved COMPETITIVE_ANALYSIS.md

README.md and docs/roadmap/BACKLOG.md still referenced the old path
at generated/COMPETITIVE_ANALYSIS.md after the file was moved to
generated/competitive/COMPETITIVE_ANALYSIS.md in #260.

* docs: add Joern-inspired feature candidates with BACKLOG-style grading

Append a new "Joern-Inspired Feature Candidates" section to the Joern
competitive deep-dive. Lists 11 actionable features extracted from
Parsing & Language Support, Graph Model & Analysis Depth, and Query
Language & Interface sections — assessed with the same tier/grading
system used in BACKLOG.md (zero-dep, foundation-aligned, problem-fit,
breaking).

Tier 1 non-breaking: call-chain slicing, type-informed resolution,
error-tolerant parsing, regex filtering, Kotlin, Swift, script execution.
Tier 1 breaking: expanded node/edge types, intraprocedural CFG, stored AST.
Not adopted: 9 features with FOUNDATION.md reasoning.
Cross-references BACKLOG IDs 14 and 7.

* docs: add competitive deep-dive for Narsil-MCP with feature candidates

Comprehensive comparison across 10 dimensions: parsing (32 vs 11
languages), graph model (CFG/DFG/type inference vs complexity/roles/
communities), search (similarity/chunking vs RRF hybrid), security
(147 rules vs none), queries (90 tools vs 21 + compound commands),
performance (cold start vs incremental), install, MCP integration,
developer productivity, and ecosystem.

Feature candidates section covers all comparison sections:
- Tier 1 non-breaking (10): MCP presets, AST chunking, code similarity,
  git blame/symbol history, remote repo indexing, config wizard, Kotlin,
  Swift, Bash, Scala language support
- Tier 1 breaking (1): export map per module
- Tier 2 (2): interactive HTML viz, multiple embedding backends
- Tier 3 (2): OWASP patterns, SBOM generation
- Not adopted (10): taint, type inference, SPARQL/RDF, CCG, in-memory
  arch, 90-tool surface, browser WASM, Forgemax, LSP, license scanning
- Cross-references to BACKLOG IDs 7, 8, 10, 14 and Joern candidates
  J4, J5, J8, J9

* feat: add dedicated `exports <file>` command with per-symbol consumers

Implements feature N11 from the Narsil competitive analysis. The new
command provides a focused export map showing which symbols a file
exports and who calls each one, filling the gap between `explain`
(public/internal split without consumers) and `where --file` (just
export names).

Adds exportsData/fileExports to queries.js, CLI command, MCP tool,
batch support, programmatic API, and integration tests.

Impact: 7 functions changed, 15 affected

* feat: add scoped rebuild for parallel agent rollback

Extract purgeFilesFromGraph() from the inline deletion cascade in
buildGraph() for reuse. Add opts.scope and opts.noReverseDeps to
buildGraph() so agents can surgically rebuild only their changed files
without nuking other agents' graph state.

- `--scope <files...>` on `build` skips collectFiles/getChangedFiles
- `--no-reverse-deps` skips reverse-dep cascade (safe when exports unchanged)
- New `scoped_rebuild` MCP tool for multi-agent orchestration
- purgeFilesFromGraph exported from programmatic API
- Unit tests for purge function, integration tests for scoped rebuild
- Documented agent-level rollback workflow in titan-paradigm.md

Impact: 3 functions changed, 20 affected

* fix: remove leaked scoped_rebuild changes from another session

Reverts purgeFilesFromGraph export, --scope/--no-reverse-deps CLI
options, scoped_rebuild MCP tool+handler, and test list entry that
were accidentally included from a concurrent session's dirty worktree.

Impact: 2 functions changed, 1 affected

* fix: remove stale scoped-rebuild docs from titan-paradigm

The scoped_rebuild feature (--scope, --no-reverse-deps CLI options and
scoped_rebuild MCP tool) was removed in 651ddb2 but the documentation
in titan-paradigm.md still referenced it. Addresses Greptile review
feedback on PR #269.

* feat: add intraprocedural control flow graph (CFG) construction

Add opt-in CFG analysis that builds basic-block control flow graphs
from tree-sitter AST for individual functions. Enables complexity-aware
impact analysis and opens the path to dataflow (def-use chains).

- DB migration v12: cfg_blocks + cfg_edges tables
- New src/cfg.js module: CFG_RULES, buildFunctionCFG, buildCFGData,
  cfgData, cfgToDOT, cfgToMermaid, cfg CLI printer
- Builder integration: --cfg flag triggers CFG after complexity pass
- CLI: `cfg <name>` command with --format text/dot/mermaid, -j, --ndjson
- MCP: cfg tool with name, format, file, kind, pagination props
- Exports findFunctionNode from complexity.js for reuse
- 24 unit tests + 11 integration tests (35 total)

Phase 1: JS/TS/TSX only. Handles if/else, for/while/do-while, switch,
try/catch/finally, break/continue (with labels), return/throw.

Impact: 27 functions changed, 36 affected

* feat: add stored queryable AST nodes (calls, new, string, regex, throw, await)

Persist selected AST nodes in a dedicated ast_nodes SQLite table during
build, queryable via CLI (codegraph ast), MCP (ast_query), and
programmatic API.

- DB migration v13: ast_nodes table with indexes on kind, name, file,
  parent, and (kind,name)
- New src/ast.js module: buildAstNodes (extraction), astQueryData/
  astQuery (query), AST_NODE_KINDS constant
- Builder integration: full-rebuild deletion, incremental cleanup,
  always-on post-parse extraction (before complexity to preserve _tree)
- CLI: codegraph ast [pattern] with -k, -f, -T, -j, --ndjson,
  --limit, --offset options
- MCP: ast_query tool with pattern, kind, file, no_tests, pagination
- JS/TS/TSX Phase 1: full AST walk for new/throw/await/string/regex;
  all languages get call nodes from symbols.calls
- Pattern matching uses SQL GLOB with auto-wrapping for substring search
- Parent node resolution via narrowest enclosing definition

Impact: 12 functions changed, 26 affected

* fix: correct misleading comment for break without enclosing loop/switch

The comment incorrectly suggested this code path handled break inside
switch cases. It actually handles break with no enclosing loop/switch
context (invalid syntax) as a no-op.

Impact: 2 functions changed, 9 affected

* docs: fix stale MCP tool references in guides

Update tool names and counts to match actual MCP server output:
- query_function → query, fn_deps/symbol_path removed (merged into query)
- list_entry_points removed (merged into execution_flow)
- Add missing tools: ast_query, cfg, dataflow, symbol_children
- Fix count: 31 tools (32 in multi-repo mode)

* feat: expand node types with parameter, property, constant kinds (#270)

* feat: expand node types with parameter, property, constant kinds (Phase 1)

Add sub-declaration node extraction to all 9 WASM language extractors,
enabling structural queries like "which functions take a Request param?"
or "which classes have a userId field?" without reading source code.

Schema: migration v11 adds nullable parent_id column with indexes.
Builder: insertNode links children to parent via parent_id FK.
Extractors: JS/TS, Python, Go, Rust, Java, C#, Ruby, PHP, HCL now
emit children arrays for parameters, properties, and constants.
Queries: new childrenData() function, children in contextData output.
CLI: new `children` command, EVERY_SYMBOL_KIND validation on --kind.
MCP: new `symbol_children` tool, extended kind enum on all kind fields.
Constants: CORE_SYMBOL_KINDS (10), EXTENDED_SYMBOL_KINDS (3),
EVERY_SYMBOL_KIND (13). ALL_SYMBOL_KINDS preserved for backward compat.

Native Rust engine: Definition struct gains children field but actual
extraction is deferred to Phase 2 — WASM fallback handles new kinds.

Impact: 63 functions changed, 62 affected

* feat: add expanded edge types — contains, parameter_of, receiver (Phase 2)

Build file→definition and parent→child contains edges, parameter_of
inverse edges, and receiver edges for method-call dispatch. Add
CORE_EDGE_KINDS, STRUCTURAL_EDGE_KINDS, EVERY_EDGE_KIND constants.
Exclude structural edges from moduleMapData coupling counts. Scope
directory contains-edge cleanup to preserve symbol-level edges.

Impact: 3 functions changed, 22 affected

* fix(native): add missing children field to all Rust extractors

The Definition struct gained a children field but no extractor was
updated to include it, causing 50 compilation errors. Add children: None
to every Definition initializer across all 9 language extractors.
Also fix unused variable warnings in parser_registry.rs and parallel.rs.

Impact: 13 functions changed, 10 affected

* ci: trigger workflow re-run

* fix: include cfg_edges and cfg_blocks in full rebuild cleanup

The full rebuild DELETE chain was missing the two CFG tables,
which would leave orphaned CFG data after a fresh build.

Impact: 1 functions changed, 0 affected

* feat: expand node types with parameter, property, constant kinds (Phase 1)

Add sub-declaration node extraction to all 9 WASM language extractors,
enabling structural queries like "which functions take a Request param?"
or "which classes have a userId field?" without reading source code.

Schema: migration v11 adds nullable parent_id column with indexes.
Builder: insertNode links children to parent via parent_id FK.
Extractors: JS/TS, Python, Go, Rust, Java, C#, Ruby, PHP, HCL now
emit children arrays for parameters, properties, and constants.
Queries: new childrenData() function, children in contextData output.
CLI: new `children` command, EVERY_SYMBOL_KIND validation on --kind.
MCP: new `symbol_children` tool, extended kind enum on all kind fields.
Constants: CORE_SYMBOL_KINDS (10), EXTENDED_SYMBOL_KINDS (3),
EVERY_SYMBOL_KIND (13). ALL_SYMBOL_KINDS preserved for backward compat.

Native Rust engine: Definition struct gains children field but actual
extraction is deferred to Phase 2 — WASM fallback handles new kinds.

Impact: 63 functions changed, 62 affected

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant