Skip to content

Architecture and How It Works

special-place-administrator edited this page Jun 10, 2026 · 3 revisions

Architecture and How It Works

SymForge is a local-first MCP server for code intelligence. It serves an agent from a live repository index instead of making the agent assemble context by reading broad chunks of files.

Use this page for

understanding what SymForge owns, how the index works, and where the boundary sits between SymForge, the shell, and the client.

Runtime Diagram

flowchart LR
    Client["MCP client<br/>Codex, Claude, Gemini, Kilo, etc."] --> Server["symforge stdio MCP server"]

    Server --> Startup["startup planner"]
    Startup -->|local session| Local["in-process LiveIndex"]
    Startup -->|shared sessions| Daemon["optional local daemon"]
    Daemon --> Local

    Workspace["workspace files"] --> Parser["tree-sitter parsers<br/>config extractors"]
    Parser --> Local
    Watcher["filesystem watcher"] --> Local
    Git["git status, diffs, history"] --> Signals["frecency, co-change,<br/>temporal hotspots"]
    Signals --> Local

    Local --> Snapshot[".symforge/index.bin"]
    Snapshot --> Local

    Local --> Tools["MCP tools<br/>resources<br/>prompts"]
    Tools --> Client

    Tools --> Edits["structural edit engine"]
    Edits --> Workspace
    Edits --> Impact["analyze_file_impact"]
    Impact --> Local

    Tools --> Analytics["optional analytics queue"]
    Analytics --> AnalyticsDb[".symforge/analytics.db"]
Loading

What SymForge Owns

SymForge should be the first stop for:

  • repository orientation
  • source-code file outlines
  • symbol lookup and symbol source reads
  • text search with enclosing symbol context
  • structural AST search
  • reference and dependent tracing
  • changed-file and symbol-diff inspection
  • syntax diagnostics for supported code/config files
  • edit planning and symbol-scoped source edits
  • post-edit reindexing and impact analysis

SymForge does not replace:

  • cargo, npm, test runners, or package managers
  • Docker and process control
  • runtime debugging
  • OS diagnostics
  • literal document reads where exact prose is the target

Core Data Flow

  1. Startup discovers the workspace unless auto-indexing is disabled.
  2. Files are admitted according to project boundaries, ignore rules, and noise policy.
  3. Source files are parsed with tree-sitter language extractors.
  4. Config/document files use dedicated extractors where available.
  5. Symbols, references, file text, parse diagnostics, and metadata are published into LiveIndex.
  6. Query tools read from the in-process index.
  7. The watcher and analyze_file_impact keep changed files fresh.
  8. Snapshots under .symforge/index.bin warm future startup.

Main Code Areas

Area Role
src/protocol/ MCP protocol surface, tool handlers, prompts, resources, result metadata, formatting
src/live_index/ In-memory file/symbol/reference store, queries, search, snapshots, rank signals
src/parsing/ Tree-sitter integration, language extractors, config extractors, diagnostics
src/analytics/ Local SQLite analytics store and bounded background writer
src/cli/ init, hook, trust, and analytics command handling
src/daemon.rs Shared local daemon and session routing
src/sidecar/ Local sidecar state, token stats, and HTTP handler surfaces
src/watcher/ Filesystem watching and reconciliation
src/git.rs Git status, diffs, retry helpers, and temporal input
npm/ JavaScript launcher, installer, and npm packaging tests

Design Decisions

  • Local-first index: source spans depend on exact local bytes, so query serving should stay in process whenever possible.
  • Snapshots are acceleration, not authority: .symforge/index.bin is used for warm startup, but current files and reindexing remain the source of truth.
  • Explicit recovery: bad parses and stale state are surfaced in health. Recovery paths are tools such as validate_file_syntax, analyze_file_impact, and index_folder.
  • Symbol-scoped edits: edit tools resolve targets server-side, write atomically, and reindex after successful changes.
  • Capability evidence: optional ranking and routing features report whether they were applied, unavailable, disabled, stale, or falling back.
  • No fake success: result metadata distinguishes found, empty, ambiguous, invalid, and failure states separately from the human-readable text.

Trust Envelopes

Query responses open with a machine-readable header so the agent knows how much to believe the result instead of guessing:

  • Match type — exact, constrained, or heuristic.
  • Source authority — current index, disk-refreshed, or worktree target (rebased) for routed edits.
  • Parse state — parsed, partial, or degraded for the files involved.
  • Completeness — full, budget-limited, or truncated by a result cap, always with the actual numbers (for example "output is ~707 tokens; budget is 600").
  • Scope — what was searched and which noise classes (vendor, generated, tests, personal tooling) were filtered.
  • Evidence anchorsfile:line references the agent can jump to.

Truncation is never silent: every bounded response says it was bounded and by how much. ask extends the same idea to routing — it reports route confidence (exact vs inferred), its rationale, and a suggested next step, and it downgrades its own confidence on compound questions rather than returning a confident false negative.

Expected-Partial Parse Quarantine

Some valid code trips upstream tree-sitter grammar limitations (for example, TypeScript import('rxjs').Subscription[], or Angular @if (a > b) control flow inside .html templates). SymForge separates these expected partials from genuine repo defects in health — but only after a proof, never a heuristic:

  1. Neutralize only the suspected construct, token-preservingly (a space replaces the offending characters, so adjacent tokens can never fuse).
  2. Re-parse the whole file.
  3. Excuse the file iff the re-parse is completely clean.

A genuinely broken file that merely contains the known construct stays classified as an unexpected partial, so real defects cannot hide behind grammar limitations. Verdicts are memoized by content hash, so repeated health calls and render paths do not re-pay the parse cost.

Typical Agent Workflow

  1. health to confirm index/runtime state.
  2. get_repo_map, explore, or ask to orient.
  3. search_symbols, search_text, or search_files to narrow.
  4. get_file_context, get_symbol, or get_symbol_context to inspect.
  5. edit_plan before non-trivial edits.
  6. A structural edit tool for source changes.
  7. analyze_file_impact for touched files or index_folder after broad work.

Boundary With The Shell

The simplest model:

  • SymForge answers "where is the code, what does it reference, and how do I edit this symbol?"
  • The shell answers "does the project build, do tests pass, what process is running, and what did the OS do?"