Architecture and How It Works

SymForge is a local-first MCP server for code intelligence. It serves an agent from a live repository index instead of making the agent assemble context by reading broad chunks of files.

Use this page for

understanding what SymForge owns, how the index works, and where the boundary sits between SymForge, the shell, and the client.

Runtime Diagram

flowchart LR
    Client["MCP client<br/>Codex, Claude, Gemini, Kilo, etc."] --> Server["symforge stdio MCP server"]

    Server --> Startup["startup planner"]
    Startup -->|local session| Local["in-process LiveIndex"]
    Startup -->|shared sessions| Daemon["optional local daemon"]
    Daemon --> Local

    Workspace["workspace files"] --> Parser["tree-sitter parsers<br/>config extractors"]
    Parser --> Local
    Watcher["filesystem watcher"] --> Local
    Git["git status, diffs, history"] --> Signals["frecency, co-change,<br/>temporal hotspots"]
    Signals --> Local

    Local --> Snapshot[".symforge/index.bin"]
    Snapshot --> Local

    Local --> Tools["MCP tools<br/>resources<br/>prompts"]
    Tools --> Client

    Tools --> Edits["structural edit engine"]
    Edits --> Workspace
    Edits --> Impact["analyze_file_impact"]
    Impact --> Local

    Tools --> Analytics["optional analytics queue"]
    Analytics --> AnalyticsDb[".symforge/analytics.db"]

What SymForge Owns

SymForge should be the first stop for:

repository orientation
source-code file outlines
symbol lookup and symbol source reads
text search with enclosing symbol context
structural AST search
reference and dependent tracing
changed-file and symbol-diff inspection
syntax diagnostics for supported code/config files
edit planning and symbol-scoped source edits
post-edit reindexing and impact analysis

SymForge does not replace:

cargo, npm, test runners, or package managers
Docker and process control
runtime debugging
OS diagnostics
literal document reads where exact prose is the target

Core Data Flow

Startup discovers the workspace unless auto-indexing is disabled.
Files are admitted according to project boundaries, ignore rules, and noise policy.
Source files are parsed with tree-sitter language extractors.
Config/document files use dedicated extractors where available.
Symbols, references, file text, parse diagnostics, and metadata are published into LiveIndex.
Query tools read from the in-process index.
The watcher and analyze_file_impact keep changed files fresh.
Snapshots under .symforge/index.bin warm future startup.

Main Code Areas

Area	Role
`src/protocol/`	MCP protocol surface, tool handlers, prompts, resources, result metadata, formatting
`src/live_index/`	In-memory file/symbol/reference store, queries, search, snapshots, rank signals
`src/parsing/`	Tree-sitter integration, language extractors, config extractors, diagnostics
`src/analytics/`	Local SQLite analytics store and bounded background writer
`src/cli/`	`init`, `hook`, `trust`, and `analytics` command handling
`src/daemon.rs`	Shared local daemon and session routing
`src/sidecar/`	Local sidecar state, token stats, and HTTP handler surfaces
`src/watcher/`	Filesystem watching and reconciliation
`src/git.rs`	Git status, diffs, retry helpers, and temporal input
`npm/`	JavaScript launcher, installer, and npm packaging tests

Design Decisions

Local-first index: source spans depend on exact local bytes, so query serving should stay in process whenever possible.
Snapshots are acceleration, not authority: .symforge/index.bin is used for warm startup, but current files and reindexing remain the source of truth.
Explicit recovery: bad parses and stale state are surfaced in health. Recovery paths are tools such as validate_file_syntax, analyze_file_impact, and index_folder.
Symbol-scoped edits: edit tools resolve targets server-side, write atomically, and reindex after successful changes.
Capability evidence: optional ranking and routing features report whether they were applied, unavailable, disabled, stale, or falling back.
No fake success: result metadata distinguishes found, empty, ambiguous, invalid, and failure states separately from the human-readable text.

Trust Envelopes

Query responses open with a machine-readable header so the agent knows how much to believe the result instead of guessing:

Match type — exact, constrained, or heuristic.
Source authority — current index, disk-refreshed, or worktree target (rebased) for routed edits.
Parse state — parsed, partial, or degraded for the files involved.
Completeness — full, budget-limited, or truncated by a result cap, always with the actual numbers (for example "output is ~707 tokens; budget is 600").
Scope — what was searched and which noise classes (vendor, generated, tests, personal tooling) were filtered.
Evidence anchors — file:line references the agent can jump to.

Truncation is never silent: every bounded response says it was bounded and by how much. ask extends the same idea to routing — it reports route confidence (exact vs inferred), its rationale, and a suggested next step, and it downgrades its own confidence on compound questions rather than returning a confident false negative.

Expected-Partial Parse Quarantine

Some valid code trips upstream tree-sitter grammar limitations (for example, TypeScript import('rxjs').Subscription[], or Angular @if (a > b) control flow inside .html templates). SymForge separates these expected partials from genuine repo defects in health — but only after a proof, never a heuristic:

Neutralize only the suspected construct, token-preservingly (a space replaces the offending characters, so adjacent tokens can never fuse).
Re-parse the whole file.
Excuse the file iff the re-parse is completely clean.

A genuinely broken file that merely contains the known construct stays classified as an unexpected partial, so real defects cannot hide behind grammar limitations. Verdicts are memoized by content hash, so repeated health calls and render paths do not re-pay the parse cost.

Typical Agent Workflow

health to confirm index/runtime state.
get_repo_map, explore, or ask to orient.
search_symbols, search_text, or search_files to narrow.
get_file_context, get_symbol, or get_symbol_context to inspect.
edit_plan before non-trivial edits.
A structural edit tool for source changes.
analyze_file_impact for touched files or index_folder after broad work.

Boundary With The Shell

The simplest model:

SymForge answers "where is the code, what does it reference, and how do I edit this symbol?"
The shell answers "does the project build, do tests pass, what process is running, and what did the OS do?"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Architecture and How It Works

Architecture and How It Works

Runtime Diagram

What SymForge Owns

Core Data Flow

Main Code Areas

Design Decisions

Trust Envelopes

Expected-Partial Parse Quarantine

Typical Agent Workflow

Boundary With The Shell

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SymForge Wiki

Start

Setup

Product and Runtime

Reference

Benchmarks

Clone this wiki locally