Local-first semantic memory engine for AI agents.
FireMemory stores everything in a single .fbrain file — no server, no cloud, no configuration.
Agents read and write memory through MCP via fquery mcp.
ML models (~325 MB) are downloaded automatically on first use.
macOS / Linux
curl -fsSL https://raw.githubusercontent.com/phmotad/firememory/main/scripts/install.sh | bashWindows (PowerShell) — installs fmem; fquery requires WSL2 or Docker
irm https://raw.githubusercontent.com/phmotad/firememory/main/scripts/install.ps1 | iexHomebrew
brew tap phmotad/firememory
brew install firememoryScoop
scoop bucket add phmotad https://github.com/phmotad/scoop-firememory
scoop install firememoryfquery init-mcp claude-code # Claude Code
fquery init-mcp cursor # Cursor
fquery init-mcp windsurf # Windsurf
fquery init-mcp zed # ZedThis writes the MCP server entry into the editor's config file and prints the path it modified.
fmem init ~/my.fbrainOr skip this — fmem stats and any fquery tool call will auto-create
~/.firememory/default.fbrain if it doesn't exist.
The MCP server starts on demand. On the first call, fquery mcp downloads the
three ML models (~325 MB, runs once). Subsequent starts are instant.
FireMemory is not a vector database, not a RAG layer, and not SQL.
It is a cognitive memory engine: it understands what is being stored, deduplicates semantically, builds a knowledge graph, and assembles context windows tailored to a query.
| Concept | FireMemory |
|---|---|
| Storage format | Single .fbrain file (bbolt) |
| Embeddings | multilingual-e5-small INT8 (local ONNX) |
| Entity extraction | GLiNER-small-v2.1 INT8 (local ONNX) |
| Intent / classification | DeBERTa-v3-small INT8 (local ONNX) |
| Model size | ~325 MB total, downloaded once |
| Transport | MCP over stdio (fquery mcp) |
| Privacy | 100% local — nothing leaves your machine |
Agents talk to FireQuery (the MCP layer), not directly to FireMemory.
Your editor agent
│ MCP (stdio)
▼
fquery mcp ← FireQuery: validates, classifies, enriches
│
▼
.fbrain file ← FireMemory: stores, recalls, syncs
| Tool | Description |
|---|---|
remember |
Store a memory (deduplication is automatic) |
recall |
Semantic search over stored memories |
get_context |
Retrieve a ranked context window for a query |
sync |
Run slow-path enrichment (entities, relations, graph) |
explain |
Explain a stored memory |
fmem init <file.fbrain> create a new brainfile
fmem remember <file.fbrain> <text> store a memory
fmem recall <file.fbrain> <query> semantic search
fmem sync <file.fbrain> entity/relation enrichment
fmem context <file.fbrain> <query> build a context window
fmem inspect <file.fbrain> show manifest
fmem snapshot <file.fbrain> full data dump (JSON)
fmem backup <file.fbrain> <dest> copy to backup path
fmem restore <backup> <file.fbrain> restore from backup
fmem compact <file.fbrain> reclaim space (bbolt vacuum)
fmem stats [<file.fbrain>] memory counts
fmem default print/create default brainfile path
fmem version print version
fquery mcp start MCP server (stdio)
fquery init-mcp <client> configure editor MCP entry
clients: claude-code, cursor, windsurf, zed
--print dry-run: show config that would be written
--config <path> override config file path
fquery models list show downloaded model status
fquery models pull download missing models
fquery models pull --force re-download all models
fquery models gc remove cached models
fquery devices list compute devices (CPU/GPU)
fquery doctor run diagnostics
fquery version print version
FireQuery uses three local ONNX INT8 models, downloaded automatically:
| Model | Use | Size |
|---|---|---|
multilingual-e5-small |
Embeddings, semantic recall | ~120 MB |
deberta-v3-small |
Intent & trigger classification | ~72 MB |
gliner-small-v2.1 |
Named entity extraction | ~121 MB |
Models are stored in:
- macOS —
~/Library/Caches/firememory/models - Linux —
~/.cache/firememory/models - Windows —
%LOCALAPPDATA%\firememory\models
Override with FIREMEMORY_MODELS_DIR.
To remove: fquery models gc
docker run --rm -i \
-v "$HOME/.firememory/models:/models" \
ghcr.io/phmotad/firequery mcpModels are cached in the mounted volume and downloaded on first run.
Requires Go 1.24 and a C compiler (for CGO).
git clone https://github.com/phmotad/firememory
cd firememory
make build # produces bin/fmem and bin/fquery (with -tags onnx)
make test # runs all tests (offline-safe, no models needed)Release binaries are built with goreleaser and the ONNX Runtime shared library
is bundled in each archive (no separate install needed).
cmd/fmem — FireMemory CLI
cmd/fquery — FireQuery CLI + MCP server
internal/
engine/ — remember / recall / sync / context / explain
storage/ — bbolt store behind the Store interface
brainfile/ — .fbrain format, validation, migration
dedup/ — semantic deduplication (hash + embedding)
embedder/ — Embedder interface (E5, deterministic, external)
graph/ — knowledge graph (entities + relations)
firequery/ — cognitive interface layer (pipeline, MCP, contracts)
firequery/onnx — ONNX inference backend (build tag: onnx)
modelcache/ — auto-download, verify, extract ML models
initcfg/ — write MCP entries into editor config files
defaultbrain/ — default brainfile path + auto-init
version/ — version string injected at build time
Fast path (remember): hash → embed → dedup → persist
Slow path (sync): extract entities → build relations → update graph
See CONTRIBUTING.md. All tests must pass (go test ./...) before submitting a PR.
The ONNX backend is behind //go:build onnx — tests run offline without models by design.