FireMemory

Local-first semantic memory engine for AI agents.

Leia em Português (PT-BR)

FireMemory stores everything in a single .fbrain file — no server, no cloud, no configuration. Agents read and write memory through MCP via fquery mcp. ML models (~325 MB) are downloaded automatically on first use.

60-second quickstart

1. Install

macOS / Linux

curl -fsSL https://raw.githubusercontent.com/phmotad/firememory/main/scripts/install.sh | bash

Windows (PowerShell) — installs fmem; fquery requires WSL2 or Docker

irm https://raw.githubusercontent.com/phmotad/firememory/main/scripts/install.ps1 | iex

Homebrew

brew tap phmotad/firememory
brew install firememory

Scoop

scoop bucket add phmotad https://github.com/phmotad/scoop-firememory
scoop install firememory

2. Wire your editor

fquery init-mcp claude-code   # Claude Code
fquery init-mcp cursor        # Cursor
fquery init-mcp windsurf      # Windsurf
fquery init-mcp zed           # Zed

This writes the MCP server entry into the editor's config file and prints the path it modified.

3. Create a brainfile

fmem init ~/my.fbrain

Or skip this — fmem stats and any fquery tool call will auto-create ~/.firememory/default.fbrain if it doesn't exist.

4. Restart your editor

The MCP server starts on demand. On the first call, fquery mcp downloads the three ML models (~325 MB, runs once). Subsequent starts are instant.

What it is

FireMemory is not a vector database, not a RAG layer, and not SQL.

It is a cognitive memory engine: it understands what is being stored, deduplicates semantically, builds a knowledge graph, and assembles context windows tailored to a query.

Concept	FireMemory
Storage format	Single `.fbrain` file (bbolt)
Embeddings	multilingual-e5-small INT8 (local ONNX)
Entity extraction	GLiNER-small-v2.1 INT8 (local ONNX)
Intent / classification	DeBERTa-v3-small INT8 (local ONNX)
Model size	~325 MB total, downloaded once
Transport	MCP over stdio (`fquery mcp`)
Privacy	100% local — nothing leaves your machine

Agent connectivity

Agents talk to FireQuery (the MCP layer), not directly to FireMemory.

Your editor agent
      │  MCP (stdio)
      ▼
  fquery mcp          ← FireQuery: validates, classifies, enriches
      │
      ▼
  .fbrain file        ← FireMemory: stores, recalls, syncs

Supported MCP tools

Tool	Description
`remember`	Store a memory (deduplication is automatic)
`recall`	Semantic search over stored memories
`get_context`	Retrieve a ranked context window for a query
`sync`	Run slow-path enrichment (entities, relations, graph)
`explain`	Explain a stored memory

CLI reference

fmem

fmem init <file.fbrain>                 create a new brainfile
fmem remember <file.fbrain> <text>      store a memory
fmem recall <file.fbrain> <query>       semantic search
fmem sync <file.fbrain>                 entity/relation enrichment
fmem context <file.fbrain> <query>      build a context window
fmem inspect <file.fbrain>              show manifest
fmem snapshot <file.fbrain>             full data dump (JSON)
fmem backup <file.fbrain> <dest>        copy to backup path
fmem restore <backup> <file.fbrain>     restore from backup
fmem compact <file.fbrain>              reclaim space (bbolt vacuum)
fmem stats [<file.fbrain>]              memory counts
fmem default                            print/create default brainfile path
fmem version                            print version

fquery

fquery mcp                              start MCP server (stdio)
fquery init-mcp <client>               configure editor MCP entry
  clients: claude-code, cursor, windsurf, zed
  --print                               dry-run: show config that would be written
  --config <path>                       override config file path
fquery models list                      show downloaded model status
fquery models pull                      download missing models
fquery models pull --force              re-download all models
fquery models gc                        remove cached models
fquery devices                          list compute devices (CPU/GPU)
fquery doctor                           run diagnostics
fquery version                          print version

Models

FireQuery uses three local ONNX INT8 models, downloaded automatically:

Model	Use	Size
`multilingual-e5-small`	Embeddings, semantic recall	~120 MB
`deberta-v3-small`	Intent & trigger classification	~72 MB
`gliner-small-v2.1`	Named entity extraction	~121 MB

Models are stored in:

macOS — ~/Library/Caches/firememory/models
Linux — ~/.cache/firememory/models
Windows — %LOCALAPPDATA%\firememory\models

Override with FIREMEMORY_MODELS_DIR.

To remove: fquery models gc

Docker

docker run --rm -i \
  -v "$HOME/.firememory/models:/models" \
  ghcr.io/phmotad/firequery mcp

Models are cached in the mounted volume and downloaded on first run.

Build from source

Requires Go 1.24 and a C compiler (for CGO).

git clone https://github.com/phmotad/firememory
cd firememory
make build          # produces bin/fmem and bin/fquery (with -tags onnx)
make test           # runs all tests (offline-safe, no models needed)

Release binaries are built with goreleaser and the ONNX Runtime shared library is bundled in each archive (no separate install needed).

Architecture

cmd/fmem       — FireMemory CLI
cmd/fquery     — FireQuery CLI + MCP server

internal/
  engine/        — remember / recall / sync / context / explain
  storage/       — bbolt store behind the Store interface
  brainfile/     — .fbrain format, validation, migration
  dedup/         — semantic deduplication (hash + embedding)
  embedder/      — Embedder interface (E5, deterministic, external)
  graph/         — knowledge graph (entities + relations)
  firequery/     — cognitive interface layer (pipeline, MCP, contracts)
  firequery/onnx — ONNX inference backend (build tag: onnx)
  modelcache/    — auto-download, verify, extract ML models
  initcfg/       — write MCP entries into editor config files
  defaultbrain/  — default brainfile path + auto-init
  version/       — version string injected at build time

Fast path (remember): hash → embed → dedup → persist
Slow path (sync): extract entities → build relations → update graph

Contributing

See CONTRIBUTING.md. All tests must pass (go test ./...) before submitting a PR.

The ONNX backend is behind //go:build onnx — tests run offline without models by design.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.claude/skills/firememory-setup		.claude/skills/firememory-setup
.github		.github
cmd		cmd
docs		docs
examples		examples
internal		internal
packaging		packaging
scripts		scripts
.gitattributes		.gitattributes
.gitignore		.gitignore
.golangci.yml		.golangci.yml
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile.fquery		Dockerfile.fquery
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
README_pt-BR.md		README_pt-BR.md
SECURITY.md		SECURITY.md
SUPPORT.md		SUPPORT.md
go.mod		go.mod
go.sum		go.sum
goreleaser.yaml		goreleaser.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FireMemory

60-second quickstart

1. Install

2. Wire your editor

3. Create a brainfile

4. Restart your editor

What it is

Agent connectivity

Supported MCP tools

CLI reference

fmem

fquery

Models

Docker

Build from source

Architecture

Contributing

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FireMemory

60-second quickstart

1. Install

2. Wire your editor

3. Create a brainfile

4. Restart your editor

What it is

Agent connectivity

Supported MCP tools

CLI reference

fmem

fquery

Models

Docker

Build from source

Architecture

Contributing

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages