Engram now runs fully offline against a local Ollama daemon — semantic search and the LLM judge / HyDE / consolidation with no API keys.
New features
- Local Ollama embedding provider. Set
[embedding] provider = "ollama"(e.g.model = "qwen3-embedding:0.6b"at 1024-dim) to produce real dense vectors against a local Ollama daemon with no API key. Switching an existing database is a one-timeengram reembed; the[6020]guard enforces that stored vectors match the active provider. - Local Ollama text generator. Set
[llm] provider = "ollama"to run the LLM judge, HyDE, and consolidation on a local chat model (e.g.qwen3:4b) with no API key. An unreachable daemon falls back to the heuristic judge. - Ollama endpoint configuration — new
[embedding].host/[llm].hostconfig fields and theENGRAM_OLLAMA_HOSTenvironment override select the Ollama host (defaulthttp://localhost:11434). - Ollama in the
engram initTUI wizard — selectable as both an embedding and an LLM provider, with no API-key prompt.
Improvements
- Search degrades to FTS5 when embeddings are unavailable instead of failing the request.
memory_searchreturns a{ results, degraded }envelope on both the healthy and degraded paths;degraded: truesignals the vector branch was skipped (e.g. the embedding provider is unreachable). - Localhost-tuned retry backoff for the Ollama HTTP clients — short, few retries — instead of reusing the cloud-latency budget. Provider documentation is pinned against the bundled config template and README by a docs-invariant test.
Breaking changes
- None.
Install: cargo install engram-memory · npm i -g @engramm/engram-mcp-server · pip install engram-trainer
Full changelog: v0.4.0...v0.5.0