The gold memory layer for human agent directives. Captures the operator's words once. Surfaces them mechanically at every relevant agent decision. Never extracts the same answer twice.
npm i -g goldmem@alpha
goldmem --version # → 0.1.0-alpha.10
goldmem setup # interactive wizard: init + LLM detect +
# install-hooks + HOOK-FIRING SELF-TEST that
# synthesizes a real operator turn through
# the actual installed hook script and
# verifies the capture pipeline end-to-end
# — catches silent-fail modes (jq missing,
# broken hook, etc.) at install timeOr step-by-step:
npm i -g goldmem@alpha
goldmem init # in your project root
goldmem llm-setup # optional — pick LLM provider, get env-var name
goldmem install-hooks # wire into Claude Code / Codex / Cursor MCP + hooks
goldmem smoke-test # ~18s, 30 steps; must pass before relying on itRequires Node >=18.0.0. Tested on macOS, Linux, Windows.
jq prerequisite (POSIX hooks only): the bash hook adapter scripts (~/.claude/hooks/goldmem/*.sh etc.) pipe the harness's hook payload through jq before handing off to goldmem. If you install hooks on macOS/Linux, you need jq on your PATH:
| OS | Install command |
|---|---|
| macOS | brew install jq |
| Debian/Ubuntu | sudo apt install jq |
| Fedora/RHEL | sudo dnf install jq |
| Arch | sudo pacman -S jq |
| Windows | winget install jqlang.jq (the .cmd hook variants don't need jq, but having it installed lets the bash variants work too if you use Git Bash / WSL) |
The goldmem CLI itself, the MCP server (goldmem mcp), and Windows .cmd hook adapters do not require jq. It is only needed for the bash hook scripts. goldmem's postinstall step prints a warning if jq isn't on PATH.
Updates: npm update -g goldmem (or re-run npm i -g goldmem@alpha for the latest alpha).
Safety guarantees: goldmem never reads or writes any .env* file in your project; only .goldmem/ (its own canonical store) and .gitignore (append-only with required-set check) are touched.
Full quickstart in §Install / Quickstart below.
goldmem is the gold memory layer for human agent directives. Every directive the operator gives an AI agent is captured verbatim at the moment it is spoken, persisted durably as Markdown, and surfaced mechanically at every relevant agent decision — so the agent always sees what the operator already said, in the operator's own words, and never asks for an answer that has already been given.
The protected asset is the operator's time. Operator turns are scarce, expensive, and irreplaceable; agent reasoning is fungible and cheap. goldmem treats operator words as gold, agent words as context — and refuses to lose either the literal words or the context that makes them legible.
It works inside any agent harness — Claude Code, Cursor, Codex, Aider, OpenCode, Gemini CLI, custom MCP servers — and integrates cleanly with workflow plugins layered on top of those harnesses (sprint-gates, claim systems, multi-worker registries) through the same primitives every harness uses: hooks, MCP, or CLI shell-out. There is no daemon, no server, no network by default — just a Markdown source of truth, an embedded graph index, and a CLI that any agent can call.
This project was written on the night of 2026-05-03, after a ~13-hour autonomous coding session lost an entire day of development to one structural failure: the agent had been told the right thing, multiple times, by the operator — and the framework had no mechanism to make the agent act on what it had been told.
An autonomous agent was given a clear directive: "we need a new UI everywhere, per the mocks, no legacy." The mocks were pod.v1.html and center.v1.html, both sitting in the project tree, both referenced by the directive.
Over 34 sprints in 7 hours, the agent:
- Read the directive once, never enforced it on itself
- Treated
pod.v1.htmlas the canonical mock - Never opened
center.v1.html— the second half of the design source - Built 75
/center/*routes against an architecture that doesn't exist in the mock (left-rail-with-grouped-sub-routes; the mock specifies a top-bar mode-strip + multi-chat-stream layout) - Self-attested "ARC CLOSED 34/34, 17/17 UCs migrated" on every sprint
- Ran a follow-up audit sprint and missed the same gap a second time
- Closed the audit prematurely, three times in a row
Each correction required the operator to type the same directive again. The day finished with the wrong UI shipped on half the surfaces, the operator out 13 hours, and the agent ready to do it again tomorrow unless something structurally changed.
| Resource | Property |
|---|---|
| Operator | One. Unique. Finite. Time = life. Cannot scale. |
| Agent | Fungible. Scales with cash. Can be re-spun. |
Every minute the operator spent re-stating a directive is a minute extracted from a finite life. Every token the agent burned re-deriving scope it had already been told is paid for in compute, which is ~free. The two costs are not equivalent. They are not even on the same axis.
A framework that lets the agent extract the same answer from the operator more than once is, structurally, converting the scarce resource into a sink.
Operator input is the highest-priority signal in the system. It must be captured verbatim, persisted durably, and surfaced mechanically at every decision point — never re-extracted from the operator.
A short answer like "yes" is gold only if the question that produced it is captured alongside. A directive like "new UI everywhere" is binding only if every sprint plan in the affected scope physically loads it before authoring an action. Doctrine in a CLAUDE.md file is a request. A hook that blocks a tool call until the agent cites the directive is enforcement.
goldmem is the storage and retrieval layer that makes this enforcement possible.
- Bootstraps from your existing session history on day one.
goldmem init --bootstrapwalks every installed agent harness's session storage (Claude Code, Cursor, Codex, OpenCode, Gemini CLI, Aider), filters to sessions that touched the current project, retroactively applies goldmem's capture pipeline to that backlog, and lands the resulting directives in your.goldmem/with original timestamps preserved asvalid_at. Months of operator decisions arrive as inject-block content before you type a single new turn. Two-phase by default: dry-run preview → operator review → commit. - Captures every operator turn that has directive shape (imperatives, scope-defining statements, short answers to captured questions) into a structured Markdown file with timestamp, scope, applies-to patterns, and the verbatim words.
- Distinguishes operator-authored prose from pasted reference material so the database is never flooded by stack traces, code dumps, error logs, or URL paste-bombs. Only what the operator deliberately typed or dictated becomes directive content; what they pasted is preserved as referenced context but never indexed as gold.
- Captures the spotted-error pattern — when the operator delegates explanation via a leading question or artifact reference ("how can the title say X when Y is selected?"), goldmem captures the operator's question PLUS the agent's "you're right — here's what I got wrong" diagnostic, derives the rule the operator was implicitly enforcing from the joint content, and surfaces it for confirmation. The agent's self-correction text is the only case where agent output enters the gold record.
- Tracks how directives evolve over time, bi-temporally. Every directive carries when its content became true in the world (
valid_at/invalid_at) and when goldmem recorded it (captured_at/expired_at). When the operator restates a topic, goldmem classifies the new statement as a repeat (agent failed to act — surfaced prominently), refinement (additional detail), reversal (changed mind), correction (agent misinterpreted before), or scope change. Directives are never silently overwritten; the timeline is preserved and queryable point-in-time. - Indexes the captured directives in an embedded graph database (LadybugDB) with native full-text search, vector similarity, and Cypher graph traversal — supporting supersession chains, refinement trees, conflict clusters, citation graphs, and bi-temporal change edges.
- Surfaces scope-filtered directives mechanically at every relevant decision: agent about to declare a new sprint? Inject the active grounding. Agent about to edit a file? Run a compliance check against
applies_to. Agent about to ask the operator a question? Block the question if a directive already answers it. Operator restated the same directive three times? Block all related actions until the operator acknowledges the prior statement was missed. - Archives directives when their scope closes (arc completes, session ends), preserving full history while keeping the active set small and the working context focused.
- Stays out of the way when there's no directive to surface. Silent on irrelevant decisions.
Goldmem operates per project. It is initiated inside the project (goldmem init), stores its directives at .goldmem/ in that project's tree, and only ever loads context from that project. There is no machine-wide shared memory, no implicit cross-project bleed, no "preferences profile" that follows the operator from codebase to codebase.
A project's directives stay in the project. This is a design property, not a limitation — a directive that's right for project A is often actively wrong for project B (different conventions, different stakeholders, different stage). Implicit inheritance would dilute the signal goldmem exists to amplify.
When cross-project context IS warranted — for example, a deployment project that needs to absorb directives from the three services it deploys — the relationship is explicit and project-declared. The absorbing project's .goldmem/config.yaml lists the projects to absorb from:
absorbs_from:
- name: acme-api # resolved via the local goldmem registry
- path: ../acme-worker # absolute or relative path
- name: acme-frontendAbsorbed directives load into the absorbing project's inject-block at a weaker precedence than the project's own directives (local always wins on conflict), are read-only from the absorbing side (no edits propagate outward), and are visibly tagged in the inject-block with their source. The agent — and the operator — always knows which project a given directive came from.
- Not a graph database product. It uses one as an internal index. The canonical store is plain Markdown, git-tracked, human-auditable.
- Not a vector database. Vector similarity is an optional fallback for fuzzy ambiguity cases, never the primary recall path. Operator input is recalled by scope + applies-to + lexical match, not by embedding distance.
- Not a memory system for the model's reasoning. It does not store the agent's thoughts, decisions, or reflections. It stores the operator's input — the human signal — and only that.
- Not a daemon, server, or network service. Embedded. Local-first. One process. Zero infrastructure.
- Not tied to any harness or plugin. Agent harnesses (Claude Code, Cursor, Aider, custom MCP servers) integrate at the harness layer via hooks. Workflow plugins running inside a harness wire their own events to goldmem through the same primitives — hooks, MCP, or CLI shell-out — without a special plugin contract. Standalone mode (a harness with goldmem hooks installed and no plugin layer) is fully supported.
- Not a replacement for
CLAUDE.mdor project documentation. Those describe what the project IS. goldmem records what the operator HAS DECIDED — the running history of directional input that shapes the agent's actions.
A framework with goldmem in place would have prevented the 2026-05-03 incident as follows:
- Operator says "new UI everywhere per the mocks, no legacy."
- goldmem captures this as
DIR-0042, scopearc:ui-migration,applies_to: [/pod/**, /center/**, *.v1.html],mock_sources: [pod.v1.html, center.v1.html]. - Every subsequent sprint declaration in that arc receives
DIR-0042injected at the top of the planning context, with a compliance check requiring the agent to cite it. - When the agent tries to declare a sprint that excludes
/center/*from the new shell, the closure validator BLOCKS the action with the directive cited and the violated predicate named. - The agent cannot ship 12 sprints of left-rail Center pages. The operator never has to repeat the directive. The day is not lost.
The framework's job, restated: convert the operator's gold into agent constraint, mechanically and unbypassably, so the operator's life-minutes are spent on new directives, never on repeating old ones.
- v1.0-alpha — 19 of 19 AFK slices closed; 41 sprints archived. The full chain is operational and autonomously verified end-to-end. v1.0 ship status PENDING operator Slice 20 dogfood (≥7 days HITL validation per
sprint-docs/stop-condition.md). - See
sprint-proof/v1-closure-attempt/v1-closure-attempt.mdfor the closure-criteria audit trail. - Architecture, storage layout, schema, CLI surface, hook contracts, and integration model are spec'd in
SPEC.md. - Implementation: TypeScript source, bundled to JavaScript via esbuild for distribution, installed via
npm i -g goldmem. The shipped artifact is plain JS (TS compilation overhead would blow the <100ms hook startup budget). LadybugDB is the embedded query engine. - License: Apache 2.0.
goldmem is on npm under the alpha dist-tag while v0.1.x stabilises.
# Install the alpha (recommended for first-use validation)
npm i -g goldmem@alpha
# Or pin a specific version
npm i -g goldmem@0.1.0-alpha.10
# Verify
goldmem --help
goldmem --version # → 0.1.0-alpha.10Requires Node >=18.0.0. Tested on macOS, Linux, Windows.
A post-install banner prints next-steps the first time you install (and on every update). Set GOLDMEM_NO_BANNER=1 to suppress it.
# Pull the latest alpha
npm update -g goldmem
# or (equivalent):
npm i -g goldmem@alpha
# Compare installed vs. published
goldmem --version
npm view goldmem dist-tagsWhen the package graduates from alpha to stable, npm i -g goldmem (no tag) will pull the latest stable. While we're in alpha, always pin via @alpha or a specific version like @0.1.0-alpha.10.
goldmem touches only these files in your project:
.goldmem/— its own canonical store (created byinit; safe to delete to reset).gitignore— appended with required-set entries; never overwrites your existing rules
It does not touch:
- Any
.env*file (verified in CI:tests/integration/init-no-env-overwrite.test.ts) - Any of your source code, build artifacts, lockfiles, or
package.json - Any path outside the project root, except
~/.goldmem/registry.json(the global project registry — single line per project, project name + path)
LLM auth: goldmem reads process.env[<api_key_env>] at call time. It never writes env vars to disk and never reads .env* files itself — your shell or runner is responsible for loading them.
# Initialize in your project (one command does everything)
cd /path/to/your-project
goldmem init
# → creates .goldmem/ canonical store
# → auto-self-bootstraps from ~/.claude/projects/<encoded-cwd>/ if Claude
# Code session JSONL is present (HIGH-threshold directives only)
# → auto-wires LLM heuristic config if oolama-cloud.key OR
# ollama-cloud.key is at the repo root
# → updates .gitignore for transient state
#
# Opt-out flags if you want manual control:
# goldmem init --no-self-bootstrap --no-llm-autowire
# Verify the full chain works on your machine (~18 seconds)
goldmem smoke-test # 30 step categories pass → ready
# Auto-configure every detected agent harness
goldmem install-hooks # writes MCP entry + adapters for
# Claude Code / Cursor / Codex / OpenCode /
# Gemini / Aider (skips harnesses not installed).
# For claude-code, also refreshes the project's
# CLAUDE.md managed-block (see "CLAUDE.md
# integration" below) so directives are surfaced
# mechanically at every session start.
# Re-bootstrap on demand (init does this automatically; this command is
# for backfilling additional harnesses or re-running with a different threshold)
goldmem bootstrap --harness all # preview only — writes manifest at
# ~/.goldmem/_bootstrap/<project>/
goldmem bootstrap --commit # apply HIGH-tier directives to canonical
# Then just keep working. Every operator turn flows through:
# - UserPromptSubmit hook → directive-shape classification → auto-capture
# - PreToolUse:Edit hook → applies_to + forbidden_legacy → block on violation
# - PreToolUse:AskUser hook → operator-already-answered → block re-asks
# - PostToolUse hook → drift detection → advisory at next turn
# - Closure hook → mock_sources + forbidden_legacy + git citationCLAUDE.md at your project root is auto-loaded by Claude Code at every session start. Goldmem's install-hooks (and the standalone refresh-claude-md verb) writes a managed <!-- BEGIN GOLDMEM --> / <!-- END GOLDMEM --> block into it so the agent always knows goldmem is active and which directives apply — without depending on hook-injection latency or the agent remembering to query goldmem.
The block is intentionally slim:
<!-- BEGIN GOLDMEM v0.1.0 — managed block, do not edit manually.
Run `goldmem refresh-claude-md` to refresh. -->
## Goldmem
Captured operator directives are binding. Before non-trivial actions, run
`goldmem before-ask "<topic>"` and respect what it returns; cite DIR-NNNN.
Use `goldmem --help` for the rest.
### Active directives
<!-- DYNAMIC: rendered by `goldmem inject --format claude-md` -->
- DIR-0042 [project] never use `kuzu`; the LadybugDB binding is `@ladybugdb/core`.
- DIR-0079 [project] cross-platform first-class — normalize Windows paths at boundaries.
- ... (top N by priority × scope match; `--max-directives` to override)
<!-- END DYNAMIC -->
<!-- END GOLDMEM -->Co-existence is first-class. The protocol is adapted from CCSG's claude-md-block-template.md mechanism (see panbergco/claude-code-sprint-gate plugin/commands/sprint-setup.md Phase 6.5). Goldmem only touches content between its own markers; CCSG-managed blocks, operator content, and other tools' managed blocks are preserved byte-for-byte. Greedy regex <!-- BEGIN GOLDMEM[\s\S]*<!-- END GOLDMEM --> absorbs any orphan markers from prior stacked refreshes so the file never accumulates dead blocks.
Project-only. refresh-claude-md only writes to <project-root>/CLAUDE.md. It never touches ~/.claude/CLAUDE.md or any other path outside the current project (kill-list K-01: no machine-wide directive store).
Manual refresh:
goldmem refresh-claude-md # write/refresh the block
goldmem refresh-claude-md --dry-run # preview, no file change
goldmem refresh-claude-md --max-directives 10 # cap at 10 (default 20)Hook layer is now visible. Alpha.14 also fixed two hook-side bugs that made goldmem effectively invisible in transcripts: the PreToolUse:Bash check no longer exits 33 (MALFORMED_INPUT) on every Bash invocation, and the pre-tool-use-edit-write.sh wrapper no longer swallows VIOLATION citations to /dev/null. On a real block, the agent now sees the goldmem: VIOLATION — DIR-NNNN: … line in the tool-result transcript, alongside the remediation pointer.
goldmem's classification pipeline is rules-based by default and works fully without any LLM. Configure an LLM provider if you want recall lifted on directives the line-anchored regex stack misses (substantive operator turns phrased as questions, adverbial-prefix imperatives, etc.) and on ambiguous change classifications.
Auto-fire contract (alpha.10+): once heuristic.llm_escalation: true and the configured api_key_env is populated, LLM escalation fires automatically in:
goldmem capture --auto --stdin(live hook path) — for ambiguous change classificationgoldmem bootstrap(first-time scan) — for NONE/LOW shape candidates with substantive prosegoldmem backfill(historical replay) — same as bootstrap
No --llm-escalate flag is required. Operators who don't want LLM simply don't configure it (or unset the key env var). Cost is bounded by --llm-escalate-max N (default 50 calls per bootstrap/backfill run). See SPEC §22.
goldmem llm-setup
# Interactive prompt:
# Pick a provider:
# [1] Anthropic Claude — recommended for accuracy
# [2] OpenAI — broadly compatible
# [3] Google Gemini — cheap + fast
# [4] Groq — fastest inference, OpenAI-compat
# [5] Ollama Cloud — open weights, paid hosted
# [6] Local Ollama — runs on your machine, no API key
# [7] Skip — rules-only, no LLMThe setup verb prints the env var name + key-issuance URL + a copy-pasteable heuristic: block for .goldmem/config.yaml. Pass --commit to auto-write the block (refuses if a heuristic: block already exists).
Non-interactive form (CI / scripted): goldmem llm-setup --provider anthropic --commit.
If llm_escalation: true is set in .goldmem/config.yaml but the configured api_key_env resolves to empty in process.env, goldmem falls back to rules-only with a one-line stderr warning. No crash, no half-broken state. This means:
- Variable present in env → LLM escalation runs
- Variable absent in env → rules-only, stderr hint to run
goldmem llm-setup
Works the same on every CLI verb, every hook fire, every escalate-pending invocation.
If you'd rather edit .goldmem/config.yaml directly, here are the templates per provider:
| Provider type | Default env var | Sample model |
|---|---|---|
ollama (local) |
none — runs at http://localhost:11434 |
llama3.2:3b |
ollama (cloud) |
OLLAMA_CLOUD_API_KEY |
gpt-oss:20b-cloud |
anthropic |
ANTHROPIC_API_KEY |
claude-haiku-4-5 |
openai |
OPENAI_API_KEY |
gpt-4o-mini-2024-07-18 |
google |
GEMINI_API_KEY |
gemini-2.5-flash |
openai-compat |
configurable (LiteLLM, Together, Groq, …) | depends on host |
Model names must be version-pinned — floating tags like claude-haiku-latest or gpt-4o are rejected at startup with a clear error. See SPEC §19.5.
Edit .goldmem/config.yaml and add a heuristic block. Pick one of the templates below, or copy from the auto-wired output of goldmem init when a key file is present.
Anthropic example (set ANTHROPIC_API_KEY first):
heuristic:
llm_escalation: true
classifiers_with_llm:
- change-classification
provider:
type: anthropic
model: claude-haiku-4-5
api_key_env: ANTHROPIC_API_KEY # default; override if you use a different name
outbound_blocklist: [] # regex patterns; payload matching skips the call
monthly_usd_warn: 50 # surface warning at next turn-start when projected exceeds
monthly_usd_block: 200 # disable escalation entirely above thisOpenAI example (set OPENAI_API_KEY first):
heuristic:
llm_escalation: true
classifiers_with_llm:
- change-classification
provider:
type: openai
model: gpt-4o-mini-2024-07-18
api_key_env: OPENAI_API_KEY
outbound_blocklist: []
monthly_usd_warn: 50
monthly_usd_block: 200Ollama Cloud example (set OLLAMA_CLOUD_API_KEY first):
heuristic:
llm_escalation: true
classifiers_with_llm:
- change-classification
provider:
type: ollama
model: gpt-oss:20b-cloud
base_url: https://ollama.com
api_key_env: OLLAMA_CLOUD_API_KEY
outbound_blocklist: []
monthly_usd_warn: 50
monthly_usd_block: 200Local Ollama example (no key needed, but pull the model first with ollama pull llama3.2:3b):
heuristic:
llm_escalation: true
classifiers_with_llm:
- change-classification
provider:
type: ollama
model: llama3.2:3b
# base_url omitted → defaults to http://localhost:11434Groq example via openai-compat (set GROQ_API_KEY first):
heuristic:
llm_escalation: true
classifiers_with_llm:
- change-classification
provider:
type: openai-compat
model: openai/gpt-oss-20b # or openai/gpt-oss-120b
base_url: https://api.groq.com/openai/v1/chat/completions
api_key_env: GROQ_API_KEY
outbound_blocklist: []
monthly_usd_warn: 50
monthly_usd_block: 200Groq model constraint: goldmem requires strict-mode
json_schemastructured outputs, which Groq currently supports only onopenai/gpt-oss-20bandopenai/gpt-oss-120b(Groq docs). Other Groq models (Llama, Mixtral) will sometimes return malformed JSON and the classification call will error. If you need a non-GPT-OSS Groq model, run with--mockfor now or wait for goldmem to add a "best-effort schema" mode.
Don't put the key in config.yaml — set it as an environment variable so it stays out of source control. Three setup options:
Option A — Shell export (per-session):
# bash / zsh
export ANTHROPIC_API_KEY="sk-ant-..."
# PowerShell
$env:ANTHROPIC_API_KEY = "sk-ant-..."Option B — .env (or .env.local) file at the repo root:
Drop a .env file at the project root with the env var(s) for whichever providers you might use. Any of the standard dotenv filenames work — goldmem just reads process.env, so whatever your shell/runner loads is what goldmem sees:
| Filename | When to use |
|---|---|
.env |
Default, committed-shape (template) — but never with real secrets |
.env.local |
Local-only override, gitignored by convention. Recommended for real keys. |
.env.development.local / .env.production.local |
Per-environment local overrides (Next.js / Vite / dotenv-flow style) |
.envrc |
direnv — loads automatically on cd into the directory |
goldmem does not load any of these itself — your shell or runner must. Common ways to load:
direnv(auto-loads.envrcon directory entry — recommended for dev)dotenv-cli(npx dotenv -e .env.local -- goldmem escalate-pending)- IDE integration (VS Code, JetBrains run-configs)
- Shell sourcing (
set -a && source .env.local && set +a)
The repo's .gitignore already excludes .env, .env.local, and .env.*.local. Verify before pasting keys:
git check-ignore -v .env.local
# → .gitignore:8:.env.local .env.local ✓ gitignoredReference .env template covering every provider goldmem supports:
# ───── Anthropic Claude API ─────────────────────────────────────────────
# Used when provider.type = anthropic. Get from https://console.anthropic.com/
ANTHROPIC_API_KEY=sk-ant-api03-...
# ───── OpenAI ───────────────────────────────────────────────────────────
# Used when provider.type = openai. Get from https://platform.openai.com/
OPENAI_API_KEY=sk-proj-...
# ───── Google Gemini ────────────────────────────────────────────────────
# Used when provider.type = google. Get from https://aistudio.google.com/apikey
GEMINI_API_KEY=AIza...
# ───── Ollama Cloud ─────────────────────────────────────────────────────
# Used when provider.type = ollama AND base_url contains "cloud" (e.g.
# https://ollama.com). Get from https://ollama.com/settings/keys
# (Local Ollama at localhost:11434 needs no key.)
OLLAMA_CLOUD_API_KEY=...
# ───── Groq (OpenAI-compatible) ─────────────────────────────────────────
# Used when provider.type = openai-compat with base_url
# https://api.groq.com/openai/v1/chat/completions. Get from
# https://console.groq.com/keys
# Only Groq models openai/gpt-oss-20b and openai/gpt-oss-120b satisfy
# goldmem's strict-mode json_schema requirement.
GROQ_API_KEY=gsk_...
# ───── Other OpenAI-compat hosts ────────────────────────────────────────
# LiteLLM proxy, OpenRouter, Together, vLLM, etc. — pick any env var
# name and reference it via api_key_env in your provider block.
LITELLM_API_KEY=...
OPENROUTER_API_KEY=sk-or-...
TOGETHER_API_KEY=...You only need to set the keys for the providers you actually use. goldmem reads exactly one api_key_env per configured provider (the value you set in .goldmem/config.yaml's heuristic.provider.api_key_env field).
Option C — User-home ~/.env (loaded automatically by your login shell if you have direnv or a startup script):
# Example: ensure your shell loads ~/.env
echo 'set -a && [ -f ~/.env ] && source ~/.env && set +a' >> ~/.bashrc
# or for zsh: ~/.zshrc / fish: ~/.config/fish/config.fishgoldmem doctor will surface a warning if llm_escalation: true is set but the configured api_key_env resolves to empty.
- Mandatory redaction before any outbound call (SPEC §22): goldmem applies a 22-pattern secret-redaction sweep to the payload before it crosses the network boundary. Tokens / keys / PEM blocks / certs are stripped to
[REDACTED:<kind>]. - Operator-extensible blocklist:
outbound_blocklistregex patterns reject the call entirely if matched. - Monthly cost cap:
monthly_usd_blockdisables escalation above the threshold; falls back to rules-only. - Mock provider for CI:
goldmem escalate-pending --mockruns the pipeline against a deterministicMockProvider— no network, useful for testing.
# After config + env-var setup, test the live round-trip:
goldmem escalate-pending --mock # safe — uses MockProvider
goldmem escalate-pending # live — uses configured provider
goldmem doctor # surfaces missing-key + cost-cap warningsnpm uninstall -g goldmem
# .goldmem/ in your project is left intact — delete manually if you don't
# want the captured directive history retainedWhile in alpha, please report any issue you hit — false positives, missed captures, hook failures, integration problems with a specific harness, anything.
- Bug report: open an issue at https://github.com/panbergco/goldmem/issues with:
goldmem --versionoutputgoldmem doctor --format jsonoutput (after redacting any project-specific paths you'd rather not share)- The CLI command + steps that produced the issue
- Expected vs. actual behavior
- Security-sensitive issue (potential secret leak, redaction bypass, hook fail-mode regression): mark the issue with the
securitylabel, or contact the maintainer privately via the email listed inpackage.json. Do NOT include real keys, real.envcontent, or sensitive directives in public issues — goldmem ships secret-redaction patterns insrc/core/redact/index.tsif you need to mask something for a repro. - Behavior question ("is X intentional?"): start with
goldmem before-ask --topic "<question>"against a fresh project — if the canonical store has an answer, the system surfaces it. If not, an issue is the right venue.
The dogfood arc that produced v0.1.0-alpha.x already caught four real bugs (S033 silent-dequeue retry-loss, S039 audit-classification gap, S042 globstar fail-OPEN, S046 80% bootstrap false-positive rate). Keep the pressure on — that's how an alpha gets to stable.
| Verb | Purpose |
|---|---|
setup [--skip-llm] [--skip-self-test] |
Interactive wizard: init + LLM detect + install-hooks + HOOK-FIRING SELF-TEST that pipes a sentinel through the actual installed hook script and verifies it reaches .goldmem/. Catches silent-fail modes (jq missing, broken hook, GOLDMEM_BIN unresolvable) at install time. |
init |
Create .goldmem/ structure; register project; auto-fires self-bootstrap + claude-memory absorption |
status |
Fast state report (<50ms) |
doctor [--repair] [--regression] |
Consistency checks; per-classifier regression; baseline ratchet |
rebuild [--force] |
Walk canonical → regenerate LadybugDB index |
capture <verbatim> |
Operator-explicit directive capture |
capture --auto --stdin |
Hook-driven (consumes §8.1 payload) |
before-ask --topic <text> |
Pre-ask hook: BLOCK_ASK / CANDIDATES / MISS |
check --file <path> |
Pre-edit hook: PASS / VIOLATION |
inject [--token-budget N] |
Render the formatted directive block |
list / audit <id> / history --topic <key> |
Browse / inspect / topic walk |
explain --topic <text> |
Decision trace for the before-ask read path: which scopes loaded, matched directives + score + method, the decision, the reason citation |
cypher [--write] '<query>' |
Top-level CLI surface for the LadybugDB graph index (mirrors the MCP cypher tool). Read-only by default; writes require both --write flag and cypher.allow_write: true in .goldmem/config.yaml |
at-time <iso-date> / changes --since <iso-date> / repeats --min-count N |
Bi-temporal queries (at-time reads valid_at, so backfilled directives correctly appear as active on dates between original truth and capture) |
ack <DIR> / reclassify <DIR> / sup <DIR> |
Operator alignment + supersession |
closure-check --scope <name> / archive-scope <name> |
Scope lifecycle |
post-action --stdin |
Drift watcher (always advisory) |
repair [--diagnose] |
Replay orphaned _pending/ entries |
bootstrap [--harness <name>] [--commit] [--llm-escalate-max N] |
First-time scan from harness session JSONL; cross-corpus dedupe; LLM auto-fires when configured |
backfill [--source <path-or-glob>] [--commit] [--dry-run] [--limit N] |
Replay historical operator turns into canonical store with original-timestamp provenance. Auto-fires claude-memory absorption + LLM escalation (when configured). For recovering lost turns from transcript pruning, hook outage, or late install. |
absorb-claude-memory [--dry-run] |
Import directives from ~/.claude/projects/<encoded>/memory/ (feedback/user/project/reference files) into canonical store. Idempotent. Auto-fires during init and backfill. |
dedupe-bootstrap EX-A EX-B |
Operator-initiated cross-harness dedupe |
escalate-pending [--mock] |
Drain LLM-escalation queue (opt-in) |
absorb [--refresh] |
List/refresh absorbed cross-project sources |
mcp |
Start MCP stdio server (4 tools + 5 resources) |
slash-resolve <prefix> <verb> |
Map /gm <verb> → CLI invocation |
fixture-add EX-NNNN |
Capture an exchange as a regression fixture |
install-hooks [--dry-run] |
Multi-harness install (MCP + hooks + skills + project CLAUDE.md block refresh) |
refresh-claude-md [--dry-run] [--max-directives N] |
Write/refresh the GOLDMEM-managed block in project CLAUDE.md. Slim teaching + top-N active directives. Project-only (never touches ~/.claude/CLAUDE.md). Co-exists with CCSG and other tools' managed blocks. |
smoke-test [--keep-sandbox] |
End-to-end pre-flight (30 steps in 18s) |
/gm c <verbatim> → capture
/gm ok DIR-NNNN → ack
/gm fix DIR-NNNN <kind> → reclassify
/gm sup DIR-NNNN → sup --archive
/gm sup DIR-NNNN <verb> → sup --supersede
/gm at <iso-date> → at-time
/gm tl <topic-key> → history
/gm sug → inject (preview)
/gm fixture-add EX-NNNN → fixture-add
/gm scope-end <name> → closure-check
/gm scope-archive <name> → archive-scope
/goldmem <verb> is a long-form alias for any of the above.
goldmem exists because of an operator who recognized the asymmetry between a human's irreplaceable time and an agent's reproducible compute, and refused to spend more days converting the former into compensation for the latter.
The design is the answer to the question: "how do you make it impossible for the agent to lose the operator's directive between context cycles?"
The answer turns out to be: store it on disk in a format the agent must mechanically consult before every relevant action, and refuse to let it act otherwise.