Skip to content

panbergco/goldmem

Repository files navigation

goldmem

npm version CI License: Apache 2.0 Node

The gold memory layer for human agent directives. Captures the operator's words once. Surfaces them mechanically at every relevant agent decision. Never extracts the same answer twice.

Install

npm i -g goldmem@alpha
goldmem --version             # → 0.1.0-alpha.10
goldmem setup                 # interactive wizard: init + LLM detect +
                              # install-hooks + HOOK-FIRING SELF-TEST that
                              # synthesizes a real operator turn through
                              # the actual installed hook script and
                              # verifies the capture pipeline end-to-end
                              # — catches silent-fail modes (jq missing,
                              # broken hook, etc.) at install time

Or step-by-step:

npm i -g goldmem@alpha
goldmem init                  # in your project root
goldmem llm-setup             # optional — pick LLM provider, get env-var name
goldmem install-hooks         # wire into Claude Code / Codex / Cursor MCP + hooks
goldmem smoke-test            # ~18s, 30 steps; must pass before relying on it

Requires Node >=18.0.0. Tested on macOS, Linux, Windows.

jq prerequisite (POSIX hooks only): the bash hook adapter scripts (~/.claude/hooks/goldmem/*.sh etc.) pipe the harness's hook payload through jq before handing off to goldmem. If you install hooks on macOS/Linux, you need jq on your PATH:

OS Install command
macOS brew install jq
Debian/Ubuntu sudo apt install jq
Fedora/RHEL sudo dnf install jq
Arch sudo pacman -S jq
Windows winget install jqlang.jq (the .cmd hook variants don't need jq, but having it installed lets the bash variants work too if you use Git Bash / WSL)

The goldmem CLI itself, the MCP server (goldmem mcp), and Windows .cmd hook adapters do not require jq. It is only needed for the bash hook scripts. goldmem's postinstall step prints a warning if jq isn't on PATH.

Updates: npm update -g goldmem (or re-run npm i -g goldmem@alpha for the latest alpha). Safety guarantees: goldmem never reads or writes any .env* file in your project; only .goldmem/ (its own canonical store) and .gitignore (append-only with required-set check) are touched. Full quickstart in §Install / Quickstart below.


goldmem is the gold memory layer for human agent directives. Every directive the operator gives an AI agent is captured verbatim at the moment it is spoken, persisted durably as Markdown, and surfaced mechanically at every relevant agent decision — so the agent always sees what the operator already said, in the operator's own words, and never asks for an answer that has already been given.

The protected asset is the operator's time. Operator turns are scarce, expensive, and irreplaceable; agent reasoning is fungible and cheap. goldmem treats operator words as gold, agent words as context — and refuses to lose either the literal words or the context that makes them legible.

It works inside any agent harness — Claude Code, Cursor, Codex, Aider, OpenCode, Gemini CLI, custom MCP servers — and integrates cleanly with workflow plugins layered on top of those harnesses (sprint-gates, claim systems, multi-worker registries) through the same primitives every harness uses: hooks, MCP, or CLI shell-out. There is no daemon, no server, no network by default — just a Markdown source of truth, an embedded graph index, and a CLI that any agent can call.


Why goldmem exists

This project was written on the night of 2026-05-03, after a ~13-hour autonomous coding session lost an entire day of development to one structural failure: the agent had been told the right thing, multiple times, by the operator — and the framework had no mechanism to make the agent act on what it had been told.

The empirical incident

An autonomous agent was given a clear directive: "we need a new UI everywhere, per the mocks, no legacy." The mocks were pod.v1.html and center.v1.html, both sitting in the project tree, both referenced by the directive.

Over 34 sprints in 7 hours, the agent:

  • Read the directive once, never enforced it on itself
  • Treated pod.v1.html as the canonical mock
  • Never opened center.v1.html — the second half of the design source
  • Built 75 /center/* routes against an architecture that doesn't exist in the mock (left-rail-with-grouped-sub-routes; the mock specifies a top-bar mode-strip + multi-chat-stream layout)
  • Self-attested "ARC CLOSED 34/34, 17/17 UCs migrated" on every sprint
  • Ran a follow-up audit sprint and missed the same gap a second time
  • Closed the audit prematurely, three times in a row

Each correction required the operator to type the same directive again. The day finished with the wrong UI shipped on half the surfaces, the operator out 13 hours, and the agent ready to do it again tomorrow unless something structurally changed.

The asymmetry that made it expensive

Resource Property
Operator One. Unique. Finite. Time = life. Cannot scale.
Agent Fungible. Scales with cash. Can be re-spun.

Every minute the operator spent re-stating a directive is a minute extracted from a finite life. Every token the agent burned re-deriving scope it had already been told is paid for in compute, which is ~free. The two costs are not equivalent. They are not even on the same axis.

A framework that lets the agent extract the same answer from the operator more than once is, structurally, converting the scarce resource into a sink.

The principle goldmem encodes

Operator input is the highest-priority signal in the system. It must be captured verbatim, persisted durably, and surfaced mechanically at every decision point — never re-extracted from the operator.

A short answer like "yes" is gold only if the question that produced it is captured alongside. A directive like "new UI everywhere" is binding only if every sprint plan in the affected scope physically loads it before authoring an action. Doctrine in a CLAUDE.md file is a request. A hook that blocks a tool call until the agent cites the directive is enforcement.

goldmem is the storage and retrieval layer that makes this enforcement possible.


What goldmem does

  • Bootstraps from your existing session history on day one. goldmem init --bootstrap walks every installed agent harness's session storage (Claude Code, Cursor, Codex, OpenCode, Gemini CLI, Aider), filters to sessions that touched the current project, retroactively applies goldmem's capture pipeline to that backlog, and lands the resulting directives in your .goldmem/ with original timestamps preserved as valid_at. Months of operator decisions arrive as inject-block content before you type a single new turn. Two-phase by default: dry-run preview → operator review → commit.
  • Captures every operator turn that has directive shape (imperatives, scope-defining statements, short answers to captured questions) into a structured Markdown file with timestamp, scope, applies-to patterns, and the verbatim words.
  • Distinguishes operator-authored prose from pasted reference material so the database is never flooded by stack traces, code dumps, error logs, or URL paste-bombs. Only what the operator deliberately typed or dictated becomes directive content; what they pasted is preserved as referenced context but never indexed as gold.
  • Captures the spotted-error pattern — when the operator delegates explanation via a leading question or artifact reference ("how can the title say X when Y is selected?"), goldmem captures the operator's question PLUS the agent's "you're right — here's what I got wrong" diagnostic, derives the rule the operator was implicitly enforcing from the joint content, and surfaces it for confirmation. The agent's self-correction text is the only case where agent output enters the gold record.
  • Tracks how directives evolve over time, bi-temporally. Every directive carries when its content became true in the world (valid_at / invalid_at) and when goldmem recorded it (captured_at / expired_at). When the operator restates a topic, goldmem classifies the new statement as a repeat (agent failed to act — surfaced prominently), refinement (additional detail), reversal (changed mind), correction (agent misinterpreted before), or scope change. Directives are never silently overwritten; the timeline is preserved and queryable point-in-time.
  • Indexes the captured directives in an embedded graph database (LadybugDB) with native full-text search, vector similarity, and Cypher graph traversal — supporting supersession chains, refinement trees, conflict clusters, citation graphs, and bi-temporal change edges.
  • Surfaces scope-filtered directives mechanically at every relevant decision: agent about to declare a new sprint? Inject the active grounding. Agent about to edit a file? Run a compliance check against applies_to. Agent about to ask the operator a question? Block the question if a directive already answers it. Operator restated the same directive three times? Block all related actions until the operator acknowledges the prior statement was missed.
  • Archives directives when their scope closes (arc completes, session ends), preserving full history while keeping the active set small and the working context focused.
  • Stays out of the way when there's no directive to surface. Silent on irrelevant decisions.

Scope — project-only by default

Goldmem operates per project. It is initiated inside the project (goldmem init), stores its directives at .goldmem/ in that project's tree, and only ever loads context from that project. There is no machine-wide shared memory, no implicit cross-project bleed, no "preferences profile" that follows the operator from codebase to codebase.

A project's directives stay in the project. This is a design property, not a limitation — a directive that's right for project A is often actively wrong for project B (different conventions, different stakeholders, different stage). Implicit inheritance would dilute the signal goldmem exists to amplify.

When cross-project context IS warranted — for example, a deployment project that needs to absorb directives from the three services it deploys — the relationship is explicit and project-declared. The absorbing project's .goldmem/config.yaml lists the projects to absorb from:

absorbs_from:
  - name: acme-api          # resolved via the local goldmem registry
  - path: ../acme-worker         # absolute or relative path
  - name: acme-frontend

Absorbed directives load into the absorbing project's inject-block at a weaker precedence than the project's own directives (local always wins on conflict), are read-only from the absorbing side (no edits propagate outward), and are visibly tagged in the inject-block with their source. The agent — and the operator — always knows which project a given directive came from.


What goldmem is NOT

  • Not a graph database product. It uses one as an internal index. The canonical store is plain Markdown, git-tracked, human-auditable.
  • Not a vector database. Vector similarity is an optional fallback for fuzzy ambiguity cases, never the primary recall path. Operator input is recalled by scope + applies-to + lexical match, not by embedding distance.
  • Not a memory system for the model's reasoning. It does not store the agent's thoughts, decisions, or reflections. It stores the operator's input — the human signal — and only that.
  • Not a daemon, server, or network service. Embedded. Local-first. One process. Zero infrastructure.
  • Not tied to any harness or plugin. Agent harnesses (Claude Code, Cursor, Aider, custom MCP servers) integrate at the harness layer via hooks. Workflow plugins running inside a harness wire their own events to goldmem through the same primitives — hooks, MCP, or CLI shell-out — without a special plugin contract. Standalone mode (a harness with goldmem hooks installed and no plugin layer) is fully supported.
  • Not a replacement for CLAUDE.md or project documentation. Those describe what the project IS. goldmem records what the operator HAS DECIDED — the running history of directional input that shapes the agent's actions.

The shape of the win

A framework with goldmem in place would have prevented the 2026-05-03 incident as follows:

  1. Operator says "new UI everywhere per the mocks, no legacy."
  2. goldmem captures this as DIR-0042, scope arc:ui-migration, applies_to: [/pod/**, /center/**, *.v1.html], mock_sources: [pod.v1.html, center.v1.html].
  3. Every subsequent sprint declaration in that arc receives DIR-0042 injected at the top of the planning context, with a compliance check requiring the agent to cite it.
  4. When the agent tries to declare a sprint that excludes /center/* from the new shell, the closure validator BLOCKS the action with the directive cited and the violated predicate named.
  5. The agent cannot ship 12 sprints of left-rail Center pages. The operator never has to repeat the directive. The day is not lost.

The framework's job, restated: convert the operator's gold into agent constraint, mechanically and unbypassably, so the operator's life-minutes are spent on new directives, never on repeating old ones.


Status

  • v1.0-alpha — 19 of 19 AFK slices closed; 41 sprints archived. The full chain is operational and autonomously verified end-to-end. v1.0 ship status PENDING operator Slice 20 dogfood (≥7 days HITL validation per sprint-docs/stop-condition.md).
  • See sprint-proof/v1-closure-attempt/v1-closure-attempt.md for the closure-criteria audit trail.
  • Architecture, storage layout, schema, CLI surface, hook contracts, and integration model are spec'd in SPEC.md.
  • Implementation: TypeScript source, bundled to JavaScript via esbuild for distribution, installed via npm i -g goldmem. The shipped artifact is plain JS (TS compilation overhead would blow the <100ms hook startup budget). LadybugDB is the embedded query engine.
  • License: Apache 2.0.

Install

goldmem is on npm under the alpha dist-tag while v0.1.x stabilises.

# Install the alpha (recommended for first-use validation)
npm i -g goldmem@alpha

# Or pin a specific version
npm i -g goldmem@0.1.0-alpha.10

# Verify
goldmem --help
goldmem --version    # → 0.1.0-alpha.10

Requires Node >=18.0.0. Tested on macOS, Linux, Windows.

A post-install banner prints next-steps the first time you install (and on every update). Set GOLDMEM_NO_BANNER=1 to suppress it.

Updating

# Pull the latest alpha
npm update -g goldmem
# or (equivalent):
npm i -g goldmem@alpha

# Compare installed vs. published
goldmem --version
npm view goldmem dist-tags

When the package graduates from alpha to stable, npm i -g goldmem (no tag) will pull the latest stable. While we're in alpha, always pin via @alpha or a specific version like @0.1.0-alpha.10.

Safety guarantees

goldmem touches only these files in your project:

  • .goldmem/ — its own canonical store (created by init; safe to delete to reset)
  • .gitignore — appended with required-set entries; never overwrites your existing rules

It does not touch:

  • Any .env* file (verified in CI: tests/integration/init-no-env-overwrite.test.ts)
  • Any of your source code, build artifacts, lockfiles, or package.json
  • Any path outside the project root, except ~/.goldmem/registry.json (the global project registry — single line per project, project name + path)

LLM auth: goldmem reads process.env[<api_key_env>] at call time. It never writes env vars to disk and never reads .env* files itself — your shell or runner is responsible for loading them.

Quickstart

# Initialize in your project (one command does everything)
cd /path/to/your-project
goldmem init
# → creates .goldmem/ canonical store
# → auto-self-bootstraps from ~/.claude/projects/<encoded-cwd>/ if Claude
#   Code session JSONL is present (HIGH-threshold directives only)
# → auto-wires LLM heuristic config if oolama-cloud.key OR
#   ollama-cloud.key is at the repo root
# → updates .gitignore for transient state
#
# Opt-out flags if you want manual control:
#   goldmem init --no-self-bootstrap --no-llm-autowire

# Verify the full chain works on your machine (~18 seconds)
goldmem smoke-test           # 30 step categories pass → ready

# Auto-configure every detected agent harness
goldmem install-hooks        # writes MCP entry + adapters for
                             # Claude Code / Cursor / Codex / OpenCode /
                             # Gemini / Aider (skips harnesses not installed).
                             # For claude-code, also refreshes the project's
                             # CLAUDE.md managed-block (see "CLAUDE.md
                             # integration" below) so directives are surfaced
                             # mechanically at every session start.

# Re-bootstrap on demand (init does this automatically; this command is
# for backfilling additional harnesses or re-running with a different threshold)
goldmem bootstrap --harness all   # preview only — writes manifest at
                                  # ~/.goldmem/_bootstrap/<project>/
goldmem bootstrap --commit        # apply HIGH-tier directives to canonical

# Then just keep working. Every operator turn flows through:
#   - UserPromptSubmit hook   → directive-shape classification → auto-capture
#   - PreToolUse:Edit hook    → applies_to + forbidden_legacy → block on violation
#   - PreToolUse:AskUser hook → operator-already-answered → block re-asks
#   - PostToolUse hook        → drift detection → advisory at next turn
#   - Closure hook            → mock_sources + forbidden_legacy + git citation

CLAUDE.md integration (alpha.14+)

CLAUDE.md at your project root is auto-loaded by Claude Code at every session start. Goldmem's install-hooks (and the standalone refresh-claude-md verb) writes a managed <!-- BEGIN GOLDMEM --> / <!-- END GOLDMEM --> block into it so the agent always knows goldmem is active and which directives apply — without depending on hook-injection latency or the agent remembering to query goldmem.

The block is intentionally slim:

<!-- BEGIN GOLDMEM v0.1.0 — managed block, do not edit manually.
     Run `goldmem refresh-claude-md` to refresh. -->

## Goldmem

Captured operator directives are binding. Before non-trivial actions, run
`goldmem before-ask "<topic>"` and respect what it returns; cite DIR-NNNN.
Use `goldmem --help` for the rest.

### Active directives

<!-- DYNAMIC: rendered by `goldmem inject --format claude-md` -->
- DIR-0042 [project] never use `kuzu`; the LadybugDB binding is `@ladybugdb/core`.
- DIR-0079 [project] cross-platform first-class — normalize Windows paths at boundaries.
- ... (top N by priority × scope match; `--max-directives` to override)
<!-- END DYNAMIC -->

<!-- END GOLDMEM -->

Co-existence is first-class. The protocol is adapted from CCSG's claude-md-block-template.md mechanism (see panbergco/claude-code-sprint-gate plugin/commands/sprint-setup.md Phase 6.5). Goldmem only touches content between its own markers; CCSG-managed blocks, operator content, and other tools' managed blocks are preserved byte-for-byte. Greedy regex <!-- BEGIN GOLDMEM[\s\S]*<!-- END GOLDMEM --> absorbs any orphan markers from prior stacked refreshes so the file never accumulates dead blocks.

Project-only. refresh-claude-md only writes to <project-root>/CLAUDE.md. It never touches ~/.claude/CLAUDE.md or any other path outside the current project (kill-list K-01: no machine-wide directive store).

Manual refresh:

goldmem refresh-claude-md                       # write/refresh the block
goldmem refresh-claude-md --dry-run             # preview, no file change
goldmem refresh-claude-md --max-directives 10   # cap at 10 (default 20)

Hook layer is now visible. Alpha.14 also fixed two hook-side bugs that made goldmem effectively invisible in transcripts: the PreToolUse:Bash check no longer exits 33 (MALFORMED_INPUT) on every Bash invocation, and the pre-tool-use-edit-write.sh wrapper no longer swallows VIOLATION citations to /dev/null. On a real block, the agent now sees the goldmem: VIOLATION — DIR-NNNN: … line in the tool-result transcript, alongside the remediation pointer.

Configuring an LLM provider (optional)

goldmem's classification pipeline is rules-based by default and works fully without any LLM. Configure an LLM provider if you want recall lifted on directives the line-anchored regex stack misses (substantive operator turns phrased as questions, adverbial-prefix imperatives, etc.) and on ambiguous change classifications.

Auto-fire contract (alpha.10+): once heuristic.llm_escalation: true and the configured api_key_env is populated, LLM escalation fires automatically in:

  • goldmem capture --auto --stdin (live hook path) — for ambiguous change classification
  • goldmem bootstrap (first-time scan) — for NONE/LOW shape candidates with substantive prose
  • goldmem backfill (historical replay) — same as bootstrap

No --llm-escalate flag is required. Operators who don't want LLM simply don't configure it (or unset the key env var). Cost is bounded by --llm-escalate-max N (default 50 calls per bootstrap/backfill run). See SPEC §22.

Quickest path: goldmem llm-setup

goldmem llm-setup
# Interactive prompt:
# Pick a provider:
#   [1] Anthropic Claude     — recommended for accuracy
#   [2] OpenAI               — broadly compatible
#   [3] Google Gemini        — cheap + fast
#   [4] Groq                 — fastest inference, OpenAI-compat
#   [5] Ollama Cloud         — open weights, paid hosted
#   [6] Local Ollama         — runs on your machine, no API key
#   [7] Skip                 — rules-only, no LLM

The setup verb prints the env var name + key-issuance URL + a copy-pasteable heuristic: block for .goldmem/config.yaml. Pass --commit to auto-write the block (refuses if a heuristic: block already exists).

Non-interactive form (CI / scripted): goldmem llm-setup --provider anthropic --commit.

Graceful degrade

If llm_escalation: true is set in .goldmem/config.yaml but the configured api_key_env resolves to empty in process.env, goldmem falls back to rules-only with a one-line stderr warning. No crash, no half-broken state. This means:

  • Variable present in env → LLM escalation runs
  • Variable absent in env → rules-only, stderr hint to run goldmem llm-setup

Works the same on every CLI verb, every hook fire, every escalate-pending invocation.

Manual configuration

If you'd rather edit .goldmem/config.yaml directly, here are the templates per provider:

Five provider types are shipped

Provider type Default env var Sample model
ollama (local) none — runs at http://localhost:11434 llama3.2:3b
ollama (cloud) OLLAMA_CLOUD_API_KEY gpt-oss:20b-cloud
anthropic ANTHROPIC_API_KEY claude-haiku-4-5
openai OPENAI_API_KEY gpt-4o-mini-2024-07-18
google GEMINI_API_KEY gemini-2.5-flash
openai-compat configurable (LiteLLM, Together, Groq, …) depends on host

Model names must be version-pinned — floating tags like claude-haiku-latest or gpt-4o are rejected at startup with a clear error. See SPEC §19.5.

Configuration

Edit .goldmem/config.yaml and add a heuristic block. Pick one of the templates below, or copy from the auto-wired output of goldmem init when a key file is present.

Anthropic example (set ANTHROPIC_API_KEY first):

heuristic:
  llm_escalation: true
  classifiers_with_llm:
    - change-classification
  provider:
    type: anthropic
    model: claude-haiku-4-5
    api_key_env: ANTHROPIC_API_KEY  # default; override if you use a different name
  outbound_blocklist: []        # regex patterns; payload matching skips the call
  monthly_usd_warn: 50          # surface warning at next turn-start when projected exceeds
  monthly_usd_block: 200        # disable escalation entirely above this

OpenAI example (set OPENAI_API_KEY first):

heuristic:
  llm_escalation: true
  classifiers_with_llm:
    - change-classification
  provider:
    type: openai
    model: gpt-4o-mini-2024-07-18
    api_key_env: OPENAI_API_KEY
  outbound_blocklist: []
  monthly_usd_warn: 50
  monthly_usd_block: 200

Ollama Cloud example (set OLLAMA_CLOUD_API_KEY first):

heuristic:
  llm_escalation: true
  classifiers_with_llm:
    - change-classification
  provider:
    type: ollama
    model: gpt-oss:20b-cloud
    base_url: https://ollama.com
    api_key_env: OLLAMA_CLOUD_API_KEY
  outbound_blocklist: []
  monthly_usd_warn: 50
  monthly_usd_block: 200

Local Ollama example (no key needed, but pull the model first with ollama pull llama3.2:3b):

heuristic:
  llm_escalation: true
  classifiers_with_llm:
    - change-classification
  provider:
    type: ollama
    model: llama3.2:3b
    # base_url omitted → defaults to http://localhost:11434

Groq example via openai-compat (set GROQ_API_KEY first):

heuristic:
  llm_escalation: true
  classifiers_with_llm:
    - change-classification
  provider:
    type: openai-compat
    model: openai/gpt-oss-20b              # or openai/gpt-oss-120b
    base_url: https://api.groq.com/openai/v1/chat/completions
    api_key_env: GROQ_API_KEY
  outbound_blocklist: []
  monthly_usd_warn: 50
  monthly_usd_block: 200

Groq model constraint: goldmem requires strict-mode json_schema structured outputs, which Groq currently supports only on openai/gpt-oss-20b and openai/gpt-oss-120b (Groq docs). Other Groq models (Llama, Mixtral) will sometimes return malformed JSON and the classification call will error. If you need a non-GPT-OSS Groq model, run with --mock for now or wait for goldmem to add a "best-effort schema" mode.

Setting the API key

Don't put the key in config.yaml — set it as an environment variable so it stays out of source control. Three setup options:

Option A — Shell export (per-session):

# bash / zsh
export ANTHROPIC_API_KEY="sk-ant-..."

# PowerShell
$env:ANTHROPIC_API_KEY = "sk-ant-..."

Option B — .env (or .env.local) file at the repo root:

Drop a .env file at the project root with the env var(s) for whichever providers you might use. Any of the standard dotenv filenames work — goldmem just reads process.env, so whatever your shell/runner loads is what goldmem sees:

Filename When to use
.env Default, committed-shape (template) — but never with real secrets
.env.local Local-only override, gitignored by convention. Recommended for real keys.
.env.development.local / .env.production.local Per-environment local overrides (Next.js / Vite / dotenv-flow style)
.envrc direnv — loads automatically on cd into the directory

goldmem does not load any of these itself — your shell or runner must. Common ways to load:

  • direnv (auto-loads .envrc on directory entry — recommended for dev)
  • dotenv-cli (npx dotenv -e .env.local -- goldmem escalate-pending)
  • IDE integration (VS Code, JetBrains run-configs)
  • Shell sourcing (set -a && source .env.local && set +a)

The repo's .gitignore already excludes .env, .env.local, and .env.*.local. Verify before pasting keys:

git check-ignore -v .env.local
# → .gitignore:8:.env.local   .env.local   ✓ gitignored

Reference .env template covering every provider goldmem supports:

# ───── Anthropic Claude API ─────────────────────────────────────────────
# Used when provider.type = anthropic. Get from https://console.anthropic.com/
ANTHROPIC_API_KEY=sk-ant-api03-...

# ───── OpenAI ───────────────────────────────────────────────────────────
# Used when provider.type = openai. Get from https://platform.openai.com/
OPENAI_API_KEY=sk-proj-...

# ───── Google Gemini ────────────────────────────────────────────────────
# Used when provider.type = google. Get from https://aistudio.google.com/apikey
GEMINI_API_KEY=AIza...

# ───── Ollama Cloud ─────────────────────────────────────────────────────
# Used when provider.type = ollama AND base_url contains "cloud" (e.g.
# https://ollama.com). Get from https://ollama.com/settings/keys
# (Local Ollama at localhost:11434 needs no key.)
OLLAMA_CLOUD_API_KEY=...

# ───── Groq (OpenAI-compatible) ─────────────────────────────────────────
# Used when provider.type = openai-compat with base_url
# https://api.groq.com/openai/v1/chat/completions. Get from
# https://console.groq.com/keys
# Only Groq models openai/gpt-oss-20b and openai/gpt-oss-120b satisfy
# goldmem's strict-mode json_schema requirement.
GROQ_API_KEY=gsk_...

# ───── Other OpenAI-compat hosts ────────────────────────────────────────
# LiteLLM proxy, OpenRouter, Together, vLLM, etc. — pick any env var
# name and reference it via api_key_env in your provider block.
LITELLM_API_KEY=...
OPENROUTER_API_KEY=sk-or-...
TOGETHER_API_KEY=...

You only need to set the keys for the providers you actually use. goldmem reads exactly one api_key_env per configured provider (the value you set in .goldmem/config.yaml's heuristic.provider.api_key_env field).

Option C — User-home ~/.env (loaded automatically by your login shell if you have direnv or a startup script):

# Example: ensure your shell loads ~/.env
echo 'set -a && [ -f ~/.env ] && source ~/.env && set +a' >> ~/.bashrc
# or for zsh: ~/.zshrc / fish: ~/.config/fish/config.fish

goldmem doctor will surface a warning if llm_escalation: true is set but the configured api_key_env resolves to empty.

Privacy posture

  • Mandatory redaction before any outbound call (SPEC §22): goldmem applies a 22-pattern secret-redaction sweep to the payload before it crosses the network boundary. Tokens / keys / PEM blocks / certs are stripped to [REDACTED:<kind>].
  • Operator-extensible blocklist: outbound_blocklist regex patterns reject the call entirely if matched.
  • Monthly cost cap: monthly_usd_block disables escalation above the threshold; falls back to rules-only.
  • Mock provider for CI: goldmem escalate-pending --mock runs the pipeline against a deterministic MockProvider — no network, useful for testing.

Verify it works

# After config + env-var setup, test the live round-trip:
goldmem escalate-pending --mock     # safe — uses MockProvider
goldmem escalate-pending             # live — uses configured provider
goldmem doctor                       # surfaces missing-key + cost-cap warnings

Uninstall

npm uninstall -g goldmem
# .goldmem/ in your project is left intact — delete manually if you don't
# want the captured directive history retained

Reporting issues

While in alpha, please report any issue you hit — false positives, missed captures, hook failures, integration problems with a specific harness, anything.

  • Bug report: open an issue at https://github.com/panbergco/goldmem/issues with:
    • goldmem --version output
    • goldmem doctor --format json output (after redacting any project-specific paths you'd rather not share)
    • The CLI command + steps that produced the issue
    • Expected vs. actual behavior
  • Security-sensitive issue (potential secret leak, redaction bypass, hook fail-mode regression): mark the issue with the security label, or contact the maintainer privately via the email listed in package.json. Do NOT include real keys, real .env content, or sensitive directives in public issues — goldmem ships secret-redaction patterns in src/core/redact/index.ts if you need to mask something for a repro.
  • Behavior question ("is X intentional?"): start with goldmem before-ask --topic "<question>" against a fresh project — if the canonical store has an answer, the system surfaces it. If not, an issue is the right venue.

The dogfood arc that produced v0.1.0-alpha.x already caught four real bugs (S033 silent-dequeue retry-loss, S039 audit-classification gap, S042 globstar fail-OPEN, S046 80% bootstrap false-positive rate). Keep the pressure on — that's how an alpha gets to stable.

CLI verb cheat-sheet

Verb Purpose
setup [--skip-llm] [--skip-self-test] Interactive wizard: init + LLM detect + install-hooks + HOOK-FIRING SELF-TEST that pipes a sentinel through the actual installed hook script and verifies it reaches .goldmem/. Catches silent-fail modes (jq missing, broken hook, GOLDMEM_BIN unresolvable) at install time.
init Create .goldmem/ structure; register project; auto-fires self-bootstrap + claude-memory absorption
status Fast state report (<50ms)
doctor [--repair] [--regression] Consistency checks; per-classifier regression; baseline ratchet
rebuild [--force] Walk canonical → regenerate LadybugDB index
capture <verbatim> Operator-explicit directive capture
capture --auto --stdin Hook-driven (consumes §8.1 payload)
before-ask --topic <text> Pre-ask hook: BLOCK_ASK / CANDIDATES / MISS
check --file <path> Pre-edit hook: PASS / VIOLATION
inject [--token-budget N] Render the formatted directive block
list / audit <id> / history --topic <key> Browse / inspect / topic walk
explain --topic <text> Decision trace for the before-ask read path: which scopes loaded, matched directives + score + method, the decision, the reason citation
cypher [--write] '<query>' Top-level CLI surface for the LadybugDB graph index (mirrors the MCP cypher tool). Read-only by default; writes require both --write flag and cypher.allow_write: true in .goldmem/config.yaml
at-time <iso-date> / changes --since <iso-date> / repeats --min-count N Bi-temporal queries (at-time reads valid_at, so backfilled directives correctly appear as active on dates between original truth and capture)
ack <DIR> / reclassify <DIR> / sup <DIR> Operator alignment + supersession
closure-check --scope <name> / archive-scope <name> Scope lifecycle
post-action --stdin Drift watcher (always advisory)
repair [--diagnose] Replay orphaned _pending/ entries
bootstrap [--harness <name>] [--commit] [--llm-escalate-max N] First-time scan from harness session JSONL; cross-corpus dedupe; LLM auto-fires when configured
backfill [--source <path-or-glob>] [--commit] [--dry-run] [--limit N] Replay historical operator turns into canonical store with original-timestamp provenance. Auto-fires claude-memory absorption + LLM escalation (when configured). For recovering lost turns from transcript pruning, hook outage, or late install.
absorb-claude-memory [--dry-run] Import directives from ~/.claude/projects/<encoded>/memory/ (feedback/user/project/reference files) into canonical store. Idempotent. Auto-fires during init and backfill.
dedupe-bootstrap EX-A EX-B Operator-initiated cross-harness dedupe
escalate-pending [--mock] Drain LLM-escalation queue (opt-in)
absorb [--refresh] List/refresh absorbed cross-project sources
mcp Start MCP stdio server (4 tools + 5 resources)
slash-resolve <prefix> <verb> Map /gm <verb> → CLI invocation
fixture-add EX-NNNN Capture an exchange as a regression fixture
install-hooks [--dry-run] Multi-harness install (MCP + hooks + skills + project CLAUDE.md block refresh)
refresh-claude-md [--dry-run] [--max-directives N] Write/refresh the GOLDMEM-managed block in project CLAUDE.md. Slim teaching + top-N active directives. Project-only (never touches ~/.claude/CLAUDE.md). Co-exists with CCSG and other tools' managed blocks.
smoke-test [--keep-sandbox] End-to-end pre-flight (30 steps in 18s)

Slash commands (after goldmem install-hooks)

/gm c <verbatim>          → capture
/gm ok DIR-NNNN           → ack
/gm fix DIR-NNNN <kind>   → reclassify
/gm sup DIR-NNNN          → sup --archive
/gm sup DIR-NNNN <verb>   → sup --supersede
/gm at <iso-date>         → at-time
/gm tl <topic-key>        → history
/gm sug                   → inject (preview)
/gm fixture-add EX-NNNN   → fixture-add
/gm scope-end <name>      → closure-check
/gm scope-archive <name>  → archive-scope

/goldmem <verb> is a long-form alias for any of the above.


Acknowledgments

goldmem exists because of an operator who recognized the asymmetry between a human's irreplaceable time and an agent's reproducible compute, and refused to spend more days converting the former into compensation for the latter.

The design is the answer to the question: "how do you make it impossible for the agent to lose the operator's directive between context cycles?"

The answer turns out to be: store it on disk in a format the agent must mechanically consult before every relevant action, and refuse to let it act otherwise.

About

The gold memory layer for human agent directives. Every directive the operator gives an AI agent is captured verbatim, persisted as Markdown, and surfaced mechanically at every relevant agent decision — so the agent never asks for an answer the operator already gave.

Resources

License

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors