Skip to content

fix: stop burning Claude tokens on every PostToolUse hook (#138)#140

Merged
rohitg00 merged 2 commits intomainfrom
fix/opt-in-auto-compress-138
Apr 14, 2026
Merged

fix: stop burning Claude tokens on every PostToolUse hook (#138)#140
rohitg00 merged 2 commits intomainfrom
fix/opt-in-auto-compress-138

Conversation

@rohitg00
Copy link
Copy Markdown
Owner

@rohitg00 rohitg00 commented Apr 14, 2026

Closes #138.

The bug

@olcor1 reported that installing agentmemory on a fresh Docker stack busted his Claude API token allocation within 20 minutes — the exact opposite of what the tool is supposed to do. Reproduced.

Root cause: `src/functions/observe.ts:129` fired `mem::compress` unconditionally on every PostToolUse hook:

```ts
sdk.triggerVoid("mem::compress", { observationId: obsId, sessionId: payload.sessionId, raw });
```

`mem::compress` in turn calls `compressWithRetry(provider, COMPRESSION_SYSTEM, prompt, validator, 1)` which goes to Claude via the user's `ANTHROPIC_API_KEY`. No rate limit, no batching, no opt-in. An active coding session with 50-200 tool calls per hour burned hundreds of thousands of tokens silently in the background.

The fix

Per-observation LLM compression is now opt-in.

  • New env var `AGENTMEMORY_AUTO_COMPRESS`, default `false`
  • When disabled (default), `observe.ts` builds a zero-LLM synthetic CompressedObservation from raw tool I/O:
    • `title` from tool name
    • `narrative` from truncated `tool_input` + `tool_output` (capped at 400 chars)
    • `files` extracted from `tool_input.file_path` / `path` / `pattern` / etc.
    • `type` inferred from camelCase-normalized tool name (`Read` → `file_read`, `Bash` → `command_run`, `WebFetch` → `web_fetch`, etc.)
    • `confidence: 0.3` (synthetic, lower than LLM-generated so ranking can prefer real summaries when both exist)
  • The synthetic observation is stored in KV and added to the BM25 search index, so `mem::search`, `mem::smart-search`, `mem::context` and the matching MCP tools still work
  • Startup banner now prints either `Auto-compress: OFF (default, Busting token allocation #138)` or a loud warning when opt-in is enabled — the mode is never silent

Migration

If you were relying on LLM-generated summaries, add to `~/.agentmemory/.env`:

```env
AGENTMEMORY_AUTO_COMPRESS=true
```

and restart. Existing compressed observations on disk are untouched.

Files changed

  • `src/config.ts` — `isAutoCompressEnabled()` helper
  • `src/functions/compress-synthetic.ts` (new) — `buildSyntheticCompression()`
  • `src/functions/observe.ts` — gate the compress trigger, write synthetic observation when disabled
  • `src/index.ts` — startup banner
  • `README.md` — new env var in the `.env` section
  • `test/auto-compress.test.ts` (new) — 8 cases

Test plan

  • `npm run build` clean
  • `npm test` — 707 passing (was 699 + 8 new)
  • Default path: no `mem::compress` is triggered
  • Default path: synthetic `CompressedObservation` is stored with correct `type`, `title`, `files`
  • `AGENTMEMORY_AUTO_COMPRESS=true`: `mem::compress` is triggered exactly once
  • `AGENTMEMORY_AUTO_COMPRESS=false` explicitly: no trigger
  • Tool-name-to-type mapping covers `Read`/`Write`/`Edit`/`Bash`/`Grep`/`WebFetch`/`Task`/unknown
  • Long narratives truncated to 400 chars
  • `post_tool_failure` hook → `error` type even without a tool name
  • After release: reproduce @olcor1's scenario on a fresh install with `ANTHROPIC_API_KEY` set and confirm zero Claude calls fire from observing tool use

Release

Bumps to 0.8.8 (main package + `@agentmemory/mcp` shim). CHANGELOG includes a prominent Behavior change banner at the top of the [0.8.8] entry so upgraders see the migration note.

Summary by CodeRabbit

  • New Features

    • Default per-observation compression now uses a zero-LLM "synthetic" summarization; opt-in LLM compression available via AGENTMEMORY_AUTO_COMPRESS (disabled by default).
  • Chores

    • Bumped package versions to 0.8.8.
    • Startup banner now reports whether auto-compress is enabled.
  • Documentation

    • Updated config and migration notes describing token/search impacts and how to opt in.
  • Tests

    • Added tests covering auto-compress behavior and synthetic compression mapping.

mem::observe was firing mem::compress unconditionally on every
PostToolUse hook, which called Claude via the user's ANTHROPIC_API_KEY
to compress each raw observation into a structured summary. On an
active coding session (50-200 tool calls/hour) this silently burned
hundreds of thousands of Claude API tokens within minutes. Reported
by @olcor1 on a fresh Docker install — his token allocation was gone
in 20 minutes.

Make per-observation LLM compression opt-in.

- New env var AGENTMEMORY_AUTO_COMPRESS, default `false`. When false,
  observe.ts builds a zero-LLM synthetic CompressedObservation from
  raw tool I/O (title from toolName, narrative from truncated
  input/output, files extracted from file_path/pattern/etc., type
  inferred from camelCase-normalized tool name) and stores + indexes
  it. Recall and BM25 search still work.
- New src/functions/compress-synthetic.ts with buildSyntheticCompression().
- src/functions/observe.ts gates the compress trigger on the flag and
  writes the synthetic observation + stream events when disabled.
- Startup banner now logs "Auto-compress: OFF (default, #138)" or a
  loud warning when opt-in is on, so the mode is never silent.
- Regression test: test/auto-compress.test.ts covers the default path
  (no mem::compress fired), explicit opt-in, tool-name-to-type
  mapping, file-path extraction, narrative truncation, and the
  post_tool_failure→error mapping.

Bumps to 0.8.8 (main package + shim). 707 tests pass.

**Migration**: users who were relying on LLM-generated summaries set
AGENTMEMORY_AUTO_COMPRESS=true in ~/.agentmemory/.env and restart.
Existing compressed observations on disk are untouched.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 14, 2026

📝 Walkthrough

Walkthrough

Per-observation LLM compression is now opt-in via AGENTMEMORY_AUTO_COMPRESS (default false). By default observations undergo a zero-LLM "synthetic compression" derived from raw tool metadata; an env var restores the prior LLM-on-every-observation behavior. Startup logging and tests added.

Changes

Cohort / File(s) Summary
Release metadata
package.json, packages/mcp/package.json, plugin/.claude-plugin/plugin.json, src/version.ts
Bumped package and module version(s) to 0.8.8 and updated inter-package dependency ranges.
Changelog & README
CHANGELOG.md, README.md
Added 0.8.8 changelog entry and documented new AGENTMEMORY_AUTO_COMPRESS env var with default false.
Config accessor
src/config.ts
Added isAutoCompressEnabled() to read AGENTMEMORY_AUTO_COMPRESS as boolean.
Synthetic compression logic
src/functions/compress-synthetic.ts
New buildSyntheticCompression(raw) converts RawObservation → CompressedObservation via deterministic heuristics (type inference, title/subtitle truncation, narrative assembly/truncation, file extraction, fixed importance/confidence).
Observe flow & indexing
src/functions/observe.ts, src/index.ts
Replaced unconditional mem::compress with opt-in branch: when enabled call mem::compress, otherwise build synthetic compression, persist to KV, add to search index, and emit stream::set events. Added startup banner logging compression mode.
Export/import compatibility
src/types.ts, src/functions/export-import.ts
Added "0.8.8" to ExportData.version union and to mem::import supported versions.
Tests
test/auto-compress.test.ts, test/export-import.test.ts
Added tests for auto-compress gating and synthetic compression behaviors; updated export version expectation to 0.8.8.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Observer
    participant Config
    participant Synthetic as Synthetic<br/>Compressor
    participant LLM as LLM<br/>Compressor
    participant KV as KV Store
    participant Search as Search Index
    participant Stream as Stream Events

    Client->>Observer: mem::observe (PostToolUse raw)
    Observer->>Config: isAutoCompressEnabled()?
    alt AGENTMEMORY_AUTO_COMPRESS = "true"
        Config-->>Observer: true
        Observer->>LLM: trigger mem::compress(observation)
        LLM->>LLM: perform LLM compression
        LLM-->>Observer: CompressedObservation
        Observer->>KV: persist compressed observation
        Observer->>Search: add(compressed)
        Observer->>Stream: emit stream::set (compressed)
    else default (synthetic)
        Config-->>Observer: false
        Observer->>Synthetic: buildSyntheticCompression(raw)
        Synthetic-->>Observer: synthetic CompressedObservation
        Observer->>KV: overwrite observation with synthetic
        Observer->>Search: add(synthetic)
        Observer->>Stream: emit stream::set (compressed)
    end
    Observer-->>Client: return observationId
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰
Hopping through records, light on my feet,
I stitch short stories from tools we meet.
No costly LLMs nibbling the score,
Synthetic crumbs keep tokens galore.
Tiny compressed hops — hooray, save more!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 14.29% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: making per-observation LLM compression opt-in to prevent token consumption on every PostToolUse hook.
Linked Issues check ✅ Passed The pull request fully addresses issue #138 by making automatic LLM compression opt-in via AGENTMEMORY_AUTO_COMPRESS, implementing zero-LLM synthetic compression by default, and surfacing startup state to users.
Out of Scope Changes check ✅ Passed All changes are directly related to addressing the token-burning issue: config management, synthetic compression logic, startup logging, version bumps, tests, and documentation updates.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/opt-in-auto-compress-138

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (4)
CHANGELOG.md (1)

13-13: Tighten phrasing in Line 13.

Consider replacing “the exact opposite” with “the opposite” for cleaner wording.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@CHANGELOG.md` at line 13, The phrasing in the changelog sentence describing
mem::observe→mem::compress behavior is a bit wordy; update the sentence that
reads “which is the exact opposite of what a memory tool should do” to use
“which is the opposite of what a memory tool should do” instead. Edit the line
describing the old mem::observe path (mentions mem::compress and PostToolUse
hook and Claude/ANTHROPIC_API_KEY) to swap the phrase while keeping the
surrounding explanation about synthetic compression and BM25 intact.
src/functions/export-import.ts (1)

178-178: Centralize supported export versions to reduce release drift risk.

The inline literal works, but this list is easy to miss during future bumps. Consider defining a shared SUPPORTED_EXPORT_VERSIONS constant (and deriving VERSION/type unions from it) to keep one source of truth.

Based on learnings: "When bumping version, update: package.json version field, src/version.ts VERSION constant and type union, src/types.ts ExportData version union, src/functions/export-import.ts supportedVersions set, test/export-import.test.ts version assertion, and plugin.json version field".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/functions/export-import.ts` at line 178, Replace the inline Set literal
used in supportedVersions with a single shared constant array/Set named
SUPPORTED_EXPORT_VERSIONS and import/use it in the export-import code (replace
supportedVersions with SUPPORTED_EXPORT_VERSIONS), then derive the exported
VERSION constant and any version type unions (e.g., the ExportData version
union) from that single source so there’s one source of truth; update related
references/tests (e.g., the version assertion in export-import.test.ts) to
import/compare against SUPPORTED_EXPORT_VERSIONS (or a helper like
LATEST_EXPORT_VERSION) instead of hard‑coding literals to avoid release drift.
src/functions/compress-synthetic.ts (1)

18-20: Map task signals to task to preserve type fidelity.

Line 40 currently maps "task" to "subagent", which collapses two distinct ObservationType values and can weaken type-based filtering/analytics.

Proposed adjustment
-  if (hookType === "subagent_stop" || hookType === "task_completed")
-    return "subagent";
+  if (hookType === "subagent_stop") return "subagent";
+  if (hookType === "task_completed") return "task";
@@
-  if (["task", "agent"].some(hasWord)) return "subagent";
+  if (["task"].some(hasWord)) return "task";
+  if (["agent", "subagent"].some(hasWord)) return "subagent";

Also applies to: 40-40

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/functions/compress-synthetic.ts` around lines 18 - 20, The current
mapping collapses distinct ObservationType values by returning "subagent" for
task-related signals; update the mapping logic so that hookType === "task"
returns "task" (preserving ObservationType fidelity) while keeping hookType ===
"subagent_stop" and hookType === "task_completed" mapped to "subagent", and
"notification" mapped to "notification"; locate the conditional that checks
hookType in compress-synthetic.ts and add an explicit branch for "task"
(referencing hookType and ObservationType) so analytics/type-based filtering
remain accurate.
test/auto-compress.test.ts (1)

172-179: Remove redundant dynamic import inside the loop.

The buildSyntheticCompression function is already imported at line 152-154. The import at line 174 inside the loop is unnecessary and adds overhead.

♻️ Suggested simplification
     for (const [name, expectedType] of cases) {
-      const synthetic = (
-        await import("../src/functions/compress-synthetic.js")
-      ).buildSyntheticCompression({ ...base, toolName: name });
+      const synthetic = buildSyntheticCompression({ ...base, toolName: name });
       expect(synthetic.type, `${name} -> ${expectedType}`).toBe(expectedType);
     }
-    // silence unused warning — buildSyntheticCompression is used above
-    expect(typeof buildSyntheticCompression).toBe("function");
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/auto-compress.test.ts` around lines 172 - 179, The loop currently
performs a dynamic import of "../src/functions/compress-synthetic.js" each
iteration; remove that import and call the already-imported
buildSyntheticCompression (the symbol imported earlier at lines ~152-154)
directly inside the loop when creating synthetic via buildSyntheticCompression({
...base, toolName: name }); also remove or adjust the trailing expect that
exists solely to silence an "unused" warning (since buildSyntheticCompression
will now be used), i.e., delete the final expect(typeof
buildSyntheticCompression).toBe("function") line so there is no redundant import
or unused-symbol workaround.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/functions/compress-synthetic.ts`:
- Line 79: The computed fallback variable toolName (const toolName =
raw.toolName ?? raw.hookType) is not being used for type inference; replace
usages of raw.toolName in the type-inference call with the normalized toolName
so the fallback is applied (preventing inference falling back to "other").
Locate the inference call near where raw.toolName is passed (around the
subsequent call at ~line 92) and pass toolName instead; ensure any other places
in this function that use raw.toolName for inference also use the normalized
toolName.

In `@test/auto-compress.test.ts`:
- Around line 65-72: Tests for mem::observe rely on dynamic imports that cache
module state; add vi.resetModules() in the test setup to clear the module cache
between runs so environment-driven config reads (e.g.,
AGENTMEMORY_AUTO_COMPRESS) are re-evaluated. Update the beforeEach (and/or
afterEach) block surrounding the describe("mem::observe auto-compress gate
(`#138`)") to call vi.resetModules() before deleting/setting process.env, ensuring
observe.js and any config modules are re-imported fresh for each test.

---

Nitpick comments:
In `@CHANGELOG.md`:
- Line 13: The phrasing in the changelog sentence describing
mem::observe→mem::compress behavior is a bit wordy; update the sentence that
reads “which is the exact opposite of what a memory tool should do” to use
“which is the opposite of what a memory tool should do” instead. Edit the line
describing the old mem::observe path (mentions mem::compress and PostToolUse
hook and Claude/ANTHROPIC_API_KEY) to swap the phrase while keeping the
surrounding explanation about synthetic compression and BM25 intact.

In `@src/functions/compress-synthetic.ts`:
- Around line 18-20: The current mapping collapses distinct ObservationType
values by returning "subagent" for task-related signals; update the mapping
logic so that hookType === "task" returns "task" (preserving ObservationType
fidelity) while keeping hookType === "subagent_stop" and hookType ===
"task_completed" mapped to "subagent", and "notification" mapped to
"notification"; locate the conditional that checks hookType in
compress-synthetic.ts and add an explicit branch for "task" (referencing
hookType and ObservationType) so analytics/type-based filtering remain accurate.

In `@src/functions/export-import.ts`:
- Line 178: Replace the inline Set literal used in supportedVersions with a
single shared constant array/Set named SUPPORTED_EXPORT_VERSIONS and import/use
it in the export-import code (replace supportedVersions with
SUPPORTED_EXPORT_VERSIONS), then derive the exported VERSION constant and any
version type unions (e.g., the ExportData version union) from that single source
so there’s one source of truth; update related references/tests (e.g., the
version assertion in export-import.test.ts) to import/compare against
SUPPORTED_EXPORT_VERSIONS (or a helper like LATEST_EXPORT_VERSION) instead of
hard‑coding literals to avoid release drift.

In `@test/auto-compress.test.ts`:
- Around line 172-179: The loop currently performs a dynamic import of
"../src/functions/compress-synthetic.js" each iteration; remove that import and
call the already-imported buildSyntheticCompression (the symbol imported earlier
at lines ~152-154) directly inside the loop when creating synthetic via
buildSyntheticCompression({ ...base, toolName: name }); also remove or adjust
the trailing expect that exists solely to silence an "unused" warning (since
buildSyntheticCompression will now be used), i.e., delete the final
expect(typeof buildSyntheticCompression).toBe("function") line so there is no
redundant import or unused-symbol workaround.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 575b85ba-658c-4648-b6ca-3dd61eb76ecf

📥 Commits

Reviewing files that changed from the base of the PR and between b864a7d and 38591a8.

⛔ Files ignored due to path filters (1)
  • package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (14)
  • CHANGELOG.md
  • README.md
  • package.json
  • packages/mcp/package.json
  • plugin/.claude-plugin/plugin.json
  • src/config.ts
  • src/functions/compress-synthetic.ts
  • src/functions/export-import.ts
  • src/functions/observe.ts
  • src/index.ts
  • src/types.ts
  • src/version.ts
  • test/auto-compress.test.ts
  • test/export-import.test.ts

Comment thread src/functions/compress-synthetic.ts
Comment thread test/auto-compress.test.ts
- compress-synthetic.ts:92 — use the fallback toolName for inferType
  so both title and type derive from the same resolved value
- test/auto-compress.test.ts — vi.resetModules() in beforeEach to
  prevent cached observe.js from seeing stale env var state between
  the default, opt-in, and explicit-off test cases
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
src/functions/compress-synthetic.ts (2)

18-19: Map task_completed to task instead of subagent.

Line 18 currently collapses "task_completed" into "subagent", which can reduce precision for task-specific retrieval/analytics.

Suggested change
-  if (hookType === "subagent_stop" || hookType === "task_completed")
-    return "subagent";
+  if (hookType === "subagent_stop") return "subagent";
+  if (hookType === "task_completed") return "task";
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/functions/compress-synthetic.ts` around lines 18 - 19, The current
conditional in compress-synthetic.ts maps hookType === "task_completed" to
"subagent" which reduces task-level precision; update the logic in the function
handling hookType (the branch that currently reads if (hookType ===
"subagent_stop" || hookType === "task_completed") return "subagent";) so that
"task_completed" returns "task" while "subagent_stop" still returns "subagent"
(i.e., split the OR into two checks or add an explicit branch for
"task_completed" returning "task").

7-10: Prefer self-documenting code over WHAT-style banner comments.

This header explains implementation behavior that is already clear from function names and call sites; consider trimming or moving rationale to changelog/docs.

As per coding guidelines, "Use clear, self-documenting variable and function names instead of code comments explaining WHAT".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/functions/compress-synthetic.ts` around lines 7 - 10, Remove the long
WHAT-style banner comment describing the "Zero-LLM compression path" in
src/functions/compress-synthetic.ts and replace it with a concise one-line
summary above the compressSynthetic function (or the function that converts
RawObservation to CompressedObservation) such as "Convert RawObservation to
CompressedObservation using heuristic (no LLM)"; keep variable and function
names (RawObservation, CompressedObservation, compressSynthetic)
self-descriptive, and move the rationale about default behavior and
AGENTMEMORY_AUTO_COMPRESS to the changelog/docs instead of in-file.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/functions/compress-synthetic.ts`:
- Around line 18-19: The current conditional in compress-synthetic.ts maps
hookType === "task_completed" to "subagent" which reduces task-level precision;
update the logic in the function handling hookType (the branch that currently
reads if (hookType === "subagent_stop" || hookType === "task_completed") return
"subagent";) so that "task_completed" returns "task" while "subagent_stop" still
returns "subagent" (i.e., split the OR into two checks or add an explicit branch
for "task_completed" returning "task").
- Around line 7-10: Remove the long WHAT-style banner comment describing the
"Zero-LLM compression path" in src/functions/compress-synthetic.ts and replace
it with a concise one-line summary above the compressSynthetic function (or the
function that converts RawObservation to CompressedObservation) such as "Convert
RawObservation to CompressedObservation using heuristic (no LLM)"; keep variable
and function names (RawObservation, CompressedObservation, compressSynthetic)
self-descriptive, and move the rationale about default behavior and
AGENTMEMORY_AUTO_COMPRESS to the changelog/docs instead of in-file.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 74c92794-915e-49ca-9ee9-86f17e5d1e08

📥 Commits

Reviewing files that changed from the base of the PR and between 38591a8 and c591751.

📒 Files selected for processing (2)
  • src/functions/compress-synthetic.ts
  • test/auto-compress.test.ts
✅ Files skipped from review due to trivial changes (1)
  • test/auto-compress.test.ts

@rohitg00 rohitg00 merged commit 281e10c into main Apr 14, 2026
3 checks passed
@rohitg00 rohitg00 deleted the fix/opt-in-auto-compress-138 branch April 14, 2026 14:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Busting token allocation

1 participant