Skip to content

research: measure token cost impact of agentmemory-first tool usage pattern #28

@robotrocketscience

Description

@robotrocketscience

Question

Does the locked belief "use agentmemory mem: tools first, then use your other tools to fill in the gaps" reduce token costs and improve task completion efficiency?

Context

Belief bd26a7791c68 was locked on 2026-04-21. Before this, sessions used Explore agents and raw SQLite queries to investigate issues. After, sessions should use mcp__agentmemory__search first, then targeted file reads/greps to fill gaps.

What to measure

Pre-belief (sessions before 2026-04-21)

  • Agent tool calls per issue resolution (Explore agents, Grep, Read, Bash)
  • Estimated tokens consumed per issue (from session metrics)
  • Time to diagnosis per issue
  • Number of "dead end" searches (tools called but results not used)

Post-belief (sessions after 2026-04-21)

  • Same metrics as above
  • How often mcp__agentmemory__search was the FIRST tool called
  • How often search results were sufficient vs needed gap-filling
  • Gap-filling tool count (Read, Grep, Bash after search)

Preliminary data from this session

This session resolved 6 issues and filed 6 new ones. The agentmemory-first pattern was enforced mid-session. Compare:

Before enforcement (GH-16/18/19/20 triage):

  • Spawned 3 Explore agents (~60K tokens each)
  • Raw SQLite queries for telemetry investigation
  • Multiple Grep/Glob searches

After enforcement (GH-17/21 diagnosis):

  • mcp__agentmemory__search returned relevant beliefs with context
  • Targeted file reads to fill specific gaps (correction_detection.py, hook_search.py)
  • Fewer total tool calls for equivalent diagnostic depth

Data sources

  • sessions table: beliefs_created, searches_performed, retrieval_tokens
  • hook_injections table: chars_injected, beliefs_injected per prompt
  • tests table: used/ignored outcomes per belief
  • Conversation logs: tool call sequences and token estimates

Deliverable

Structured comparison report with before/after metrics. Even with limited data (1 session), directional signal matters.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions