research: measure token cost impact of agentmemory-first tool usage pattern

## Question

Does the locked belief "use agentmemory mem:<tool> tools first, then use your other tools to fill in the gaps" reduce token costs and improve task completion efficiency?

## Context

Belief `bd26a7791c68` was locked on 2026-04-21. Before this, sessions used Explore agents and raw SQLite queries to investigate issues. After, sessions should use `mcp__agentmemory__search` first, then targeted file reads/greps to fill gaps.

## What to measure

### Pre-belief (sessions before 2026-04-21)
- Agent tool calls per issue resolution (Explore agents, Grep, Read, Bash)
- Estimated tokens consumed per issue (from session metrics)
- Time to diagnosis per issue
- Number of "dead end" searches (tools called but results not used)

### Post-belief (sessions after 2026-04-21)
- Same metrics as above
- How often `mcp__agentmemory__search` was the FIRST tool called
- How often search results were sufficient vs needed gap-filling
- Gap-filling tool count (Read, Grep, Bash after search)

### Preliminary data from this session

This session resolved 6 issues and filed 6 new ones. The agentmemory-first pattern was enforced mid-session. Compare:

**Before enforcement (GH-16/18/19/20 triage):**
- Spawned 3 Explore agents (~60K tokens each)
- Raw SQLite queries for telemetry investigation
- Multiple Grep/Glob searches

**After enforcement (GH-17/21 diagnosis):**
- `mcp__agentmemory__search` returned relevant beliefs with context
- Targeted file reads to fill specific gaps (correction_detection.py, hook_search.py)
- Fewer total tool calls for equivalent diagnostic depth

## Data sources

- `sessions` table: beliefs_created, searches_performed, retrieval_tokens
- `hook_injections` table: chars_injected, beliefs_injected per prompt
- `tests` table: used/ignored outcomes per belief
- Conversation logs: tool call sequences and token estimates

## Deliverable

Structured comparison report with before/after metrics. Even with limited data (1 session), directional signal matters.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

research: measure token cost impact of agentmemory-first tool usage pattern #28

Question

Context

What to measure

Pre-belief (sessions before 2026-04-21)

Post-belief (sessions after 2026-04-21)

Preliminary data from this session

Data sources

Deliverable

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

research: measure token cost impact of agentmemory-first tool usage pattern #28

Description

Question

Context

What to measure

Pre-belief (sessions before 2026-04-21)

Post-belief (sessions after 2026-04-21)

Preliminary data from this session

Data sources

Deliverable

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions