Skip to content

joshuaisaact/pointer-experiments

Repository files navigation

Pointer-Based Context Management Experiments

When long-running AI agents hit context limits, the standard approach is summarization -- an LLM rewrites the conversation as a condensed summary. This works, but it's lossy, and the loss compounds. By the third compaction you're working with a summary of a summary of a summary.

This repo tests an alternative: replace conversation content with lightweight pointers (chunk IDs with previews) and give the agent a tool to retrieve the originals on demand. Like a memory hierarchy -- hot data in context, cold data in storage, all addressable.

Results

Five experiments, run against Claude Sonnet 4.6 with eight synthetic session fixtures. Full writeup in FINDINGS.md, but the short version:

  • Models reliably dereference pointers. 100% dereference rate with a prescriptive prompt and a list_refs tool.
  • Pointers beat summarization on cascaded compaction. At one compaction the difference is 1 point (85% vs 84% grounding). At two compactions it's 18 points (92% vs 74%). Summaries degrade on re-compaction; pointers don't, because the ref store is lossless.
  • Hybrid (summary + pointers) is a trap. The summary anchors the model on a lossy interpretation, undermining the retrieval. This has implications for any RAG system that prepends context summaries before retrieved chunks.
  • Good ref descriptions matter more than good prompts. "File: src/config.ts" beats "Tool result for tool_05". The model decides what to retrieve based on what it sees in list_refs, not what you tell it in the system prompt.

See PLAN.md for the hypothesis and experiment designs.

Running

bun install
cp .env.example .env
# Add your Anthropic API key to .env

bun run exp:format      # Experiment 1: pointer format comparison
bun run exp:interface   # Experiment 2: retrieval interface comparison
bun run exp:strategy    # Experiment 3: compaction strategy comparison
bun run exp:cascade     # Experiment 4: cascaded compaction
bun test                # OpenCode grounding test

Experiments need session fixtures in fixtures/sessions/. Results go to results/.

License

MIT

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors