This repository contains the experimental framework used in the paper:
“Did You Check the Right Pocket? Store Routing for Memory-Augmented Agents”
The project evaluates routing policies that select which memory stores to retrieve before retrieval and generation. The framework supports both:
- Synthetic routing evaluation (store-selection correctness)
- End-to-end LLM QA evaluation (accuracy vs token cost)
.
├── run_experiments.py # Main entry point
├── benchmark_comprehensive.py # Synthetic memory generator
├── ablation_experiments.py # Feature ablation experiments
├── update_cost_experiments.py # Store-cost experiments
├── metrics_framework.py # Routing metrics (coverage, EM, waste)
├── er21.py # Real LLM QA evaluation
├── prompt1 / prompt2 # Prompt templates
├── requirements_e2e.txt # Dependencies
Create a Python environment and install dependencies:
pip install -r requirements_e2e.txtEvaluates routing coverage, exact match, and waste using synthetic store labels.
python ablation_experiments.py
python benchmark_comprehensive.pyOutputs:
- Feature ablation tables
- Coverage / EM / waste metrics
Runs routing policies on real LLM question answering tasks.
python er21.pyOutputs:
- Accuracy per policy
- Token usage statistics
- Short-context vs long-context results
To run all sythetic experiments sequentially:
python run_experiments.py --all- Uniform retrieval
- Oracle routing
- Fixed subset policies (e.g., STM+Sum+LTM)
- Hybrid heuristic routing
- Ablated feature routing variants
The framework evaluates routing using:
- Coverage: required stores retrieved
- Exact Match (EM): exact store subset selected
- Waste: unnecessary stores retrieved
- Token Cost: context tokens inserted into prompts
- QA Accuracy: substring answer match
- Synthetic routing labels are derived from query taxonomies
- Store contents are generated deterministically from seed values
- Temperature is set to 0 for LLM evaluation
- Bootstrap resampling is used for statistical significance
@inproceedings{store-routing-2026,
title={Did You Check the Right Pocket? Store Routing for Memory-Augmented Agents},
author={Anonymous},
year={2026}
}
For questions or issues, please open a repository issue or contact the authors.