Agent Trace Runtime

A small benchmark and replay harness that argues agent workloads should be optimized at the serving layer before the model. Runs on one rented RTX 3090.

What's in here:

A FastAPI trace proxy and replay harness.
Six runtime policies: naive, exact-prefix-cache, prefix-aware-routing, prompt-compaction, max-token-prediction, and a learned policy model.
720 synthetic public agent traces, 40 SWE-bench Verified prompts, 120 OpenHands trajectory shapes.
13 JSON reports from real GPU runs and training jobs.
A metadata MLP router that hits 87.5% held-out accuracy without touching private content.
An interactive paper in web/ (Svelte 5 + Vite) that reads the same reports the Python code emits.

Headline numbers

Public SWE-bench Verified prompts on Qwen3-4B-AWQ, one RTX 3090:

Stack	Naive avg	Shared-prefix avg	Cut
vLLM	644 ms	198 ms	69.33%
SGLang	976 ms	133 ms	86.38%

Same weights, same prompts. Difference is runtime policy. Full table and limitations in RESULTS.md and WRITEUP.md.

Layout

src/agent_trace_runtime/   Python package: proxy, replay, policies, router, schema, FastAPI app
templates/, static/        Server-rendered HTMX dashboard
web/                       Svelte 5 + Vite static site
script/                    Data prep, benchmark drivers, training entry points
data/                      Synthetic + public agent traces (NDJSON / JSONL)
reports/                   JSON evidence from every run
tests/                     pytest tests for replay, privacy, web app
dashboard/                 Static zero-runtime dashboard (fallback)
goal.md                    What was being built
WRITEUP.md                 Paper-style narrative
RESULTS.md                 Per-run measurements, including negative results
BENCHMARKS.md, RUNBOOK.md  How to reproduce
DATA.md, PRIVACY.md        Data sources and privacy stance

Reproduce

Tested on Python 3.12, macOS and Linux.

python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

PYTHONPATH=src python3 -m pytest tests

PYTHONPATH=src python3 -m uvicorn agent_trace_runtime.web_app:app --host 127.0.0.1 --port 8081
# open http://127.0.0.1:8081

For the GPU runs you need a CUDA box. RUNBOOK.md has the exact serving commands.

Interactive paper

cd web
npm install
npm run dev      # local
npm run build    # static dist/, deploys anywhere

Sections: Overview (hero token race), Results (latency race + sweep), Trace Explorer (720 traces), Router Lab (live policy selection), Agent Shapes (OpenHands), Paper (text), Data Appendix.

Models

LoRA adapters and the metadata MLP report their scores in reports/policy_*.json and reports/metadata_policy_classifier_report.json. Adapter weights are not committed; regenerate them from the public SFT data via the training scripts in script/.

Data

Everything in data/ and reports/ is public or synthetic:

data/synthetic_traces.ndjson - 720 generated app/change traces.
data/swe_bench_verified_tasks.ndjson - 40 tasks sampled from SWE-bench Verified.
data/policy_sft_public.jsonl - 760 public-only SFT rows.
reports/openhands_trace_shapes.json - shape stats over 120 public OpenHands trajectories.

No private content anywhere. See PRIVACY.md.

Hardware

1x NVIDIA RTX 3090 (24 GB VRAM, ~805 GB/s)
Xeon Gold 6138
vLLM 0.21.0, SGLang 0.5.9
Qwen3-4B-AWQ primary, Qwen3-8B-AWQ smoke
One Vast.ai instance, single-GPU, single-process

License

MIT.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agent Trace Runtime

Headline numbers

Layout

Reproduce

Interactive paper

Models

Data

Hardware

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
reports		reports
script		script
src/agent_trace_runtime		src/agent_trace_runtime
static		static
templates		templates
tests		tests
web		web
.gitignore		.gitignore
BENCHMARKS.md		BENCHMARKS.md
DATA.md		DATA.md
LICENSE		LICENSE
PRIVACY.md		PRIVACY.md
README.md		README.md
RESULTS.md		RESULTS.md
RUNBOOK.md		RUNBOOK.md
WRITEUP.md		WRITEUP.md
goal.md		goal.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Agent Trace Runtime

Headline numbers

Layout

Reproduce

Interactive paper

Models

Data

Hardware

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages