Skip to content

v0.3.0 - loop detection

Choose a tag to compare

@kirder24-code kirder24-code released this 07 Jun 01:15
· 2 commits to main since this release

What's new

Loop detection - catches the agent that looks productive but is circling the same failure.

The hard case in stuck-detection is the agent that keeps producing output yet is really retrying the same dead end, just reworded each time. Plain hashing misses it: the prompt is similar but never byte-identical between loops, so the hash changes every turn and nothing trips.

Runcap now closes that gap from the one place that can see it - the gateway, which observes every request the agent sends in real time.

How it works

  • Each request's conversation shape is compared against the recent run with a line-similarity ratio (the same line-diff primitive the v0.2.2 delta-encoder already uses).
  • When N prompts in a row are near-identical (default: 3 prompts at 92%+ similarity) while the conversation never moves forward, the run is flagged loop.looping.
  • It surfaces a warning in runcap status, attaches a loop field to every gateway event, and fires an alert - so you can step in before the loop burns more budget.
  • Pure Node, no model call, single-digit ms. Tune or disable with AIM_LOOP_DETECT=off.

Proven end to end (not estimated)

Four reworded "let me try X instead" prompts pushed through the live gateway:

Prompt # repeats similarity looping
1 0 0% false
2 1 97.7% false
3 2 97.7% false
4 3 97.7% true

The signal escalates (0 → 1 → 2 → 3) instead of firing on a single slow step, and genuine progress or one long legit step never trips it.

How this is better than before

detectStuck in v0.2.2 was outcome-based: it scored a run after it ended (non-zero exit code, parsed errors, zero git diff). That catches obvious dead ends but is blind to an in-flight agent that is still "working" while going nowhere. Loop detection adds the missing behavioral, in-flight signal on top of it - you learn the agent is circling during the run, not in the post-mortem.

Honesty note

This is a calculated signal, not a proven dollar-saving like the delta-encoder. It tells you "the agent has sent N near-identical prompts in a row with no progress" so you can intervene. The token savings claim from v0.2.2 (37.9% on a real call) is unchanged and still the proven number.

Tests

5 new tests in scripts/loop-test.mjs, wired into npm test:

  • reworded same-failure attempts flagged as a loop
  • genuine progress NOT flagged
  • single long step NOT flagged
  • threshold boundary (2 repeats under the bar, not yet a loop)
  • OpenAI / Anthropic request-shape normalization

Install

npm install -g runcap@0.3.0

Full changelog: v0.2.2...v0.3.0