Skip to content

v3.9.31: Self-Learning Loop — Kernel-Side Dream Cycles, Three-Signal Routing

Choose a tag to compare

@proffesor-for-testing proffesor-for-testing released this 15 May 17:42
· 154 commits to main since this release
cfe4a15

What's New

The QE fleet's self-learning loop now closes end-to-end. Previously the pattern catalog would converge on whichever pattern won the first routing decision, hooks would never trigger dream cycles, and the Q-learning table filled up with data nothing consumed. This release fixes all of that.

Headline features

  • aqe learning loop-health — new operator dashboard. Shows liveness of the three pipeline components (CapturedExperienceBridge, LearningConsolidationWorker, DreamScheduler) plus a 7-day routing-diversification view (exploit/explore counts, avg quality per bucket, avg mincut multiplier, avg Q-weight). One command to know if the loop is closing.

  • Three-signal agent routing (ADR-095). Routing decisions now blend three signals: static scoring (existing), a Q-value bonus from rl_q_values (sigmoid-normalized, ramps from 0 to 0.4 weight over 20 visits per state×agent), and crypto-random ε-greedy exploration (default 5%) gated by a mincut safety multiplier that dampens exploration 5x when swarm topology is critical. Set AQE_ROUTER_EXPLORATION_RATE=0 to disable exploration as a rollback knob.

  • Kernel-side dream cycles (ADR-094). Dream cycles used to run inside short-lived hook subprocesses, holding the SQLite write transaction for up to 10 seconds and losing errors to stderr. They now run inside the long-lived kernel via the DreamScheduler. Hook subprocesses keep only the cheap experience-counter bump. A new boundary-enforcement test in CI prevents future regressions.

  • aqe init --auto daemon script now probes for globally-installed agentic-qe directly via require.resolve, so the daemon pidfile points at the real MCP server (not the npx wrapper that exits immediately).

Reliability fixes

  • dream_insights table now gets a retention sweep on every worker tick (30-day default; applied insights stay forever as audit trail).
  • Bridge cursor is monotonic — cross-process races can't regress the cursor and cause duplicate event re-publication.
  • qe_pattern_usage audit table is now structurally consistent with qe_patterns.usage_count (single shared writer).
  • Pre-task bridge now writes regardless of pattern match count, so Q-learning gets signal for low-confidence prompts.

Closes

  • #480 (post-edit dream trigger)
  • #486 (mineExperiences auto-trigger + qe_pattern_usage dual-write)
  • #487 (Q-learning starved for low-confidence prompts)
  • #488 (production-readiness blind spots — Phases 1-4)

Getting Started

```bash
npx agentic-qe@3.9.31 init --auto
aqe learning loop-health
```

See the CHANGELOG and v3.9.31 release notes for full details. Architecture decisions documented in ADR-094 and ADR-095.