v3.9.31: Self-Learning Loop — Kernel-Side Dream Cycles, Three-Signal Routing
What's New
The QE fleet's self-learning loop now closes end-to-end. Previously the pattern catalog would converge on whichever pattern won the first routing decision, hooks would never trigger dream cycles, and the Q-learning table filled up with data nothing consumed. This release fixes all of that.
Headline features
-
aqe learning loop-health— new operator dashboard. Shows liveness of the three pipeline components (CapturedExperienceBridge, LearningConsolidationWorker, DreamScheduler) plus a 7-day routing-diversification view (exploit/explore counts, avg quality per bucket, avg mincut multiplier, avg Q-weight). One command to know if the loop is closing. -
Three-signal agent routing (ADR-095). Routing decisions now blend three signals: static scoring (existing), a Q-value bonus from
rl_q_values(sigmoid-normalized, ramps from 0 to 0.4 weight over 20 visits per state×agent), and crypto-random ε-greedy exploration (default 5%) gated by a mincut safety multiplier that dampens exploration 5x when swarm topology is critical. SetAQE_ROUTER_EXPLORATION_RATE=0to disable exploration as a rollback knob. -
Kernel-side dream cycles (ADR-094). Dream cycles used to run inside short-lived hook subprocesses, holding the SQLite write transaction for up to 10 seconds and losing errors to stderr. They now run inside the long-lived kernel via the
DreamScheduler. Hook subprocesses keep only the cheap experience-counter bump. A new boundary-enforcement test in CI prevents future regressions. -
aqe init --autodaemon script now probes for globally-installed agentic-qe directly viarequire.resolve, so the daemon pidfile points at the real MCP server (not thenpxwrapper that exits immediately).
Reliability fixes
dream_insightstable now gets a retention sweep on every worker tick (30-day default; applied insights stay forever as audit trail).- Bridge cursor is monotonic — cross-process races can't regress the cursor and cause duplicate event re-publication.
qe_pattern_usageaudit table is now structurally consistent withqe_patterns.usage_count(single shared writer).- Pre-task bridge now writes regardless of pattern match count, so Q-learning gets signal for low-confidence prompts.
Closes
- #480 (post-edit dream trigger)
- #486 (
mineExperiencesauto-trigger +qe_pattern_usagedual-write) - #487 (Q-learning starved for low-confidence prompts)
- #488 (production-readiness blind spots — Phases 1-4)
Getting Started
```bash
npx agentic-qe@3.9.31 init --auto
aqe learning loop-health
```
See the CHANGELOG and v3.9.31 release notes for full details. Architecture decisions documented in ADR-094 and ADR-095.