Skip to content

v3.9.36: Close the self-learning routing Q-loop (#499)

Choose a tag to compare

@proffesor-for-testing proffesor-for-testing released this 19 May 14:54
· 119 commits to main since this release
6b04dd9

What's New

Closes the self-learning routing loop reported by Jordi in #499. Before this release, the route hook's qWeight was structurally pinned at 0: the producer (updateHookRouterQValue) only fired from post-task gated on the pre-task bridge, so on real projects routing_outcomes reached 100+ rows while rl_q_values stayed at ~2 — the consumer side ADR-095 wired up had nothing to read. Q-learning never influenced routing.

ADR-096 documents the three-part fix that ships in this release:

  1. post-route now trains rl_q_values — the high-volume routing surface that fires on every user prompt can now train the table it reads. Writer and reader address byte-identical rows via the shared derivation helpers.
  2. post-task no longer drops Q-updates when the pre-task bridge is absent — direct Bash/Edit sessions and Task-tool runs where pre-task didn't fire now train Q from the task description. New --description option on aqe hooks post-task.
  3. Q-state collapsed from 4-dim to 3-dim — the previous length-keyed complexityBucket fragmented semantically identical tasks across cells. The new (taskType|priority|domain) key converges ~11× faster.

Existing rl_q_values rows under the previous 4-dim key shape become orphans; field reports show real installs have ~2 such rows at most, so no migration ships. If your install has substantial 4-dim data, see docs/releases/v3.9.36.md for a one-shot UPDATE statement.

Thanks to Jordi Izquierdo for the exhaustive root-cause analysis in #499 and the verified external workaround that confirmed the producer/consumer alignment approach before any code change shipped.

Getting Started

npx agentic-qe init --auto

See CHANGELOG and v3.9.36 release notes for full details.