v3.9.36: Close the self-learning routing Q-loop (#499)
What's New
Closes the self-learning routing loop reported by Jordi in #499. Before this release, the route hook's qWeight was structurally pinned at 0: the producer (updateHookRouterQValue) only fired from post-task gated on the pre-task bridge, so on real projects routing_outcomes reached 100+ rows while rl_q_values stayed at ~2 — the consumer side ADR-095 wired up had nothing to read. Q-learning never influenced routing.
ADR-096 documents the three-part fix that ships in this release:
post-routenow trainsrl_q_values— the high-volume routing surface that fires on every user prompt can now train the table it reads. Writer and reader address byte-identical rows via the shared derivation helpers.post-taskno longer drops Q-updates when the pre-task bridge is absent — direct Bash/Edit sessions and Task-tool runs where pre-task didn't fire now train Q from the task description. New--descriptionoption onaqe hooks post-task.- Q-state collapsed from 4-dim to 3-dim — the previous length-keyed
complexityBucketfragmented semantically identical tasks across cells. The new(taskType|priority|domain)key converges ~11× faster.
Existing rl_q_values rows under the previous 4-dim key shape become orphans; field reports show real installs have ~2 such rows at most, so no migration ships. If your install has substantial 4-dim data, see docs/releases/v3.9.36.md for a one-shot UPDATE statement.
Thanks to Jordi Izquierdo for the exhaustive root-cause analysis in #499 and the verified external workaround that confirmed the producer/consumer alignment approach before any code change shipped.
Getting Started
npx agentic-qe init --autoSee CHANGELOG and v3.9.36 release notes for full details.