A downstream consumer (ruflo, ruvnet/ruflo#2222) found that SONA's trained adapter never changes a routing/recall decision — empirically Δ=0 after ~200 adapts — and attributed it to "@ruvector/ruvllm MicroLoRA apply() being inert." I cloned ruvnet/ruvector (c2089c4, 2026-05-28) and verified against the actual source. The real picture is more specific, and worth correcting: inference seams exist and work (applyLora, MicroLoRA::forward, LoraAdapter.forward) — this is not "no forward path." The gap is that the learn→adapt loop is not wired through the JS/WASM entry points a consumer actually calls, and even the Rust path only adapts under conditions a typical consumer won't hit. Filing so the seam isn't lost.
Findings (source + reproduction)
1. WasmSonaEngine::learn_from_feedback is a no-op — crates/sona/src/wasm.rs:183-192. It computes a reward and console.logs it, then returns; it never builds a trajectory, accumulates a gradient, or touches a weight:
pub fn learn_from_feedback(&self, success: bool, latency_ms: f32, quality: f32) {
let reward = if success { quality } else { -quality };
web_sys::console::log_1(&format!("Feedback: ... reward={}", reward).into());
}
This is the natural JS/WASM "I have an outcome, learn from it" entry point — calling it any number of times trains nothing.
2. The pure-JS SonaCoordinator.processInstantLearning is an empty stub — npm/packages/ruvllm/src/sona.js:449-452: // In full implementation, this updates LoRA weights with no body. So the JS-package SONA coordinator never adapts LoRA either. (Its LoraAdapter.forward/backward in lora.ts do work — but SonaCoordinator never instantiates or calls them.)
3. The Rust trajectory path adapts only on multi-step, varying-reward trajectories. LearningSignal::estimate_gradient (crates/sona/src/types.rs:69-101) is REINFORCE with a mean-reward baseline. A single-step trajectory — the common "one outcome per task" shape — has advantage = reward − mean(reward) = 0, giving a zero gradient. Reproduced: 200 single-step adapts → Δ = 0 exactly.
4. MicroLoRA::accumulate_gradient only writes grad_up — crates/sona/src/lora.rs:192-229. down_proj/grad_down is allocated, zeroed, reset, never updated, so adaptation is up-projection-only (rank-deficient). Reproduced: down_proj Δ=0, up_proj |w|=0.566.
5. (Bonus correctness smell) estimate_gradient L2-normalizes before returning, so a uniform-reward trajectory (true advantage ≈ 0) has its f32 baseline residue amplified into a unit-norm gradient — a full-magnitude update from a no-information signal. Reproduced: ||gradient_estimate|| = 1.0, spurious Δ = 0.0125.
Reproduction
cargo test -p ruvector-sona --test repro_delta_zero -- --nocapture (drives the real public API — MicroLoRA + LearningSignal::from_trajectory + TrajectoryBuilder, no mocks):
[single-step feedback] Δ after 200 adapts = 0e0 (inert)
[uniform-reward 3-step] ||grad||=1.0, spurious Δ = 1.26e-2 (FP residue amplified)
[varying-reward 2-step] Δ after 200 adapts = 1.71e-2 (works — control)
[freeze check] down_proj Δ = 0, up_proj |w| = 0.566
Full reproduction test (drop in crates/sona/tests/repro_delta_zero.rs)
use ruvector_sona::{LearningSignal, MicroLoRA, TrajectoryBuilder};
const DIM: usize = 16;
const RANK: usize = 2;
const LR: f32 = 0.01;
fn forward_probe(lora: &MicroLoRA, input: &[f32]) -> Vec<f32> {
let mut out = vec![0.0f32; input.len()];
lora.forward(input, &mut out);
out
}
fn l2(a: &[f32], b: &[f32]) -> f32 {
a.iter().zip(b).map(|(x, y)| (x - y) * (x - y)).sum::<f32>().sqrt()
}
/// The natural "task finished, here is its quality" pattern: one step, one final score.
#[test]
fn single_step_feedback_is_inert() {
let mut lora = MicroLoRA::new(DIM, RANK);
let probe: Vec<f32> = (0..DIM).map(|i| (i as f32 * 0.1).sin()).collect();
let baseline = forward_probe(&lora, &probe);
for n in 0..200u64 {
let mut b = TrajectoryBuilder::new(n, probe.clone());
b.add_step(probe.clone(), vec![1.0; DIM], 1.0);
let signal = LearningSignal::from_trajectory(&b.build(0.95));
lora.accumulate_gradient(&signal);
lora.apply_accumulated(LR);
}
let delta = l2(&baseline, &forward_probe(&lora, &probe));
println!("[single-step feedback] Δ after 200 adapts = {delta:e}");
assert_eq!(delta, 0.0);
}
/// Uniform per-step rewards: true advantage 0, but L2-normalized FP residue → unit-norm gradient.
#[test]
fn uniform_reward_amplifies_fp_residue() {
let mut lora = MicroLoRA::new(DIM, RANK);
let probe: Vec<f32> = (0..DIM).map(|i| (i as f32 * 0.2).cos()).collect();
let baseline = forward_probe(&lora, &probe);
let mut b0 = TrajectoryBuilder::new(0, probe.clone());
b0.add_step(probe.clone(), vec![1.0; DIM], 0.9);
b0.add_step(probe.clone(), vec![1.0; DIM], 0.9);
b0.add_step(probe.clone(), vec![1.0; DIM], 0.9);
let sig0 = LearningSignal::from_trajectory(&b0.build(0.9));
let gnorm: f32 = sig0.gradient_estimate.iter().map(|x| x * x).sum::<f32>().sqrt();
println!("[uniform-reward] true advantage=0, yet ||gradient_estimate|| = {gnorm:e}");
for n in 0..200u64 {
let mut b = TrajectoryBuilder::new(n, probe.clone());
b.add_step(probe.clone(), vec![1.0; DIM], 0.9);
b.add_step(probe.clone(), vec![1.0; DIM], 0.9);
b.add_step(probe.clone(), vec![1.0; DIM], 0.9);
let signal = LearningSignal::from_trajectory(&b.build(0.9));
lora.accumulate_gradient(&signal);
lora.apply_accumulated(LR);
}
let delta = l2(&baseline, &forward_probe(&lora, &probe));
println!("[uniform-reward 3-step] spurious Δ after 200 adapts = {delta:e}");
assert!(delta > 0.0);
}
/// Control: varying per-step rewards DO adapt — the mechanism works when fed a real gradient.
#[test]
fn varying_reward_multistep_adapts() {
let mut lora = MicroLoRA::new(DIM, RANK);
let probe: Vec<f32> = (0..DIM).map(|i| (i as f32 * 0.3).sin() + 0.2).collect();
let baseline = forward_probe(&lora, &probe);
for n in 0..200u64 {
let mut b = TrajectoryBuilder::new(n, probe.clone());
b.add_step(probe.clone(), vec![1.0; DIM], 0.1);
b.add_step(probe.clone(), vec![1.0; DIM], 0.9);
let signal = LearningSignal::from_trajectory(&b.build(0.8));
lora.accumulate_gradient(&signal);
lora.apply_accumulated(LR);
}
let delta = l2(&baseline, &forward_probe(&lora, &probe));
println!("[varying-reward 2-step] Δ after 200 adapts = {delta:e}");
assert!(delta > 0.0);
}
/// down_proj is never adapted regardless of signal — only up_proj moves.
#[test]
fn down_proj_is_frozen() {
let mut lora = MicroLoRA::new(DIM, RANK);
let (down_before, _) = { let (d, u) = lora.get_weights(); (d.clone(), u.clone()) };
for n in 0..50u64 {
let mut b = TrajectoryBuilder::new(n, vec![0.5; DIM]);
b.add_step(vec![0.5; DIM], vec![1.0; DIM], 0.1);
b.add_step(vec![0.5; DIM], vec![1.0; DIM], 0.9);
lora.accumulate_gradient(&LearningSignal::from_trajectory(&b.build(0.8)));
lora.apply_accumulated(LR);
}
let (down_after, up_after) = lora.get_weights();
let down_delta = l2(&down_before, down_after);
let up_delta: f32 = up_after.iter().map(|x| x * x).sum::<f32>().sqrt();
println!("[freeze check] down_proj Δ = {down_delta:e} up_proj |w| = {up_delta:e}");
assert_eq!(down_delta, 0.0);
assert!(up_delta > 0.0);
}
Net effect for consumers
A JS/WASM consumer driving SONA via the documented learnFromFeedback API — or via single-outcome trajectories — observes zero adaptation; applyLora/applyMicroLora then return the untrained (up_proj = 0) transform. The forward/inference seam exists; the learn→inference loop is just not connected through the bindings consumers reach for.
Suggested directions
- Wire
learn_from_feedback to build a single-step-safe LearningSignal (e.g. fall back to a query_embedding-based gradient when steps < 2, or skip baseline subtraction for a single step) and accumulate + flush.
- Implement
processInstantLearning in @ruvector/ruvllm, or mark it explicitly unimplemented so consumers don't assume it adapts.
- Adapt
down_proj too, or document up-only adaptation as intentional.
- Guard
estimate_gradient against amplifying a near-zero gradient (don't normalize when the pre-norm magnitude is ≈ 0).
Related but distinct: #516 (@ruvector/sona published without build output → MODULE_NOT_FOUND) is a packaging issue; this is about the no-op learn path when the module is loaded and running.
Environment: ruvnet/ruvector@c2089c4, Rust 1.95, crate ruvector-sona 0.2.0. Downstream context: ruflo 3.10.10 / Node 26.
A downstream consumer (ruflo, ruvnet/ruflo#2222) found that SONA's trained adapter never changes a routing/recall decision — empirically
Δ=0after ~200 adapts — and attributed it to "@ruvector/ruvllmMicroLoRAapply()being inert." I clonedruvnet/ruvector(c2089c4, 2026-05-28) and verified against the actual source. The real picture is more specific, and worth correcting: inference seams exist and work (applyLora,MicroLoRA::forward,LoraAdapter.forward) — this is not "no forward path." The gap is that the learn→adapt loop is not wired through the JS/WASM entry points a consumer actually calls, and even the Rust path only adapts under conditions a typical consumer won't hit. Filing so the seam isn't lost.Findings (source + reproduction)
1.
WasmSonaEngine::learn_from_feedbackis a no-op —crates/sona/src/wasm.rs:183-192. It computes a reward andconsole.logs it, then returns; it never builds a trajectory, accumulates a gradient, or touches a weight:This is the natural JS/WASM "I have an outcome, learn from it" entry point — calling it any number of times trains nothing.
2. The pure-JS
SonaCoordinator.processInstantLearningis an empty stub —npm/packages/ruvllm/src/sona.js:449-452:// In full implementation, this updates LoRA weightswith no body. So the JS-package SONA coordinator never adapts LoRA either. (ItsLoraAdapter.forward/backwardinlora.tsdo work — butSonaCoordinatornever instantiates or calls them.)3. The Rust trajectory path adapts only on multi-step, varying-reward trajectories.
LearningSignal::estimate_gradient(crates/sona/src/types.rs:69-101) is REINFORCE with a mean-reward baseline. A single-step trajectory — the common "one outcome per task" shape — hasadvantage = reward − mean(reward) = 0, giving a zero gradient. Reproduced: 200 single-step adapts → Δ = 0 exactly.4.
MicroLoRA::accumulate_gradientonly writesgrad_up—crates/sona/src/lora.rs:192-229.down_proj/grad_downis allocated, zeroed, reset, never updated, so adaptation is up-projection-only (rank-deficient). Reproduced:down_proj Δ=0,up_proj |w|=0.566.5. (Bonus correctness smell)
estimate_gradientL2-normalizes before returning, so a uniform-reward trajectory (true advantage ≈ 0) has its f32 baseline residue amplified into a unit-norm gradient — a full-magnitude update from a no-information signal. Reproduced:||gradient_estimate|| = 1.0, spuriousΔ = 0.0125.Reproduction
cargo test -p ruvector-sona --test repro_delta_zero -- --nocapture(drives the real public API —MicroLoRA+LearningSignal::from_trajectory+TrajectoryBuilder, no mocks):Full reproduction test (drop in
crates/sona/tests/repro_delta_zero.rs)Net effect for consumers
A JS/WASM consumer driving SONA via the documented
learnFromFeedbackAPI — or via single-outcome trajectories — observes zero adaptation;applyLora/applyMicroLorathen return the untrained (up_proj = 0) transform. The forward/inference seam exists; the learn→inference loop is just not connected through the bindings consumers reach for.Suggested directions
learn_from_feedbackto build a single-step-safeLearningSignal(e.g. fall back to aquery_embedding-based gradient whensteps < 2, or skip baseline subtraction for a single step) and accumulate + flush.processInstantLearningin@ruvector/ruvllm, or mark it explicitly unimplemented so consumers don't assume it adapts.down_projtoo, or document up-only adaptation as intentional.estimate_gradientagainst amplifying a near-zero gradient (don't normalize when the pre-norm magnitude is ≈ 0).Related but distinct: #516 (
@ruvector/sonapublished without build output →MODULE_NOT_FOUND) is a packaging issue; this is about the no-op learn path when the module is loaded and running.Environment:
ruvnet/ruvector@c2089c4, Rust 1.95, crateruvector-sona0.2.0. Downstream context: ruflo 3.10.10 / Node 26.