SONA learn→inference loop unwired at the JS/WASM boundary: learn_from_feedback is a no-op; MicroLoRA only adapts on multi-step varying-reward trajectories

A downstream consumer (ruflo, [ruvnet/ruflo#2222](https://github.com/ruvnet/ruflo/issues/2222)) found that SONA's trained adapter never changes a routing/recall decision — empirically `Δ=0` after ~200 adapts — and attributed it to "`@ruvector/ruvllm` MicroLoRA `apply()` being inert." I cloned `ruvnet/ruvector` (`c2089c4`, 2026-05-28) and verified against the actual source. The real picture is more specific, and worth correcting: **inference seams exist and work** (`applyLora`, `MicroLoRA::forward`, `LoraAdapter.forward`) — this is *not* "no forward path." The gap is that the **learn→adapt loop is not wired through the JS/WASM entry points a consumer actually calls**, and even the Rust path only adapts under conditions a typical consumer won't hit. Filing so the seam isn't lost.

### Findings (source + reproduction)

**1. `WasmSonaEngine::learn_from_feedback` is a no-op** — `crates/sona/src/wasm.rs:183-192`. It computes a reward and `console.log`s it, then returns; it never builds a trajectory, accumulates a gradient, or touches a weight:

```rust
pub fn learn_from_feedback(&self, success: bool, latency_ms: f32, quality: f32) {
    let reward = if success { quality } else { -quality };
    web_sys::console::log_1(&format!("Feedback: ... reward={}", reward).into());
}
```

This is the natural JS/WASM "I have an outcome, learn from it" entry point — calling it any number of times trains nothing.

**2. The pure-JS `SonaCoordinator.processInstantLearning` is an empty stub** — `npm/packages/ruvllm/src/sona.js:449-452`: `// In full implementation, this updates LoRA weights` with no body. So the JS-package SONA coordinator never adapts LoRA either. (Its `LoraAdapter.forward`/`backward` in `lora.ts` *do* work — but `SonaCoordinator` never instantiates or calls them.)

**3. The Rust trajectory path adapts only on multi-step, *varying-reward* trajectories.** `LearningSignal::estimate_gradient` (`crates/sona/src/types.rs:69-101`) is REINFORCE with a mean-reward baseline. A **single-step** trajectory — the common "one outcome per task" shape — has `advantage = reward − mean(reward) = 0`, giving a zero gradient. Reproduced: 200 single-step adapts → **Δ = 0 exactly**.

**4. `MicroLoRA::accumulate_gradient` only writes `grad_up`** — `crates/sona/src/lora.rs:192-229`. `down_proj`/`grad_down` is allocated, zeroed, reset, **never updated**, so adaptation is up-projection-only (rank-deficient). Reproduced: `down_proj Δ=0`, `up_proj |w|=0.566`.

**5. (Bonus correctness smell) `estimate_gradient` L2-normalizes before returning**, so a uniform-reward trajectory (true advantage ≈ 0) has its f32 baseline residue **amplified into a unit-norm gradient** — a full-magnitude update from a no-information signal. Reproduced: `||gradient_estimate|| = 1.0`, spurious `Δ = 0.0125`.

### Reproduction

`cargo test -p ruvector-sona --test repro_delta_zero -- --nocapture` (drives the real public API — `MicroLoRA` + `LearningSignal::from_trajectory` + `TrajectoryBuilder`, no mocks):

```
[single-step feedback]   Δ after 200 adapts = 0e0          (inert)
[uniform-reward 3-step]  ||grad||=1.0, spurious Δ = 1.26e-2 (FP residue amplified)
[varying-reward 2-step]  Δ after 200 adapts = 1.71e-2       (works — control)
[freeze check]           down_proj Δ = 0,  up_proj |w| = 0.566
```

<details><summary>Full reproduction test (drop in <code>crates/sona/tests/repro_delta_zero.rs</code>)</summary>

```rust
use ruvector_sona::{LearningSignal, MicroLoRA, TrajectoryBuilder};

const DIM: usize = 16;
const RANK: usize = 2;
const LR: f32 = 0.01;

fn forward_probe(lora: &MicroLoRA, input: &[f32]) -> Vec<f32> {
    let mut out = vec![0.0f32; input.len()];
    lora.forward(input, &mut out);
    out
}
fn l2(a: &[f32], b: &[f32]) -> f32 {
    a.iter().zip(b).map(|(x, y)| (x - y) * (x - y)).sum::<f32>().sqrt()
}

/// The natural "task finished, here is its quality" pattern: one step, one final score.
#[test]
fn single_step_feedback_is_inert() {
    let mut lora = MicroLoRA::new(DIM, RANK);
    let probe: Vec<f32> = (0..DIM).map(|i| (i as f32 * 0.1).sin()).collect();
    let baseline = forward_probe(&lora, &probe);
    for n in 0..200u64 {
        let mut b = TrajectoryBuilder::new(n, probe.clone());
        b.add_step(probe.clone(), vec![1.0; DIM], 1.0);
        let signal = LearningSignal::from_trajectory(&b.build(0.95));
        lora.accumulate_gradient(&signal);
        lora.apply_accumulated(LR);
    }
    let delta = l2(&baseline, &forward_probe(&lora, &probe));
    println!("[single-step feedback] Δ after 200 adapts = {delta:e}");
    assert_eq!(delta, 0.0);
}

/// Uniform per-step rewards: true advantage 0, but L2-normalized FP residue → unit-norm gradient.
#[test]
fn uniform_reward_amplifies_fp_residue() {
    let mut lora = MicroLoRA::new(DIM, RANK);
    let probe: Vec<f32> = (0..DIM).map(|i| (i as f32 * 0.2).cos()).collect();
    let baseline = forward_probe(&lora, &probe);
    let mut b0 = TrajectoryBuilder::new(0, probe.clone());
    b0.add_step(probe.clone(), vec![1.0; DIM], 0.9);
    b0.add_step(probe.clone(), vec![1.0; DIM], 0.9);
    b0.add_step(probe.clone(), vec![1.0; DIM], 0.9);
    let sig0 = LearningSignal::from_trajectory(&b0.build(0.9));
    let gnorm: f32 = sig0.gradient_estimate.iter().map(|x| x * x).sum::<f32>().sqrt();
    println!("[uniform-reward] true advantage=0, yet ||gradient_estimate|| = {gnorm:e}");
    for n in 0..200u64 {
        let mut b = TrajectoryBuilder::new(n, probe.clone());
        b.add_step(probe.clone(), vec![1.0; DIM], 0.9);
        b.add_step(probe.clone(), vec![1.0; DIM], 0.9);
        b.add_step(probe.clone(), vec![1.0; DIM], 0.9);
        let signal = LearningSignal::from_trajectory(&b.build(0.9));
        lora.accumulate_gradient(&signal);
        lora.apply_accumulated(LR);
    }
    let delta = l2(&baseline, &forward_probe(&lora, &probe));
    println!("[uniform-reward 3-step] spurious Δ after 200 adapts = {delta:e}");
    assert!(delta > 0.0);
}

/// Control: varying per-step rewards DO adapt — the mechanism works when fed a real gradient.
#[test]
fn varying_reward_multistep_adapts() {
    let mut lora = MicroLoRA::new(DIM, RANK);
    let probe: Vec<f32> = (0..DIM).map(|i| (i as f32 * 0.3).sin() + 0.2).collect();
    let baseline = forward_probe(&lora, &probe);
    for n in 0..200u64 {
        let mut b = TrajectoryBuilder::new(n, probe.clone());
        b.add_step(probe.clone(), vec![1.0; DIM], 0.1);
        b.add_step(probe.clone(), vec![1.0; DIM], 0.9);
        let signal = LearningSignal::from_trajectory(&b.build(0.8));
        lora.accumulate_gradient(&signal);
        lora.apply_accumulated(LR);
    }
    let delta = l2(&baseline, &forward_probe(&lora, &probe));
    println!("[varying-reward 2-step] Δ after 200 adapts = {delta:e}");
    assert!(delta > 0.0);
}

/// down_proj is never adapted regardless of signal — only up_proj moves.
#[test]
fn down_proj_is_frozen() {
    let mut lora = MicroLoRA::new(DIM, RANK);
    let (down_before, _) = { let (d, u) = lora.get_weights(); (d.clone(), u.clone()) };
    for n in 0..50u64 {
        let mut b = TrajectoryBuilder::new(n, vec![0.5; DIM]);
        b.add_step(vec![0.5; DIM], vec![1.0; DIM], 0.1);
        b.add_step(vec![0.5; DIM], vec![1.0; DIM], 0.9);
        lora.accumulate_gradient(&LearningSignal::from_trajectory(&b.build(0.8)));
        lora.apply_accumulated(LR);
    }
    let (down_after, up_after) = lora.get_weights();
    let down_delta = l2(&down_before, down_after);
    let up_delta: f32 = up_after.iter().map(|x| x * x).sum::<f32>().sqrt();
    println!("[freeze check] down_proj Δ = {down_delta:e}  up_proj |w| = {up_delta:e}");
    assert_eq!(down_delta, 0.0);
    assert!(up_delta > 0.0);
}
```
</details>

### Net effect for consumers

A JS/WASM consumer driving SONA via the documented `learnFromFeedback` API — or via single-outcome trajectories — observes zero adaptation; `applyLora`/`applyMicroLora` then return the untrained (`up_proj = 0`) transform. The forward/inference seam exists; the learn→inference loop is just not connected through the bindings consumers reach for.

### Suggested directions

- Wire `learn_from_feedback` to build a single-step-safe `LearningSignal` (e.g. fall back to a `query_embedding`-based gradient when `steps < 2`, or skip baseline subtraction for a single step) and accumulate + flush.
- Implement `processInstantLearning` in `@ruvector/ruvllm`, or mark it explicitly unimplemented so consumers don't assume it adapts.
- Adapt `down_proj` too, or document up-only adaptation as intentional.
- Guard `estimate_gradient` against amplifying a near-zero gradient (don't normalize when the pre-norm magnitude is ≈ 0).

**Related but distinct:** #516 (`@ruvector/sona` published without build output → `MODULE_NOT_FOUND`) is a packaging issue; this is about the no-op learn path *when the module is loaded and running*.

**Environment:** `ruvnet/ruvector@c2089c4`, Rust 1.95, crate `ruvector-sona` 0.2.0. Downstream context: ruflo 3.10.10 / Node 26.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SONA learn→inference loop unwired at the JS/WASM boundary: learn_from_feedback is a no-op; MicroLoRA only adapts on multi-step varying-reward trajectories #519

Findings (source + reproduction)

Reproduction

Net effect for consumers

Suggested directions

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

SONA learn→inference loop unwired at the JS/WASM boundary: learn_from_feedback is a no-op; MicroLoRA only adapts on multi-step varying-reward trajectories #519

Description

Findings (source + reproduction)

Reproduction

Net effect for consumers

Suggested directions

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions