Skip to content

test_relpos_attention_local: local (rel_pos_local_attn) attention diverges from NeMo #44

Description

@rubdttcom

While building a full NeMo-baseline set to run the model-dependent test suite, I
hit a divergence in the local (Longformer) attention path that looks separate
from #39 (the streaming O(N²) fix) — filing it on its own.

Symptom

test_relpos_attention_local fails on the 110m anchor, on CPU (PARAKEET_DEVICE=cpu,
f32 GGUF), so it isn't iGPU fp16 tolerance:

[relpos_attention_local] n=47616 max|d|=3.349e+02 mean|d|=9.779e+00 (worst@47338 got=0.44526 ref=335.37750) -> FAIL

The divergence is broad (mean |d| ≈ 10, not a single element) and the worst point
is the last time frame (worst index 47338 = frame 92 of T=93, d_model=512).

It's not the --att-context-size (W) chosen for the baseline

I regenerated PARAKEET_TEST_BASELINE_LOCAL at two windows and re-ran:

W result
64 worst@47338 got=0.44526 ref=335.37750
32 worst@47338 got=0.44526 ref=356.31686

The C++ output (got) is identical across W while NeMo's ref changes — i.e.
forward_local does not respond to the window the baseline encodes. (W=128 is
correctly rejected by the test since W ≥ T.)

test_relpos_attention_local_chunked and test_relpos_attention_local_memory
pass (they use an internal brute-force reference), so the gap is specific to
the non-chunked forward_local vs the NeMo rel_pos_local_attn baseline.

Reproduce

# baseline (NeMo): local attention with a finite window over speech.wav
python scripts/gen_nemo_baseline.py \
  --model nvidia/parakeet-tdt_ctc-110m \
  --audio tests/fixtures/speech.wav \
  --att-context-size 64 --output /tmp/baseline_local.gguf

# convert the 110m anchor to f32 gguf -> PARAKEET_TEST_GGUF
PARAKEET_DEVICE=cpu \
PARAKEET_TEST_GGUF=/tmp/pk110m-f32.gguf \
PARAKEET_TEST_BASELINE_LOCAL=/tmp/baseline_local.gguf \
  ./build/tests/test_relpos_attention_local

Question

Is this a known limitation, a layout/convention mismatch between the dumped
pos_emb ([2W+1, d_model]) and what forward_local expects, or a real bug in
the non-chunked local path? Happy to dig into forward_local if it's worth a fix.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions