Skip to content

Qwen3.5-4B: DeltaNet conv1d unstable on short prompts — long context works fine #95

@unamedkr

Description

@unamedkr

Root Cause Found

Qwen3.5-4B's DeltaNet layers produce garbage on short prompts (< ~50 tokens) but work correctly on long prompts (150+ tokens with document context).

Evidence

# SHORT prompt — FAILS (even first call, fresh state)
"What is 2+2?""The answer to **"  # wrong

# LONG prompt — WORKS (4/4 correct)
"Document: Acme reported 847M...\nQuestion: What was revenue?""847 million"

Hypothesis

DeltaNet's conv1d layer (conv_width=4) needs sufficient input tokens to initialize its recurrent state (conv_state buffer). With short prompts, the conv buffer contains uninitialized/zero values that produce unstable attention patterns.

Impact on RLV

The coherence check prompt ("Is this answered? YES/NO") is short (~40 tokens) and triggers this bug, causing:

  • False UNSURE verdicts
  • Unnecessary retries (3x slower)
  • "empty response from server" when the model generates only template tokens

Proposed Fix

Option A: Pad short prompts with a warm-up prefix:

if (n_prompt < 50) {
    // Prepend neutral tokens to warm up conv_state
    prepend("The following is a question and answer.\n");
}

Option B: Initialize conv_state with a learned warm-up pass:

// Run a dummy forward pass with padding tokens before real generation
for (int i = 0; i < conv_width; i++)
    tq_forward(model, state, PAD_TOKEN, i);

Option C: Zero-initialize conv_state explicitly before each generate call (already done by calloc, but verify the state isn't being reused).

Environment

  • Model: unsloth/Qwen3.5-4B-GGUF (Q4_K_M, 2.6GB)
  • quant.h: latest main (e12fcbd DeltaNet fix)
  • OS: macOS 15 (Apple M3, 16GB)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions