Skip to content

fix(vision): captured slice_update in DeepStack/section-assembly injection loops (qwen3_vl, qwen3_vl_moe, qwen2_vl, paddleocr_vl) #650

Description

@inureyes

Problem

mlxcel_core::slice_update is a functional op: it wraps mlx::core::slice_update, which returns a NEW array (MLX arrays are immutable) and does NOT mutate its src argument. Several visual-injection / section-assembly loops call it as a bare statement and discard the return value, so result is never updated and the write is silently lost.

This exact bug made Granite 4 Vision completely blind (the model ran text-only) and was fixed in PR #649 (see inject_at_positions in src/models/granitemoehybrid.rs, which now captures the return and has a co-located regression test). The same broken pattern exists at these other sites, copied from the same template:

  • src/models/qwen3_vl.rs:798 (DeepStack injection loop)
  • src/models/qwen3_vl_moe.rs:991 (DeepStack injection loop)
  • src/models/qwen2_vl.rs:205 (section-assembly loop)
  • src/models/paddleocr_vl.rs:191 (section-assembly loop)

Impact

  • qwen3_vl / qwen3_vl_moe: the MAIN image features enter via inputs_embeds, and DeepStack is only an auxiliary deeper-layer refinement, so the dropped write degrades output quality rather than fully blinding the model. Still a real correctness bug.
  • qwen2_vl / paddleocr_vl: these are section-assembly loops (mrope / feature packing); the discarded return may or may not drop data depending on control flow. Verify each before fixing.

Fix

Capture the return: result = mlxcel_core::slice_update(&result, &val, ...). For a contiguous run, collapse the per-row loop to a single slice_update (see the granite4 fix). Add a regression test per model where practical.

Validation

Each fix must be validated on a real checkpoint with a real image, comparing per-op absmeans (q/k/v post-rope, SDPA output) against the mlx-vlm reference. Diagnostic signature of a dropped injection: hidden-state at image positions stays ~0, so the all-positions q/k/v absmean collapses to roughly reference x (n_text / seq_len).

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:modelsModel architectures, weights, loading, metadatapriority:highHigh prioritystatus:readyReady to be worked ontype:bugBug fixes, error corrections, or issue resolutions

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions