Skip to content

context : use n_embd_out for pooled embedding extraction#20840

Merged
ggerganov merged 1 commit intoggml-org:masterfrom
extfs:fix/qwen3vl-embedding-pooling
Mar 21, 2026
Merged

context : use n_embd_out for pooled embedding extraction#20840
ggerganov merged 1 commit intoggml-org:masterfrom
extfs:fix/qwen3vl-embedding-pooling

Conversation

@extfs
Copy link
Contributor

@extfs extfs commented Mar 21, 2026

Summary

The MEAN/CLS/LAST pooling paths in encode() and decode() use n_embd (set to hparams.n_embd_inp()) to read from the pooled embedding tensor t_embd. For models with deepstack layers (qwen3vl), n_embd_inp() returns 16384 while the pooled tensor only has n_embd_out() = 4096 floats per sequence, causing:

GGML_ASSERT(offset + size <= ggml_nbytes(tensor) && "tensor read out of bounds")

The fix uses hparams.n_embd_out() instead, consistent with how POOLING_TYPE_NONE (line 1756) already handles it.

Affected models

Qwen3-VL-Embedding-8B (and any model with n_deepstack_layers > 0)

Test

Tested with Qwen3-VL-Embedding-8B-Q4_K_M.gguf, --embedding --pooling last:

  • Before: crash on startup
  • After: correct 4096-dim L2-normalized embeddings

No impact on models without deepstack (n_embd_inp == n_embd_out).

Use case

Qwen3-VL-Embedding can create embeddings for both images and text in the same vector space. A typical workflow is to index images using the full model on GPU, then run text-only retrieval queries against that index using a quantized GGUF on CPU. This fix makes the second step possible.

Disclosure

AI was used as a research/debugging aid to locate the root cause.

The MEAN/CLS/LAST pooling paths in encode() and decode() used
n_embd_inp() (16384 for qwen3vl with deepstack) to read from the
pooled embedding tensor, which only has n_embd_out() (4096) floats
per sequence. This caused a tensor read out of bounds assertion.

Fixes embedding mode for Qwen3-VL-Embedding models.
@extfs extfs requested a review from ggerganov as a code owner March 21, 2026 12:29
@ggerganov ggerganov merged commit 212f452 into ggml-org:master Mar 21, 2026
45 of 48 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants