context : use n_embd_out for pooled embedding extraction by extfs · Pull Request #20840 · ggml-org/llama.cpp

extfs · 2026-03-21T12:29:57Z

Summary

The MEAN/CLS/LAST pooling paths in encode() and decode() use n_embd (set to hparams.n_embd_inp()) to read from the pooled embedding tensor t_embd. For models with deepstack layers (qwen3vl), n_embd_inp() returns 16384 while the pooled tensor only has n_embd_out() = 4096 floats per sequence, causing:

GGML_ASSERT(offset + size <= ggml_nbytes(tensor) && "tensor read out of bounds")

The fix uses hparams.n_embd_out() instead, consistent with how POOLING_TYPE_NONE (line 1756) already handles it.

Affected models

Qwen3-VL-Embedding-8B (and any model with n_deepstack_layers > 0)

Test

Tested with Qwen3-VL-Embedding-8B-Q4_K_M.gguf, --embedding --pooling last:

Before: crash on startup
After: correct 4096-dim L2-normalized embeddings

No impact on models without deepstack (n_embd_inp == n_embd_out).

Use case

Qwen3-VL-Embedding can create embeddings for both images and text in the same vector space. A typical workflow is to index images using the full model on GPU, then run text-only retrieval queries against that index using a quantized GGUF on CPU. This fix makes the second step possible.

Disclosure

AI was used as a research/debugging aid to locate the root cause.

The MEAN/CLS/LAST pooling paths in encode() and decode() used n_embd_inp() (16384 for qwen3vl with deepstack) to read from the pooled embedding tensor, which only has n_embd_out() (4096) floats per sequence. This caused a tensor read out of bounds assertion. Fixes embedding mode for Qwen3-VL-Embedding models.

extfs requested a review from ggerganov as a code owner March 21, 2026 12:29

ggerganov approved these changes Mar 21, 2026

View reviewed changes

ggerganov merged commit 212f452 into ggml-org:master Mar 21, 2026
45 of 48 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

context : use n_embd_out for pooled embedding extraction#20840

context : use n_embd_out for pooled embedding extraction#20840
ggerganov merged 1 commit intoggml-org:masterfrom
extfs:fix/qwen3vl-embedding-pooling

extfs commented Mar 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

extfs commented Mar 21, 2026

Summary

Affected models

Test

Use case

Disclosure

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants