Skip to content

Conversation

@ngxson
Copy link
Collaborator

@ngxson ngxson commented Dec 2, 2025

Fix #17676

@ngxson ngxson requested a review from ggerganov as a code owner December 2, 2025 15:45
Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it a bit confusing though. Does it mean if you warmup with a small image, then using a larger image would fail?

@ngxson
Copy link
Collaborator Author

ngxson commented Dec 2, 2025

I find it a bit confusing though. Does it mean if you warmup with a small image, then using a larger image would fail?

Indeed, the bug was because buf_compute_meta was never allocated if warmup is never called, so it still need to be called at least once.

If the user use a bigger image later on, ggml_backend_sched_alloc_graph inside clip_image_batch_encode will be responsible for allocating a bigger device buffer.


There is also an alternative to fix this issue: to allocate buf_compute_meta upon initializing clip_ctx. But since warmup also does other stuff, like deciding the status of flash attn, log unsupported ops, it should still be called at least once.

Note sure if you have any other ideas?

@ngxson ngxson merged commit a96283a into ggml-org:master Dec 2, 2025
60 of 69 checks passed
khemchand-zetta pushed a commit to khemchand-zetta/llama.cpp that referenced this pull request Dec 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Misc. bug: --no-warmup failing in llama-server.exe for some vision models

2 participants