Skip to content

fix llama4 kv quant#1199

Merged
mengniwang95 merged 2 commits intomainfrom
mengni/kv_fix
Dec 26, 2025
Merged

fix llama4 kv quant#1199
mengniwang95 merged 2 commits intomainfrom
mengni/kv_fix

Conversation

@mengniwang95
Copy link
Copy Markdown
Contributor

Llama4 vision model's attention doesn't have cache and will crash in vllm loading if there is k/v scale
https://github.com/vllm-project/vllm/blob/030fc4491465d361e4bed626d76c184f8a7d8a07/vllm/model_executor/models/mllama4.py#L258

Signed-off-by: Mengni Wang <mengni.wang@intel.com>
@mengniwang95 mengniwang95 requested a review from yiliu30 December 25, 2025 12:34
@mengniwang95 mengniwang95 merged commit dd8ce2e into main Dec 26, 2025
29 checks passed
@mengniwang95 mengniwang95 deleted the mengni/kv_fix branch December 26, 2025 05:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants