Qwen3Omni+vllm+flashatt with audio+video+text input has Error: This flash attention build does not support headdim not being a multiple of 32

Describe the bug

<img width="1708" height="565" alt="Image" src="https://github.com/user-attachments/assets/6b5aab95-c790-4425-be5b-c9b99124a43d" />

When starting the vLLM engine, the process crashes with the following error:

RuntimeError: Worker failed with error 'This flash attention build does not support headdim not being a multiple of 32.'


Full traceback excerpt:

(EngineCore_DP0 pid=669) ERROR 11-17 00:53:02 [multiproc_executor.py:230] Worker proc VllmWorker-0 died unexpectedly, shutting down executor.
...
File "/data/vllm39/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 228, in _initialize_kv_caches
  available_gpu_memory = self.model_executor.determine_available_memory()
...
RuntimeError: Worker failed with error 'This flash attention build does not support headdim not being a multiple of 32.'


This happens immediately when launching the engine via vllm.start() (or equivalent).
It seems to be caused by FlashAttention 2.8.3 rejecting the model’s head dimension, even though vLLM supposedly handles arbitrary head sizes.

To Reproduce

Steps to reproduce:

Install latest vLLM from GitHub (master).

Install FlashAttention 2.8.3.

Run a model whose attention head dimension is not a multiple of 32 (e.g., some custom / fine-tuned architectures).

Launch the engine.

Engine crashes before initialization completes.

Expected behavior

vLLM should either:

fall back to a compatible attention kernel,
or

provide a clear error message explaining which models are incompatible with the installed FlashAttention version.

Hardware and system info

Torch: 2.9

CUDA: 12.8

GPU: H100

vLLM: latest GitHub master

FlashAttention: 2.8.3

Python: 3.12


Additional context

Using flash-attn 2.8.3 + vLLM latest master + Swift 3.10 environment.

The error happens consistently and prevents any model from loading.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Qwen3Omni+vllm+flashatt with audio+video+text input has Error: This flash attention build does not support headdim not being a multiple of 32 #6617

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Qwen3Omni+vllm+flashatt with audio+video+text input has Error: This flash attention build does not support headdim not being a multiple of 32 #6617

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions