Skip to content

Qwen3Omni+vllm+flashatt with audio+video+text input has Error: This flash attention build does not support headdim not being a multiple of 32 #6617

@cheliu-computation

Description

@cheliu-computation

Describe the bug

Image

When starting the vLLM engine, the process crashes with the following error:

RuntimeError: Worker failed with error 'This flash attention build does not support headdim not being a multiple of 32.'

Full traceback excerpt:

(EngineCore_DP0 pid=669) ERROR 11-17 00:53:02 [multiproc_executor.py:230] Worker proc VllmWorker-0 died unexpectedly, shutting down executor.
...
File "/data/vllm39/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 228, in _initialize_kv_caches
available_gpu_memory = self.model_executor.determine_available_memory()
...
RuntimeError: Worker failed with error 'This flash attention build does not support headdim not being a multiple of 32.'

This happens immediately when launching the engine via vllm.start() (or equivalent).
It seems to be caused by FlashAttention 2.8.3 rejecting the model’s head dimension, even though vLLM supposedly handles arbitrary head sizes.

To Reproduce

Steps to reproduce:

Install latest vLLM from GitHub (master).

Install FlashAttention 2.8.3.

Run a model whose attention head dimension is not a multiple of 32 (e.g., some custom / fine-tuned architectures).

Launch the engine.

Engine crashes before initialization completes.

Expected behavior

vLLM should either:

fall back to a compatible attention kernel,
or

provide a clear error message explaining which models are incompatible with the installed FlashAttention version.

Hardware and system info

Torch: 2.9

CUDA: 12.8

GPU: H100

vLLM: latest GitHub master

FlashAttention: 2.8.3

Python: 3.12

Additional context

Using flash-attn 2.8.3 + vLLM latest master + Swift 3.10 environment.

The error happens consistently and prevents any model from loading.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions