From e011550ddf989c727df1d654ad1a9926aa264b84 Mon Sep 17 00:00:00 2001 From: Richard Palethorpe Date: Sat, 25 Apr 2026 09:44:00 +0100 Subject: [PATCH] fix(vllm): drop flash-attn wheel to avoid torch 2.10 ABI mismatch The pinned flash-attn 2.8.3+cu12torch2.7 wheel breaks at import time once vllm 0.19.1 upgrades torch to its hard-pinned 2.10.0: ImportError: .../flash_attn_2_cuda...so: undefined symbol: _ZN3c104cuda29c10_cuda_check_implementationEiPKcS2_ib That C10 CUDA symbol is libtorch-version-specific. Dao-AILab has not yet published flash-attn wheels for torch 2.10 -- the latest release (2.8.3) tops out at torch 2.8 -- so any wheel pinned here is silently ABI-broken the moment vllm completes its install. vllm 0.19.1 lists flashinfer-python==0.6.6 as a hard dep, which already covers the attention path. The only other use of flash-attn in vllm is the rotary apply_rotary import in vllm/model_executor/layers/rotary_embedding/common.py, which is guarded by find_spec("flash_attn") and falls back cleanly when absent. Also unpin torch in requirements-cublas12.txt: the 2.7.0 pin only existed to give the flash-attn wheel a matching torch to link against. With flash-attn gone, vllm's own torch==2.10.0 dep is the binding constraint regardless of what we put here. Assisted-by: Claude:claude-opus-4-7 [Claude Code] Signed-off-by: Richard Palethorpe --- backend/python/vllm/requirements-cublas12-after.txt | 9 ++++++++- backend/python/vllm/requirements-cublas12.txt | 2 +- 2 files changed, 9 insertions(+), 2 deletions(-) diff --git a/backend/python/vllm/requirements-cublas12-after.txt b/backend/python/vllm/requirements-cublas12-after.txt index cab27c888e27..e6a61ea11ea2 100644 --- a/backend/python/vllm/requirements-cublas12-after.txt +++ b/backend/python/vllm/requirements-cublas12-after.txt @@ -1,2 +1,9 @@ -https://github.com/Dao-AILab/flash-attention/releases/download/v2.8.3/flash_attn-2.8.3+cu12torch2.7cxx11abiTRUE-cp310-cp310-linux_x86_64.whl +# flash-attn wheels are ABI-tied to a specific torch version. vllm forces +# torch==2.10.0 as a hard dep, but flash-attn 2.8.3 (latest) only ships +# prebuilt wheels up to torch 2.8 — any wheel we pin here gets silently +# broken when vllm upgrades torch during install, producing an undefined +# libc10_cuda symbol at import time. FlashInfer (required by vllm) covers +# attention, and rotary_embedding/common.py guards the flash_attn import +# with find_spec(), so skipping flash-attn is safe and the only stable +# choice until upstream ships a torch-2.10 wheel. vllm diff --git a/backend/python/vllm/requirements-cublas12.txt b/backend/python/vllm/requirements-cublas12.txt index 8bd72ae125fd..e007f0946daa 100644 --- a/backend/python/vllm/requirements-cublas12.txt +++ b/backend/python/vllm/requirements-cublas12.txt @@ -1,4 +1,4 @@ accelerate -torch==2.7.0 +torch transformers bitsandbytes \ No newline at end of file