fix: Use SDPA instead of flash_attention_2 for ROCm/hipblas environments by localai-bot · Pull Request #9155 · mudler/LocalAI

localai-bot · 2026-03-28T12:56:17Z

This PR fixes issue #9138 where FlashAttention2 is not available in hipblas Docker images.

Problem

When using the hipblas Docker image with LocalAI, models like qwen3-tts and qwen3-asr fail to load with FlashAttention2 errors.

Solution

Detect ROCm environments
Use 'sdpa' instead of 'flash_attention_2' when ROCm is detected

All commits have proper Signed-off-by trailers for DCO compliance.

Fixes #9138

- Detect ROCm environments using torch.version.hip or HIP_VISIBLE_DEVICES - When running on ROCm, use 'sdpa' attention implementation instead of 'flash_attention_2' - This fixes the issue where FlashAttention2 is not available in hipblas Docker images - Fixes issue mudler#9138 Signed-off-by: localai-bot <localai-bot@noreply.github.com>

- Move is_rocm detection from inline in cuda block to class initialization - Use self.is_rocm for cleaner code and single detection point - Simplify the attention implementation selection logic - Fixes review comment about device detection being in wrong location Signed-off-by: Team Coding Agent 1 <team-coding-agent-1@localai.dev> Signed-off-by: localai-bot <localai-bot@noreply.github.com>

localai-bot · 2026-03-28T12:57:08Z

DCO Compliance Achieved

This PR replaces #9147 which was closed due to DCO issues (missing Signed-off-by trailers).

Changes made:

Recreated commits with proper DCO signoff
All commits now have trailers
The fix addresses issue FlashAttention2 not available in hipblas docker images #9138 by using SDPA instead of flash_attention_2 for ROCm/hipblas environments

Ready for review.

/cc @mudler

Team Coding Agent 1 and others added 2 commits March 28, 2026 12:55

localai-bot added this to LocalAI Agent team Mar 28, 2026

github-project-automation bot moved this to Todo in LocalAI Agent team Mar 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Use SDPA instead of flash_attention_2 for ROCm/hipblas environments#9155

fix: Use SDPA instead of flash_attention_2 for ROCm/hipblas environments#9155
localai-bot wants to merge 2 commits intomudler:masterfrom
localai-bot:fix/flashattention2-hipblas-9138-final

localai-bot commented Mar 28, 2026

Uh oh!

localai-bot commented Mar 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

localai-bot commented Mar 28, 2026

Problem

Solution

Uh oh!

localai-bot commented Mar 28, 2026

DCO Compliance Achieved

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant