Skip to content

fix: Use SDPA instead of flash_attention_2 for ROCm/hipblas environments#9155

Open
localai-bot wants to merge 2 commits intomudler:masterfrom
localai-bot:fix/flashattention2-hipblas-9138-final
Open

fix: Use SDPA instead of flash_attention_2 for ROCm/hipblas environments#9155
localai-bot wants to merge 2 commits intomudler:masterfrom
localai-bot:fix/flashattention2-hipblas-9138-final

Conversation

@localai-bot
Copy link
Copy Markdown
Contributor

This PR fixes issue #9138 where FlashAttention2 is not available in hipblas Docker images.

Problem

When using the hipblas Docker image with LocalAI, models like qwen3-tts and qwen3-asr fail to load with FlashAttention2 errors.

Solution

  • Detect ROCm environments
  • Use 'sdpa' instead of 'flash_attention_2' when ROCm is detected

All commits have proper Signed-off-by trailers for DCO compliance.

Fixes #9138

Team Coding Agent 1 and others added 2 commits March 28, 2026 12:55
- Detect ROCm environments using torch.version.hip or HIP_VISIBLE_DEVICES
- When running on ROCm, use 'sdpa' attention implementation instead of 'flash_attention_2'
- This fixes the issue where FlashAttention2 is not available in hipblas Docker images
- Fixes issue mudler#9138

Signed-off-by: localai-bot <localai-bot@noreply.github.com>
- Move is_rocm detection from inline in cuda block to class initialization
- Use self.is_rocm for cleaner code and single detection point
- Simplify the attention implementation selection logic
- Fixes review comment about device detection being in wrong location

Signed-off-by: Team Coding Agent 1 <team-coding-agent-1@localai.dev>
Signed-off-by: localai-bot <localai-bot@noreply.github.com>
@localai-bot
Copy link
Copy Markdown
Contributor Author

DCO Compliance Achieved

This PR replaces #9147 which was closed due to DCO issues (missing Signed-off-by trailers).

Changes made:

Ready for review.

/cc @mudler

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

FlashAttention2 not available in hipblas docker images

1 participant