FlashAttention2 not available in hipblas docker images

**LocalAI version:**

LocalAI v4.0.0 (8e8b7df715e620b64d07cddf6b73bff8f966dac5)

**Environment, CPU architecture, OS, and Version:**

AMD Strix Halo 395+
128Gb Unified Memory (96 VRAM)
Fedora 42

**Describe the bug**

I am unable to load either qwen3-tts or qwen3-asr with errors saying FlashAttention2 is not found.

**To Reproduce**

Docker Compose file:
```
  localai:
    container_name: localai
    image: localai/localai:latest-gpu-hipblas
    environment:
      DEBUG=true
      MODELS_PATH: /models
    ports:
      - "8080:8080"
    volumes:
      - ./backends:/backends
      - ./config:/config
      - ./models:/models
    devices:
      - /dev/dri:/dev/dri
      - /dev/kfd:/dev/kfd
    group_add:
      - "video" 
    restart: unless-stopped
```

After that install qwen3-tts and try to run the model.
Backend will download but fail load.

**Expected behavior**

Models should load without errors.

**Logs**

<img width="1926" height="142" alt="Image" src="https://github.com/user-attachments/assets/5dcb18eb-d4ab-4bd4-893d-ee9ce8764403" />

**Additional context**

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

FlashAttention2 not available in hipblas docker images #9138

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

FlashAttention2 not available in hipblas docker images #9138

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions