Skip to content

FlashAttention2 not available in hipblas docker images #9138

@dougmaitelli

Description

@dougmaitelli

LocalAI version:

LocalAI v4.0.0 (8e8b7df)

Environment, CPU architecture, OS, and Version:

AMD Strix Halo 395+
128Gb Unified Memory (96 VRAM)
Fedora 42

Describe the bug

I am unable to load either qwen3-tts or qwen3-asr with errors saying FlashAttention2 is not found.

To Reproduce

Docker Compose file:

  localai:
    container_name: localai
    image: localai/localai:latest-gpu-hipblas
    environment:
      DEBUG=true
      MODELS_PATH: /models
    ports:
      - "8080:8080"
    volumes:
      - ./backends:/backends
      - ./config:/config
      - ./models:/models
    devices:
      - /dev/dri:/dev/dri
      - /dev/kfd:/dev/kfd
    group_add:
      - "video" 
    restart: unless-stopped

After that install qwen3-tts and try to run the model.
Backend will download but fail load.

Expected behavior

Models should load without errors.

Logs

Image

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    Status

    In Progress

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions