Skip to content

[Feature Request] Enable Flash Attention in the released binary #258

@JohnAlcatraz

Description

@JohnAlcatraz

When I tried out the released binaries, I was surprised they do not have flash attention enabled. I mean these:

sd-master-ce1bcc7-bin-win-avx-x64.zip

sd-master-ce1bcc7-bin-win-avx2-x64.zip

sd-master-ce1bcc7-bin-win-avx512-x64.zip

The readme mentions this about flash attention:

Enabling flash attention reduces memory usage by at least 400 MB. At the moment, it is not supported when CUBLAS is enabled because the kernel implementation is missing.

The "bin-win-avx" CPU-only binaries don't have CUBLAS, so it shouldn't be any issue to enable flash attention, right?

When I then tried to compile the project myself with flash attention enabled, I also noticed that flash attention does not seem to work without manually disabling an assert (#138).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions