Flash attention unavailable after 0.0.21 on Windows system #863

KohakuBlueleaf · 2023-09-22T12:31:45Z

🐛 Bug

Command

python -m xformers.info

To Reproduce

Steps to reproduce the behavior:

install xformers 0.0.21 or build from source on latest commit on windows, memory_efficient_attention.flshattF/B are all unavailable.
(Also, the build.env.TORCH_CUDA_ARCH_LIST in pre-built wheel doesn't have 8.6 and 8.9)

Expected behavior

both pre-built wheel and build from source should give us flash attention support.
(If this situation is bcuz windows doesn't support some feature which is needed in flashattn2, plz at least give us flash attn1 support on windows)

I also wondered if this is some bug in xformers.info, but since xformers 0.0.21 actually give me slower result than 0.0.20, I think flash attn just gone.

Environment

Python version: 3.11.5 (tags/v3.11.5:cce6ba9, Aug 24 2023, 14:38:34) [MSC v.1936 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.22621-SP0
Is CUDA available: True
CUDA runtime version: 12.1.105
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 4060 Ti
Nvidia driver version: 537.34
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Additional context

here is the output of xformers.info on 0.0.21:

xFormers 0.0.21
memory_efficient_attention.cutlassF:               available
memory_efficient_attention.cutlassB:               available
memory_efficient_attention.decoderF:               available
memory_efficient_attention.flshattF@0.0.0:         unavailable
memory_efficient_attention.flshattB@0.0.0:         unavailable
memory_efficient_attention.smallkF:                available
memory_efficient_attention.smallkB:                available
memory_efficient_attention.tritonflashattF:        unavailable
memory_efficient_attention.tritonflashattB:        unavailable
indexing.scaled_index_addF:                        available
indexing.scaled_index_addB:                        available
indexing.index_select:                             available
swiglu.dual_gemm_silu:                             available
swiglu.gemm_fused_operand_sum:                     available
swiglu.fused.p.cpp:                                available
is_triton_available:                               False
is_functorch_available:                            False
pytorch.version:                                   2.0.1+cu118
pytorch.cuda:                                      available
gpu.compute_capability:                            8.9
gpu.name:                                          NVIDIA GeForce RTX 4060 Ti
build.info:                                        available
build.cuda_version:                                1108
build.python_version:                              3.11.4
build.torch_version:                               2.0.1+cu118
build.env.TORCH_CUDA_ARCH_LIST:                    5.0+PTX 6.0 6.1 7.0 7.5 8.0+PTX 9.0
build.env.XFORMERS_BUILD_TYPE:                     Release
build.env.XFORMERS_ENABLE_DEBUG_ASSERTIONS:        None
build.env.NVCC_FLAGS:                              None
build.env.XFORMERS_PACKAGE_FROM:                   wheel-v0.0.21
build.nvcc_version:                                11.8.89
source.privacy:                                    open source

Here is the output of 0.0.20:

xFormers 0.0.20
memory_efficient_attention.cutlassF:               available
memory_efficient_attention.cutlassB:               available
memory_efficient_attention.flshattF:               available
memory_efficient_attention.flshattB:               available
memory_efficient_attention.smallkF:                available
memory_efficient_attention.smallkB:                available
memory_efficient_attention.tritonflashattF:        unavailable
memory_efficient_attention.tritonflashattB:        unavailable
indexing.scaled_index_addF:                        available
indexing.scaled_index_addB:                        available
indexing.index_select:                             available
swiglu.dual_gemm_silu:                             available
swiglu.gemm_fused_operand_sum:                     available
swiglu.fused.p.cpp:                                available
is_triton_available:                               False
is_functorch_available:                            False
pytorch.version:                                   2.0.1+cu118
pytorch.cuda:                                      available
gpu.compute_capability:                            8.9
gpu.name:                                          NVIDIA GeForce RTX 4060 Ti
build.info:                                        available
build.cuda_version:                                1108
build.python_version:                              3.11.3
build.torch_version:                               2.0.1+cu118
build.env.TORCH_CUDA_ARCH_LIST:                    5.0+PTX 6.0 6.1 7.0 7.5 8.0 8.6
build.env.XFORMERS_BUILD_TYPE:                     Release
build.env.XFORMERS_ENABLE_DEBUG_ASSERTIONS:        None
build.env.NVCC_FLAGS:                              None
build.env.XFORMERS_PACKAGE_FROM:                   wheel-v0.0.20
build.nvcc_version:                                11.8.89
source.privacy:                                    open source

The text was updated successfully, but these errors were encountered:

rltgjqmcpgjadyd · 2023-09-26T05:14:19Z

af6b866 commit has same problem

xFormers 0.0.22+af6b866.d20230926
memory_efficient_attention.cutlassF:               available
memory_efficient_attention.cutlassB:               available
memory_efficient_attention.decoderF:               available
memory_efficient_attention.flshattF@0.0.0:         unavailable
memory_efficient_attention.flshattB@0.0.0:         unavailable
memory_efficient_attention.smallkF:                available
memory_efficient_attention.smallkB:                available
memory_efficient_attention.tritonflashattF:        unavailable
memory_efficient_attention.tritonflashattB:        unavailable
memory_efficient_attention.triton_splitKF:         unavailable
indexing.scaled_index_addF:                        available
indexing.scaled_index_addB:                        available
indexing.index_select:                             available
swiglu.dual_gemm_silu:                             available
swiglu.gemm_fused_operand_sum:                     available
swiglu.fused.p.cpp:                                available
is_triton_available:                               False
pytorch.version:                                   2.1.0.dev20230821+cu121
pytorch.cuda:                                      available
gpu.compute_capability:                            8.9
gpu.name:                                          NVIDIA GeForce RTX 4090
build.info:                                        available
build.cuda_version:                                1201
build.python_version:                              3.11.5
build.torch_version:                               2.1.0.dev20230821+cu121
build.env.TORCH_CUDA_ARCH_LIST:                    8.9
build.env.XFORMERS_BUILD_TYPE:                     None
build.env.XFORMERS_ENABLE_DEBUG_ASSERTIONS:        None
build.env.NVCC_FLAGS:                              "-allow-unsupported-compiler"
build.env.XFORMERS_PACKAGE_FROM:                   None
build.nvcc_version:                                12.1.66
source.privacy:                                    open source

danthe3rd · 2023-09-26T10:51:16Z

Hi,
Flash-Attention does not support windows at the moment, so we don't build it on windows (see for instance Dao-AILab/flash-attention#565). We still can run our own implementation which should be a bit faster than Flash v1 (but slower than Flash v2).
Once Flash-Attention v2 has support for windows, we will add it back.

rbertus2000 · 2023-10-09T17:34:49Z

Hi, Flash-Attention does not support windows at the moment, so we don't build it on windows (see for instance Dao-AILab/flash-attention#565). We still can run our own implementation which should be a bit faster than Flash v1 (but slower than Flash v2). Once Flash-Attention v2 has support for windows, we will add it back.

It seems like flash-attention 2.3.2 supports windows now. Dao-AILab/flash-attention#595 (comment)

KohakuBlueleaf · 2023-10-09T17:36:25Z

Hi, Flash-Attention does not support windows at the moment, so we don't build it on windows (see for instance Dao-AILab/flash-attention#565). We still can run our own implementation which should be a bit faster than Flash v1 (but slower than Flash v2). Once Flash-Attention v2 has support for windows, we will add it back.

It seems like flash-attention 2.3.2 supports windows now. Dao-AILab/flash-attention#595 (comment)

I will try to build flash attn with torch2.1.0 and cuda12.1 to see if it worked

Panchovix · 2023-10-09T22:14:01Z

Hi, Flash-Attention does not support windows at the moment, so we don't build it on windows (see for instance Dao-AILab/flash-attention#565). We still can run our own implementation which should be a bit faster than Flash v1 (but slower than Flash v2). Once Flash-Attention v2 has support for windows, we will add it back.

It seems like flash-attention 2.3.2 supports windows now. Dao-AILab/flash-attention#595 (comment)

I will try to build flash attn with torch2.1.0 and cuda12.1 to see if it worked

Does xformers automatically uses if FA2 is installed in the venv, or you have to build it with FA2 installed instead?

KohakuBlueleaf · 2023-10-12T04:18:38Z

@danthe3rd Flash attention is able to be compiled/installed on windows after 2.3.2
Will xformers update for it?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flash attention unavailable after 0.0.21 on Windows system #863

Flash attention unavailable after 0.0.21 on Windows system #863

KohakuBlueleaf commented Sep 22, 2023

rltgjqmcpgjadyd commented Sep 26, 2023

danthe3rd commented Sep 26, 2023

rbertus2000 commented Oct 9, 2023

KohakuBlueleaf commented Oct 9, 2023

Panchovix commented Oct 9, 2023

KohakuBlueleaf commented Oct 12, 2023

Flash attention unavailable after 0.0.21 on Windows system #863

Flash attention unavailable after 0.0.21 on Windows system #863

Comments

KohakuBlueleaf commented Sep 22, 2023

🐛 Bug

Command

To Reproduce

Expected behavior

Environment

Additional context

rltgjqmcpgjadyd commented Sep 26, 2023

danthe3rd commented Sep 26, 2023

rbertus2000 commented Oct 9, 2023

KohakuBlueleaf commented Oct 9, 2023

Panchovix commented Oct 9, 2023

KohakuBlueleaf commented Oct 12, 2023