Skip to content

shreyansh26/FlashAttention-PyTorch

Repository files navigation

FlashAttention in PyTorch

A simplified implementation of FlashAttention in PyTorch. I have implemented the forward pass and backward pass algorithms from the paper, and also shown that it is equivalent to the normal attention formulation in Transformers. I also include some code for benchmarking.

Note that this is for educational purposes only as I haven't implemented any of the CUDA and SRAM memory tricks as described in the paper.

Requirements

  • einops==0.6.1
  • torch==2.0.1

Files

  • flash_attention.py - Implementation of the general formulation of FlashAttention which takes in Q, K, V and a mask. The code includes both the forward and backward algorithms and a simple test of equivalence of the forward pass with normal attention as well.
  • flash_attention_causal.py - The causal version of FlashAttention which takes in Q, K and V. The mask is caluclated in a causal fashion which is typcially used in autoregressive models. This code also includes the forward and backward algorithms and a simple test of equivalence of the forward pass with normal attention (causal) as well.
  • bench.py, bench_causal.py - Benchmarking code for both general and causal versions of FlashAttention.
  • check_backward.py, check_backward_causal.py - This script verifies two things - 1. whether the calculated value of gradients (using PyTorch's jacrev) of Q, K and V match for the normal version of attention and FlashAttention, and 2. whether these results match the implementation of backward pass given in the paper. The loss function is simply assumed to be a sum of the final output tensor.

To run

Forward pass

Causal mask
python flash_attention_causal.py

Random mask
python flash_attention.py

Benchmarking - Causal mask

FlashAttention
python bench_causal.py --b 1 --h 2 --q_len 16384 --kv_len 16384 --d 512 --type flash

Normal attention
python bench_causal.py --b 1 --h 2 --q_len 16384 --kv_len 16384 --d 512 --type normal

Add --profile to log additional details using PyTorch Profiler.

Benchmarking - Random mask

FlashAttention
python bench.py --b 1 --h 2 --q_len 16384 --kv_len 16384 --d 512 --type flash

Normal attention
python bench.py --b 1 --h 2 --q_len 16384 --kv_len 16384 --d 512 --type normal

Add --profile to log additional details using PyTorch Profiler.

Backward Pass

Causal mask
python check_backward_causal.py

Random mask
python check_backward.py

About

Implementation of FlashAttention in PyTorch

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages