Skip to content

ita9naiwa/attention-impl

Repository files navigation

CUDA torch functions for LLM

For study purpose

implemented attentions

  • Naive Attention
  • Attention with KV
  • Attention with non-contagious memory
  • Single Query Attention with non-contagious KV cache (PagedAttention with block size 1)
  • Multi Query Attention with non-contagious KV cache (for Speculative Decoding)
  • Rotary Embedding

About

attention implemenation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published