CS student focused on LLM inference and AI infrastructure.
-
Huazhong University of Science and Technology
- Wuhan
Highlights
- Pro
Stars
CUDA
7 repositories
A self-learning tutorail for CUDA High Performance Programing.
how to optimize some algorithm in cuda.
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
Flash Attention in ~100 lines of CUDA (forward pass only)
High performance Transformer implementation in C++.

