Skip to content

Latest commit

 

History

History
115 lines (84 loc) · 8.4 KB

README.md

File metadata and controls

115 lines (84 loc) · 8.4 KB

Just helping myself keep track of LLM papers that I‘m reading, with an emphasis on inference and model compression.

Transformer Architectures

Foundation Models

Position Encoding

KV Cache

Activation

Pruning

Quantization

Normalization

Sparsity and rank compression

Fine-tuning

Sampling

Scaling

Mixture of Experts

Watermarking

More