IST Austria Distributed Algorithms and Systems Lab

gptq Public

Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".

Python 1.9k 153

sparsegpt Public

Code for the ICML 2023 paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot".

Python 712 96

marlin Public

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Python 611 47

qmoe Public

Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".

Python 261 22

PanzaMail Public

Python 257 14

QUIK Public

Repository for the QUIK project, enabling the use of 4bit kernels for generative inference - EMNLP 2024

Provide feedback