Add support for GPTQ style INT3/4/5 quantization of LLMs like LLaMA #1491

powderluv · 2023-05-16T22:27:16Z

Request description

There has been a lot of progress in low precision inference (Int3/4/5) with LLMs like LLaMA using GPTQ and llama.cpp. While we are adding support for quantization to StableHLO it would be good to ensure we can express a GPTQ quantized LLaMA via StableHLO.

It would be awesome if we added a small example of importing LLaMA form HuggingFace and exporting a GPTQ quantized StableHLO IR.

sdasgup3 · 2023-06-20T23:01:46Z

We have started working on gathering what's needed in StableHLO to represent quantization in cutting-edge LLMs. The discussion in #1535 (comment) is pretty useful in that direction. As a starting point, we are exploring some of the int4 LLaMA / Vicuna torch IR while working with @powderluv and @qedawkins .

sdasgup3 · 2024-04-12T16:17:56Z

I note that the GTPQ is definitely within scope of coverage along with other algorithms for the StableHLO Quantizer team. We will update the ticket once the plans are finalized.

cc @doyeonkim0

burmako added the Spec label May 17, 2023

burmako assigned sdasgup3 Jun 3, 2023

sdasgup3 mentioned this issue Jun 20, 2023

Should dot_general support per-axis quantization scheme? #1615

Closed

sdasgup3 mentioned this issue Jun 20, 2023

Spec quantization #588

Closed

GleasonK added the Quantization label Apr 8, 2024

jonatanklosko mentioned this issue Jun 11, 2024

Quantization via MLIR elixir-nx/nx#1452

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for GPTQ style INT3/4/5 quantization of LLMs like LLaMA #1491

Add support for GPTQ style INT3/4/5 quantization of LLMs like LLaMA #1491

powderluv commented May 16, 2023

sdasgup3 commented Jun 20, 2023

sdasgup3 commented Apr 12, 2024

Add support for GPTQ style INT3/4/5 quantization of LLMs like LLaMA #1491

Add support for GPTQ style INT3/4/5 quantization of LLMs like LLaMA #1491

Comments

powderluv commented May 16, 2023

Request description

sdasgup3 commented Jun 20, 2023

sdasgup3 commented Apr 12, 2024