Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for GPTQ style INT3/4/5 quantization of LLMs like LLaMA #1491

Open
powderluv opened this issue May 16, 2023 · 2 comments
Open

Add support for GPTQ style INT3/4/5 quantization of LLMs like LLaMA #1491

powderluv opened this issue May 16, 2023 · 2 comments
Assignees

Comments

@powderluv
Copy link

Request description

There has been a lot of progress in low precision inference (Int3/4/5) with LLMs like LLaMA using GPTQ and llama.cpp. While we are adding support for quantization to StableHLO it would be good to ensure we can express a GPTQ quantized LLaMA via StableHLO.

It would be awesome if we added a small example of importing LLaMA form HuggingFace and exporting a GPTQ quantized StableHLO IR.

@sdasgup3
Copy link
Member

We have started working on gathering what's needed in StableHLO to represent quantization in cutting-edge LLMs. The discussion in #1535 (comment) is pretty useful in that direction. As a starting point, we are exploring some of the int4 LLaMA / Vicuna torch IR while working with @powderluv and @qedawkins .

@sdasgup3
Copy link
Member

I note that the GTPQ is definitely within scope of coverage along with other algorithms for the StableHLO Quantizer team. We will update the ticket once the plans are finalized.

cc @doyeonkim0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In Progress
Status: Backlog
Development

No branches or pull requests

4 participants