[Quantization] AutoGPTQ refactor and matmul combination support #694

LeiWang1999 · 2023-08-08T12:05:24Z

This PR refactors the AutoGPTQ integration to better align with the framework design. The PR, meanwhile, supports the AutoGPTQ quantization in MLC LLM with matmul combination.

With this PR, you will be able to compile Llama2 using the following command:

python -m mlc_llm.build --model=Llama-2-7b-chat-hf --quantization autogptq_llama_q4f16_1 --target cuda

to use the AutoGPTQ quantization. Note that the first run may take around 10 min for AutoGPTQ quantization computation, and the following runs will be much quicker. The AutoGPTQ quantization requires the Python auto_gptq package to have version at least 0.2.0.

Co-authored-by: Ruihang Lai ruihangl@cs.cmu.edu

LeiWang1999 · 2023-08-08T12:05:49Z

Please cc @MasterJH5574

mlc_llm/core.py

LeiWang1999 · 2023-08-16T03:56:52Z

LGTM, thanks for your hard work on this pr @MasterJH5574 !

This PR refactors the AutoGPTQ integration to better align with the framework design. The PR, meanwhile, supports the AutoGPTQ quantization in MLC LLM with matmul combination. With this PR, you will be able to compile Llama2 using the following command: ```python python -m mlc_llm.build --model=Llama-2-7b-chat-hf --quantization autogptq_llama_q4f16_1 --target cuda ``` to use the AutoGPTQ quantization. **Note that the first run may take around 10 min for AutoGPTQ quantization computation, and the following runs will be much quicker.** The AutoGPTQ quantization requires the Python `auto_gptq` package to have version at least 0.2.0. Co-authored-by: Lei Wang <LeiWang1999@users.noreply.github.com>

Lurrobert · 2023-11-03T23:30:56Z

do you also think this should be added in the requirements? auto_gptq module

not working in macos unfortunately
AutoGPTQ/AutoGPTQ#299

MasterJH5574 self-assigned this Aug 9, 2023

MasterJH5574 force-pushed the lei/gptq-combined branch from 9976004 to 1efef2a Compare August 15, 2023 22:58

MasterJH5574 reviewed Aug 15, 2023

View reviewed changes

mlc_llm/core.py Outdated Show resolved Hide resolved

MasterJH5574 changed the title ~~[Param Manager] Combined Matmul Support for auto-gptq Quant Spec~~ [Quantization] AutoGPTQ refactor and matmul combination support Aug 15, 2023

MasterJH5574 force-pushed the lei/gptq-combined branch 2 times, most recently from 8e0400a to b5c1162 Compare August 16, 2023 02:07

MasterJH5574 approved these changes Aug 16, 2023

View reviewed changes

MasterJH5574 force-pushed the lei/gptq-combined branch from b5c1162 to 823d481 Compare August 25, 2023 20:47

MasterJH5574 merged commit 5fe6344 into mlc-ai:main Aug 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Quantization] AutoGPTQ refactor and matmul combination support #694

[Quantization] AutoGPTQ refactor and matmul combination support #694

LeiWang1999 commented Aug 8, 2023 •

edited by MasterJH5574

LeiWang1999 commented Aug 8, 2023

LeiWang1999 commented Aug 16, 2023

Lurrobert commented Nov 3, 2023 •

edited

[Quantization] AutoGPTQ refactor and matmul combination support #694

[Quantization] AutoGPTQ refactor and matmul combination support #694

Conversation

LeiWang1999 commented Aug 8, 2023 • edited by MasterJH5574

LeiWang1999 commented Aug 8, 2023

LeiWang1999 commented Aug 16, 2023

Lurrobert commented Nov 3, 2023 • edited

LeiWang1999 commented Aug 8, 2023 •

edited by MasterJH5574

Lurrobert commented Nov 3, 2023 •

edited