Support GPTQ for LLMs INT4 quantization

I am trying to quantize Qwen3-VL (a multimodal LLM) to INT4 using AIMET. Due to the hardware (Snapdragon 8295), I can only use per‑channel quantization. I first tried basic PTQ, but the accuracy was insufficient. Then I used SpinQuant, which improved accuracy, but it is still not enough. I read the SpinQuant paper, which states that after applying the rotation, GPTQ is used for quantization. However, AIMET's implementation does not seem to follow this approach.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support GPTQ for LLMs INT4 quantization #4100

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Support GPTQ for LLMs INT4 quantization #4100

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions