Skip to content

Support GPTQ for LLMs INT4 quantization #4100

@Yichangfa

Description

@Yichangfa

I am trying to quantize Qwen3-VL (a multimodal LLM) to INT4 using AIMET. Due to the hardware (Snapdragon 8295), I can only use per‑channel quantization. I first tried basic PTQ, but the accuracy was insufficient. Then I used SpinQuant, which improved accuracy, but it is still not enough. I read the SpinQuant paper, which states that after applying the rotation, GPTQ is used for quantization. However, AIMET's implementation does not seem to follow this approach.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions