Skip to content

[Feature Request] Support quantized MoE (eg Qwen3-30B-A3B-GPTQ) #5505

@tommyip

Description

@tommyip

What is your request?

The MoE layer does not support quantized weights. Seems like the main thing missing is a quant version for the kernel grouped_matmul_ragged.

What is your motivation for this change?

A MoE layer with quant support allows the implementation of popular MoE models like Qwen3-30B-A3B-GPTQ (Text, Coder & Omni).

Any other details?

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions