Adds AWQ (Activation-aware Weight Quantization) support. by copybara-service[bot] · Pull Request #205 · google/qwix

copybara-service · 2026-02-01T19:10:39Z

Adds AWQ (Activation-aware Weight Quantization) support.

This CL implements AWQ to improve quantization accuracy by identifying salient weight channels based on activation magnitudes and applying per-channel scaling.

Key changes:

AwqCalibrationProvider: Added provider to collect activation statistics (act_scale) by intercepting dot_general and einsum.
Inherits from the new StatsCalibrationProvider to share interception logic with GPTQ.
AwqRule: Added rule to enable AWQ configuration.
quantize_params: Implemented AWQ scale search (grid search) and application.
Stores quantized weights wrapped in WithAwqScale alongside per-channel scales.
WithAwqScale: storage for localized AWQ scales alongside quantized data.
AwqInferenceProvider: Added inference support that handles WithAwqScale inputs, performing on-the-fly dequantization and scale compensation during dot_general and einsum operations.
StatsCalibrationProvider: Refactored common interception logic (for dot_general/einsum) from GPTQ into a shared base class in calibration.py.

This CL implements AWQ to improve quantization accuracy by identifying salient weight channels based on activation magnitudes and applying per-channel scaling. Key changes: - `AwqCalibrationProvider`: Added provider to collect activation statistics (act_scale) by intercepting `dot_general` and `einsum`. - Inherits from the new `StatsCalibrationProvider` to share interception logic with GPTQ. - `AwqRule`: Added rule to enable AWQ configuration. - `quantize_params`: Implemented AWQ scale search (grid search) and application. - Stores quantized weights wrapped in `WithAwqScale` alongside per-channel scales. - `WithAwqScale`: storage for localized AWQ scales alongside quantized data. - `AwqInferenceProvider`: Added inference support that handles `WithAwqScale` inputs, performing on-the-fly dequantization and scale compensation during `dot_general` and `einsum` operations. - `StatsCalibrationProvider`: Refactored common interception logic (for `dot_general`/`einsum`) from GPTQ into a shared base class in calibration.py. PiperOrigin-RevId: 865178716

copybara-service Bot force-pushed the test_863102715 branch 3 times, most recently from e62f814 to 7009dc7 Compare February 4, 2026 04:28

copybara-service Bot force-pushed the test_863102715 branch from 7009dc7 to f7101de Compare February 4, 2026 04:32

copybara-service Bot merged commit f7101de into main Feb 4, 2026

copybara-service Bot deleted the test_863102715 branch February 4, 2026 04:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds AWQ (Activation-aware Weight Quantization) support.#205

Adds AWQ (Activation-aware Weight Quantization) support.#205
copybara-service[bot] merged 1 commit intomainfrom
test_863102715

copybara-service Bot commented Feb 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

copybara-service Bot commented Feb 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant