Skip to content

Adds AWQ (Activation-aware Weight Quantization) support.#205

Merged
copybara-service[bot] merged 1 commit intomainfrom
test_863102715
Feb 4, 2026
Merged

Adds AWQ (Activation-aware Weight Quantization) support.#205
copybara-service[bot] merged 1 commit intomainfrom
test_863102715

Conversation

@copybara-service
Copy link
Copy Markdown

Adds AWQ (Activation-aware Weight Quantization) support.

This CL implements AWQ to improve quantization accuracy by identifying salient weight channels based on activation magnitudes and applying per-channel scaling.

Key changes:

  • AwqCalibrationProvider: Added provider to collect activation statistics (act_scale) by intercepting dot_general and einsum.
  • Inherits from the new StatsCalibrationProvider to share interception logic with GPTQ.
  • AwqRule: Added rule to enable AWQ configuration.
  • quantize_params: Implemented AWQ scale search (grid search) and application.
  • Stores quantized weights wrapped in WithAwqScale alongside per-channel scales.
  • WithAwqScale: storage for localized AWQ scales alongside quantized data.
  • AwqInferenceProvider: Added inference support that handles WithAwqScale inputs, performing on-the-fly dequantization and scale compensation during dot_general and einsum operations.
  • StatsCalibrationProvider: Refactored common interception logic (for dot_general/einsum) from GPTQ into a shared base class in calibration.py.

@copybara-service copybara-service Bot force-pushed the test_863102715 branch 3 times, most recently from e62f814 to 7009dc7 Compare February 4, 2026 04:28
This CL implements AWQ to improve quantization accuracy by identifying salient weight channels based on activation magnitudes and applying per-channel scaling.

Key changes:
- `AwqCalibrationProvider`: Added provider to collect activation statistics (act_scale) by intercepting `dot_general` and `einsum`.
- Inherits from the new `StatsCalibrationProvider` to share interception logic with GPTQ.
- `AwqRule`: Added rule to enable AWQ configuration.
- `quantize_params`: Implemented AWQ scale search (grid search) and application.
- Stores quantized weights wrapped in `WithAwqScale` alongside per-channel scales.
- `WithAwqScale`: storage for localized AWQ scales alongside quantized data.
- `AwqInferenceProvider`: Added inference support that handles `WithAwqScale` inputs, performing on-the-fly dequantization and scale compensation during `dot_general` and `einsum` operations.
- `StatsCalibrationProvider`: Refactored common interception logic (for `dot_general`/`einsum`) from GPTQ into a shared base class in calibration.py.

PiperOrigin-RevId: 865178716
@copybara-service copybara-service Bot merged commit f7101de into main Feb 4, 2026
@copybara-service copybara-service Bot deleted the test_863102715 branch February 4, 2026 04:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant