From 1ff85ae1725eeb96eca8087533de850fbfa7b756 Mon Sep 17 00:00:00 2001 From: Michael Tuttle Date: Fri, 3 Oct 2025 14:38:07 -0700 Subject: [PATCH] Add AimetQuantization pass documentation Signed-off-by: Michael Tuttle --- docs/source/features/quantization.md | 84 ++++++++++++++++++++++++++++ docs/source/reference/pass.rst | 7 +++ 2 files changed, 91 insertions(+) diff --git a/docs/source/features/quantization.md b/docs/source/features/quantization.md index 6df0c02679..a59d21e261 100644 --- a/docs/source/features/quantization.md +++ b/docs/source/features/quantization.md @@ -198,3 +198,87 @@ Olive consolidates the NVIDIA TensorRT Model Optimizer-Windows quantization into ``` Please refer to [Phi3.5 example](https://github.com/microsoft/olive-recipes/tree/main/microsoft-Phi-3.5-mini-instruct/NvTensorRtRtx) for usability and setup details. + + +## Quantize with AI Model Efficiency Toolkit +Olive supports quantizing models with Qualcomm's [AI Model Efficiency Toolkit](https://github.com/quic/aimet) (AIMET). + +AIMET is a software toolkit for quantizing trained ML models to optimize deployment on edge devices such as mobile phones or laptops. AIMET employs post-training and fine-tuning techniques to minimize accuracy loss during quantization. + +Olive consolidates AIMET quantization into a single pass called AimetQuantization which supports LPBQ, SeqMSE, and AdaRound. Multiple techniques can be applied in a single pass by listing them in the techniques array. If no techniques are specified, AIMET applies basic static quantization to the model using the provided data. + +| Technique | Description | +|--------------------------------|-----------------------------------------------------------------------------| +| **LPBQ** | An alternative to blockwise quantization which allows backends to leverage existing per-channel quantization kernels while significantly improving encoding granularity. | +| **SeqMSE** | Optimizes the weight encodings of each layer of a model to minimize the difference between the layer's original and quantized outputs. | +| **AdaRound** | Tunes the rounding direction for quantized model weights to minimize the local quantization error at each layer output. | + +### Example Configuration + +```json +{ + "type": "AimetQuantization", + "data_config": "calib_data_config" +} +``` + +#### LPBQ + +Configurations: + +- `block_size`: Number of input channels to group in each block (default: `64`). +- `op_types`: List of operator types for which to enable LPBQ (default: `["Gemm", "MatMul", "Conv"]`). +- `nodes_to_exclude`: List of node names to exclude from LPBQ weight quantization (default: `None`) + + +```json +{ + "type": "AimetQuantization", + "data_config": "calib_data_config", + "techniques": [ + {"name": "lpbq", "block_size": 64} + ] +} +``` + +#### SeqMSE + +Configurations: + + +- `data_config`: Data config to use for SeqMSE optimization. Defaults to calibration set if not specified. +- `num_candidates`: Number of encoding candidates to sweep for each weight (default: `20`). + + +```json +{ + "type": "AimetQuantization", + "data_config": "calib_data_config", + "precision": "int4", + "techniques": [ + {"name": "seqmse", "num_candidates": 20} + ] +} +``` + +#### AdaRound + +Configurations: + +- `num_iterations`: Number of optimization steps to take for each layer (default: `10000`). Recommended value is + 10K for weight bitwidths >= 8-bits, 15K for weight bitwidths < 8 bits. +- `nodes_to_exclude`: List of node names to exclude from AdaRound optimization (default: `None`). + + +```json +{ + "type": "AimetQuantization", + "data_config": "calib_data_config", + "techniques": [ + {"name": "adaround", "num_iterations": 10000, "nodes_to_exclude": ["/lm_head/MatMul"]} + ] +} +``` + +Please refer to [AimetQuantization](aimet_quantization) for more details about the pass and its config parameters. + diff --git a/docs/source/reference/pass.rst b/docs/source/reference/pass.rst index 2014577151..ab040074fc 100644 --- a/docs/source/reference/pass.rst +++ b/docs/source/reference/pass.rst @@ -194,6 +194,13 @@ ModelBuilder ------------ .. autoconfigclass:: olive.passes.ModelBuilder +.. _aimet_quantization: + +AimetQuantization +----------------- + +.. autoconfigclass:: olive.passes.AimetQuantization + Pytorch =================================