Version 2.11.0
-
New Feature
-
PyTorch
- SpinQuant (experimental) - implement SpinQuant PTQ technique (https://arxiv.org/pdf/2308.13137) for Llama, Qwen2, and Mistral families (R1 rotation w/o optimization) (7364b37)
- Enable Adascale and Omniquant for Mistral (d33e98c)
-
ONNX
- Enable llm_configurator for Llama (Experimental) (08c17b8)
-
-
Bug fixes and Improvements
-
Common
-
ONNX
- Apply matmul exception rule only for integer quantization (bb93c76)
- Optimize blockwise min-max encoding analyzer (4febdd4)
- Remove explicit FP32 model creation inside AdaRound and optimize building sessions during the optimization process (
b1415bd_) - Make Concat output quantizer inherit fixed input range (50f35dd)
- Enable output quantizers to inherit input encoding when tying encodings (3750526)
- Fix bug in CLE with bn_conv groups (654f4b1)
-
PyTorch
-
-
Documentation Updates
-
Known Issues
- Keras
- Accuracy drop observed with AIMET Keras for certain models. Fix is planned for the next release.
- Skipping 2.11 aimet-keras release due to regression
- Keras