Releases: qualcomm/aimet
Releases · qualcomm/aimet
Version 2.32.1
Full Changelog: 2.32.0...2.32.1
Version 2.32.0
- Bug fixes and Improvements
-
ONNX
- Add C++ support for bfloat16 quantization (ca7d3e0)
- Fix large model support with protobuf 7.x (9ef2251)
- Skip QDQ pair scale/zp in duplicate_shared_initializers (05e8332)
- Handle Identity passthrough in duplicate_shared_initializers (1b27d98)
- Fix SpinQuant embed_tokens filter to exclude non-embedding Gathers (81b8041)
- Inline fused supergroups after encoding propagation (68fdcb6)
-
Torch
-
Version 2.31.0
-
New Features
-
Removed Features
-
Bug fixes and Improvements
-
ONNX
- Fuse supergroups to ONNX function nodes in QuantSim init (441ac6d)
- Enable ONNX initializer deduplication pass in torch>=2.12 (21dc8e0)
- Detect post-writing norm incompatibility in ONNX SpinQuant (85bdbdb)
- Remove incorrect entries from grid-preserving ops list (4007d7f)
- Set self.session = None to avoid double memory allocation during rebuild session (18664a4)
- Give fused supergroup nodes intuitive naming (a775b6e)
-
Torch
- Raise ValueError for unsupported architectures in PyTorch SpinQuant (a962614)
-
-
Documentation
Version 2.30.0
-
New Features
-
Bug fixes and Improvements
-
Documentation
Version 2.29.0
-
New Features
-
Bug fixes and Improvements
-
Common
- Make export_int32_bias default to True if encoding_version >= 2.0.0 (22876ca)
-
ONNX
- Optimize QDQ latency for fp16 models (c817a17)
- Support pattern matching LayerNormalization without bias (84f880a)
- Make from_onnx_export ignore unloadable encodings by default (1b50727)
- Enable loading models with redundant back-to-back QDQ using from_onnx_qdq (0f2be91)
- Skip folding BatchNormalization when the Conv layer has shared weights (8f552b7)
- Fix bug in standalone BatchNormalization fold with shared tensors (eb7ae4b)
-
Torch
- Disable activation quantizers for re-used stateless nn.Modules (8f552b7)
-
Version 2.28.0
Version 2.27.0
- Bug fixes and Improvements
-
ONNX
- Add
force_activation_asoption to export APIs to control activation signedness (3583462)
- Add
-
Torch
- Reduce quantize-dequantize latency overhead (9ca3bf4, 525e993, b3de9a2)
- Optimize inference speed for GenAITests models (cacd5cc, b6ea5bd, 30ab60a)
- Allow checkpointing and loading during SeqMSE optimization (4eb97f0)
- Fix SeqMSE error when model contains unquantized Conv/Linear layers (3dd4ca9)
- Populate scalar constant Mul/Div output encodings at export (1228394, 169952d, ca2a324)
- Propagate tensor encodings through scalar Mul/Div operations (54c7462, 2cfd07e)
-
Common
- Propagate concat input quantizers to output when possible (5ee0f13)
-
Version 2.26.0
-
Bug fixes and Improvements
-
ONNX
- Implement onnxscript RMSNorm fusion for improved graph optimization (68710d9)
- Propagate encoding through Concat during ONNX QDQ export (4811a34)
- Export scale & offset as initializers instead of Constants in ONNX QDQ export (ea9a619)
- Fix AdaScale (aimet-onnx) for Qwen3 models (beac8f8)
- Fix BN fold for YOLO models (bae9953)
-
Torch
- Support int2 ONNX QDQ export (5fa79cf)
- Significantly improve ONNX export performance by eliminating O(N^2) iterations and redundant Q/DQ operations (695465e, cb1f9ae, abe0ef5, c547cfb, 310b43d, fb9629d, d54efa0)
- Add native support for Qwen3 MoE models (389d71f, ab6e810)
- Fix triton kernel bug upon transposed inputs (a1f6795)
- Fix GPU memory leak in AdaScale optimization loop (f52f2e2)
- Fix AdaScale device error with caching disabled (964d11f)
- Work around torch.compile bugs and exclude internal quantization methods from compilation (b8bcb47, d518f35)
- Fix tie quantizers removing relu encoding constraint (3cc7252)
- Fail immediately without retrying upon torch.cuda.OutOfMemoryError (4f84eb1)
- Release blockwise sampler input memory before yielding to reduce memory usage (ee3d193)
- Add aimet_torch.v1 end-of-life warning (8fc52c6)
- Use whitelist approach for enabling per-channel quantization in quantsim config (817d3b1)
-
Common
-
-
Documentation
Version 2.25.1
Version 2.25.0
2.25.0
- Bug fixes and Improvements
-
ONNX
- Reduced peak CPU memory usage for AdaScale and SeqMSE techniques (28f89a7)
- Reduced peak CUDA memory usage for AdaScale technique (a29f44f)
- Added support for Qwen3 VL models in GenAITests (c014961)
- ONNX-IR based supergroup pattern detection and replacement (9972c1b)
- Tie concat and interpolation ops by default (a8ac6f4)
-
Torch
- Bug fix for onnx qdq export with control flow ops (ae1abd1)
- Use Triton kernels by default if available (3adcbee)
- Introduces
block_sizeparameter to EncodingAnalyzer (e250abd) - Always export encodings as uint (ae7d5ef)
- float4/8 QDQ export support (135a0af)
- Support loading zero_point_shift with sim.load_encodings() (624ba30)
- Support built-in quantization of SyncBatchNorm (1e8eceb)
-