Version 2.26.0
-
Bug fixes and Improvements
-
ONNX
- Implement onnxscript RMSNorm fusion for improved graph optimization (68710d9)
- Propagate encoding through Concat during ONNX QDQ export (4811a34)
- Export scale & offset as initializers instead of Constants in ONNX QDQ export (ea9a619)
- Fix AdaScale (aimet-onnx) for Qwen3 models (beac8f8)
- Fix BN fold for YOLO models (bae9953)
-
Torch
- Support int2 ONNX QDQ export (5fa79cf)
- Significantly improve ONNX export performance by eliminating O(N^2) iterations and redundant Q/DQ operations (695465e, cb1f9ae, abe0ef5, c547cfb, 310b43d, fb9629d, d54efa0)
- Add native support for Qwen3 MoE models (389d71f, ab6e810)
- Fix triton kernel bug upon transposed inputs (a1f6795)
- Fix GPU memory leak in AdaScale optimization loop (f52f2e2)
- Fix AdaScale device error with caching disabled (964d11f)
- Work around torch.compile bugs and exclude internal quantization methods from compilation (b8bcb47, d518f35)
- Fix tie quantizers removing relu encoding constraint (3cc7252)
- Fail immediately without retrying upon torch.cuda.OutOfMemoryError (4f84eb1)
- Release blockwise sampler input memory before yielding to reduce memory usage (ee3d193)
- Add aimet_torch.v1 end-of-life warning (8fc52c6)
- Use whitelist approach for enabling per-channel quantization in quantsim config (817d3b1)
-
Common
-
-
Documentation