Skip to content

Version 2.26.0

Choose a tag to compare

@aimetci aimetci released this 09 Mar 18:00
  • Bug fixes and Improvements

    • ONNX

      • Implement onnxscript RMSNorm fusion for improved graph optimization (68710d9)
      • Propagate encoding through Concat during ONNX QDQ export (4811a34)
      • Export scale & offset as initializers instead of Constants in ONNX QDQ export (ea9a619)
      • Fix AdaScale (aimet-onnx) for Qwen3 models (beac8f8)
      • Fix BN fold for YOLO models (bae9953)
    • Torch

      • Support int2 ONNX QDQ export (5fa79cf)
      • Significantly improve ONNX export performance by eliminating O(N^2) iterations and redundant Q/DQ operations (695465e, cb1f9ae, abe0ef5, c547cfb, 310b43d, fb9629d, d54efa0)
      • Add native support for Qwen3 MoE models (389d71f, ab6e810)
      • Fix triton kernel bug upon transposed inputs (a1f6795)
      • Fix GPU memory leak in AdaScale optimization loop (f52f2e2)
      • Fix AdaScale device error with caching disabled (964d11f)
      • Work around torch.compile bugs and exclude internal quantization methods from compilation (b8bcb47, d518f35)
      • Fix tie quantizers removing relu encoding constraint (3cc7252)
      • Fail immediately without retrying upon torch.cuda.OutOfMemoryError (4f84eb1)
      • Release blockwise sampler input memory before yielding to reduce memory usage (ee3d193)
      • Add aimet_torch.v1 end-of-life warning (8fc52c6)
      • Use whitelist approach for enabling per-channel quantization in quantsim config (817d3b1)
    • Common

      • Tie concat and interpolation op quantizers by default with safe edge case handling (5ce7229, 5084af3)
      • Implement supergroup unrolling without name mangling (e351112)
      • Treat CG_split as grid-preserving op (738ee26)
      • Handle dynamic matmul add in connected graph passes (3c0de8e)
  • Documentation

    • Add AdaScale documentation with HuggingFace LLM example (c403562)
    • Update doc code examples to use aimet_torch.onnx.export (fed2a06)