Skip to content

Releases: qualcomm/aimet

Version 2.32.1

04 Jun 23:34

Choose a tag to compare

Version 2.32.0

03 Jun 16:25

Choose a tag to compare

  • Bug fixes and Improvements
    • ONNX

      • Add C++ support for bfloat16 quantization (ca7d3e0)
      • Fix large model support with protobuf 7.x (9ef2251)
      • Skip QDQ pair scale/zp in duplicate_shared_initializers (05e8332)
      • Handle Identity passthrough in duplicate_shared_initializers (1b27d98)
      • Fix SpinQuant embed_tokens filter to exclude non-embedding Gathers (81b8041)
      • Inline fused supergroups after encoding propagation (68fdcb6)
    • Torch

      • Disable output quantizers of reused modules before output encoding propagation (b66b9a1)
      • Inline Q/DQ nodes statically without re-invoking torch.export (de79ae4)
      • Stop incorrect encoding propagation through non-grid-preserving ops (66b2834)

Version 2.31.0

20 May 00:27

Choose a tag to compare

  • New Features

    • ONNX

      • Support Qwen 3VL in AdaScale ONNX (35d2440)
    • Torch

      • Add Gemma 3 support for AdaScale (a2da0de)
      • LoRA integration (0b90d8a)
  • Removed Features

  • Bug fixes and Improvements

    • ONNX

      • Fuse supergroups to ONNX function nodes in QuantSim init (441ac6d)
      • Enable ONNX initializer deduplication pass in torch>=2.12 (21dc8e0)
      • Detect post-writing norm incompatibility in ONNX SpinQuant (85bdbdb)
      • Remove incorrect entries from grid-preserving ops list (4007d7f)
      • Set self.session = None to avoid double memory allocation during rebuild session (18664a4)
      • Give fused supergroup nodes intuitive naming (a775b6e)
    • Torch

      • Raise ValueError for unsupported architectures in PyTorch SpinQuant (a962614)
  • Documentation

    • Add zero_point_shift to 1.0.0 encoding spec documentation (094dead)
    • Add float8/float4 encoding to 2.0.0 spec documentation (02e75aa)

Version 2.30.0

05 May 17:41

Choose a tag to compare

  • New Features

    • ONNX

      • Extend SpinQuant support for Vision-Language Models (VLM) (e5cd628)
    • Torch

      • Remove legacy aimet_torch v1. v2 is now the sole API (4192da7, 00e3c72)
  • Bug fixes and Improvements

    • ONNX

      • Improve set_and_freeze_param_encodings (39bf1f6, d320cae)
      • Optimize GPU calibration for fp16 models (d7faee0)
      • Only save model with external data if necessary (29fbef2)
    • Common

      • Fix exception rule bug with mixed precision (c9add22)
  • Documentation

    • Add SpinQuant ONNX documentation and examples (37454b3)
    • Document 2.0.0 encoding specification (7e54291)

Version 2.29.0

20 Apr 18:53

Choose a tag to compare

  • New Features

    • ONNX

      • Add support for Qwen 2.5 VL in aimet-onnx (f256686)
    • Torch

      • Support OOTB quantization of nn.MultiHeadAttention (4d19f47)
      • Support OOTB quantization of Qwen 3.5 normalization layers (01b912f)
      • Support OOTB quantization of InternVL GELU (c5f65b7)
  • Bug fixes and Improvements

    • Common

      • Make export_int32_bias default to True if encoding_version >= 2.0.0 (22876ca)
    • ONNX

      • Optimize QDQ latency for fp16 models (c817a17)
      • Support pattern matching LayerNormalization without bias (84f880a)
      • Make from_onnx_export ignore unloadable encodings by default (1b50727)
      • Enable loading models with redundant back-to-back QDQ using from_onnx_qdq (0f2be91)
      • Skip folding BatchNormalization when the Conv layer has shared weights (8f552b7)
      • Fix bug in standalone BatchNormalization fold with shared tensors (eb7ae4b)
    • Torch

      • Disable activation quantizers for re-used stateless nn.Modules (8f552b7)

Version 2.28.0

06 Apr 18:03

Choose a tag to compare

  • New Features

    • Torch

      • Add resumable checkpointing for AdaScale optimization (20ecb0a)
    • Common

      • Migrate pybind11 bindings to Cython using Python's Stable ABI to enable Python-version-independent wheels (0d6f856)
  • Bug fixes and Improvements

    • Torch
      • Fix rescale encodings not propagating with shared scale values (d9f3a90)
  • Documentation

    • Update docs and examples to use new API for setting lm_head precision (ac3e11e)

Version 2.27.0

26 Mar 05:40

Choose a tag to compare

  • Bug fixes and Improvements
    • ONNX

      • Add force_activation_as option to export APIs to control activation signedness (3583462)
    • Torch

      • Reduce quantize-dequantize latency overhead (9ca3bf4, 525e993, b3de9a2)
      • Optimize inference speed for GenAITests models (cacd5cc, b6ea5bd, 30ab60a)
      • Allow checkpointing and loading during SeqMSE optimization (4eb97f0)
      • Fix SeqMSE error when model contains unquantized Conv/Linear layers (3dd4ca9)
      • Populate scalar constant Mul/Div output encodings at export (1228394, 169952d, ca2a324)
      • Propagate tensor encodings through scalar Mul/Div operations (54c7462, 2cfd07e)
    • Common

      • Propagate concat input quantizers to output when possible (5ee0f13)

Version 2.26.0

09 Mar 18:00

Choose a tag to compare

  • Bug fixes and Improvements

    • ONNX

      • Implement onnxscript RMSNorm fusion for improved graph optimization (68710d9)
      • Propagate encoding through Concat during ONNX QDQ export (4811a34)
      • Export scale & offset as initializers instead of Constants in ONNX QDQ export (ea9a619)
      • Fix AdaScale (aimet-onnx) for Qwen3 models (beac8f8)
      • Fix BN fold for YOLO models (bae9953)
    • Torch

      • Support int2 ONNX QDQ export (5fa79cf)
      • Significantly improve ONNX export performance by eliminating O(N^2) iterations and redundant Q/DQ operations (695465e, cb1f9ae, abe0ef5, c547cfb, 310b43d, fb9629d, d54efa0)
      • Add native support for Qwen3 MoE models (389d71f, ab6e810)
      • Fix triton kernel bug upon transposed inputs (a1f6795)
      • Fix GPU memory leak in AdaScale optimization loop (f52f2e2)
      • Fix AdaScale device error with caching disabled (964d11f)
      • Work around torch.compile bugs and exclude internal quantization methods from compilation (b8bcb47, d518f35)
      • Fix tie quantizers removing relu encoding constraint (3cc7252)
      • Fail immediately without retrying upon torch.cuda.OutOfMemoryError (4f84eb1)
      • Release blockwise sampler input memory before yielding to reduce memory usage (ee3d193)
      • Add aimet_torch.v1 end-of-life warning (8fc52c6)
      • Use whitelist approach for enabling per-channel quantization in quantsim config (817d3b1)
    • Common

      • Tie concat and interpolation op quantizers by default with safe edge case handling (5ce7229, 5084af3)
      • Implement supergroup unrolling without name mangling (e351112)
      • Treat CG_split as grid-preserving op (738ee26)
      • Handle dynamic matmul add in connected graph passes (3c0de8e)
  • Documentation

    • Add AdaScale documentation with HuggingFace LLM example (c403562)
    • Update doc code examples to use aimet_torch.onnx.export (fed2a06)

Version 2.25.1

03 Mar 16:21

Choose a tag to compare

2.25.1

  • Bug fixes and Improvements
    • ONNX

      • Fix for encoding propagation for concat layers (5084af3)
    • Torch

      • Fix to reduce GPU RAM usage for AdaScale for Qwen 3 VL model (ee3d193)

Version 2.25.0

25 Feb 17:22

Choose a tag to compare

2.25.0

  • Bug fixes and Improvements
    • ONNX

      • Reduced peak CPU memory usage for AdaScale and SeqMSE techniques (28f89a7)
      • Reduced peak CUDA memory usage for AdaScale technique (a29f44f)
      • Added support for Qwen3 VL models in GenAITests (c014961)
      • ONNX-IR based supergroup pattern detection and replacement (9972c1b)
      • Tie concat and interpolation ops by default (a8ac6f4)
    • Torch

      • Bug fix for onnx qdq export with control flow ops (ae1abd1)
      • Use Triton kernels by default if available (3adcbee)
      • Introduces block_size parameter to EncodingAnalyzer (e250abd)
      • Always export encodings as uint (ae7d5ef)
      • float4/8 QDQ export support (135a0af)
      • Support loading zero_point_shift with sim.load_encodings() (624ba30)
      • Support built-in quantization of SyncBatchNorm (1e8eceb)