04 Jun 23:34

aimet-bot

2.32.1

391e60f

Version 2.32.1 Latest

Latest

Full Changelog: 2.32.0...2.32.1

Assets 10

03 Jun 16:25

aimet-bot

2.32.0

9c2fbf1

Version 2.32.0

Bug fixes and Improvements
- ONNX
  - Add C++ support for bfloat16 quantization (ca7d3e0)
  - Fix large model support with protobuf 7.x (9ef2251)
  - Skip QDQ pair scale/zp in duplicate_shared_initializers (05e8332)
  - Handle Identity passthrough in duplicate_shared_initializers (1b27d98)
  - Fix SpinQuant embed_tokens filter to exclude non-embedding Gathers (81b8041)
  - Inline fused supergroups after encoding propagation (68fdcb6)
- Torch
  - Disable output quantizers of reused modules before output encoding propagation (b66b9a1)
  - Inline Q/DQ nodes statically without re-invoking torch.export (de79ae4)
  - Stop incorrect encoding propagation through non-grid-preserving ops (66b2834)

Assets 9

20 May 00:27

bhushan23

2.31.0

8c4cbf2

Version 2.31.0

New Features
- ONNX
  - Support Qwen 3VL in AdaScale ONNX (35d2440)
- Torch
  - Add Gemma 3 support for AdaScale (a2da0de)
  - LoRA integration (0b90d8a)
Removed Features
- Torch
  - Delete AutoQuant (a238275)
  - Delete bias correction (a238275)
  - Delete quantizable transformer (a238275)
  - Delete winnow (a238275)
Bug fixes and Improvements
- ONNX
  - Fuse supergroups to ONNX function nodes in QuantSim init (441ac6d)
  - Enable ONNX initializer deduplication pass in torch>=2.12 (21dc8e0)
  - Detect post-writing norm incompatibility in ONNX SpinQuant (85bdbdb)
  - Remove incorrect entries from grid-preserving ops list (4007d7f)
  - Set self.session = None to avoid double memory allocation during rebuild session (18664a4)
  - Give fused supergroup nodes intuitive naming (a775b6e)
- Torch
  - Raise ValueError for unsupported architectures in PyTorch SpinQuant (a962614)
Documentation
- Add zero_point_shift to 1.0.0 encoding spec documentation (094dead)
- Add float8/float4 encoding to 2.0.0 spec documentation (02e75aa)

Assets 10

05 May 17:41

aimetci

2.30.0

cb18354

Version 2.30.0

New Features
- ONNX
  - Extend SpinQuant support for Vision-Language Models (VLM) (e5cd628)
- Torch
  - Remove legacy aimet_torch v1. v2 is now the sole API (4192da7, 00e3c72)
Bug fixes and Improvements
- ONNX
  - Improve set_and_freeze_param_encodings (39bf1f6, d320cae)
  - Optimize GPU calibration for fp16 models (d7faee0)
  - Only save model with external data if necessary (29fbef2)
- Common
  - Fix exception rule bug with mixed precision (c9add22)
Documentation
- Add SpinQuant ONNX documentation and examples (37454b3)
- Document 2.0.0 encoding specification (7e54291)

Assets 8

20 Apr 18:53

aimetci

2.29.0

f5666dd

Version 2.29.0

New Features
- ONNX
  - Add support for Qwen 2.5 VL in aimet-onnx (f256686)
- Torch
  - Support OOTB quantization of nn.MultiHeadAttention (4d19f47)
  - Support OOTB quantization of Qwen 3.5 normalization layers (01b912f)
  - Support OOTB quantization of InternVL GELU (c5f65b7)
Bug fixes and Improvements
- Common
  - Make export_int32_bias default to True if encoding_version >= 2.0.0 (22876ca)
- ONNX
  - Optimize QDQ latency for fp16 models (c817a17)
  - Support pattern matching LayerNormalization without bias (84f880a)
  - Make from_onnx_export ignore unloadable encodings by default (1b50727)
  - Enable loading models with redundant back-to-back QDQ using from_onnx_qdq (0f2be91)
  - Skip folding BatchNormalization when the Conv layer has shared weights (8f552b7)
  - Fix bug in standalone BatchNormalization fold with shared tensors (eb7ae4b)
- Torch
  - Disable activation quantizers for re-used stateless nn.Modules (8f552b7)

Assets 10

06 Apr 18:03

aimetci

2.28.0

53dbee9

Version 2.28.0

New Features
- Torch
  - Add resumable checkpointing for AdaScale optimization (20ecb0a)
- Common
  - Migrate pybind11 bindings to Cython using Python's Stable ABI to enable Python-version-independent wheels (0d6f856)
Bug fixes and Improvements
- Torch
  - Fix rescale encodings not propagating with shared scale values (d9f3a90)
Documentation
- Update docs and examples to use new API for setting lm_head precision (ac3e11e)

Assets 10

26 Mar 05:40

aimetci

2.27.0

f87b96a

Version 2.27.0

Bug fixes and Improvements
- ONNX
  - Add force_activation_as option to export APIs to control activation signedness (3583462)
- Torch
  - Reduce quantize-dequantize latency overhead (9ca3bf4, 525e993, b3de9a2)
  - Optimize inference speed for GenAITests models (cacd5cc, b6ea5bd, 30ab60a)
  - Allow checkpointing and loading during SeqMSE optimization (4eb97f0)
  - Fix SeqMSE error when model contains unquantized Conv/Linear layers (3dd4ca9)
  - Populate scalar constant Mul/Div output encodings at export (1228394, 169952d, ca2a324)
  - Propagate tensor encodings through scalar Mul/Div operations (54c7462, 2cfd07e)
- Common
  - Propagate concat input quantizers to output when possible (5ee0f13)

Assets 8

09 Mar 18:00

aimetci

2.26.0

8106fb3

Version 2.26.0

Bug fixes and Improvements
- ONNX
  - Implement onnxscript RMSNorm fusion for improved graph optimization (68710d9)
  - Propagate encoding through Concat during ONNX QDQ export (4811a34)
  - Export scale & offset as initializers instead of Constants in ONNX QDQ export (ea9a619)
  - Fix AdaScale (aimet-onnx) for Qwen3 models (beac8f8)
  - Fix BN fold for YOLO models (bae9953)
- Torch
  - Support int2 ONNX QDQ export (5fa79cf)
  - Significantly improve ONNX export performance by eliminating O(N^2) iterations and redundant Q/DQ operations (695465e, cb1f9ae, abe0ef5, c547cfb, 310b43d, fb9629d, d54efa0)
  - Add native support for Qwen3 MoE models (389d71f, ab6e810)
  - Fix triton kernel bug upon transposed inputs (a1f6795)
  - Fix GPU memory leak in AdaScale optimization loop (f52f2e2)
  - Fix AdaScale device error with caching disabled (964d11f)
  - Work around torch.compile bugs and exclude internal quantization methods from compilation (b8bcb47, d518f35)
  - Fix tie quantizers removing relu encoding constraint (3cc7252)
  - Fail immediately without retrying upon torch.cuda.OutOfMemoryError (4f84eb1)
  - Release blockwise sampler input memory before yielding to reduce memory usage (ee3d193)
  - Add aimet_torch.v1 end-of-life warning (8fc52c6)
  - Use whitelist approach for enabling per-channel quantization in quantsim config (817d3b1)
- Common
  - Tie concat and interpolation op quantizers by default with safe edge case handling (5ce7229, 5084af3)
  - Implement supergroup unrolling without name mangling (e351112)
  - Treat CG_split as grid-preserving op (738ee26)
  - Handle dynamic matmul add in connected graph passes (3c0de8e)
Documentation
- Add AdaScale documentation with HuggingFace LLM example (c403562)
- Update doc code examples to use aimet_torch.onnx.export (fed2a06)

Assets 10

03 Mar 16:21

aimetci

2.25.1

6eaf9c7

Version 2.25.1

2.25.1

Bug fixes and Improvements
- ONNX
  - Fix for encoding propagation for concat layers (5084af3)
- Torch
  - Fix to reduce GPU RAM usage for AdaScale for Qwen 3 VL model (ee3d193)

Assets 8

25 Feb 17:22

aimetci

2.25.0

cb9bb8e

Version 2.25.0

2.25.0

Bug fixes and Improvements
- ONNX
  - Reduced peak CPU memory usage for AdaScale and SeqMSE techniques (28f89a7)
  - Reduced peak CUDA memory usage for AdaScale technique (a29f44f)
  - Added support for Qwen3 VL models in GenAITests (c014961)
  - ONNX-IR based supergroup pattern detection and replacement (9972c1b)
  - Tie concat and interpolation ops by default (a8ac6f4)
- Torch
  - Bug fix for onnx qdq export with control flow ops (ae1abd1)
  - Use Triton kernels by default if available (3adcbee)
  - Introduces block_size parameter to EncodingAnalyzer (e250abd)
  - Always export encodings as uint (ae7d5ef)
  - float4/8 QDQ export support (135a0af)
  - Support loading zero_point_shift with sim.load_encodings() (624ba30)
  - Support built-in quantization of SyncBatchNorm (1e8eceb)

Assets 8

Releases: qualcomm/aimet

Version 2.32.1

Uh oh!

Version 2.32.0

Uh oh!

Version 2.31.0

Uh oh!

Version 2.30.0

Uh oh!

Version 2.29.0

Uh oh!

Version 2.28.0

Uh oh!

Version 2.27.0

Uh oh!

Version 2.26.0

Uh oh!

Version 2.25.1

2.25.1

Uh oh!

Version 2.25.0

2.25.0

Uh oh!