Version 2.29.0

aimetci released this 20 Apr 18:53

f5666dd

New Features
- ONNX
  - Add support for Qwen 2.5 VL in aimet-onnx (f256686)
- Torch
  - Support OOTB quantization of nn.MultiHeadAttention (4d19f47)
  - Support OOTB quantization of Qwen 3.5 normalization layers (01b912f)
  - Support OOTB quantization of InternVL GELU (c5f65b7)
Bug fixes and Improvements
- Common
  - Make export_int32_bias default to True if encoding_version >= 2.0.0 (22876ca)
- ONNX
  - Optimize QDQ latency for fp16 models (c817a17)
  - Support pattern matching LayerNormalization without bias (84f880a)
  - Make from_onnx_export ignore unloadable encodings by default (1b50727)
  - Enable loading models with redundant back-to-back QDQ using from_onnx_qdq (0f2be91)
  - Skip folding BatchNormalization when the Conv layer has shared weights (8f552b7)
  - Fix bug in standalone BatchNormalization fold with shared tensors (eb7ae4b)
- Torch
  - Disable activation quantizers for re-used stateless nn.Modules (8f552b7)

Assets 10